Research

Multimodal Learning

Making the V in Text-VQA Matter 

Shamanthak Hegde, Soumya Jahagirdar, Shankar Gangisetty

CVPR O-DRUM Workshop 2023 (Oral)

[Paper] [arXiv]

Consider a questions like "What is written on the signboard?", the answer predicted by the model is always" STOP" which makes the model to ignore the image. To address these issues, we propose a method to learn visual features (making V matter in TextVQA) along with the OCR features and question features using VQA dataset as external knowledge for Text-based VQA. Specifically, we combine the TextVQA dataset and VQA dataset and train the model on this combined dataset. Such a simple, yet effective approach increases the understanding and correlation between the image features and text present in the image, which helps in the better answering of questions. 

Weakly Supervised Visual Question Answer Generation 

Charani Alampalle, Shamanthak Hegde, Soumya Jahagirdar, Shankar Gangisetty

CVPR O-DRUM Workshop 2023 (Oral)

[Paper] [arXiv]

Growing interest in conversational agents promote two-way human-computer communications involving asking and answering visual questions have become an active area of research in AI. Thus, generation of visual question-answer pair (s) becomes an important and challenging task. To address this issue, we propose a weakly-supervised visual question answer generation method that generates a relevant question-answer pairs for a given input image and associated caption. Most of the prior works are supervised and depend on the annotated question-answer datasets. In our work, we present a weakly supervised method that synthetically generates question-answer pairs procedurally from visual information and captions.

Look, Read and Ask: Learning to Ask Questions by Reading Text in Images

Soumya Jahagirdar, Shankar Gangisetty and Anand Mishra

ICDAR 2021 (Oral)

[Paper] [GitHub] [Webpage][Video] [OpenAccess]

TextVQG aims to generate a natural language question for a given input image and an automatically extracted text also known as OCR token from it such that the OCR token is an answer to the generated question.

ShvasAarogya Project

Link for sharing vocal data: https://shvasa.kletech.ac.in/

May 2021

A study to collect sound samples for developing non-invasive diagnosis of respiratory diseases and COVID-19.

3D Scene Understanding

SHREC'22: Open-set 3D Object Retrieval 

Yifan Feng, Yue Gao, Xibin Zhao, Yandong Guo, Nihar Bagewadi, Nhat-Tan Bui, Hieu Dao, Shankar Gangisetty, et al.

Computers & Graphics 2022

[Paper]

The goal of this track is to evaluate the performance of different retrieval algorithms under the Open-Set setting and modality-missing setting, respectively. Since objects from unseen categories are very common in real-world applications, we design the open-set 3D object retrieval to expand the application of traditional 3D object retrieval.

PIG-Net: Inception base Deep Learning architecture for 3D point cloud segmentation

Sindhu Hegde, Shankar Gangisetty

Computers & Graphics 2021

[Paper] [arXiv] [Webpage]

Unlike human vision, supervising the machine to analyze segments of an object is challenging & quite essential in various machine vision applications. We propose PIG-Net for 3D part segmentation of point clouds based on inception.

SHREC 2020: 3D point cloud semantic segmentation for street scenes

Tao Ku, Remco C.Veltkamp, Kiran Akadas, Shankar Gangisetty, et al.  

Computers & Graphics 2020 - EuroGraphics 3DOR

[Paper][Github]

Scene understanding of large-scale 3D point clouds of an outer space is a challenging task. We propose GRanD-Net for  semantic segmentation of large-scale 3D point clouds of street scenes.

3D Semantic Segmentation for Large-Scale Scene Understanding 

Kiran Akadas, Shankar Gangisetty  

ACCVW 2020

[Paper] [OpenAccess][Github]

3D semantic segmentation is one of the most challenging events in robotic vision tasks for detection and identification of various objects in a scene. We propose a lightweight semantic segmentation network with dilated convolutions.

3D Representations

An Evaluation of Feature Encoding Techniques for Non-Rigid and Rigid 3D Point Cloud Retrieval

Sindhu Hegde, Shankar Gangisetty

BMVC 2019

[Paper]

As powerful computation resources and scanning devices have led to an exponential growth of 3D point cloud data, retrieving the relevant 3D objects from databases is a challenging task. We propose the necessity of 3D feature encoding along with local descriptors for solving non-rigid and rigid point cloud retrieval.

Evaluation of Point Cloud Categorization for Rigid and Non-Rigid 3D Objects

Shankar Gangisetty, Sindhu Hegde, Supriya Satyappanavar

Uma Mudenagudi

ICVGIP 2018, WiCV@CVPR 2019 (show)

[Paper]

We propose a 3D object categorization framework comprising of IWKS & MTCS feature descriptors by approximating Laplace-Beltrami operator on point cloud data for non-rigid objects and MTCS for rigid objects.

Underwater Imaging

FloodNet: Underwater image restoration based on residual dense learning

Shankar Gangisetty, Raghu Raj Rai

Signal Processing: Image Communication 2022 

[Paper]

In recent years, different methods have relied on the underwater image formation model and deep learning techniques to restore the underwater image, but tend to produce unnatural artifacts and reduced levels of sharpness. To address these challenges, we proposed FloodNet using residual dense learning with the objective of estimating restored underwater images from a wide variety of degraded underwater images.

Underwater image restoration using deep encoder–decoder network with symmetric skip connections 

Shankar Gangisetty, Raghu Raj Rai

Signal, Image and Video Processing 2022 

[Paper]

We propose end-to-end deep convolutional network architecture to restore the underwater degraded images and improve visual perception by utilizing the symmetric skip connections between encoder–decoder. 

3D Data Processing and Indian Digital Heritage

Region of Interest-Based 3D Inpainting of Cultural Heritage Artifacts

Shankar Gangisetty, Uma Mudenagudi

JOCCH 2018

[Paper]

Example-based 3D inpainting of point clouds using metric tensor and Christoffel symbols

Shankar Gangisetty, Uma Mudenagudi

MVAP 2018

[Paper]

3D Data Processing of Cultural Heritage Artifacts

Shankar Gangisetty (Advisor: Uma Mudenagudi)

ICVGIP 2016 

Best Doctoral Symposium Award

With the advent of modern digital technology there is a great surge of interest among the computer graphics and vision community in digital restoration of 3D models and many problems still exist in building the complete 3D reconstruction framework. In our research work, we address the problems of digital restoration of 3D models at cultural heritage sites specifically artifacts at Hampi, India. We propose to build a 3D reconstruction pipeline, referring to acquisition, data processing and rendering for digital restoration of 3D models at cultural heritage sites.

Region of Interest (ROI) Based 3D Inpainting

Shankar Gangisetty, Himanshu Shekhar, Uma Mudenagudi

SIGGRAPH ASIA 2016 (Poster)

[Paper]

We address the problem of 3D inpainting using ROI-based method for point clouds. We focus on inpainting of complex, irregular and large missing regions covering prominent geometric features by considering n self-similar examples. 

Framework for 3D object hole filling

Shankar Gangisetty, Syed Altaf Ganihar, Uma Mudenagudi

NCVPRIPG 2015

[Paper]

Metric Tensor and Christoffel Symbols Based 3D Object Categorization

Syed Altaf Ganihar, Shreyas Joshi, Shankar Gangisetty, Uma Mudenagudi

SIGGRAPH 2014 (Poster), ACCVW 2014

[PaperSigraph][PaperAccv]

3D object decomposition and super resolution

Syed Altaf Ganihar, Shreyas Joshi, Shankar Gangisetty, Uma Mudenagudi

SIGGRAPH ASIA 2014 (Poster)

[Paper]

3D Object Super Resolution using Metric Tensor and Christoffel Symbols

Syed Altaf Ganihar, Shreyas Joshi, Shankar Gangisetty, Uma Mudenagudi

ICVGIP 2014

[Paper]

Realistic Walkthrough of Cultural Heritage Sites-Hampi

Uma Mudenagudi, Syed Altaf Ganihar, Shreyas Joshi, Shankar Gangisetty, Prem Kalra, et al.

ACCVW 2014

[Paper]

Misc

Indian movie face database: a benchmark for face recognition under wide variations

Shankar Gangisetty, Moula Husain, C V Jawahar, et al.

NCVPRIPG 2013

[Paper][Webpage]

Multi Class Video Categorization Using Noise Free Text

Swathi Shetty, Shankar Gangisetty

ICDMW 2013

[Paper]