Research
Multimodal Learning
Consider a questions like "What is written on the signboard?", the answer predicted by the model is always" STOP" which makes the model to ignore the image. To address these issues, we propose a method to learn visual features (making V matter in TextVQA) along with the OCR features and question features using VQA dataset as external knowledge for Text-based VQA. Specifically, we combine the TextVQA dataset and VQA dataset and train the model on this combined dataset. Such a simple, yet effective approach increases the understanding and correlation between the image features and text present in the image, which helps in the better answering of questions.
Growing interest in conversational agents promote two-way human-computer communications involving asking and answering visual questions have become an active area of research in AI. Thus, generation of visual question-answer pair (s) becomes an important and challenging task. To address this issue, we propose a weakly-supervised visual question answer generation method that generates a relevant question-answer pairs for a given input image and associated caption. Most of the prior works are supervised and depend on the annotated question-answer datasets. In our work, we present a weakly supervised method that synthetically generates question-answer pairs procedurally from visual information and captions.
Look, Read and Ask: Learning to Ask Questions by Reading Text in Images
Soumya Jahagirdar, Shankar Gangisetty and Anand Mishra
ICDAR 2021 (Oral)
[Paper] [GitHub] [Webpage][Video] [OpenAccess]
TextVQG aims to generate a natural language question for a given input image and an automatically extracted text also known as OCR token from it such that the OCR token is an answer to the generated question.
A study to collect sound samples for developing non-invasive diagnosis of respiratory diseases and COVID-19.
3D Scene Understanding
The goal of this track is to evaluate the performance of different retrieval algorithms under the Open-Set setting and modality-missing setting, respectively. Since objects from unseen categories are very common in real-world applications, we design the open-set 3D object retrieval to expand the application of traditional 3D object retrieval.
Unlike human vision, supervising the machine to analyze segments of an object is challenging & quite essential in various machine vision applications. We propose PIG-Net for 3D part segmentation of point clouds based on inception.
Scene understanding of large-scale 3D point clouds of an outer space is a challenging task. We propose GRanD-Net for semantic segmentation of large-scale 3D point clouds of street scenes.
3D Semantic Segmentation for Large-Scale Scene Understanding
Kiran Akadas, Shankar Gangisetty
ACCVW 2020
[Paper] [OpenAccess][Github]
3D semantic segmentation is one of the most challenging events in robotic vision tasks for detection and identification of various objects in a scene. We propose a lightweight semantic segmentation network with dilated convolutions.
3D Representations
An Evaluation of Feature Encoding Techniques for Non-Rigid and Rigid 3D Point Cloud Retrieval
Sindhu Hegde, Shankar Gangisetty
BMVC 2019
[Paper]
As powerful computation resources and scanning devices have led to an exponential growth of 3D point cloud data, retrieving the relevant 3D objects from databases is a challenging task. We propose the necessity of 3D feature encoding along with local descriptors for solving non-rigid and rigid point cloud retrieval.
Evaluation of Point Cloud Categorization for Rigid and Non-Rigid 3D Objects
Shankar Gangisetty, Sindhu Hegde, Supriya Satyappanavar
Uma Mudenagudi
ICVGIP 2018, WiCV@CVPR 2019 (show)
[Paper]
We propose a 3D object categorization framework comprising of IWKS & MTCS feature descriptors by approximating Laplace-Beltrami operator on point cloud data for non-rigid objects and MTCS for rigid objects.
Underwater Imaging
FloodNet: Underwater image restoration based on residual dense learning
Shankar Gangisetty, Raghu Raj Rai
Signal Processing: Image Communication 2022
[Paper]
In recent years, different methods have relied on the underwater image formation model and deep learning techniques to restore the underwater image, but tend to produce unnatural artifacts and reduced levels of sharpness. To address these challenges, we proposed FloodNet using residual dense learning with the objective of estimating restored underwater images from a wide variety of degraded underwater images.
Underwater image restoration using deep encoder–decoder network with symmetric skip connections
Shankar Gangisetty, Raghu Raj Rai
Signal, Image and Video Processing 2022
[Paper]
We propose end-to-end deep convolutional network architecture to restore the underwater degraded images and improve visual perception by utilizing the symmetric skip connections between encoder–decoder.
3D Data Processing and Indian Digital Heritage
Region of Interest-Based 3D Inpainting of Cultural Heritage Artifacts
Shankar Gangisetty, Uma Mudenagudi
JOCCH 2018
[Paper]
Example-based 3D inpainting of point clouds using metric tensor and Christoffel symbols
Shankar Gangisetty, Uma Mudenagudi
MVAP 2018
[Paper]
3D Data Processing of Cultural Heritage Artifacts
Shankar Gangisetty (Advisor: Uma Mudenagudi)
Best Doctoral Symposium Award
With the advent of modern digital technology there is a great surge of interest among the computer graphics and vision community in digital restoration of 3D models and many problems still exist in building the complete 3D reconstruction framework. In our research work, we address the problems of digital restoration of 3D models at cultural heritage sites specifically artifacts at Hampi, India. We propose to build a 3D reconstruction pipeline, referring to acquisition, data processing and rendering for digital restoration of 3D models at cultural heritage sites.
Region of Interest (ROI) Based 3D Inpainting
Shankar Gangisetty, Himanshu Shekhar, Uma Mudenagudi
SIGGRAPH ASIA 2016 (Poster)
[Paper]
We address the problem of 3D inpainting using ROI-based method for point clouds. We focus on inpainting of complex, irregular and large missing regions covering prominent geometric features by considering n self-similar examples.
Framework for 3D object hole filling
Shankar Gangisetty, Syed Altaf Ganihar, Uma Mudenagudi
NCVPRIPG 2015
[Paper]
Metric Tensor and Christoffel Symbols Based 3D Object Categorization
Syed Altaf Ganihar, Shreyas Joshi, Shankar Gangisetty, Uma Mudenagudi
SIGGRAPH 2014 (Poster), ACCVW 2014
3D object decomposition and super resolution
Syed Altaf Ganihar, Shreyas Joshi, Shankar Gangisetty, Uma Mudenagudi
SIGGRAPH ASIA 2014 (Poster)
[Paper]
3D Object Super Resolution using Metric Tensor and Christoffel Symbols
Syed Altaf Ganihar, Shreyas Joshi, Shankar Gangisetty, Uma Mudenagudi
ICVGIP 2014
[Paper]
Realistic Walkthrough of Cultural Heritage Sites-Hampi
Uma Mudenagudi, Syed Altaf Ganihar, Shreyas Joshi, Shankar Gangisetty, Prem Kalra, et al.
ACCVW 2014
[Paper]
Misc
Multi Class Video Categorization Using Noise Free Text
Swathi Shetty, Shankar Gangisetty
ICDMW 2013
[Paper]