Research

Multimodal Learning in Autonomous Driving

Early Anticipation of Driving Maneuvers

Abdul Wasi, Shankar Gangisetty, Shyam Nandan, C V Jawahar

ECCV 2024 (Accepted)

[Paper] [arXiv][Webpage]

Early anticipation in driving is important in scenarios that demand a preemptive response before a maneuver begins. However, there is no prior work aimed at addressing the problem of driver action anticipation before the onset of the maneuver, limiting the ability of the advanced driver assistance system (ADAS) for early maneuver anticipation. In this work, we introduce Anticipating Driving Maneuvers (ADM), a new task that enables driver action anticipation before the onset of the maneuver.

Visual Place Recognition in Unstructured Driving Environments

Utkarsh Rai, Shankar Gangisetty, Abdul Hafez, Anbumani, C V Jawahar

IROS 2024 (Accepted)

[Paper] [arXiv][Webpage]

The problem of determining geolocation through visual inputs, known as Visual Place Recognition (VPR), has attracted significant attention in recent years owing to its potential applications in autonomous self-driving systems. We address the VPR challenges by proposing an Indian driving VPR dataset that caters to the semantic diversity of unstructured driving environments like occlusions due to dynamic environments, variations in traffic density, viewpoint variability, and variability in lighting conditions.

Multimodal Learning

Making the V in Text-VQA Matter

Shamanthak Hegde, Soumya Jahagirdar, Shankar Gangisetty

CVPR O-DRUM Workshop 2023 (Oral)

[Paper] [arXiv]

Consider a questions like "What is written on the signboard?", the answer predicted by the model is always" STOP" which makes the model to ignore the image. To address these issues, we propose a method to learn visual features (making V matter in TextVQA) along with the OCR features and question features using VQA dataset as external knowledge for Text-based VQA. Specifically, we combine the TextVQA dataset and VQA dataset and train the model on this combined dataset. Such a simple, yet effective approach increases the understanding and correlation between the image features and text present in the image, which helps in the better answering of questions.

Weakly Supervised Visual Question Answer Generation

Charani Alampalle, Shamanthak Hegde, Soumya Jahagirdar, Shankar Gangisetty

CVPR O-DRUM Workshop 2023 (Oral)

[Paper] [arXiv]

Growing interest in conversational agents promote two-way human-computer communications involving asking and answering visual questions have become an active area of research in AI. Thus, generation of visual question-answer pair (s) becomes an important and challenging task. To address this issue, we propose a weakly-supervised visual question answer generation method that generates a relevant question-answer pairs for a given input image and associated caption. Most of the prior works are supervised and depend on the annotated question-answer datasets. In our work, we present a weakly supervised method that synthetically generates question-answer pairs procedurally from visual information and captions.

Look, Read and Ask: Learning to Ask Questions by Reading Text in Images

Soumya Jahagirdar, Shankar Gangisetty and Anand Mishra

ICDAR 2021 (Oral)

[Paper] [GitHub] [Webpage][Video] [OpenAccess]

TextVQG aims to generate a natural language question for a given input image and an automatically extracted text also known as OCR token from it such that the OCR token is an answer to the generated question.

ShvasAarogya Project

Link for sharing vocal data: https://shvasa.kletech.ac.in/

May 2021

A study to collect sound samples for developing non-invasive diagnosis of respiratory diseases and COVID-19.

3D Scene Understanding

SHREC'22: Open-set 3D Object Retrieval

Yifan Feng, Yue Gao, Xibin Zhao, Yandong Guo, Nihar Bagewadi, Nhat-Tan Bui, Hieu Dao, Shankar Gangisetty, et al.

Computers & Graphics 2022

[Paper]

The goal of this track is to evaluate the performance of different retrieval algorithms under the Open-Set setting and modality-missing setting, respectively. Since objects from unseen categories are very common in real-world applications, we design the open-set 3D object retrieval to expand the application of traditional 3D object retrieval.

PIG-Net: Inception base Deep Learning architecture for 3D point cloud segmentation

Sindhu Hegde, Shankar Gangisetty

Computers & Graphics 2021

[Paper] [arXiv] [Webpage]

Unlike human vision, supervising the machine to analyze segments of an object is challenging & quite essential in various machine vision applications. We propose PIG-Net for 3D part segmentation of point clouds based on inception.

SHREC 2020: 3D point cloud semantic segmentation for street scenes

Tao Ku, Remco C.Veltkamp, Kiran Akadas, Shankar Gangisetty, et al.

Computers & Graphics 2020 - EuroGraphics 3DOR

[Paper][Github]

Scene understanding of large-scale 3D point clouds of an outer space is a challenging task. We propose GRanD-Net for semantic segmentation of large-scale 3D point clouds of street scenes.

3D Semantic Segmentation for Large-Scale Scene Understanding

Kiran Akadas, Shankar Gangisetty

ACCVW 2020

[Paper] [OpenAccess][Github]

3D semantic segmentation is one of the most challenging events in robotic vision tasks for detection and identification of various objects in a scene. We propose a lightweight semantic segmentation network with dilated convolutions.

3D Representations

An Evaluation of Feature Encoding Techniques for Non-Rigid and Rigid 3D Point Cloud Retrieval

Sindhu Hegde, Shankar Gangisetty

BMVC 2019

[Paper]

As powerful computation resources and scanning devices have led to an exponential growth of 3D point cloud data, retrieving the relevant 3D objects from databases is a challenging task. We propose the necessity of 3D feature encoding along with local descriptors for solving non-rigid and rigid point cloud retrieval.

Evaluation of Point Cloud Categorization for Rigid and Non-Rigid 3D Objects

Shankar Gangisetty, Sindhu Hegde, Supriya Satyappanavar

Uma Mudenagudi

ICVGIP 2018, WiCV@CVPR 2019 (show)

[Paper]

We propose a 3D object categorization framework comprising of IWKS & MTCS feature descriptors by approximating Laplace-Beltrami operator on point cloud data for non-rigid objects and MTCS for rigid objects.

Underwater Imaging

FloodNet: Underwater image restoration based on residual dense learning

Shankar Gangisetty, Raghu Raj Rai

Signal Processing: Image Communication 2022

[Paper]

In recent years, different methods have relied on the underwater image formation model and deep learning techniques to restore the underwater image, but tend to produce unnatural artifacts and reduced levels of sharpness. To address these challenges, we proposed FloodNet using residual dense learning with the objective of estimating restored underwater images from a wide variety of degraded underwater images.

Underwater image restoration using deep encoder–decoder network with symmetric skip connections

Shankar Gangisetty, Raghu Raj Rai

Signal, Image and Video Processing 2022

[Paper]

We propose end-to-end deep convolutional network architecture to restore the underwater degraded images and improve visual perception by utilizing the symmetric skip connections between encoder–decoder.

3D Data Processing and Indian Digital Heritage

Region of Interest-Based 3D Inpainting of Cultural Heritage Artifacts

Shankar Gangisetty, Uma Mudenagudi

JOCCH 2018

[Paper]

Example-based 3D inpainting of point clouds using metric tensor and Christoffel symbols

Shankar Gangisetty, Uma Mudenagudi

MVAP 2018

[Paper]

3D Data Processing of Cultural Heritage Artifacts

Shankar Gangisetty (Advisor: Uma Mudenagudi)

ICVGIP 2016

Best Doctoral Symposium Award

With the advent of modern digital technology there is a great surge of interest among the computer graphics and vision community in digital restoration of 3D models and many problems still exist in building the complete 3D reconstruction framework. In our research work, we address the problems of digital restoration of 3D models at cultural heritage sites specifically artifacts at Hampi, India. We propose to build a 3D reconstruction pipeline, referring to acquisition, data processing and rendering for digital restoration of 3D models at cultural heritage sites.

Region of Interest (ROI) Based 3D Inpainting

Shankar Gangisetty, Himanshu Shekhar, Uma Mudenagudi

SIGGRAPH ASIA 2016 (Poster)

[Paper]

We address the problem of 3D inpainting using ROI-based method for point clouds. We focus on inpainting of complex, irregular and large missing regions covering prominent geometric features by considering n self-similar examples.