Sweta Kumari

Ph.D
CNS Lab, IIT Madras

About Me

I am currently pursuing Ph.D. in Computational Neuroscience Lab (CNS Lab) under the supervision of Prof. V. Srinivasa Chakravarthy, Department of Biotechnology in IIT Madras. I am leading a team of 10 called Neuromotive which is working in a collaboration with an Automotive German Company, Continental. Our team is devoted to develop a Bio-Inspired model for object detection and classification by mimicking the "What" and "Where" pathway of primary visual cortex in human brain.

Research Interest

  • Artificial Intelligence in Neuroscience

  • Visual Attention

  • Hemineglect

  • Object Detection

Education

IIT Madras

Ph.D. in Computational Neuroscience

MANIT Bhopal

M.Tech in Bioinformatics

MAKAUT WB

B.Tech in Computer Science and Engineering

Project

Virtual Reality

VR_Assignment_Project_Demo_3D.mp4

Augmented Reality

2018-11-07-03-58-44-T6ILT.mp4

VR Model of IIT Madras Campus (Github Repository)

Traffic Sign Prediction Using Modified Q- Learning

Traffic sign detection is an important component of Advanced Driver Assistance Systems (ADAS). Deep learning has contributed immensely in progress of research in this field. The current state-of-the-art methods for traffic sign detection like Faster R-CNN and YOLO9000 use convolutional neural networks (CNNs) and were developed to detect objects in an image. A more naturalistic approach to detect traffic signs is to use road video data to harness temporal information of traffic sign movement. We propose a Deep Q Neural Network for predicting the location of traffic signs in the next frame, using the current frame. The proposed network has two parts: feature extraction and “value” estimation using reinforcement learning. First, a pretrained Alexnet network is used to extract the features of the input frames. Secondly, a convolutional layer was added, that outputs a feature map of dimension 29 x 63, which computes the value estimation of the input frame. A reward of 1 at the location where the next frame contains traffic signs is defined for each frame and is resized to 29 x 63. The weights of the network are fine-tuned for grayscale frames of road video data by back propagating the temporal difference error. The network was trained on 12,224 frames and tested on 715 frames. The model outputs an activity map, and after contour detection, the coordinates where the traffic sign might appear in the next frame are obtained. Our model achieved mean average precision (MAP) of 56.001% which beats both Faster R-CNN and YOLO9000 with MAP of 53.85% and 49.648% respectively. Our model took 18 hours for training on GPU, compared to 36 and 83 hours for Faster R-CNN and YOLO9000 respectively. Thus, our model not only outperforms both the state-of-the-art models in accuracy, but also takes lesser time to train and generate predictions.

[Paper]

Results on Grayscale Dataset

Faster_RCNN_Triangle_Night.mp4

Faster R-CNN

YOLO_Triangle_Night.mp4

YOLOv2

Prediction_Network.mp4

Prediction Network

Results on Color Dataset

video.avi

Traffic sign

test.avi

Traffic lights

video.avi

Pedestrian

Traffic Sign Detection by Attentional Search Using a Deep Q-Neural Network

Object detection in autonomous driving requires high performance in terms of both accuracy and time to make a quick and correct decision to avoid accidents. Detection of traffic signs in low quality images, bad weather conditions, nighttime, damaged, and obstructed signs are some existing challenges. Although deep learning-based solutions have demonstrated excellent performance on various pattern recognition tasks, their failures are creating a need for more bio-inspired solutions to the same problems. We propose Attentional Search Network, a model that inculcates attentional search inspired by the human visual system. Instead of searching for an object in each pixel of the image like the state-of-the-art model YOLOv2, our model is motivated to search for it near its expected location. Attentional Search Network learns the saliency map for the traffic sign in the given image frame by combining the global and local information. The network has two parts: feature extraction from the image which provides global information and feature extraction from the soft attention image which provides local information. A reward matrix for the next frame has been created in the training dataset. The proposed network is trained for 18 hours on 13,248 RGB frames of a road video recorded during both day and nighttime with an objective to minimize the temporal difference error (TDE) and achieved mean average precision (mAP) of 41.53% outperforming YOLOv2 that requires 24 hours for training which results in a mAP of 18.01% on the same testing dataset containing 1,893 frames.

[Reports]

HybridModelMixedvideo.avi

Attentional search network (ASN)

test.avi

ASN on Traffic lights

YOLOVideoMixed.avi

YOLOv2

Convolutional Elman Jordan Neural Network for Reconstruction and Classification Using Attention Window

In deep learning-based visual pattern recognition systems, typically the entire image is presented to the system for recognition. However, the human visual system often scans a large visual object by sequential shifts of attention, which is integrated for visual classification. Even in artificial domains, such sequential integration is particularly useful when the input image is too large. Some previous studies based on Elman and Jordan networks have explored only with fully connected layers using full image as input, but not with convolutional layers using attention window as input. To this end, we present a novel recurrent neural network architecture which possesses spatiotemporal memory called Convolutional Elman Jordan Neural Network (CEJNN) to integrate the information by looking at a series of small attentional windows applied over the full image. Two variations of CEJNN with some modifications have been developed for two tasks: reconstruction and classification. The network is trained on 48 K images and tested on 10 K images of MNIST handwritten digit database for both tasks. Our experiment shows that the network captures better correlation of the spatiotemporal information by providing the result with a mean square error (MSE) of 0.012 for reconstruction task and also claiming the classification with 97.62% accuracy on the testing set.

[Paper]

Copy of recons_elman.avi

Reconstruction in MNIST

Elman and Jordan Recurrence in Convolutional Neural Networks Using Attention Window

The retina of the human eyes constantly foveate on the target objects in a large visual field to recognize the object, and that foveated region is called the central fovea. At each instance of the eye movements, the human visual system integrates the significant information observed at the central fovea of the visual field inside a spatiotemporal memory to perform an object’s pattern recognition task. This integration property is widely useful in an artificial domain when the image size is too large. Therefore, presenting the full image to the visual pattern recognition systems in deep learning research is computationally expensive and biologically implausible. Being inspired by this biological hypothesis, we proposed five variations of Elman and Jordan recurrence in convolutional neural networks (EJRCNNs). Each of the five networks takes the input of a series of small attention windows, which is cropped out from different locations in the image. Here, the attention window contributes to the central fovea of the human visual system. The proposed networks integrate the information presented in all of the attention windows inside the context layers using recurrence connections in convolutional and fully connected layers. After processing all of the attention windows, networks perform the classification task. Elman and Jordan recurrences only in fully connected layers were partially explored in some previous studies using the full image. Each of the networks is trained on the MNIST [1] handwritten digit database. From our extensive experiments, the networks provide a better correlation of the spatiotemporal information by outperforming the RNN.

[Paper]

Addressing the working memory for object searching and classification task using attention windows

Attention and memory have many possible forms of interaction. If the memory has a limited capacity, it makes sense for the brain to be selective about what is allowed to enter it. Similarly, for the object searching task in the image, we need to deploy our attention on a specific location and a specific feature which reaches to the target object by tuning out the non-target objects. Inspired from this, we have proposed a model architecture which takes inputs of some cropped patches from the big image to predict the class and next location of the patches. Here, patch refers to an attention window. The architecture of the model has two major parts: one is a classifier network and the other one is a saccade network. Three attention windows of three different scales have been cropped out from the big image of 60x60 (in case of cluttered MNIST handwritten digit dataset). The cropped attention windows of 3 different scales have resized into the smallest or the central attention window’s scale (12x12), then stacked resized attention windows of size 12x12x3 are passed as input to the classifier and saccade network both. The heatmap representation of the central attention window (considered as eye position) after passing through some hidden layers is concatenated at the outputs of higher hidden layers of the classifier network and saccade network. The classifier network outputs the class prediction and on the other hand, the saccade network predicts one of the nine movement directions of the attention window. In this case, up, down, left, right, left-top, right-top, left-bottom, right-bottom, and no movement is considered as nine movement directions of the attention window. “What pathway” of the classifier network guides the saccade network by giving the reward signal at each saccade movement of the attention window. The saccade network gets reward 1 if the classifier network predicts the correct class of the objects, otherwise 0. Both of the networks keep storing the integrated features and locations of the attention window throughout the time to reach the target object. JK flip-flop recurrent layers, Elman recurrent layers as the local loop, and Jordan recurrent layers as global loop have been used for storing the integrated information in memory. The proposed model successfully gives 96.9%, 96.45%, 93.75%, and 90% classification accuracy on 28x28 MNIST, 60x60 Translated MNIST, 60x60 Cluttered translated MNIST, and 640x480 Yale Face B recognition dataset respectively.

[Paper]

original_video.avi

28x28 MNIST

translated_video.mp4

60x60 Translated MNIST

cluttered_video.mp4

60x60 Cluttered MNIST

face_yale2_flip_flop_scale4.mp4

640x480 Yale Face

publications

Journal Publication

  1. Sweta Kumari, Usha Chouhan and Sunil Kumar Suryawanshi , "Machine learning approaches to study HIV/AIDS infection: A Review", An International Peer Reviewed Open Access Journal For Rapid Publication (2017), http://bbrc.in/bbrc/2017jan-marchPDF/BBRC11_006.pdf.

  2. K. Sweta, C. Vigneswaran, and V. S. Chakravarthy, “The flip-flop neuron - a memory efficient alternative for solving challenging sequence processing and decision-making problems,” in BioRxiv, 2021. [ link]

  3. K. Sweta, Shobha, Nivethithan, and V. S. Chakravarthy, “ASM-3D: An attentional search model fashioned after what and where/how pathways for target search in 3D environment,” in BioRxiv, 2022. [link]

Conference Publication

  1. Niraj Singh, Sweta Kumari, V. Srinivasa Chakravarthy, Jitendra Kumar, “Traffic Sign Detection using a modified Deep Q Neural Network”, Proceedings of IAC in Budapest 2018 [Abstract] [Poster2]

  2. Sweta Kumari, S. Aravindakshan, Umangi Jain, V. Srinivasa Chakravarthy, “Convolutional Elman Jordan Neural Network for Reconstruction and Classification Using Attention Window”, Proceedings of ICICV 2020, https://doi.org/10.1007/978-981-15-6067-5_20.

  3. Sweta Kumari, S. Aravindakshan, V. Srinivasa Chakravarthy, “Elman and Jordan Recurrence in Convolutional Neural Networks Using Attention Window”, Proceedings of ICICC 2020, https://doi.org/10.1007/978-981-15-5113-0_83.

  4. Kumari S, Chakravarthy VS (2020) Addressing the working memory for object searching and classification task using attention windows, Bernstein Conference 2020, https://doi: 10.12751/nncn.bc2020.0051. [ppt]

  5. Chopra, Dhruv, Sweta Kumari, and V. Srinivasa Chakravarthy. "Modelling working memory using deep convolutional Elman and Jordan neural networks." JOURNAL OF COMPUTATIONAL NEUROSCIENCE. Vol. 49. No. SUPPL 1. VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT, NETHERLANDS: SPRINGER, 2021.

Blog

Books Review: Tuesdays with Morrie

Contact