Visiting Scholar @ MIT (Short term)
June2025
Through the MISTI fellowship, I had the opportunity to present my current research and brainstorm how transportation affects urban areas, associated issues with urban planning in the Global South, and our approach to solving them. Hosted by Senseable City Lab at MIT, Cambridge.
Junior Research Fellow @
INDIAN INSTITUTE OF SCIENCE
DEC23-Current
As a Junior Research Fellow at one of India's best research institutes, my research focuses on building a digital twin of the city to enable next-generation intelligent transportation systems. My work involves leveraging computer vision to analyse complex traffic dynamics, utilising both uncalibrated ground-level cameras to determine vehicular turning patterns and UAVs for broader traffic monitoring and behavior analysis. A key emphasis of my approach is to extract the most granular insights from the global south datasets, creating a highly detailed and accurate picture of urban mobility.
This rich data serves as the foundation for forecasting traffic flow across entire road networks using advanced geometric deep learning models. A significant aspect of my research is the practical application of these models, focusing on developing end-to-end solutions optimized for deployment on edge computing devices like the Nvidia Jetson. To complement this, I also contribute to building representation learning models for label-free tasks, advancing the development of more data-efficient AI systems.
Research/SDE Intern @
MODECI, Princeton University & University College London
MAY-AUG 22
In my role as a Research Intern at ModECI (Model Exchange and Convergence Initiative), a collaborative effort among multiple investigators dedicated to establishing a standardized format for exchanging computational models across various software platforms and scientific domains like neuroscience, machine learning, and artificial intelligence, my tasks included developing serialization and deserialization pipelines. Specifically, I focused on adapting PyTorch vision models to standard format, enabling seamless interchange in JSON format for enhanced model interoperability.
GitHub - https://github.com/ModECI
Unsupervised Extraction of Vehicle Turning Patterns from Uncalibrated Camera Setups using Spatio-Temporal data
(Underway)
White boxes are manually drawn vs Blue polygons are algorithm predictions (takes time to load)
City is a collection of complex road networks, and each junction sees a lot of vehicle movements across the duration of the day. Traffic load at each junction depends on its connected counterparts. In order to correctly count and enforce policies at that intersection, it requires manual counting or labeling of entry/exit regions. At the current scale, with the target of scaling this to 6,000 cameras in Bengaluru city alone, it is a tedious and cumbersome task that cannot be scaled to the 110 cities selected under the Smart City Mission India. To automate this entire process, we utilized vehicle trajectory data (comprising both spatial and temporal components) to predict the vehicle's entry/exit hotspots, supplementing it with custom clustering algorithms and techniques to create a handcrafted algorithmic pipeline. This, in turn, will be used for the downstream task of traffic prediction at an intersection using GNNs, different "what if" scenarios (in case an intersection closes).
STEAM: Squeeze and Transform Enhanced Attention Module
2024
Channel and spatial attention mechanisms introduced by earlier works enhance the representation abilities of deep convolutional neural networks (CNNs) but often lead to increased parameter and computation costs. While recent approaches focus solely on efficient feature context modeling for channel attention, we aim to model both channel and spatial attention comprehensively with minimal parameters and reduced computation. Leveraging the principles of relational modeling in graphs, we introduce a constant-parameter module, STEAM: Squeeze and Transform Enhanced Attention Module, which integrates channel and spatial attention to enhance the representation power of CNNs. Additionally, we introduce Output Guided Pooling (OGP), which efficiently captures spatial context to further enhance spatial attention. We extensively evaluate STEAM for large-scale image classification, object detection and instance segmentation on standard benchmark datasets. STEAM achieves a 2% increase in accuracy over the standard ResNet-50 model with only a meager increase in GFLOPs.
Benchmarking Object Detection and Tracking for UAVs: An Algorithmic Comparison
2024
Object detection and tracking tasks have been pursued by many researchers for a very long time, from traditional computer vision techniques to advanced deep learning architectures. Various object detection and tracking models have been developed for unmanned aerial vehicle (UAV) applications. However, to our knowledge, no study has yet provided a comparative analysis of existing detection and tracking models and their combinations. In this study, our focus is on implementing various object detection and tracking models for edge device deployment on UAVs. Building detectors for unmanned aerial vehicle platforms remains a challenging task and no study compares these models with UAV and edge feasibility in perspective. In this paper, we combined various object detection algorithms with different multi-class multi-object trackers to track multiple targets from the video feed and test performance on edge-device. With this comparison, we achieved a comprehensive analysis of current state-of-the-art tracking and detection algorithms that best suit the UAV use cases with different applications.
International Conference on Vehicular Electronics and Safety (ICVES) 2024 - https://ieeexplore.ieee.org/document/10928092
This paper presents an efficient solution for weed classification in agriculture, focusing on optimizing inference performance while respecting agricultural constraints. It introduces a Quantized Deep Neural Network model that classifies nine weed classes using 8-bit integer quantization, reducing the model size and inference time while maintaining accuracy. The study evaluates this approach on ResNet-50 and InceptionV3 architectures, showing significant reductions in model size and inference time in real-world scenarios on processors including Desktop, Mobile, and Raspberry Pi. This work offers a promising direction for efficient AI applications in agriculture with broader potential uses.
Conference on Robots and Vision / International Conference on High Performance Computing, Data, and Analytics (HiPC) 2024 - https://crv.pubpub.org/pub/g5b99349/release/1
KERAS-CV
Contributed to Keras which is a prominently used open-source library for computer vision research for modular computer vision components that work natively with TensorFlow, JAX, or PyTorch.
GitHub (stars 852) - https://github.com/keras-team/keras-cv
TRAIN YOUR OWN YOLO
Contributed to one of the most commonly used machine learning models in object detection's open-source library for easy and beginner-friendly implementation. From preparing the dataset to annotations, to model training everything is covered.
GitHub (stars 633)- https://github.com/AntonMu/TrainYourOwnYOLO
TARDIS
Contributed to the TARDIS repository used extensively by astrophysicists researchers to simulate supernovas. TARDIS packages provide several tools for physics calculation and visualization to make your supernova research easier.
GitHub (stars 182) - https://github.com/tardis-sn/tardis