Welcome to my personal website! I am a Computer Vision Engineer at Verkada, which is a security camera unicorn in the San Francisco Bay Area. I am the earliest member of the five-person strong Computer Vision team which manages all of Verkada's awesome Computer Vision functionalities. Apart from training models, developing new features and general maintenance, my primary contribution at Verkada has been the design, development and release of our state-of-the-art license plate recognition system.

I completed my MS in Computer Vision from Carnegie Mellon University, under the wing of Prof. Jeff Schneider, where my capstone project focused on Reinforcement Learning for self-driving cars. We trained an image-based end-to-end self-driving agent on the Carla simulator using RL and IL. My primary interests lie in the practical applications of deep learning-based computer vision systems, particularly in real-time deployment on cloud and edge devices.

My previous work experience includes my internship with the perception team at Uber ATG and my job in India as a Research Engineer at Conduent Labs India (formerly XRCI). At Uber, I developed in-house Adversarial RL/IRL modules to retrain their vehicle motion prediction solution, while at Conduent, my work resulted in three publications as listed below. 

I received my undergraduate degree in Electrical Engineering from IIT, Delhi. I am indebted to Prof. S. D. Joshi for his guidance on my undergraduate thesis and in life, and I appreciate Prof. Sumeet Agrawal for his exceptional course offering in Machine Learning.

If you want to guess which house I'm from, here's a hint: सर्वश्रेष्ठम् सर्वसुन्दरम् काराकोरम् काराकोरम् 

Publications

(All relevant materials which can be made available are linked)
A flow chart describing our text to image matching architecture. We found Fisher-NN hybrid architecture along with a novel symmetric triplet loss to be optimal.

Zero shot License Plate Re-Identification

PDF   -   Slides   -   Poster   -   Article   -   WACV 2019

Mini Abstract: We present a robust license plate re-identification system that matches plates in a holistic manner against a text database of known customers. We proposed using a novel multimodal triplet loss function on Kumar et al's hybrid Fisher Vector based architecture to outperform existing SoTA in this task by ~1%, achieving 99.6% matching. 

This work is complementary to performance of traditional OCRs on the LPR task and greatly boosts their performance. We also propose a holistic end-to-end Fisher OCR which performs with similar accuracy as traditional character level OCRs.

An example of a frontal view image with ROI detection using YOLOv3

VPDS: An AI-Based Automated Vehicle Occupancy and Violation Detection System

PDF   -   Slides   -   Article   -   IAAI 2019

 Mini Abstract: We used latest segmentation and object detection/classification techniques to count the number of passengers in vehicles using HOV Lanes. The data, front and side view images, were produced and labelled by Conduent using Near-IR cameras. We used YOLOv3 to extract front and side ROIs and classified vehicles into violators/non-violators using two separate GoogleNets to count the number of people in a vehicle. I trained models for deployment on highways New York, San Francisco and Los Angeles. 

Interpolation over the noise input for a fixed text label input."NYABC1234" does not exist in real life.

Parametric Synthesis of Text on Stylized Backgrounds using ProGANs

PDF   -   arXiv

 Mini Abstract: Recent progress in Progressive Growing of GANs and text to image synthesis had made it possible to get high resolution generated images. We use the high quality text "signatures" obtained in the License Plate Re-ID project with ProGANs modified to do Generative Adversarial Text to Image Synthesis. We obtained high quality generated images and were able to fool a Tesseract based Industrial OCR using this method with 89% accuracy on fake plates as compared to 92% on NY plates.

Projects

The final SAC agent learns to avoid collisions and obey traffic light rules.Interestingly, it stops for yellow lights which is a self-learned behavior.

Reinforcement Learning for Self Driving Cars

We developed an end-to-end reinforcement learning model which takes in RGB images, waypoints and outputs from previous frame to produce vehicle speed and steer.

Our agent beats previous works on the no-crash benchmark (regular and dense).

Step 1 of refining pretrained models for a predetermined set of important classes. More details in the presentation.

 Online Low-shot Learning

 We proposed an online low-shot learning approach to achieve better accuracy than the pretrained model on a set a of "important" classes. We use nearest neighbour approach in the feature space with a threshold. We explore several methods to do finetuning in an online manner and found that K-Means clustering to create prototypes works well to increase True Positive rate.

Dense reconstruction with 50% compression ratio

Denser Reconstruction Using ORB-SLAM2

 We used predicted camera trajectory from ORB-SLAM2 to unproject RGB-D frames into 3D. We applied point based fusion during the process to achieve efficient memory usage during this process. 

Minor Projects

t-SNE plot of a custom CNN trained on the EMNIST dataset.

An analysis of the embedding space

Report   -   Poster

The aim is to explore alternate methods of classification apart from the softmax layer. The motivation for this is described beautifully by this blog post by Christopher Olah.  As illustrated in the blog post, and through experimentation, we see that using K-Nearest Neighbour classification with PCA/NCA can capture certain tangled manifolds which are tough to capture using shallow NN networks. This was a course project for the Math Fundamentals for Robotics Course.

Examples of synthetically constructed and real fingerprint data.

Learning Fingerprint Identification with fake data

This was a collaborative effort with Indu Joshi, a PhD student from IIT Delhi, and Ayush Utkarsh from IBM India Labs part of which was also presented as a course project.

Get in touch at mgmayank18@gmail.com