A python library consisting of pipelines for visual analysis of different sports using Computer Vision and Deep Learning [Work in progress]. Link
Challenges: Variations in analysis for each sport and the challenges associated with each. For example, Multi person detection and tracking in football, high frame rate while detecting and tracking in cricket, etc.
Contributions:
1. Badminton player detector
2. Badminton heatmap generation
3. Football player detection and tracking
Implementing Iterative Knowledge Distillation (IKD) for Data Free Learning of Student Networks(DAFL) for compressing huge models into lightweight models. The student networks learn without ground truth labels using the soft labels from the teacher network and a GAN based network. Link
Challenges: Minimal performance tradeoff while transferring knowledge from large models to smaller models, Absence of training data, GAN training is unstable.
Contributions: Data free approach for Iterative Knowledge Distillation, a modified loss function for stable knowledge compression on smaller batches with T-softened probability distributions.
Recognizing Dynamic Hand Gesture movements using 3D-CNNs and mapping them for human-computer interaction based tasks such as controlling portable devices. Link
Challenges: Spatiotemporal modelling, Efficient detection for real-time applications.
Contributions: Implemented a highly efficient gesture recognition tool which is able to completely control portable devices such as laptops, mobiles, etc.
Tool for differently-abled people to recognize hand gestures and convert to text/audio based on sign language. It can also convert Text/Voice into a motion sequence of gestures and read out newspapers/articles. Link
Challenges: Background motion and noise, high practical application to enhance the lives of differently-abled people adding to complications in design.
Contributions:
1. Segmenting the hand from the background to remove background noise.
2. Suggesting most frequent words/sentences used by the person based on the detected gestures for efficient communication.
Reconstruction of a person’s face based on his/her speech using an encoder-decoder network on AVSpeech dataset. Link
Challenges: Unstable training for Generative models
Motivation, Sentiment and Offensiveness classification of Memes on Social Media for a safe internet using both Image and Text data.
Challenges: Memes contain a lot of text within the image which can cause problems, Moreover, multi-modal networks require careful selection of fusion strategies.
Contributions: Explored different fusion and embedding strategies for multi-modal meme emotion classification on 3 different tasks. We were able to create a single common model that performs equally well on all three tasks.
A modern implementation of classic arcade game Space Invaders made using Pygame. Link