In around May 2020, Google announced their program for a summer school in Artificial Intelligence. Intrigued by the program and speakers, I decided to apply for it. The application process required me to write a review of a paper which gave me the opportunity to analyse a very interesting paper in detail. After about 2 months of waiting, the results were announced and I was selected as one of the 150 students all over India to participate in the program as part of the Computer Vision track.
The entire event was a 3 day event with talks and tutorials by Google researchers on a wide spectrum of topics. Moreover, the meet room sessions gave me the opportunity to interact with other google researchers as well as some of the top upcoming minds in India. The days concluded with Panel discussion sessions by top researchers and the overall experience was extremely enjoyable.
Created an AI assistant as a part of the hackathon for differently-abled people and was the second runner-up in the competition.
I started off with gesture classification on ASL dataset as gesture recogntion was the base of the project.
On testing model with live images, i found out that accuracy was not very great on real-life images even after using optimization techniques.
I realised that real-life images have a lot of noise in the background and thus accuracy is low.
After some research, i came up with a solution to use background subtraction to eliminate the noise in the background and my accuracy in real life images significantly improved to accuractely classify 28/29 gestures with over a 98% accuracy which is better than most models available.
I then processed and added more functionalities like smart word and sentence recommendations and converting text/audio to a live motion sequence of gestures for differently abled people to understand in their sign language and also added functionality to read out text from articles and newspapers from their images.
In the end, I was able to secure 3rd position in the competition.
Creating a model for reconstructing a person's face from his/her voice sample.
The project started off with understanding the workflow and creating a timeline of the tasks to be done.
Firstly, I extracted the voice segments and face of the people from youtube videos using the AVSpeech dataset.
Next, I augmented the audio segments with themselves until they reached a fixed clip size to make the inputs uniform for the encoder network.
After that, I extracted the facial features from the extracted faces using VGG vace which would be the output ground truth values for my encoder network and proceeded with building and training the encoder network.
The next task was to reconstruct the image of a person from the output of the encoder using a Face decoder network.
I built the decoder network to do the same usign transpose convolution layers and sm currently in the process of optimization of hyper-parameters.
In the end, I was able to create a model that can give an approximate of a person's facial looks from his / her audio samples.
The problem statement was to create a model to detect objects in images.
I started this project by doing a detailed study of the concepts of object detection and computer vision which helped me prepare a plan of action.
Since the competition was a 2 month long competion involving three stages, my primary focus was to create a base model for the second round and then optimize it.
I cleared round two and proceeded to round three in which i started using advanced optimization techniques to improve my model performance and finally acheived a nationwide rank of 29.
GLMC is a specialized linear algebra library prepared with C along with a python wrapper.
I started this project by doing a detailed study of the concepts of linear algebra that i had previously learnt and leaning some new ones which helped me set up a strong foundation to code optimally.
The next step was to decide a timeline and create a rough model for referrance.
And finally came the coding part. The coding part of the project was made relatively simple due to the strong base set up before it by the rough model and revision of concepts of linear algebra. After a few weeks of coding, the C library was completed and it could perform functions for 2x2,3x3,4x4 matrices and 2D,3D,4D vectors. It could also perform functions between matrices and vectors. The matrices were column major and vectors were column vectors. Note that the exmphasis was on speed and most of the functions were hence hard coded.