Training Physics-Based Characters to Dance to Music (Best Poster award - MIG 2020)
This was one of the primary works towards my Master Thesis. Here is a link.
We are using Machine Learning techniques to animate any arbitrary character/agent dance to songs. The main contribution of the work is that our method does not require any motion capture data and can generate dance moves for a given character automatically. Also, we use a physics-based controller and train the character to dance in physical environments, thereby avoiding various artifacts common with kinematic based controllers, such as foot sliding, unnatural transitions, etc.
We are training a Generative Adversarial Network to generate a corpus of dance moves through physical simulations. The character is then trained to interact with physical environments using Deep Reinforcement Learning, and then dance to the beats of songs while maintaining balance. Finally, we construct a motion-graph for the character containing the full set of moves it has been trained on.
We are using various music information retrieval techniques to extract various features from the song, and then using a graph search method to generate choreography for the given song.
Semi-supervised Coordinated Locomotion controllers trained for a variety of characters
Learning Coordinated Locomotion in Latent Space for Articulated Characters
We are training arbitrary characters to learn locomotion skills using Deep Reinforcement Learning.
Our method does not require motion capture data, and proposes an automatic way of extracting the most important modes/ low-dimensional coactivations of a character given its anatomy.
The central idea of our approach is actuating different joints of the robot with random motor commands for short periods of time. We are evaluating various methods such as PCA and Variational Autoencoders (VAE) to construct this low dimensional latent space, and then use it to train locomotion skills using DRL. Our results show considerably improved locomotion gaits for the character, which are otherwise difficult to achieve when trained in the high dimensional joint-action space.
Simplified Network Architecture (Top)
Reconstructed images from latent space (Bottom)
Predicting Autism from Brain fMRI Scans with Unsupervised Latent Space Embeddings
This is my final project for the course CPSC 8420: Advanced Machine Learning, with Dr. Kai Liu.
Functional Magentic Resonance Imaging (fMRI) is a brain scan method that measures neural activity in the brain across a temporal axis. The data is represented as a 4D tensor (3D space + 1D time) with close to 1 million features per scan. With very limited number of images available compared to the number of features, learning useful representations of the data remains a challenging task with traditional Machine Learning approaches.
In this project, we explore the idea of using Deep Learning based unsupervised training methods to obtain latent spatio-temporal embeddings of the data. Specifically, we use 3D Convolution-based Autoencoders to extract spatial features of the data, and an Attention-based Sequence to Sequence encoder-decoder recurrent network to capture the temporal characteristics. We then use these latent representations for supervised prediction of Autism Spectrum Disorder (ASD) using traditional machine learning techniques such as Kernel SVM, logistic regression, etc.
Multi-Hypergraph structure of the mined network
Movie Recommendation System using embeddings produced by Hypergraph Convolution Networks
We are working on a scalable movie recommendation system that can generate recommendations as well as reasonings for the same. First, a movie knowledge graph represented as a multi-hypergraph network is constructed. Here, each node represents a movie, and different nodes are connected by hyperedges of various modalities such as cast, director, and plot-keywords. The additional information is mined using TMDBpy.
The embedding of each movie is generated using a hypergraph convolution method, parameterized by a neural network. We aggregate local features of each movie with respect to its modalities (cast, genre, keywords, users) to generate movie embeddings. Similarly, user embeddings are generated by aggregating the information from adjacent movies and co-cited users. A neighborhood sampling method is used for computation graph construction, similar to GraphSAGE, which allows the parameters to be trained in mini-batches and makes the proposed method scalable to large networks.
The method is tested on the MovieLens-1M dataset, and it produces promising results with various commonly used evaluation metrics, such as Precision@K, Recall@K, F1@K, Hit@K, and NCDG@K.
More Academic Projects
Dancing with Proximal Policy Optimization
This was my final project for CPSC 8810: Motion Planning. I implemented Proximal Policy Optimization (PPO) in Tensorflow and tested on various gym environments (images on the right).
The goal was to use the PPO implementation to train agents to dance using physical controllers, even in absence of motion capture (or other expert-sources) data.
First a vocabulary of poses was constructed by sampling the pose-space of a given character. The agent learnt to interpolate between dance poses, and was rewarded for hitting zero velocities whenever the beats of a song hits.
PPO Trained in various gym environments
Monte Carlo Path Tracing with Photon Mapping
This was my final project for CPSC 6050: Computer Graphics in Fall 2019.
I developed a renderer in C++ with Global Illumination with Monte Carlo Path Tracing and Photon Mapping. This was one of the most fun projects I have done, especially because I did not use any existing rendering/vector/GL libraries for this at all! Please check out this Document for more details.
Various Deep Learning Projects
These are some selected Codes from one of my favourite courses at Clemson - CPSC 8430/Formerly 8810: Deep Learning, taught by Dr. Feng Luo.
S2VT Implementation: Sequence to Sequence Video to Text: Implemented the automatic video captioning system reimplementing the S2VT paper in Tensorflow.
Generative Adverserial Neural Nets: DCGAN, WGAN, WGAN-GP, LSGAN: Implemented various GAN structures in Tensorflow and generated images on the CIFAR10 dataset.
Images Produced from training various GAN structures on CIFAR10 dataset
Traversal Set Centralities for Networks
This is an implementation of calculating traversal sets centrality score for each edges of a network.
To calculate the centrality, the traversal set of each edge e is calculated, which is a set of nodes which has a geodesic through edge e. The centrality score of each edge is equal to the minimum vertex cover, or maximum flow, through the graph induced by the traversal set of that edge.
Scores were computed for various real-world networks. Edges that form interconnections between different clusters were scored higher than edges that connect nodes of the same cluster.
Some AI / Motion Planning Projects
Some of my favourite course assignments from the CPSC 6420: Artificial Intelligence, and CPSC 8810: Motion Planning
Unlimited Range Surveillance Robot Controlled by Android Application
This was the final project for my Bachelor's Degree (2016).
We developed an Android application to control a Arduino robot through MQTT messaging. As long as there is internet connection, the robot could be controlled using an Android Application, and the camera attached in front of the robot would stream live video to the client's app using an external media server.
Soccer/Football Fantasy Prediction Social Networking App
Android app to (a) enable users to predict scorelines and scorers of football matches in the BPL 2016-17 season,(b) make customized accounts, groups, friends, (c) maintain statistical data of performance and see their progress on global leaderboard.
Used Java/Android programming, MySQL server, PHP.
Live Auctioning Android Application
Using this app, we can upload an excelsheet of items with attributes. Participants of the auction can sign in (with one of them signing in as the auctioneer), and a fixed budget is added to everyone's account. Every minute a new item will be displayed on all the connected screens and everyone will have a limited time of submitting a bid. The person with the highest bid wins the item, and the corresponding amount is deducted. The primary intention of creating this app was so that me and my friends could auction cricket players from the video-game Ashes Cricket 2009 and compete against each other with our own teams! (The auctions were more fun than the matches themselves)
The major challenge of this work was to make the auction work in real-time, a problem that I solved using Google Cloud Firestore, an excellent service by Google that allows to maintain a real-time connection between a client (android phone) and a document-based database on the cloud.
The app was never released on playstore or elsewhere, because I got too busy with my MS applications, but it remains one of the coolest things I did before joining Clemson. The scantly documented project could be found here.
Tic-Tac-Toe "AI" on 5 x 5, and 7 x 7 board
20-year old me will be very disappointed with me if I don't refer to this code as "AI". This was one of my first projects (way back in 2014), where I found my love for programming and creating smart agents. Admittedly, this is not the greatest code I have written, but it perhaps is the hardest work I have ever done! All the screens from this game is hand-drawn, and the code itself was written in Android without using any of the built-in APIs (not even buttons). I learnt so much about myself while writing this, that I can't imagine not putting it on my resume!
On the technical side, the program uses a simple heuristic based greedy search method to play Tic-Tac-Toe on large boards, some non-traditional rules add more flavour to the game design.
This used to be in Android Google Play, and had 500+ downloads. Due to lack of maintenance and the ever-growing policies of Google Play apps, I have taken the app down last year! The source code is still available here!