About Me

I am a first-year PhD student in Computer Science at Stanford University, currently advised by Carlos Guestrin. I am enthusiastic about bridging the gap between machine and human intelligence.

I am fortunate to be supported by the Stanford EDGE and SOE fellowships.



Metabolomics (2020-2021)

I enjoyed my year as Deep Learning Scientist at ReviveMed, advised by Leila Pirhaji from September 2020 to September 2021.

I re-imagined the metabolomics signal detection pipeline, thus leading to a patent and the discovery of potential kidney cancer biomarkers.

My work was guided by Ernest Fraenkel (MIT Biological Engineering) and Clary Clish (Broad Institute of MIT and Harvard).





MIT (2014-2020)

I graduated from MIT with B.S. and MEng. degrees in Computer Science in 2018 and 2020.

During my MEng., I was advised by Leslie Kaelbling and my research at the Learning and Intelligent Systems group centered around probabilistic modeling and inference, and applying graph neural networks to 3D spatial and geometrical problems.

I enjoy teaching and I was fortunate to have my MEng. funded through teaching assistantships: 6.008: Intro to Inference (Fall 2018, Fall 2019) and 6.036: Intro to Machine Learning (Spring 2019, Spring 2020).

Additional important figures during my time at MIT are: Patrick Winston who introduced me to research in artificial intelligence, Polina Golland and Gregory Wornell who taught me how to teach, and Ferran Alet who helped hone my scientific thinking.

Publications

1. Adversarially-learned Inference via an Ensemble of Discrete Undirected Graphical Models

Adarsh K. Jeewajee, Leslie P. Kaelbling

Neural Information Processing Systems (NeurIPS), 2020


Published and invited for poster presentation - Paper - Video - Code - Slides

2. Robotic Gripper Design with Evolutionary Strategies and Graph Element Networks

Adarsh K. Jeewajee*, Ferran Alet*, Maria Bauza*, Max Thomsen*, Alberto Rodriguez, Leslie P. Kaelbling, Tomás Lozano-Pérez

(* equal contributions)

NeurIPS Workshop on Machine Learning for Engineering Modeling, Simulation, and Design (NeurIPS ML4Eng), 2020


Published and invited for poster presentation - Paper

3. Graph Element Networks: Adaptive, Structured Computation and Memory

Ferran Alet, Adarsh K. Jeewajee, Maria Bauza, Alberto Rodriguez, Tomás Lozano-Pérez, Leslie P. Kaelbling

International Conference on Machine Learning (ICML), 2019


Published and invited for oral presentation (4.5% of all submissions) - Paper - Code - Slides

Research Projects

Adversarially-learned Inference via an Ensemble of Discrete Undirected Graphical Models

Machine Learning | Probabilistic Models | Adversarial Learning

Paper | Video | Code | Slides

Training: A neural network L (learner) produces a full set of parameters (edge potentials) for the graphical model (GM), given noise. The GM models a distribution over N random variables.
Belief propagation is run on the GM, producing node marginal probabilities, which are decoded as a data sample.
A discriminator (not shown) judges whether the sample is real (from the true data distribution) or fake (produced manually). This signal is used to train this pipeline end-to-end.
Testing: At inference time, a subset of the variables are observed (evidence), and we produce M sets of beliefs over the unobserved nodes, conditioned on the evidence.
How do we produce this ensemble? We sample M noise vectors and run M pipelines similar to the one seen in the left image, the only difference being that the same evidence (red) is also fed to the GM.
The ensemble of beliefs is combined into one final belief vector over the unobserved variables.

Undirected graphical models are compact representations of joint probability distributions over random variables. Given a distribution over inference tasks, graphical models of arbitrary topology can be trained using empirical risk minimization. However, when faced with new task distributions, these models (EGMs) often need to be re-trained.

Instead, we propose an inference-agnostic adversarial training framework for producing an ensemble of graphical models (AGMs). The ensemble is optimized to generate data, and inference is learned as a by-product of this endeavor.

AGMs:

  • perform comparably with EGMs on inference tasks that the latter were specifically optimized for

  • show significantly better generalization capabilities than EGMs across distributions of inference tasks

  • are on par with GibbsNet and VAEAC, state-of-the-art deep neural architectures, which allow arbitrary conditioning, like AGMs.

  • allow fast data sampling, competitive with Gibbs sampling from EGMs.

Rendering scene images from novel viewpoints

Geometric Deep Learning | Representation Learning | Auto-encoding Architectures

Paper | Blog Post | Code

9 mazes are placed in a 3x3 grid structure. Left shows generated scenes and right shows a top-down view of the 9 mazes. After propagating information about how the mazes appear in a few given locations, we query the GEN for the inferred view at new query coordinates, while rotating 360 degrees for each position. The red nodes (right) are active nodes from which information is interpolated to generate a new view, for each query location.

We investigate whether Graph Element Networks (a graph convolutional neural network architecture that we published in ICML 2019) can be used to organize memories spatially, in the problem of generating scene images from novel viewpoints.

We sample 3D mazes from the DeepMind Lab game platform (dataset) and each maze comes with a series of images. Each image reveals how the maze appears, from a specific 2D coordinate and given a specific (yaw, pitch, roll) triple for the camera.

In the animation, we have mazes positioned in a 3x3 grid structure. The animation shows generated scenes on the left and a top-down view of the 9 mazes on the right. We first sample views from different places inside the mazes, and insert them into the GEN. We then query the GEN for the inferred view at new query coordinates, while rotating 360 degrees for each position. The red nodes (in the top-down map) are active nodes from which information is interpolated to generate a new view, for each query location.

In this problem, the GEN:

  • has its nodes spread across the 2D ground plane of the mazes (see white circles in right image)

  • learns a useful representation for what mazes look like and we interpolate information from its nodes to generate new images

  • compartmentalizes spatial memories since it trains on mazes one by one but at test time succeeds in absorbing information from 9 mazes simultaneously

How do we decode node states to draw scene images? This work was done to improve on Deepmind's work (Eslami et. al.) where they have a representation-learning network and an image-generation network ressembling the standard DRAW architecture. They can only represent one maze at a time as their model absorbs information without spatial disentangling. We use our GENs for representation learning, and apply their standard drawing architecture to decode our hidden states.

Optimizing the shape of robotic fingers for increased object grasping accuracy on adversarially-shaped objects

Robotics | Computer Graphics | Machine Learning

Paper (partial)| Code (simulation) | Code (morphology optimization)

WSG-32 parallel gripper (pybullet) simulation
Left gripper finger morphology being optimized to pick up bottles

We investigate whether there exist 3D-printable robotic finger morphologies that have better object grasping performance than default finger shapes.

We open-source a WSG-32 parallel-jaw gripper simulation (see first animation), and collect a dataset of objects from distinct categories which are hard to grasp (e.g. bottles or cones or adversarial objects...) on which we would like our optimized gripper morphologies (see second animation) to have better grasp success than the out-of-the-box WSG-32 gripper configuration.

We search the space of gripper morphologies using augmented random search or evolutionary algorithms, and propose changes to the base morphology at each iteration by:

We find that:

  • optimizing the shape of the robotic fingers increases grasp performance

  • restricting our evaluation set of objects to a specific type reflects on the salient visual features of the emergent gripper finger morphologies

Learning a policy for the locomotion and morphology adaptation of a bipedal walker, on difficult terrains

Reinforcement Learning | Deep Learning | Curriculum Learning

Paper | Slides (gifs of emergent morphologies) | Code

Default-shaped agent can only learn to walk but cannot go further in this environment
By altering its body morphology, it can learn to walk and become small enough to crawl below the obstacles

We investigate whether allowing a locomotive agent to modify its own morphology has any beneficial effect on the task of finding optimal locomotive policies, on terrains of varying difficulty.

We use the augmented random search algorithm to optimize the policy of a bipedal walker (parametrized by a feed-forward neural network). We allow the agent to modify its morphology to increase its score, and by doing so we observe that:

  • convergence to optimal policies (across all terrain types) is considerably faster

  • the agent solves environments that were initially impossible with the default morphology (see animation on the left).

We vary the difficulty of the terrains by making the terrain rougher (including hills and valleys), and by including obstacles such as blocks and pits.

We also search for a morphology-policy pair that generalizes to many environments with very little fine-tuning. See our slides for gifs of discovered agent morphologies.

Grouping similar questions on online discussion boards through domain adaptation

Natural Language Processing | Transfer Learning | Deep Learning with Domain Adaptation

Paper | Code

We learn a high-dimensional encoding per question and the similarity between questions is given by the cosine similarity measure between the encodings.

In online forums, a major area of interest is the consolidation of questions: merging similar questions into one, to prevent diluting answers, saving storage etc.

I. Our main goal is to train language models that can detect if two questions are similar or not.

II. Language models require large datasets of annotated questions to be trained on and some online forums lack that much data. Our second goal is to train our models on annotated data from a forum X and fine-tune it on the small amount of annotated data available for forum Y, hoping for generalization.

In goal I, we need to learn the best possible encoding for our questions, such that some similarity metric (here cosine similarity) between questions in that encoding is accurately measuring how similar two questions are. We train encoding architectures described by Lei et al. (using a CNN or LSTM) on our dataset of annotated questions.

For goal II, we use a domain adaptation technique from Ganin et al.. Our training pipeline now uses encoders from part I, and we again minimize the loss associated with predicting question similarity, now using a mixture of mostly annotated questions of dataset X and few from Y.

We also pass the encoded questions from both domains into a discriminator whose job is to predicting the domain X or Y of each question using their encoded representation.

We have two training pipelines,

  • We minimize a classification loss to improve the discriminator in isolation (without updating the encoder weights).

  • To train the encoder, we minimize [(lambda * negative classification loss) + similarity loss from part I], so that the encoder learns to encode questions in a domain-invariant way. Essentially we are maximizing the discrimination loss and we want the learned encoding to make the discriminator unsure of which domain each question is from, even though in isolation, the discriminator is getting better and better at its job.

Evaluating the hardware limitations of Google Glass through augmented reality games

Augmented Reality Game Development | Wearable Devices | Performance Optimization

Video | Code

Demo of the "safari adventure" game being played in my dorm room, on Google Glass

We investigate the extent to which the Google Glass device could be used for augmented reality games, while avoiding issues such as overheating and running out of memory. We design an augmented reality game ("safari adventure") and optimize its performance to measure actual game play time that a user can get from the device.

The game can be run on any Android device and has been optimized for Google Glass. The game uses one's camera and accelerometer. It was designed using the Unity3D engine, with C# as scripting language. The animals merge into one's actual room decor, and extensions of this game could be used to help children get acclimatized to new surroundings, like classrooms etc.

The logic behind the game is rather simple:

  • load an animal at start at a random position in the user's 3D neighborhood

  • if the animal is kept within the camera focus for more than t seconds, a photograph is taken and the animal is replaced by another one in another random position

  • the game ends after n seconds

The game starts lagging after about 60 seconds and the Google Glass gets too hot to be worn after about 120 seconds. Running the game plus a screen recorder was overkill for the hardware which is why the video/gif is choppy.

After optimizing for performance, game time was boosted to around 110 seconds with no lagging and to about 200 seconds without overheating. Some performance optimization ideas were to:

  • use less detailed animal assets

  • check presence of animals in the camera focus only when the accelerometer is quasi-steady