I re-imagined the metabolomics signal detection pipeline, thus leading to a patent and the discovery of potential kidney cancer biomarkers.
I graduated from MIT with B.S. and MEng. degrees in Computer Science in 2018 and 2020.
During my MEng., I was advised by Leslie Kaelbling and my research at the Learning and Intelligent Systems group centered around probabilistic modeling and inference, and applying graph neural networks to 3D spatial and geometrical problems.
I enjoy teaching and I was fortunate to have my MEng. funded through teaching assistantships: 6.008: Intro to Inference (Fall 2018, Fall 2019) and 6.036: Intro to Machine Learning (Spring 2019, Spring 2020).
Additional important figures during my time at MIT are: Patrick Winston who introduced me to research in artificial intelligence, Polina Golland and Gregory Wornell who taught me how to teach, and Ferran Alet who helped hone my scientific thinking.
2. Robotic Gripper Design with Evolutionary Strategies and Graph Element Networks
Adarsh K. Jeewajee*, Ferran Alet*, Maria Bauza*, Max Thomsen*, Alberto Rodriguez, Leslie P. Kaelbling, Tomás Lozano-Pérez(* equal contributions)
NeurIPS Workshop on Machine Learning for Engineering Modeling, Simulation, and Design (NeurIPS ML4Eng), 2020
Published and invited for poster presentation - Paper
3. Graph Element Networks: Adaptive, Structured Computation and Memory
Ferran Alet, Adarsh K. Jeewajee, Maria Bauza, Alberto Rodriguez, Tomás Lozano-Pérez, Leslie P. Kaelbling
International Conference on Machine Learning (ICML), 2019
Published and invited for oral presentation (4.5% of all submissions) - Paper - Code - Slides
Belief propagation is run on the GM, producing node marginal probabilities, which are decoded as a data sample.
A discriminator (not shown) judges whether the sample is real (from the true data distribution) or fake (produced manually). This signal is used to train this pipeline end-to-end.
How do we produce this ensemble? We sample M noise vectors and run M pipelines similar to the one seen in the left image, the only difference being that the same evidence (red) is also fed to the GM.
The ensemble of beliefs is combined into one final belief vector over the unobserved variables.
Undirected graphical models are compact representations of joint probability distributions over random variables. Given a distribution over inference tasks, graphical models of arbitrary topology can be trained using empirical risk minimization. However, when faced with new task distributions, these models (EGMs) often need to be re-trained.
Instead, we propose an inference-agnostic adversarial training framework for producing an ensemble of graphical models (AGMs). The ensemble is optimized to generate data, and inference is learned as a by-product of this endeavor.
perform comparably with EGMs on inference tasks that the latter were specifically optimized for
show significantly better generalization capabilities than EGMs across distributions of inference tasks
are on par with GibbsNet and VAEAC, state-of-the-art deep neural architectures, which allow arbitrary conditioning, like AGMs.
allow fast data sampling, competitive with Gibbs sampling from EGMs.
We investigate whether Graph Element Networks (a graph convolutional neural network architecture that we published in ICML 2019) can be used to organize memories spatially, in the problem of generating scene images from novel viewpoints.
We sample 3D mazes from the DeepMind Lab game platform (dataset) and each maze comes with a series of images. Each image reveals how the maze appears, from a specific 2D coordinate and given a specific (yaw, pitch, roll) triple for the camera.
In the animation, we have mazes positioned in a 3x3 grid structure. The animation shows generated scenes on the left and a top-down view of the 9 mazes on the right. We first sample views from different places inside the mazes, and insert them into the GEN. We then query the GEN for the inferred view at new query coordinates, while rotating 360 degrees for each position. The red nodes (in the top-down map) are active nodes from which information is interpolated to generate a new view, for each query location.
In this problem, the GEN:
has its nodes spread across the 2D ground plane of the mazes (see white circles in right image)
learns a useful representation for what mazes look like and we interpolate information from its nodes to generate new images
compartmentalizes spatial memories since it trains on mazes one by one but at test time succeeds in absorbing information from 9 mazes simultaneously
How do we decode node states to draw scene images? This work was done to improve on Deepmind's work (Eslami et. al.) where they have a representation-learning network and an image-generation network ressembling the standard DRAW architecture. They can only represent one maze at a time as their model absorbs information without spatial disentangling. We use our GENs for representation learning, and apply their standard drawing architecture to decode our hidden states.
We investigate whether there exist 3D-printable robotic finger morphologies that have better object grasping performance than default finger shapes.
We open-source a WSG-32 parallel-jaw gripper simulation (see first animation), and collect a dataset of objects from distinct categories which are hard to grasp (e.g. bottles or cones or adversarial objects...) on which we would like our optimized gripper morphologies (see second animation) to have better grasp success than the out-of-the-box WSG-32 gripper configuration.
We search the space of gripper morphologies using augmented random search or evolutionary algorithms, and propose changes to the base morphology at each iteration by:
either randomly adding noise to the 3D finger mesh coordinates,
We find that:
optimizing the shape of the robotic fingers increases grasp performance
restricting our evaluation set of objects to a specific type reflects on the salient visual features of the emergent gripper finger morphologies
We investigate whether allowing a locomotive agent to modify its own morphology has any beneficial effect on the task of finding optimal locomotive policies, on terrains of varying difficulty.
We use the augmented random search algorithm to optimize the policy of a bipedal walker (parametrized by a feed-forward neural network). We allow the agent to modify its morphology to increase its score, and by doing so we observe that:
convergence to optimal policies (across all terrain types) is considerably faster
the agent solves environments that were initially impossible with the default morphology (see animation on the left).
We vary the difficulty of the terrains by making the terrain rougher (including hills and valleys), and by including obstacles such as blocks and pits.
We also search for a morphology-policy pair that generalizes to many environments with very little fine-tuning. See our slides for gifs of discovered agent morphologies.
In online forums, a major area of interest is the consolidation of questions: merging similar questions into one, to prevent diluting answers, saving storage etc.
I. Our main goal is to train language models that can detect if two questions are similar or not.
II. Language models require large datasets of annotated questions to be trained on and some online forums lack that much data. Our second goal is to train our models on annotated data from a forum X and fine-tune it on the small amount of annotated data available for forum Y, hoping for generalization.
In goal I, we need to learn the best possible encoding for our questions, such that some similarity metric (here cosine similarity) between questions in that encoding is accurately measuring how similar two questions are. We train encoding architectures described by Lei et al. (using a CNN or LSTM) on our dataset of annotated questions.
For goal II, we use a domain adaptation technique from Ganin et al.. Our training pipeline now uses encoders from part I, and we again minimize the loss associated with predicting question similarity, now using a mixture of mostly annotated questions of dataset X and few from Y.
We also pass the encoded questions from both domains into a discriminator whose job is to predicting the domain X or Y of each question using their encoded representation.
We have two training pipelines,
We minimize a classification loss to improve the discriminator in isolation (without updating the encoder weights).
To train the encoder, we minimize [(lambda * negative classification loss) + similarity loss from part I], so that the encoder learns to encode questions in a domain-invariant way. Essentially we are maximizing the discrimination loss and we want the learned encoding to make the discriminator unsure of which domain each question is from, even though in isolation, the discriminator is getting better and better at its job.
We investigate the extent to which the Google Glass device could be used for augmented reality games, while avoiding issues such as overheating and running out of memory. We design an augmented reality game ("safari adventure") and optimize its performance to measure actual game play time that a user can get from the device.
The game can be run on any Android device and has been optimized for Google Glass. The game uses one's camera and accelerometer. It was designed using the Unity3D engine, with C# as scripting language. The animals merge into one's actual room decor, and extensions of this game could be used to help children get acclimatized to new surroundings, like classrooms etc.
The logic behind the game is rather simple:
load an animal at start at a random position in the user's 3D neighborhood
if the animal is kept within the camera focus for more than t seconds, a photograph is taken and the animal is replaced by another one in another random position
the game ends after n seconds
The game starts lagging after about 60 seconds and the Google Glass gets too hot to be worn after about 120 seconds. Running the game plus a screen recorder was overkill for the hardware which is why the video/gif is choppy.
After optimizing for performance, game time was boosted to around 110 seconds with no lagging and to about 200 seconds without overheating. Some performance optimization ideas were to:
use less detailed animal assets
check presence of animals in the camera focus only when the accelerometer is quasi-steady