Developed an algorithm to accelerate and optimize domain randomization by leveraging Bayesian Optimization in conjunction with our novel fine-tuning strategies to achieve up to a 178 % performance improvement on the true environment.
Designed a sim-to-real approach that combines system identification with an external residual dynamics model learned using deep reinforcement learning and demonstrated a 45% increase in performance over baselines on BALLU hardware.
Developed a novel adaptive sampling-based approach to train a single policy using deep reinforcement learning to imitate an entire motion library. The trained policy generalizes to out-of-distribution motions that are up to 4 times as long as the training motions.
Proposed a novel two-stage deep reinforcement learning (deep RL) based approach that combines learning-based planning and imitation learning. Our method outperforms our single-stage deep RL and PRM-RL baselines by at least 68%.
Designed a novel architecture for training more robust locomotion control policies and demonstrated results on the A1 robot.
The goal of this project is to create a dynamics simulator for human arms that is comparably realistic to muscle models physiologically, but improves efficiency by avoiding simulating muscles explicitly. The first step involves learning a state-dependent joint torque limit of the upper body. We make use of the Upper Extremity Dynamic Model provided by OpenSim. Once we have learnt the torque limit model, we aim to use deep reinforcement learning and trajectory optimization to generate motion for various throwing tasks and compare this to motion generated through musculotendon simulations and conventional box torque limit models.
This project aimed to address the challenge of multi-task learning while being sample-efficient. We used reward functions consisting of multiple terms - one for each sub-task - weighted to reflect relative task importance. Borrowing ideas from Hindsight Experience Replay, we stored copies of experience tuples in the replay buffer and relabeled them with different weight configurations. The rewards for the tuples are then recalculated and stored back in the replay buffer as additional experience. We experimented with different strategies for relabeling, and tested our method on the FetchReach and DartHalfCheetah Environments.
Here are a few videos of the results:
FetchReach with 3 goals. The goal is for the end effector to reach one of the goals, which is specified by the user. In the given video, the desired goal is selected uniformly at random.
The following 3 clips consist of results produce by the same policy, but prioritizing a particular component in the reward function.
DartHalfCheetah with a high velocity weight: The HalfCheetah moves forward with a high velocity without falling down.
DartHalfCheetah with a high alive bonus weight: In this case, since the weight for the velocity component is low, there is not much incentive for the HalfCheetah to move forward, and therefore it just stays still.
DartHalfCheetah with a high energy penalty weight: Generating forward velocity leads to expenditure of energy, and a high energy leads to a lower reward in this case. The policy therefore tries to terminate the episode as quickly as possible.
The goal of this project was to accelerate fluid simulation using machine learning techniques. We worked with dam breaks. We started with an Eulerian approach but later switched to a Lagrangian one. Data for both cases was generated using mantaflow. For the Lagrangian approach, I used PointNet to obtain features for our data and made use of convolutional autoencoders for dimensionality reduction. This reduced dimension data was used to train a layer normalized LSTM to predict particle trajectories.
I also created a visualization tool to help me replay the generated data. Here are a few clips:
Reconstruction with autoencoder of bottleneck dimension 576 (original dimension is 4096)
The report may be found here.
Slides: