Research

BayRnTune: Adaptive Bayesian Domain Randomization via Strategic Fine-tuning

Tianle Huang, Nitish Sontakke, K. Niranjan Kumar, Irfan Essa, Stefanos Nikolaidis, Dennis Hong, Sehoon Ha

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

[arXiv]

Developed an algorithm to accelerate and optimize domain randomization by leveraging Bayesian Optimization in conjunction with our novel fine-tuning strategies to achieve up to a 178 % performance improvement on the true environment.

Fwd_walk_EnvMimic_4x.mp4

Residual Physics Learning and System Identification for Sim-to-real Transfer of Policies on Buoyancy Assisted Legged Robots

Nitish Sontakke, Hosik Chae, Sangjoon Lee, Tianle Huang, Dennis Hong, Sehoon Ha

2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023)

[arXiv] [YouTube] [Slides]

Designed a sim-to-real approach that combines system identification with an external residual dynamics model learned using deep reinforcement learning and demonstrated a 45% increase in performance over baselines on BALLU hardware.

Scalable_motion_imitation_star.mp4

Learning a Single Policy for Diverse Behaviors on a Quadrupedal Robot Using Scalable Motion Imitation

Arnaud Klipfel, Nitish Sontakke, Ren Liu, Sehoon Ha

2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023)

[arXiv] [YouTube] [Slides]

Developed a novel adaptive sampling-based approach to train a single policy using deep reinforcement learning to imitate an entire motion library. The trained policy generalizes to out-of-distribution motions that are up to 4 times as long as the training motions.

Solving Challenging Control Problems via Learning-based Motion Planning and Imitation

Nitish Sontakke and Sehoon Ha

20th International Conference on Ubiquitous Robots (UR 2023)

[arXiv] [YouTube] [Video Summary (1 min)]

Proposed a novel two-stage deep reinforcement learning (deep RL) based approach that combines learning-based planning and imitation learning. Our method outperforms our single-stage deep RL and PRM-RL baselines by at least 68%.

PM-FSM: Policies Modulating Finite State Machines for Robust Quadrupedal Locomotion

Ren Liu, Nitish Sontakke, Sehoon Ha

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)

[arXiv] [Slides]

Designed a novel architecture for training more robust locomotion control policies and demonstrated results on the A1 robot.

Unpublished Research:

Synthesis of Biologically Realistic Human Motion

Advisor: Prof. Karen Liu

The goal of this project is to create a dynamics simulator for human arms that is comparably realistic to muscle models physiologically, but improves efficiency by avoiding simulating muscles explicitly. The first step involves learning a state-dependent joint torque limit of the upper body. We make use of the Upper Extremity Dynamic Model provided by OpenSim. Once we have learnt the torque limit model, we aim to use deep reinforcement learning and trajectory optimization to generate motion for various throwing tasks and compare this to motion generated through musculotendon simulations and conventional box torque limit models.

Learning a Single Policy For a Family of Tasks

Advisor: Prof. Karen Liu

This project aimed to address the challenge of multi-task learning while being sample-efficient. We used reward functions consisting of multiple terms - one for each sub-task - weighted to reflect relative task importance. Borrowing ideas from Hindsight Experience Replay, we stored copies of experience tuples in the replay buffer and relabeled them with different weight configurations. The rewards for the tuples are then recalculated and stored back in the replay buffer as additional experience. We experimented with different strategies for relabeling, and tested our method on the FetchReach and DartHalfCheetah Environments.

Here are a few videos of the results:

FetchReach with 3 goals. The goal is for the end effector to reach one of the goals, which is specified by the user. In the given video, the desired goal is selected uniformly at random.

The following 3 clips consist of results produce by the same policy, but prioritizing a particular component in the reward function.

DartHalfCheetah with a high velocity weight: The HalfCheetah moves forward with a high velocity without falling down.
DartHalfCheetah with a high alive bonus weight: In this case, since the weight for the velocity component is low, there is not much incentive for the HalfCheetah to move forward, and therefore it just stays still.
DartHalfCheetah with a high energy penalty weight: Generating forward velocity leads to expenditure of energy, and a high energy leads to a lower reward in this case. The policy therefore tries to terminate the episode as quickly as possible.

Predicting Fluid Flows using Recurrent Neural Networks

Advisors: Prof. Parag Chaudhuri and Prof. Siddhartha Chaudhuri

The goal of this project was to accelerate fluid simulation using machine learning techniques. We worked with dam breaks. We started with an Eulerian approach but later switched to a Lagrangian one. Data for both cases was generated using mantaflow. For the Lagrangian approach, I used PointNet to obtain features for our data and made use of convolutional autoencoders for dimensionality reduction. This reduced dimension data was used to train a layer normalized LSTM to predict particle trajectories.

I also created a visualization tool to help me replay the generated data. Here are a few clips:

Particle trajectory visualization
Reconstruction with autoencoder of bottleneck dimension 576 (original dimension is 4096)
Original occupancy grid data

The report may be found here.

Slides:

DDP_PPT.pdf

Page updated

Google Sites

Report abuse