Projects

Reinforcement Learning for Bipedal Locomotion

I am currently researching using reinforcement learning (specifically Natural Policy Gradients) for learning walking policies for a Cassie robot. I have modified Julia open-source learning libraries and created infrastructure and wrapper code to interface with Cassie simulator libraries in order to create a working training environment.
To aid this project, I am also applying reinforcement learning to a much simpler and smaller problem on a SLIP model, a reduced-order model that applies to Cassie. Successful policies in a SLIP domain can help inform how the full robot should act.
I am also investigating how learned policies are transferred to hardware. Particularly, I am interested in finding out what is needed to cross the sim-to-real gap, i.e. what needs to model, how good of a model is needed, etc.

Reinforcement Learning for non-prehensile manipulation: Transfer from simulation to physical system.

Reinforcement learning for continuous control has emerged as a promising methodology for training robot controllers. Most results however, have been limited to simulation, due to the need for a large number of samples and lack of automated-yet-safe data collection methods. Model-based reinforcement learning methods provide an avenue to circumvent these challenges, but the traditional concern has been the mismatch between the simulator and the real world. Here, we show that control policies learned in simulation can successfully transfer to a dynamic physical system, composed of three Phantom robots pushing an object to changing targets. The resulting policies, trained in simulation, work well on the physical system without additional training. In addition, we show that training with an ensemble of models makes the learned policies more robust to modeling errors, thus compensating for difficulties in system identification.
Won Best Paper Award at IEEE SIMPAR 2018
Paper Link

Realtime State Estimation with Whole-Body Multi-Contact Dynamics: A Modified UKF Approach.

We present a real-time state estimator applicable to whole-body dynamics in contact-rich behaviors. Our estimator is based on the Unscented Kalman Filter (UKF), with modifications that proved essential in the presence of strong contact non-linearities combined with unavoidable model errors. We also develop a rich model of process noise including noise in system parameters, control noise, as well as a novel form of timing noise which makes the estimator more robust. The method is applied to an enhanced Darwin robot in walking as well as pseudo-walking while lying on the floor. A full update takes around 7 msec on a laptop processor. Furthermore we perform the computationally-expensive prediction step (involving 210 forward dynamics evaluations in the MuJoCo simulator) while waiting for sensor data, and then apply the correction step as soon as sensor data arrive. This correction only takes 0.5 msec. Thus the estimator adds minimal latency to closed-loop control, even though it handles the whole-body robot dynamics directly, without resorting to any of the modeling shortcuts used in the past.
Accepted to IEEE/RAS Humanoids 2016
Paper Link

The Impact of Curriculum Learning: Evaluating Speed and Performance in a Multiagent Grid World Domain

One of the most widely used techniques to train an agent to achieve a certain task is reinforcement learning. Though it is used in many applications, the agent may be unable to learn sufficiently in complex domains where rewards are sparse or have high variance. Thus the approach of step by step learning or curriculum learning is often used where the agent first learns a simpler task or learns in a smaller domain and slowly works its up to the full task or world size. However, this requires many more samples and computation. It is reasonable to think that for simple problems this might be unnecessary extra work. In this paper, we compare curriculum learning to vanilla Deep Q Networks (DQN) on a 15x15 grid world multiagent task. Speed of convergence, policy performance and sample complexity obtained by the optimal curriculum generated will be compared against DQN. We hope to find the boundary in task complexity and size beyond which curriculum learning outperforms DQN. This would give insight into when the extra effort of implementing curriculum learning is worth it for a given task.
Paper Link
Course Project from Autonomous Agents and Multiagent Systems Fall 17'

Google Sites

Report abuse