My research at Oregon State was focused on the development of controllers that could be capable of performing athletic maneuvers on real bipedal hardware.
The approach was an attempt to synthesize the strengths of both model-based optimal control techniques and deep reinforcement learning,
namely the ability to quickly specify and iterate bespoke optimal trajectories based on a dynamically descriptive Reduced-Order Model (ROM) such as the Single Rigid-Body Model (SRBM) with the robustness of learned policies.
Past successes of reinforcement learning controllers have demonstrated highly robust policies that are able to achieve incredible bipedal locomotion feats such as blind stair traversal and running a full 5k. However these policies still do not approach the level of dynamism that animals and humans are capable of.
credits: NFL Network
credits: Brookfield Zoo
Part of the difficulty in developing learned control policies that can replicate such dynamic feats is in designing reward functions that succinctly capture all of the dynamic complexities of a very specific maneuver such as the above football cone-drill.
In contrast model-based control techniques offer many advantages in this area. General mathematical programming recipes for any such arbitrary running trajectory can be solved offline using a variety of simple constraints. However, using model-based tracking controllers to execute these trajectories online in real-time is an entirely different difficult problem that often requires many hours of hand-tuning by an expert engineer.
The goal of this work was to utilize the descriptive and iterate simplicity of ROM trajectory optimization to develop general recipes of optimal trajectories for a variety of different athletics maneuvers (chaining steps and turns in sequence). Then utilize Deep RL to learn control policies that were capable of executing these complicated maneuvers on hardware in a reliable manner.
SRBMs strike a nice balance in being able to capture rich dynamic information such as angular momentum while not being too high dimensional that they become a cumbersome tool. With trajectory optimization we can plan dynamic maneuvers offline by utilizing a first-principles approach to legged locomotion. By setting useful objectives like minimize actuator work and specifying the right physics based constraints to the model we can plan dynamic behaviours by solving for footstep locations and ground reaction forces.
Early initial progress of a 4-step 90deg grounded running turning maneuver (leg vector and GRFs are only shown when a foot is in contact)
3.0 m/s cyclic running gait trajectory - Blender animation credits: Kevin Green
Once a recipe for a particular trajectory has been solved for, an entire library of similar trajectories can be created:
Edited video showing early progress of successive warm-started solves for a 2-step turning maneuver. The current desired velocity and the final state and controls plots are overlaid. *This was very early progress (results taken with a grain of salt), but the video does a good job illustrating*
A complete gait library of 4-step 90deg turning maneuvers from the final results of the work.
Dynamic trajectories obtained with these models can be used as expert references for deep reinforcement learning control policies. The power of deep RL controllers are their ability to seek out the corners of a full-order robot models dynamics. Through millions of iterations in simulation, with a carefully constructed reward function and techniques such as dynamics randomization, deep RL controllers are capable of learning highly robust control policies.
Trained RL Policies: The following videos depict control policies (neural networks) that we're trained using the model based trajectory libraries as expert reference information for their reward functions.
This running policy was trained from a library of model-based running reference trajectories. It captures the natural oscillation of momentum defined by the model.
This video demonstrates the execution of a previously trained (non referenced based) locomotion policy that switches into the referenced based turn policy then re-enters into the baseline locomotion. This turning policy was trained by Fangzhou Yu.
More on these works including links to the full papers, submission videos for the conferences and my full thesis diving further into this topic can be found on my Publications page.
When I first dived into the world of trajectory optimization I practiced by first creating simpler dynamic models to experiment with
Older Trajectory Optimizations: