LTC20

Learning to control

Topic Leaders

Aditya Nair, University of Washington (agnair@uw.edu)
Tobi Delbruck, University of Zürich/ETH Zürich (tobi@ini.uzh.ch)

Invitees

Prof. John Doyle (Caltech) - confirmed
Rodolphe Sepulchre (Cambridge) - confirmed
Prof. Jon Tapson (Gray Matter AI Labs) - confirmed
Prof. Yorie Nakahir (Carnigie Mellon Univ.) - confirmed
Dr. Krithika Manohar (Caltech) - confirmed
Prof. Frank Allgöwer (U Stuttgart)
Sen. Lecturer Stefan Leutenegger (Imperial)
Prof. Dana Kulic Monash (U, Melbourne, Australia)
Prof. Theodorou, Evangelos (Georgia Tech)
Prof. Rob Mahony (Australian Natl U Canberra)
Prof. John C. Gallagher (Wright State U)
Prof. Goudong Shi (U Sydney)
Prof. Nikolai Matni (U Pennsylvania)
Prof. Ayonga Hereid (Oregon State U)
Prof. Dan Lee (Cornell and Samsung)
Prof. Samuel Burden (U Washington)
Prof Christian Hubicki (Florida State University)
Dr. Jacopo Tani (ETH Zurich)
Prof. Marco Hutter (ETH Zurich)
Ms. Prof. Xiaonan Wang (NUS Singapore)
Prof. Aude Billard (EPFL Lausanne)

There is great interest in applying advances in machine learning to control. In 2019, the CDS19 topic area focused on developing control tools for specific applications (pencil balancer, cart pendulum, AMPRO prosthetic leg). After the workshop, we felt the need for theoretical advancement in data-driven control algorithms for handling various issues associated with complex dynamical systems including

High-dimensionality
Non-linearity
Time-delays
Latent/hidden variables
Training data requirement
Adaptive sample rate (data driven sampling?)
Robustness to noise and disturbance

For LTC20, we will focus on making theoretical strides in control algorithms using simulations and experiments. We will explore model problems and data of varying complexity (cart pendulum models, Lorenz attractor model, simulated flight data, network models, neuron/circadian models, turbulence data) for systematically tackling the issues above.

Projects

Data-driven system identification techniques for control: System identification techniques using data-based regression and other machine learning approaches (neural networks, deep learning) have shown to be effective in reducing the high-dimensional and nonlinear nature of complex dynamical systems and for controlling them (Brunton and Nathan Kutz 2019). This year we want to extend these system identification techniques, specifically oriented for control of dynamics and handling these issues. For neuromorphic systems, we will explore IV loop shaping based on the invited talk given by Rudolph Sepulchre in 2019 (Ribar and Sepulchre 2019), where control is applied to control spike bursting, adaptation and other neural firing patterns.
Reinforcement learning and exploring diversity in control architectures: We want to discuss novel methods to investigate “exploration versus exploitation trade-off” in the context of reinforcement learning (Kaelbling, Littman, and Moore 1996) to achieve specific control objectives and speed-accuracy tradeoffs in the context of the recent notion of diversity sweet spots (Nakahira et al. 2019). The idea of diversity sweet spots from John Doyle’s group was investigated in sensorimotor control exploiting the heterogeneity and diversity across neurophysiological layers. This was demonstrated with the help of a simulated mountain biker with participants at Telluride 2019. This year, we will extend this fundamental understanding to other dynamical systems (eg the cart pole) to explore diversity enabled sweet spots for control.
Model Predictive Control (MPC): MPC is a powerful nonlinear optimal control technology that predicts future state given past state and control input to optimize the control input to achieve any desired target. MPC can handle actuator constraints and arbitrary nonlinearities. But in the nonlinear case, MPC is computationally expensive because these prediction and optimization steps must be applied to the nonlinear model resulting in a non-convex optimization problem. In general, nonlinear MPC (NMPC), which cannot be optimized by simple gradient descent, sometimes thousands of possible trajectories need to be evaluated at each update step. Strategies such as model predictive path integration (MPPI) used in the Georgia Tech AutoRally dirt track robot race cars weight many trajectories based on their cost and thus find acceptable solutions to the NMPC optimization (Williams et al. 2016; Drews et al. 2017). This year we will continue to develop methods for NMPC. We envision several projects combining system identification with MPC control:
1. Combining Model Predictive Control (MPC) with Recurrent Neural Networks (RNNs): A continuing aim of the topic area is to join ML with control by replacing the usual hand-crafted model in MPC with a dynamical systems model that is learned from data. RNNs are a DNN type that seems ideally suited for many dynamical systems, and we can accelerate the inference of RNNs on the EdgeDRNN hardware accelerator (Chang Gao, Antonio Rios-Navarro, Xi Chen, Tobi Delbruck, Shih-Chii Liu n.d.). Using EdgeDRNN will allow these many iterations over this optimizer in each time step even when using a large RNN to model and predict system dynamics. To train the RNN, we can try a combination of 3 approaches: (1) During system operation with conventional PID controller, using data collected from the input-output responses. (2) From simulations (transfer learning). (3) E2E dataset collection, using a human to control the system and collect the input-output responses during this control.
2. MPC for Cart-Pendulum: We will use simulated and physical cart-pendulum (as from 2019, but with a much better understanding this time).
3. MPC for AMPRO: In 2019, we started working with the AMPRO transfemoral prosthetic, but could not achieve the goal of MPC in this framework because we only managed to train the RNN to directly generate the control signals. This year, we want to close the loop on AMPRO using MPC. Also, combined with a variational autoencoder, we can compress a higher dimensional state (input) space to a much lower dimensional representation that is more amenable to MPC.

Provided Hardware and Software

Cart-pole robot, with working python controller on host.
Simulation model of race car with unknown dynamics
Simulated mountain biker with controlled visual advanced warning horizon and tactile feedback via race steering wheel
(possible) AMPRO powered transfemoral prosthetic
EdgeDRNN recurrent neural network hardware accelerator

Participant Preparation

(To Be Determined)