Reinforcement Learning for Controlled Nuclear Fusion

Abstract:
Harnessing nuclear fusion offers the prospect of limitless, clean, affordable energy.  Achieving this goal is extremely challenging thanks to the nonlinear, unstable, and hard to model magnetohydrodynamics of fusion plasmas.  Bayesian optimization and reinforcement learning offer increasingly successful methods for optimizing and controlling such challenging systems, but still struggle to achieve the sample efficiency and scale to the dimensionality needed for real applications such as nuclear fusion.


In this talk, I will first describe the problem of achieving fusion in tokamaks.  I will then present our recent advances in developing sample efficient algorithms for learning control policies.  Finally, I'll show their performance on tokamak models and describe our plans to utilize them on a physical tokamak.


Bio:

Dr. Jeff Schneider is a professor in Carnegie Mellon University's School of Computer Science. His research is on machine learning and autonomous systems. He has over 20 years experience developing, publishing, and applying machine learning algorithms in government, science, and industry.  Jeff is also an entrepreneur. He was a founding member of Uber's self driving car program. Before that, he developed a machine learning based CNS drug discovery system and commercialized it as Psychogenics' Chief Informatics Officer. Earlier, he was the co-founder and CEO of Schenley Park Research, a company dedicated to bringing machine learning to industry. Through his research, commercial, and consulting efforts, he has worked with dozens of companies and government agencies around the world.

Summary:

* Problem: nuclear fusion

   * Fuse hydrogen isotopes (from water) into helium

   * Energy density 4x from fission

   * Much more available fuel (hydrogen vs uranium)

   * Safety:

      * Waste products: helium, neutrons

      * Fusion hard to get going, any disruption stops it

* Achieving fusion: press hydrogen into tiny space

   * Gravitational confinement: e.g. sun

   * Inertial confinement: push hydrogen into place using direct force

      * Fission bomb

      * Lasers: NIF

   * Magnetic confinement: use magnetic fields (Tokamak; torroidal shape)

      * Focus of this talk

* State vector:

   * Near-circular torus, broken up into different radii (~30)

   * Multiple scalars for the different radii

      * Values assumed to be the same around the torus body

   * Few parameters describe the shape of the torus (deviation from pure circle) with limited flexibility

* Modeling challenge

   * MagnetoHydroDynamics: Navier stokes + Maxwell’s equations

      * Simulations are expensive to run

   * Classic approach: separate problem into 1d control loops and use a proportional–integral–derivative controller

   * Opportunity: use ML to control these better

* Approach:

   * RL requires the ability to randomly sample control space

   * Simulations:

      * Don’t have enough compute power to do a sufficiently accurate simulations

      * Generically accurate simulations are decent but too slow for live control

   * Data driven models not great because there is not enough experimental data to train

   * Model

      * Scientific model maked prediction

      * Neural net predicts the residual

      * Train both using ODE solver

* Uncertainty Calibration

   * Using their Uncertainty Toolbox: https://uncertainty-toolbox.github.io/

   * More rigorous calibration of uncertainty estimates

   * They also innovated a new method to do gradient descent over the better uncertainty metrics

* RL process:

   * Simulator is expensive: 200ms of plasma time takes 40m of wall-clock time

   * Need to carefully choose policy samples since they are expensive

      * Optimize mutual information between current estimate and quantity we want to optimize

      * Reduction in entropy by adding a new observation to the dataset

   * Note: we don’t care about the optimal policy for most of the dynamic domain, just the stable/high payoff region

      * Their sampling algorithm is actually poor at characterizing the regions of the dynamic space away from the optimal policy

   * Their model ends up being more accurate than the physics-based model

      * The physics-based model uses many approximations, the ML adjusts them