Searchlight: a Markov Chain Monte Carlo Method for Arbitrarily High Dimensions

Peter Behroozi, University of Arizona

Abstract:

Markov Chain Monte Carlo (MCMC) methods are widely used across science and industry to extract the range of physical model parameters consistent with observed data. Yet, current exploration algorithms do not scale well to the millions to trillions of parameters present in the most advanced models, including climate, macroeconomics, and astronomical models. As well, exploration of neural network parameter spaces (a.k.a. Bayesian Deep Learning) remains impossible for all but the simplest models. I provide physical intuition for why high dimensional spaces are hard to explore with existing algorithms and show preliminary results from a new MCMC method (Searchlight) that has generically logarithmic scaling with dimensionality, and hence will scale to trillions of dimensions and beyond.

Bio:

Dr. Peter Behroozi is an associate professor in the Department of Astronomy at the University of Arizona. His main research involves resimulating the Universe millions of times on supercomputers to reconstruct how galaxies evolved over time, which prompted his interest in more efficient MCMC algorithms. He has received many awards for his research, including a Packard Fellowship, and he is a Clarivate Highly Cited Researcher.

Abstract:

Challenge
- Simulating galaxy formation and evolution is very challenging
- Gravitational effects are well-understood
- Poorly understood: Gas, magnetism, supernovae, shock formation and propagation
  - Requires simulations that resolve at scales so tiny that no supercomputer today or foreseeable future can do it
Proposed alternative: data-driven models of galaxy formation and evolution
- Gravity-only simulation +
- Learned galaxy-matter connection based on observations of the universe
  - Differential equation with interpretable terms like star formation rate
  - Use Bayesian methods to learn the parameters of the equation via Markov Chain Monte Carlo (MCMC)
    - 50-100 variables to fit
  - Account for biases of measurements
    - Sampling biases (different telescopes detect certain galaxies more easily than others)
    - Nuisance parameters to account to systematic biases of the instruments
- Sampling algorithm scales well on LBNL Edison supercomputer
  - Perfect strong scaling to 100,000 processors (~11 universes per second)
Markov Chain Monte Carlo (MCMC)
- Many problems where it is necessary to find distributions for large parameter space
  - Galaxies: 50-100
  - Neural nets: 100-1014
  - Confused Source Recovery (infer properties of astronomical light sources): 100-109
  - Constrained Realizations (infer initial conditions from outcomes): >109
- MCMC key idea:
  - Do a random walk of possible parameter values
  - Goal: sample all the regions of the parameter space to determine how likely they are
  - We get an indication of whether a given set of values is “acceptable” or not.
  - Transition probability density: probability of how likely we are to jump from one point in parameter space to another
  - Different algorithms use different transition probabilities to find good values efficiently
    - Simplest: randomly generate parameter vectors
    - More complex: walk in a way that depends on the current likelihood gradient
- High-dimensional spaces:
  - As the dimensionality of the space increases, its surface area rises faster than its volume
  - At higher dimensions:
    - Most points will appear on the boundaries of the space
    - Most unit vectors in the space are orthogonal to each other
  - This makes it hard to randomly walk from one parameter vector to a another “related” one
    - Most vectors are different and extremal
    - Have to walk along the thin surface
    - Complexity scales as O(D1/4) - O(D) - O(D9/4) (D=dimensionality of space)
Searchlight algorithm:
- Complexity O(1) - O(log(D))
- Idea:
  - Given some point in the space with a given likelihood
  - Use search algorithm to find another point with same likelihood
  - Pick a direction to search in and use root finding as search algorithm
    - Need the gradient in one dimension
    - Binary search: log time in the distance along the dimension
  - Number of searches per independent sample doesn’t increase with increasing dimensionality
- Future work:
  - Optimal search algorithms and plans
  - Automatic tuning as the algorithm searches
  - Handling boundaries
  - Partnerships with teams that have concrete search problems

Page updated

Report abuse