Searchlight: a Markov Chain Monte Carlo Method for Arbitrarily High Dimensions
Peter Behroozi, University of Arizona
Abstract:
Markov Chain Monte Carlo (MCMC) methods are widely used across science and industry to extract the range of physical model parameters consistent with observed data. Yet, current exploration algorithms do not scale well to the millions to trillions of parameters present in the most advanced models, including climate, macroeconomics, and astronomical models. As well, exploration of neural network parameter spaces (a.k.a. Bayesian Deep Learning) remains impossible for all but the simplest models. I provide physical intuition for why high dimensional spaces are hard to explore with existing algorithms and show preliminary results from a new MCMC method (Searchlight) that has generically logarithmic scaling with dimensionality, and hence will scale to trillions of dimensions and beyond.
Bio:
Dr. Peter Behroozi is an associate professor in the Department of Astronomy at the University of Arizona. His main research involves resimulating the Universe millions of times on supercomputers to reconstruct how galaxies evolved over time, which prompted his interest in more efficient MCMC algorithms. He has received many awards for his research, including a Packard Fellowship, and he is a Clarivate Highly Cited Researcher.
Abstract:
Challenge
Simulating galaxy formation and evolution is very challenging
Gravitational effects are well-understood
Poorly understood: Gas, magnetism, supernovae, shock formation and propagation
Requires simulations that resolve at scales so tiny that no supercomputer today or foreseeable future can do it
Proposed alternative: data-driven models of galaxy formation and evolution
Gravity-only simulation +
Learned galaxy-matter connection based on observations of the universe
Differential equation with interpretable terms like star formation rate
Use Bayesian methods to learn the parameters of the equation via Markov Chain Monte Carlo (MCMC)
50-100 variables to fit
Account for biases of measurements
Sampling biases (different telescopes detect certain galaxies more easily than others)
Nuisance parameters to account to systematic biases of the instruments
Sampling algorithm scales well on LBNL Edison supercomputer
Perfect strong scaling to 100,000 processors (~11 universes per second)
Markov Chain Monte Carlo (MCMC)
Many problems where it is necessary to find distributions for large parameter space
Galaxies: 50-100
Neural nets: 100-1014
Confused Source Recovery (infer properties of astronomical light sources): 100-109
Constrained Realizations (infer initial conditions from outcomes): >109
MCMC key idea:
Do a random walk of possible parameter values
Goal: sample all the regions of the parameter space to determine how likely they are
We get an indication of whether a given set of values is “acceptable” or not.
Transition probability density: probability of how likely we are to jump from one point in parameter space to another
Different algorithms use different transition probabilities to find good values efficiently
Simplest: randomly generate parameter vectors
More complex: walk in a way that depends on the current likelihood gradient
High-dimensional spaces:
As the dimensionality of the space increases, its surface area rises faster than its volume
At higher dimensions:
Most points will appear on the boundaries of the space
Most unit vectors in the space are orthogonal to each other
This makes it hard to randomly walk from one parameter vector to a another “related” one
Most vectors are different and extremal
Have to walk along the thin surface
Complexity scales as O(D1/4) - O(D) - O(D9/4) (D=dimensionality of space)
Searchlight algorithm:
Complexity O(1) - O(log(D))
Idea:
Given some point in the space with a given likelihood
Use search algorithm to find another point with same likelihood
Pick a direction to search in and use root finding as search algorithm
Need the gradient in one dimension
Binary search: log time in the distance along the dimension
Number of searches per independent sample doesn’t increase with increasing dimensionality
Future work:
Optimal search algorithms and plans
Automatic tuning as the algorithm searches
Handling boundaries
Partnerships with teams that have concrete search problems