The problem of the uncomputable posteriors
Bayesian inference is a very interesting method for someone who is interested in an axiomatic approach to inference. Indeed, the Statistician Cox set out a simple set of rules for a robot to represent, using real numbers, the strength of his beliefs in various propositions and proved that the only system which obeys these axioms is probability theory. Furthermore, the rule that the robot should use to update his beliefs when faced new information is Bayes's rule. The only axiomatization of rational thinking about the world is thus Bayesian inference.
However, there is a huge problem with Bayesian inference: in most cases, the computations required for exact implementation of the method are too expensive to be used in practice. Most of the practical research work on Bayesian methods actually revolves about how to deal with this thorny issue with various approximation schemes. These approximation schemes can be decomposed into two large families: sampling methods (dominated by Markov Chain Monte Carlo methods; acronym MCMC) which aim at producing samples from the posterior distribution and on which I don't have much to say, and what I call approximate inference methods: methods which aim to return a parametric approximation (very often Gaussian) of the true posterior distribution.
Approximate inference schemes
My work so far centers on methods which aim to return a parametric approximation (very often Gaussian) of the true posterior distribution. These are often called "variational" methods but I'd rather call them approximate inference methods instead. This is because:
- the word "variational" is already used for the Variational Bayes algorithm which, even though it is the most popular approximate inference method, is far from being the only one
- "variational" implies an optimisation, which means that the term variational excludes the Expectation Propagation algorithm
- the only critic of the "approximate inference" vs "sampling" separation I have gotten so far is that sampling methods also aim at producing an approximation of the posterior. I still feel like this is fine, since sampling methods require quite a bit of further processing in order to answer questions about the posterior whereas approximate inference methods directy output an approximation.
There is currently a large number of open questions on such methods.
1. The most important one concerns the speed at which these algorithms perform their task. Most often, these algorithms perform an iteration until they reach a fixed-point. Estimating the number of loops needed for convergence is very important for being able to guarantee that our algorithms will run quickly.
2. A second critical question concerns the quality of the approximation we obtain. We need to understand in which cases these algorithms are good enough, and in which cases we should use the more expensive but more accurate sampling methods. However, current theoretical results are inapplicable for a number of reasons: the hypotheses are untestable, they apply to approximations that are not in use, etc.
3. The final important question is of a more practical nature. It is simply whether the current versions of the algorithm we have are the best we can do, or whether there are better variants that are yet to be found. This can only be solved once we are able to compute the speed and the approximation-quality of current methods. We will then be able to see whether introducing slight changes to the current algorithms will improve them.
Bayesian statistics for the frequentist
Another aspect of my work concerns trying to convince my frequentist colleagues that Bayesian inference is the best system of statistics.
In practice, the best way to do this is not to be a dogmatic Bayesian which goes on shouting about the Cox axioms and subjective probability, but instead to adopt the frequentist point of view on problems, and show that Bayesian methods are the best at solving these. Adopting Bayesian methods is then simply a matter of choosing the most efficient tool for solving problems.
In practice, following this idea means studying the posterior distribution as a function-valued random variable, and then studying the behavior of this random variable. My work expands on earlier work by, among other, Lecam on what is called the "Bernstein-von Mises theorem". My objective is to expand current forms of this theorem so that they are:
- As general as they can be.
- As efficient as they can be.
- Relevant in practical applications of Bayesian inference.