All About that Series is a periodic seminar with a focus on compational statics, machine learning methods for problems mostly in the Bayesian paradigm. It delves into solutions to challenging learning problems that can arise from a wide range of applications such as ecology, population genetics, signal processing, energy optimisation, just to name a few. It is part of the activities of the specialised group "Statistique Bayésienne" of the SFdS.
Current organiser : Kaniav Kamary
Previous organiser : from 2019 to 2025 it was organised by Sylvain Le Corff (LPSM, Sorbonne Université) and Julien Stoehr (Université Paris-Dauphine).
If you want to get details on upcoming talks, sign in to the newsletter here!
Access to SCAI: https://iscd.sorbonne-universite.fr/about/contact/
The afternoon session is open to everyone, but please confirm your participation by registering via the following link: https://forms.office.com/e/APAHDfyYfQ
Clément Bonet (ENSAE) - Mirror and Preconditioned Gradient Descent in Wasserstein Space
Abstract: As the problem of minimizing functionals on the Wasserstein space encompasses many applications in machine learning, different optimization algorithms on Rd have received their counterpart analog on the Wasserstein space. We focus here on lifting two explicit algorithms: mirror descent and preconditioned gradient descent. These algorithms have been introduced to better capture the geometry of the function to minimize and are provably convergent under appropriate (namely relative) smoothness and convexity conditions. Adapting these notions to the Wasserstein space, we prove guarantees of convergence of some Wasserstein-gradient-based discrete-time schemes for new pairings of objective functionals and regularizers. The difficulty here is to carefully select along which curves the functionals should be smooth and convex. We illustrate the advantages of adapting the geometry induced by the regularizer on ill-conditioned optimization tasks, and showcase the improvement of choosing different discrepancies and geometries in a computational biology task of aligning single-cells.
Antoine Godichon Baggioni (Sorbonne Université) - Stochastic Newton algorithms with O(Nd) operations
Abstract: The majority of machine learning methods can be regarded as the minimization of an unavailable risk function. To optimize this function using samples provided in an online fashion, stochastic gradient descent is a common tool. However, it can be highly sensitive to ill-conditioned problems. To address this issue, we focus on Stochastic Newton methods. We first examine a version based on the Ricatti (or Sherman-Morrison) formula, which allows recursive estimation of the inverse Hessian with reduced computational time. Specifically, we show that this method leads to asymptotically efficient estimates and requires$O(Nd^2)$ operations (where N is the sample size and d is the dimension). Finally, we explore how to adapt the Stochastic Newton algorithm for a streaming context, where data arrives in blocks, and demonstrate that this approach can reduce the computational requirement to $ O(Nd) $ operations.
The afternoon session is open to everyone, but please confirm your participation by registering via the following link: https://forms.office.com/e/me7PJmkQDm
Joshua Bon (Université Paris Dauphine) - Bayesian score calibration for approximate models
Abstract: Scientists continue to develop increasingly complex mechanistic models to reflect their knowledge more realistically. Statistical inference using these models can be challenging since the corresponding likelihood function is often intractable and model simulation may be computationally burdensome. Fortunately, in many of these situations, it is possible to adopt a surrogate model or approximate likelihood function. It may be convenient to conduct Bayesian inference directly with the surrogate, but this can result in bias and poor uncertainty quantification. In this paper (https://arxiv.org/abs/2211.05357) we propose a new method for adjusting approximate posterior samples to reduce bias and produce more accurate uncertainty quantification. We do this by optimizing a transform of the approximate posterior that maximizes a scoring rule. Our approach requires only a (fixed) small number of complex model simulations and is numerically stable. We demonstrate beneficial corrections to several approximate posteriors using our method on several examples of increasing complexity.
Giacomo Zanella (Bocconi University) - Entropy contraction of the Gibbs sampler under log-concavity
Abstract: In this talk I will present recent work (https://arxiv.org/abs/2410.00858) on the non-asymptotic analysis of the Gibbs sampler, a classical and popular MCMC algorithm for sampling. In particular, under the assumption that the probability measure π of interest is strongly log-concave, we show that the random scan Gibbs sampler contracts in relative entropy, and provide a sharp characterization of the associated contraction rate. The result implies that, under appropriate conditions, the number of full evaluations of π required for the Gibbs sampler to converge is independent of the dimension. If time permits, I will also discuss connections and applications of the above results to the problem of zero-order parallel sampling, as well as extensions to Hit-and-Run and Metropolis-within-Gibbs.
Based on joint work with Filippo Ascolani and Hugo Lavenant.
Paul Bastide (Université Paris Cité) - Goodness of Fit for Bayesian Generative Models with Applications in Population Genetics
Abstract: In population genetics, inference about intractable likelihood models is common, and simulation methods, including Approximate Bayesian Computation (ABC) and Simulation-Based Inference (SBI), are essential. ABC/SBI methods work by simulating instrumental data sets of the models under study and comparing them with the observed data set $y_{obs}$. Advanced machine learning tools are used for tasks such as model selection and parameter inference. The present work focuses on model criticism. This type of analysis, called goodness of fit (GoF), is important for model validation. It can also be used for model pruning when the number of candidates to be considered is excessive, especially in the context where data simulation is expensive. We introduce two new GoF tests based on the local outlier factor (LOF), an indicator that was initially defined for outlier and novelty detection. We test whether $y_{obs}$ is distributed from the prior predictive distribution (pre-inference GoF) and whether there is a parameter value such that $y_{obs}$ is distributed from the likelihood with that value (post-inference GoF). We evaluate the performance of our two GoF tests on simulated datasets from three different model settings of varying complexity, and on a dataset of single nucleotide polymorphism (SNP) markers for the evaluation of complex evolutionary scenarios of modern human populations.
Joint work with Guillaume Le Mailloux, Jean-Michel Marin and Arnaud Estoup.
The afternoon session is open to everyone, but please confirm your participation by registering via the following link: https://forms.office.com/e/fuVzYurNRY
Stanislas Strasman (Sorbonne Université) - An analysis of the noise schedule for score-based generative models
Abstract: Score-based generative models (SGMs) aim at estimating a target data distribution by learning score functions using only noise-perturbed samples from the target. Recent literature has focused extensively on assessing the error between the target and estimated distributions, gauging the generative quality through the Kullback-Leibler (KL) divergence and Wasserstein distances. Under mild assumptions on the data distribution, we establish an upper bound for the KL divergence between the target and the estimated distributions, explicitly depending on any time-dependent noise schedule. Under additional regularity assumptions, taking advantage of favorable underlying contraction mechanisms, we provide a tighter error bound in Wasserstein distance compared to state-of-the-art results. In addition to being tractable, this upper bound jointly incorporates properties of the target distribution and SGM hyperparameters that need to be tuned during training.
Geneviève Robin (Owkin) - Generative methods for sampling transition paths in molecular dynamics
Abstract: Molecular systems often remain trapped for long times around some local minimum of the potential energy function, before switching to another one -- a behavior known as metastability. Simulating transition paths linking one metastable state to another one is difficult by direct numerical methods. In view of the promises of machine learning techniques, we explore in this work two approaches to more efficiently generate transition paths: sampling methods based on generative models such as variational autoencoders, and importance sampling methods based on reinforcement learning.
Gabriel Victorino Cardoso (École des Mines) - Solving inverse problems with score-based priors
Abstract: Solving ill-posed (Bayesian) inverse problems generally rely on the power of the prior distribution (or data fidelity term). In this talk, we focus on how to use an off the shelf score-based generative model as prior and how to modify the inner sampling procedure of the generative model to sample (approximately) from the posterior distribution. This is done without retraining the off the shelf generative model. We then present how we have use this procedure to solve inverse problems that arise in electrocardiogram analysis.
References:
[1] Gabriel Cardoso, Yazid Janati, Sylvain Le Corff, and Eric Moulines. Monte Carlo guided Denoising Diffusion models for Bayesian linear inverse problems. The Twelfth International Conference on Learning Representations. 2023.
[2]Cardoso, G. V., Bedin, L., Duchateau, J., Dubois, R., & Moulines, E. (2023). Bayesian ecg reconstruction using denoising diffusion generative models. to appear in Neurips 2024.
Guanyang Wang (Rutgers University) - MCMC when you do not want to evaluate the target distribution
Abstract: In sampling tasks, it is common for target distributions to be known up to a normalizing constant. However, in many situations, evaluating even the unnormalized distribution can be costly or infeasible. This issue arises in scenarios such as sampling from the Bayesian posterior for large datasets and the 'doubly intractable' distributions. We provide a way to unify various MCMC algorithms, including several minibatch MCMC algorithms and the exchange algorithm. This framework not only simplifies the theoretical analysis of existing algorithms but also creates new algorithms. Similar frameworks exist in the literature, but they concentrate on different objectives.
François Caron (University of Oxford) - Deep Neural Networks with Dependent Weights: Gaussian Process Mixture Limit, Heavy Tails, Sparsity and Compressibility
Abstract: This article studies the infinite-width limit of deep feedforward neural networks whose weights are dependent, and modelled via a mixture of Gaussian distributions. Each hidden node of the network is assigned a nonnegative random variable that controls the variance of the outgoing weights of that node. We make minimal assumptions on these per-node random variables: they are iid and their sum, in each layer, converges to some finite random variable in the infinite-width limit. Under this model, we show that each layer of the infinite-width neural network can be characterised by two simple quantities: a non-negative scalar parameter and a Lévy measure on the positive reals. If the scalar parameters are strictly positive and the Lévy measures are trivial at all hidden layers, then one recovers the classical Gaussian process (GP) limit, obtained with iid Gaussian weights. More interestingly, if the Lévy measure of at least one layer is non-trivial, we obtain a mixture of Gaussian processes (MoGP) in the large-width limit. The behaviour of the neural network in this regime is very different from the GP regime. One obtains correlated outputs, with non-Gaussian distributions, possibly with heavy tails. Additionally, we show that, in this regime, the weights are compressible, and some nodes have asymptotically non-negligible contributions, therefore representing important hidden features. Many sparsity-promoting neural network models can be recast as special cases of our approach, and we discuss their infinite-width limits; we also present an asymptotic analysis of the pruning error. We illustrate some of the benefits of the MoGP regime over the GP regime in terms of representation learning and compressibility on simulated, MNIST and Fashion MNIST datasets.
Elisabeth Gassiat (Université Paris-Saclay) - A stroll through hidden Markov models
Abstract: Hidden Markov models are latent variables models producing dependent sequences. I will survey recent results providing guarantees for their use in various fields such as clustering, multiple testing, nonlinear ICA or variational autoencoders.
Ritabrata Dutta (University of Warwick) - Bayesian Model Averaging with exact inference of likelihood- free Scoring Rule Posteriors.
Abstract: A novel application of Bayesian Model Averaging to generative models parameterized with neural networks (GNN) characterized by intractable likelihoods is presented. We leverage a likelihood-free generalized Bayesian inference approach with Scoring Rules. To tackle the challenge of model selection in neural networks, we adopt a continuous shrinkage prior, specifically the horseshoe prior. We introduce an innovative blocked sampling scheme, offering compatibility with both the Boomerang Sampler (a type of piecewise deterministic Markov process sampler) for exact but slower inference and with Stochastic Gradient Langevin Dynamics (SGLD) for faster yet biased posterior inference. This approach serves as a versatile tool bridging the gap between intractable likelihoods and robust Bayesian model selection within the generative modelling framework.
Sylvain Le Corff (Sorbonne Université) - Monte Carlo guided Diffusion for Bayesian linear inverse problems
Joint work with G. Cardoso, Y. Janati, E. Moulines.
Abstract: Ill-posed linear inverse problems that combine knowledge of the forward measurement model with prior models arise frequently in various applications, from computational photography to medical imaging. Recent research has focused on solving these problems with score-based generative models (SGMs) that produce perceptually plausible images, especially in inpainting problems. In this study, we exploit the particular structure of the prior defined in the SGM to formulate recovery in a Bayesian framework as a Feynman--Kac model adapted from the forward diffusion model used to construct score-based diffusion. To solve this Feynman--Kac problem, we propose the use of Sequential Monte Carlo methods. The proposed algorithm, MCGdiff, is shown to be theoretically grounded and we provide numerical simulations showing that it outperforms competing baselines when dealing with ill-posed inverse problems.
Kaniav Kamary (Centrale Supélec) - Bayesian principal component analysis
The technique of principal component analysis (PCA) has recently been expressed as the maximum likelihood solution for a generative latent variable model. In this talk, I’ll first present probabilistic reformulation that is the basis for a Bayesian treatment of PCA. Then, my focus will be on showing that the effective dimensionality of the latent space (equivalent to the number of retained principal components) can be determined automatically as part of the Bayesian inference procedure.
Meïli Baragtti (ENSAE) - Promenade en statistique bayésienne: une méthode d'élicitation, une méthode sans vraisemblance et un exemple simple dans un cadre de modèle épidémiologique de transmission de maladie
Webpage: http://www.meilibaragatti.fr
Francesca Crucinio (ENSAE) - Optimal Scaling Results for a Wide Class of Proximal MALA Algorithms
We consider a recently proposed class of MCMC methods which uses proximity maps instead of gradients to build proposal mechanisms which can be employed for both differentiable and non-differentiable targets. These methods have been shown to be stable for a wide class of targets, making them a valuable alternative to Metropolis-adjusted Langevin algorithms (MALA); and have found wide application in imaging contexts. The wider stability properties are obtained by building the Moreau-Yoshida envelope for the target of interest, which depends on a parameter $\lambda$. In this work, we investigate the optimal scaling problem for this class of algorithms, which encompasses MALA, and provide practical guidelines for the implementation of these methods.
Joint work with Alain Durmus, Pablo Jiménez, Gareth O. Roberts.
Adrian Raftery (University of Washington) - Very Long-Term Bayesian Global Population and Migration Projections for Assessing the Social Cost of Carbon
Population forecasts are used by governments and the private sector for planning, with horizons up to about three generations (around 2100) for different purposes. The traditional methods are deterministic using scenarios, but probabilistic forecasts are desired to get an idea of accuracy, to assess changes, and to make decisions involving risks. In a major breakthrough, since 2015 the United Nations has issued probabilistic population forecasts for all countries using a Bayesian methodology. Assessment of the social cost of carbon relies on long-term forecasts of carbon emissions, which in turn rely on even longer-range population and economic forecasts, to 2300. We extend the UN method to very-long range population forecasts, by combining the statistical approach with expert review and elicitation. We find that, while world population is projected to grow for most of the rest of this century, it is likely to stabilize in the 22nd century, and to decline in the 23rd century.
Daniele Durante (Bocconi University) - Detective Bayes: Bayesian nonparametric stochastic block modeling of criminal networks
Europol recently defined criminal networks as a modern version of the Hydra mythological creature, with covert structure and multifaceted evolutions. Indeed, relationships data among criminals are subject to measurement errors, structured missingness patterns, and exhibit a complex combination of an unknown number of core-periphery, assortative and disassortative structures that may encode key architectures of the criminal organization. The coexistence of these noisy block patterns limits the reliability of community detection algorithms routinely-used in criminology, thereby leading to overly-simplified and possibly biased reconstructions of organized crime topologies. In this seminar, I will present a number of model-based solutions which aim at covering these gaps via a combination of stochastic block models and priors for random partitions arising from Bayesian nonparametrics. These include Gibbs-type priors, and random partition priors driven by the urn scheme of a hierarchical normalized completely random measure. Product-partition models to incorporate criminals' attributes, and zero-inflated Poisson representations accounting for weighted edges and secrecy strategies, will be also discussed. Collapsed Gibbs samplers for posterior computation are presented, and refined strategies for estimation, prediction, uncertainty quantification and model selection will be outlined. Results are illustrated in an application to an Italian Mafia network, where the proposed models unveil a structure of the criminal organization mostly hidden to state-of-the-art alternatives routinely used in criminology. I will conclude the seminar with ideas on how to learn the evolutionary history of the criminal organization from the relationship data among its criminals via a novel combination of latent space models for network data and phylogenetic trees.
Marylou Gabrié (Ecole Polytechnique) - Opportunities and Challenges in Enhancing Sampling with Learning
Deep generative models parametrize very flexible families of distributions able to fit complicated datasets of images or text. Virtually, these models provide independent samples from complex high-distributions at negligible costs. On the other hand, sampling exactly a target distribution, such a Bayesian posterior, is typically challenging: either because of dimensionality, multi-modality, ill-conditioning or a combination of the previous. In this talk, I will review recent works trying to enhance traditional inference and sampling algorithms with learning. I will present in particular flowMC, an adaptive MCMC with Normalizing Flow along with first applications and remaining challenges.
Webpage: https://marylou-gabrie.github.io/
Filippo Ascolani (Bocconi University) - Clustering consistency with Dirichlet process mixtures
Dirichlet process mixtures are flexible non-parametric models, particularly suited to density estimation and probabilistic clustering. In this work we study the posterior distribution induced by Dirichlet process mixtures as the sample size increases, and more specifically focus on consistency for the unknown number of clusters when the observed data are generated from a finite mixture. Crucially, we consider the situation where a prior is placed on the concentration parameter of the underlying Dirichlet process. Previous findings in the literature suggest that Dirichlet process mixtures are typically not consistent for the number of clusters if the concentration parameter is held fixed and data come from a finite mixture. Here we show that consistency for the number of clusters can be achieved if the concentration parameter is adapted in a fully Bayesian way, as commonly done in practice. Our results are derived for data coming from a class of finite mixtures, with mild assumptions on the prior for the concentration parameter and for a variety of choices of likelihood kernels for the mixture.
Joint work with Antonio Lijoi, Giovanni Rebaudo, and Giacomo Zanelli.
Reference: https://arxiv.org/abs/2205.12924 (Biometrika, forthcoming)
Webpage: https://filippoascolani.github.io/
Andrew Gelman (Columbia University) - Prior distribution for causal inference
In Bayesian inference, we must specify a model for the data (a likelihood) and a model for parameters (a prior). Consider two questions:
Why is it more complicated to specify the likelihood than the prior?
In order to specify the prior, how could can we switch between the theoretical literature (invariance, normality assumption, ...) and the applied literature (experts elicitation, robustness, ...)?
I will discuss those question in the domain of causal inference: prior distributions for causal effects, coefficients of regression and the other parameters in causal models.
Alexandre Bouchard-Côté (University of British Columbia) - Approximation of intractable integrals using non-reversibility and non-linear distribution paths
In the first part of the talk, I will present an adaptive, non-reversible Parallel Tempering (PT) allowing MCMC exploration of challenging problems such as single cell phylogenetic trees. A sharp divide emerges in the behaviour and performance of reversible versus non-reversible PT schemes: the performance of the former eventually collapses as the number of parallel cores used increases whereas non-reversible benefits from arbitrarily many available parallel cores. These theoretical results are exploited to develop an adaptive scheme to efficiently optimize over annealing schedules.
In the second half, I will talk about the global communication barrier, a fundamental limit shared by both reversible and non-reversible PT methods, and on our recent work that leverage non-linear annealing paths to provably and practically break that barrier.
My group is also interested in making these advanced non-reversible Monte Carlo methods easily available to data scientists. To do so, we have designed a Bayesian modelling language to perform inference over arbitrary data types using non-reversible, highly parallel algorithms.
References:
Non-Reversible Parallel Tempering: a Scalable Highly Parallel MCMC Scheme (2021). S. Syed, A. Bouchard-Côté, G. Deligiannidis, A. Doucet. Journal of Royal Statistical Society, Series B. https://rss.onlinelibrary.wiley.com/doi/10.1111/rssb.12464
Parallel Tempering on Optimized Paths (2021). S. Syed, V. Romaniello, T. Campbell, A. Bouchard-Côté. International Conference on Machine Learning (ICML). http://proceedings.mlr.press/v139/syed21a/syed21a.pdf
Software: Blang: Probabilitistic Programming for Combinatorial Spaces. A. Bouchard-Côté, K. Chern, D. Cubranic, S. Hosseini, J. Hume, M. Lepur, Z. Ouyang, G. Sgarbi. Journal of Statistical Software (Accepted). https://arxiv.org/abs/1912.10396, https://www.stat.ubc.ca/~bouchard/blang/
Webpage: https://www.stat.ubc.ca/~bouchard/index.html
Arnaud Guyader (LPSM, Sorbonne Université) - On the Asymptotic Normality of Adaptive Multilevel Splitting
Adaptive Multilevel Splitting (AMS) is a Sequential Monte Carlo method for Markov processes that simulates rare events and estimates associated probabilities. Despite its practical efficiency, there are almost no theoretical results on the convergence of this algorithm. The purpose of this talk is to prove both consistency and asymptotic normality results in a general setting. This is done by associating to the original Markov process a level-indexed process, also called a stochastic wave, and by showing that AMS can then be seen as a Fleming-Viot type particle system. This is a joint work with Frédéric Cérou, Bernard Delyon, and Mathias Rousset.
Webpage: https://www.lpsm.paris/pageperso/guyader/index.html
Estelle Kuhn (INRAE, Unité MaIAGE) - Properties of the stochastic approximation EM algorithm with mini-batch sampling
To deal with very large datasets a mini-batch version of the Monte Carlo Markov Chain Stochastic Approximation Expectation– Maximization algorithm for general latent variable models is proposed. For exponential models the algorithm is shown to be convergent under classical conditions as the number of iterations increases. Numerical experiments illustrate the performance of the mini-batch algorithm in various models. In particular, we highlight that mini-batch sampling results in an important speed-up of the convergence of the sequence of estimators generated by the algorithm. Moreover, insights on the effect of the mini-batch size on the limit distribution are presented. Finally, we illustrate how to use mini-batch sampling in practice to improve results when a constraint on the computing time is given.
Reference: Journal version, ArXiv version
Webpage: http://genome.jouy.inra.fr/~ekuhn/
Julyan Arbel (INRIA Grenoble) - Understanding Priors in Bayesian Neural Networks at the Unit Level
We investigate deep Bayesian neural networks with Gaussian weight priors and a class of ReLU-like nonlinearities. Bayesian neural networks with Gaussian priors are well known to induce an L2, “weight decay”, regularization. Our results characterize a more intricate regularization effect at the level of the unit activations. Our main result establishes that the induced prior distribution on the units before and after activation becomes increasingly heavy-tailed with the depth of the layer. We show that first layer units are Gaussian, second layer units are sub-exponential, and units in deeper layers are characterized by sub-Weibull distributions. Our results provide new theoretical insight on deep Bayesian neural networks, which we corroborate with simulation experiments.
Webpage: https://www.julyanarbel.com/
Pierre E. Jacob (Harvard University) - Unbiased MCMC with couplings
MCMC methods yield estimators that converge to integrals of interest in the limit of the number of iterations. This iterative asymptotic justification is not ideal; first, it stands at odds with current trends in computing hardware, with increasingly parallel architectures; secondly, the choice of "burn-in" or "warm-up" is arduous. This talk will describe recently proposed estimators that are unbiased for the expectations of interest while having a finite computing cost and a finite variance. They can thus be generated independently in parallel and averaged over. The method also provides practical upper bounds on the distance (e.g. total variation) between the marginal distribution of the chain at a finite step and its invariant distribution. The key idea is to generate "faithful" couplings of Markov chains, whereby pairs of chains coalesce after a random number of iterations. This talk will provide an overview of this line of research.
Reference: https://arxiv.org/abs/1708.03625. Code in R available at: https://github.com/pierrejacob/unbiasedmcmc.
Scott Sisson (UNSW) - Approximate posteriors and data for Bayesian inference
For various reasons, including large datasets and complex models, approximate inference is becoming increasingly common. In this talk I'll provide three vignettes of recent work. These cover
approximate Bayesian computation for Gaussian process density estimation
likelihood-free Gibbs sampling
MCMC for approximate (rounded) data.
François Portier (Télécom Paris) - On adaptive importance sampling: theory and methods
Adaptive importance sampling (AIS) uses past samples to update the sampling policy qt at each stage t. Each stage t is formed with two steps:
to explore the space with nt points according to qt ;
to exploit the current amount of information to update the sampling policy.
In this talk, I will present different AIS methods and show that they are optimal in some sense.
Grégoire Clarté - Component-wise approximate Bayesian computation via Gibbs-like steps
Approximate Bayesian computation methods are useful for generative models with intractable likelihoods. These methods are however sensitive to the dimension of the parameter space, requiring exponentially increasing resources as this dimension grows. To tackle this difficulty, we explore a Gibbs version of the ABC approach that runs component-wise approximate Bayesian computation steps aimed at the corresponding conditional posterior distributions, and based on summary statistics of reduced dimensions. While lacking the standard justifications for the Gibbs sampler, the resulting Markov chain is shown to converge in distribution under some partial independence conditions. The associated stationary distribution can further be shown to be close to the true posterior distribution and some hierarchical versions of the proposed mechanism enjoy a closed form limiting distribution. Experiments also demonstrate the gain in efficiency brought by the Gibbs version over the standard solution.
Reference: arxiv.org/abs/1905.13599