We present a supervised learning framework of training generative models for density estimation. Generative models, including generative adversarial networks, normalizing flows, variational auto-encoders, are usually considered as unsupervised learning models, because labeled data are usually unavailable for training. Despite the success of the generative models, there are several issues with the unsupervised training, e.g., requirement of reversible architectures, vanishing gradients, and training instability.
To enable supervised learning in generative models, we utilize the score-based diffusion model to generate labeled data. Unlike existing diffusion models that train neural networks to learn the score function, we develop a training-free score estimation method. This approach uses mini-batch-based Monte Carlo estimators to directly approximate the score function at any spatial-temporal location in solving an ordinary differential equation (ODE), corresponding to the reverse-time stochastic differential equation (SDE). This approach can offer both high accuracy and substantial time savings in neural network training. Once the labeled data are generated, we can train a simple fully connected neural network to learn the generative model in the supervised manner.
Compared with existing normalizing flow models, our method does not require to use reversible neural networks and avoids the computation of the Jacobian matrix. Compared with existing diffusion models, our method does not need to solve the reverse-time SDE to generate new samples. As a result, the sampling efficiency is significantly improved. We demonstrate the performance of our method by applying it to a set of 2D datasets as well as real data from the UCI repository.
Figure: Illustrating why the ODE model, instead of the SDE model, can be used to generate the labeled data for the supervised learning of the generator. Although both the ODE model (top) and the SDE model (bottom) can map the standard Gaussian distribution at the state of T=1 to the target distribution at the state of T=0, the relationship between the state at T=1 and the state at T=0 are completely different for the two models. For the SDE model, the relationship between the states at T=1 and T=0 are purely random (the bottom right plot) due to the use of the SDE transport. This makes the reverse SDE infeasible to generate the labeled data to train a NN to learn such a randomness. In comparison, the ODE model defines a very smooth function between the states at T=1 and T=0 (the top right plot). This nice relationship suggests that the ODE model can be reliably used to generate the labeled data for the supervised learning of the generative model F.
Y. Liu, M. Yang, Z. Zhang, F. Bao, Y. Cao, G. Zhang, Diffusion-model-assisted supervised learning of generative models for density estimation, Journal of Machine Learning for Modeling and Computing, 5(1), pp. 25-38, 2024.
D. Lu, Y. Liu, Z. Zhang, F. Bao, G. Zhang, A diffusion-based uncertainty quantification method to advance E3SM model calibration, Journal of Geophysical Research: Machine Learning and Computation, 1, pp. e2024JH000234, 2024.