Bayesian Analysis of Dependent Non-Gaussian Data
The Conjugate Multivariate Distribution:
Suppose Z is distributed according to the natural exponential family, then
f(Z|Y) = exp{ZY - b ψ(Y) + c(Z)},
where f denotes a generic probability density function/probability mass function (pdf/pmf). The function bψ(Y) is often called the log partition function, and exp{c(Z)} is a normalizing constant. It follows from Diaconis and Ylvisaker (1979) that the conjugate prior distribution for Y is given by,
f(Y|a,k) = K(a,k) exp{a Y - k ψ(Y)},
where K(a, k) is a normalizing constant. Let DY(a;k; ψ) denote a shorthand for the pdf in (2). Here “DY” stands for “Diaconis-Ylvisaker.” It is immediate from the previous two equations that Y|Z ~ DY(a+Z;k +b; ψ). This conjugacy motivates the development of a multivariate version of the DY random variable to model dependent non-Gaussian data from the natural exponential family. Specifically, we let,
Z = mu + Vw,
where Z, mu, and w are n-dimensional random vectors, mu is unknown, w consists of iid DY random variables, and V is an n by n real-valued matrix representing the Cholesky decomposition of a covariance matrix. In our work we show that conditional conjugacy exists in this multivariate setting aswell. This allows one to improve, in terms of computation and predictive performance, upon the typical latent Gaussian process modeling strategy.
Articles
Bradley, JR, and Clinch, M (2023). Generating Independent Replicates Directly from the Posterior Distribution for a Class of Spatial Latent Gaussian Process Models. arXiv preprint: https://arxiv.org/abs/2203.10028
Bradley, JR, Holan, SH, and Wikle, CK (2020). Bayesian Hierarchical Models with Conjugate Full-Conditional Distributions for Dependent Data from the Natural Exponential Family. Journal of the American Statistical Association.
Heli, G, Bradley, JR. (2019). Bayesian Analysis of Areal Data with Unknown Adjacencies Using the Stochastic Edge Mixed Effects Model (Spatial Statistics).
Bradley, JR, Wikle, CK, and Holan SH. (2019). Spatio-temporal Models for Big Multinomial Data using the Conditional Multivariate Logit Beta Distribution. Journal of Time Series Analysis.
Bradley, JR, Holan, SH, and Wikle, CK (2018). Computationally Efficient Distribution Theory for Bayesian Inference of High-Dimensional Dependent Count-Valued Data(with discussion). Bayesian Analysis. 13: 253 - 310. (Rejoinder: pp. 302 - 310).
Hu, G, Bradley, JR (2018). A Bayesian spatial-temporal model with latent multivariate log-gamma random effects with application to earthquake magnitudes" Stat.
Data:
Quarterly Workforce Indicators can be downloaded from the Longitudinal Employer-Household Dynamics (LEHD) program: http://ledextract.ces.census.gov/
Bayesian Models with Unknown Transformations:
Let Zij denote the observed data for i = 1,..., nj and j = 1, 2, 3. We consider the setting where for each i, Zi1 is continuous-valued, Zi2 is integer-valued ranging from 0,..., bi, and Zi3 is binary. It is assumed that Zi1, Zi2, and Zi3 are distributed according to different classes of probability density functions/probability mass functions (hence referred to as multiple-type responses). One classical strategy to model data of this type is to impose a transformation,
hj(Zij) | Yij, θ ~Dist (Yij, θ ), i = 1,…, nj, j = 1,2,3,
where hj is a transformation of the datum Zij, “Dist” is a short-hand used for a probability density function (pdf), gj{E(Zij} = Yij, θ is a real-valued parameter vector, and gj is known as a link function. Additionally, Yij is defined for i = 1,…, n, j = 1,2,3. Here, “Dist (Yij, θ )'' represents any preferred model for continuous data, and inference on Yij and θ is the primary goal.
Drop the functional notation for hj and write hij = hj(Zij). These transformations convert a multiple response type data set (e.g., { Zij }) to a single response type data set (e.g., { hij }), since hij follows a single distribution with a continuous support.
We introduce a Bayesian solution to the problem of an unknown transformation. In particular, we define pdfs and probability mass functions (pmf), f(Zij | hij) and f(hij).
Articles:
Bradley, JR, Zhou, S, and Liu, X. (2023). Deep Hierarchical Generalized Transformation Models for Spatio-Temporal Data with Discrepancy Errors. Spatial Statistics. 55: 100749.
Bradley, JR, (2022). Joint Bayesian Analysis of Multiple Response-Types Using the Hierarchical Generalized Transformation Model. Bayesian Analysis.
Yang, H. -C., Bradley, JR, (2022). Bayesian Inference for Spatial Count Data that May be Over-Dispersed or Under-Dispersed with Application to the 2016 US Presidential Election. Journal of Data Science.
S. Nandy, S. H. Holan, J. R. Bradley, C. K. Wikle (2022). Bayesian Hierarchical Models For Multi-type Survey Data Using Spatially Correlated Covariates Measured With Error. arXiv:2211.09797
Data: