Spatial Statistics

Abstract: We propose a method of spatial prediction using count data that can be reasonably modeled assuming the Conway-Maxwell Poisson distribution (COM-Poisson). The COM-Poisson model is a two parameter generalization of the Poisson distribution that allows for the flexibility needed to model count data that are either over or under-dispersed. The computationally limiting factor of the COM-Poisson distribution is that the likelihood function contains multiple intractable normalizing constants and is not always feasible when using Markov Chain Monte Carlo (MCMC) techniques. Thus, we develop a prior distribution of the parameters associated with the COMPoisson that avoids the intractable normalizing constant. Also, allowing for spatial random effects induces additional variability that makes it unclear if a spatially correlated ConwayMaxwell Poisson random variable is over or under-dispersed. We propose a computationally efficient hierarchical Bayesian model that addresses these issues. In particular, in our model, the parameters associated with the COM-Poisson do not include spatial random effects (leading to additional variability that changes the dispersion properties of the data), and are then spatially smoothed in subsequent levels of the Bayesian hierarchical model. Furthermore, the spatially smoothed parameters have a simple regression interpretation that facilitates computation. We demonstrate the applicability of our approach using simulated examples, and a motivating application using 2016 US presidential election voting data in the state of Florida obtained from the Florida Division of Elections.

Articles:

  • Yang, H. -C., Bradley, JR, (2021+). Bayesian Inference for Spatial Count Data that May be Over-Dispersed or Under-Dispersed with Application to the 2016 US Presidential Election. Journal of Data Science.

Abstract: Prediction of a spatial process using a “big dataset” has become a topical area of research over the last decade. The available solutions often involve placing strong assumptions on the error process associated with the data. Specifically, it has typically been assumed that the data are equal to the spatial process of principal interest plus a mutually independent error process. This is done to avoid modeling confounded cross-covariances between the signal and noise within an additive model. We consider an alternative latent process modeling schematic where it is assumed that the error process is spatially correlated and correlated with the latent process of interest. We show that such error process dependencies allow one to obtain precise predictions, and avoids confounded error covariances within the expression of the marginal distribution of the data. We refer to these covariances as “non-confounded discrepancy error covariances.” Additionally, a “process augmentation” technique is developed to aid in computation. Demonstrations are provided through simulated examples and through an application using a large dataset consisting of the U.S. Census Bureau’s American Community Survey 5-year period estimates of median household income on census tracts.

Articles:

Abstract: Traditional conditional autoregressive (CAR) models use neighborhood information to define the adjacency matrix. Specifically, the neighborhoods are defined deterministically using the boundaries between the regions. However, covariates may inform the entries of the adjacency matrix and may not correspond to the nearest neighbor structure that is typically assumed. We propose a class of prior distributions for adjacency matrices, which incorporate covariates and can detect a relationship between two areas that do not share a boundary. Our approach is fully Bayesian, and involves a computationally efficient conjugate update of the adjacency matrix. To illustrate the high performance of our Bayesian hierarchical model, we present a simulation study, and an example using data made publicly available by the New York City Department of Health.

Articles:

  • Heli, G, Bradley, JR. (2019). Bayesian Analysis of Areal Data with Unknown Adjacencies Using the Stochastic Edge Mixed Effects Model (Spatial Statistics).

Abstract: Interest in online rating data has increased in recent years. Such data consists of ordinal ratings of products or local businesses provided by users of a website, such as Yelp or Amazon. One source of heterogeneity in ratings is that users apply different standards when supplying their ratings; even if two users benefit from a product the same amount, they may translate their benefit into ratings in different ways. In this article we propose an ordinal data model, which we refer to as a multi-rubric model, which treats the criteria used to convert a latent utility into a rating as user-specific random effects, with the distribution of these random effects being modeled nonparametrically. We demonstrate that this approach is capable of accounting for this type of variability in addition to usual sources of heterogeneity due to item quality, user biases, interactions between items and users, and the spatial structure of the users and items. We apply the model developed here to publicly available data from the website Yelp and demonstrate that it produces interpretable clusterings of users according to their rating behavior, in addition to providing better predictions of ratings and better summaries of overall item quality.

Articles:

Model and Predictor Selection for Spatial Data: Spatial statistical models are extremely complex and require a large number of tuning parameters. As a result, I feel that it is incumbent on us to evaluate the performance of these methods. Thus, my primary focus in the area of spatial-only statistics, is on model comparison and selection. In particular, I am interested in developing new criteria and ways to compare competing spatial statistical methods.

Articles:

Thesis: