Multiscale Spatio-Temporal Data

Spatial Change of Support:

Figure 1: In (a) we display the boundaries of the community districts in New York City. In (b) we display the boundaries of the census tracts in New York City. ACS produces estimates of key demographics by census tracts, but not by community districts. The key problem is to use the census tract based estimates to produce estimates defined on community districts.

Our methodology is motivated by the American Community Survey (ACS): An ongoing survey administered by the U.S. Census Bureau that provides timely information on several key demographic variables. The ACS produces 1-year, 3-year, and 5-year “period-estimates,” and corresponding margins of errors over different regions (e.g., census tracts, counties, states, etc.). By period-estimate we mean that the estimate was compiled using confidential micro-data recorded within a given time-period.

Although ACS provides data over many different geographies (e.g., census tracts, counties, states, etc.), it is often of interest to data users to be able to specify their own geography. For example, the Department of City Planning (DCP) in New York City uses ACS period estimates of poverty, and would like to visualize these estimates defined on community districts (see Figure 1). However, ACS doesn't provide estimates on community districts, and hence, the DCP is forced to visualize ACS period-estimates on census tracts (see Figure 2). This general problem is known as spatial change of support (COS).

Figure 2: In (a) we display 2012 ACS 5-year period estimates (defined on census tracts) of the number of individuals that lie below the poverty threshold; in (b) the corresponding ACS survey variance is plotted. In (c), we plot the posterior mean (using the methodology from Bradley et al. (2014)) of the number of individuals that lie below the poverty threshold by community districts; in (d) we provide the corresponding posterior variance.

To solve this problem, we provide many methodological developments, and for those interested I suggest reading our paper Bradley et al. (2014) for the details. To quickly summarize, these methodological advances include a novel parameter model to define spatial covariances, a way to incorporate ACS estimates of the survey variance, and a way to allow for efficient computation. These technical details are innovative, and lead to substantial improvements over alternative approaches (see Section 3 of Bradley et al. (2014)). Using the technology in Bradley et al. (2014) we are able to estimate of the number of individuals that lie below the poverty level by community districts; See Figure 2. We also provide associated measures of uncertainty in Figure 2(d).

Article:

News Articles:

Data:

Space-Time Change of Support Software:

A Criterion for Spatial Aggregation Error:

For a given variable, what spatial support should a data-user perform inference on? This is known as “regionalization” (Openshaw, 1977), and the answer to this inferential question is intimately related to the modifiable areal unit problem (MAUP).

To see how the MAUP might guide a regionalization over the spatial domain of interest consider the example from Bradley et al. (201). We specify a 4 by 4 grid in panel (a), and in the remaining panels, regionalizations that have 2 or 3 areal units are given. Many regionalizations tell a different story than what is told in panel (a) – that the largest value is in the lower right-hand corner of the grid, the smallest value is in the upper-left-hand corner of the grid, and intermediate values are given in the remaining grid cells. For some regionalizations, one might make this same conclusion, but still have noticeable MAUP error. For example, 2 (e) shows that larger (smaller) values are given in the lower right-hand (upper left-hand) corner of the grid, but the functional complexity of the values in 2(a) are missing.

Figure: Example. In panel (a) the values are (from left to right, top to bottom) 0, 5, 6, and 10. Different regionalizations based on 2 or 3 areal units are given in the remaining panels. Panels (b) through (h) assume 2 areal units, and panels (i) through (n) assume 3 areal units.

The regionalization that is closest in squared error to panel (a) is given by panel (i), where the top-right and bottom-left grid cells correspond to one areal unit and the sum (over the four grid cells) of squared differences between 2(a) and 2(i) is 0.5. Hence, for this example spatial aggregation error is quantified by minimizing the sum of squared difference between the lowest spatial resolution process (i.e., panel 2(a)) and the spatial process at higher resolutions (i.e., panels (b) – (n) of Figure 2). Bradley et al. (2014) formalized this idea to quantify the MAUP in a more general setting, where the lowest spatial resolution process can be observed at the point-level.

Bradley et al. (2014) considered many examples including federal data obtained from the American Community Survey, and Mediterranean wind measurements. In both of these examples, regionalizations were determined that respected the lower resolution information from the data.

Articles:

Data:

Regionalization Software: