Geospatial Interpolation - Wind Speed

Travis Zalesky

As part of UA GIST 602B

Abstract

Introduction

Methods

Data Acquisition

Preprocessing

Inverse Distance Weighted (IDW)

Figure 1. Arizona Meteorological Network (AZMET) weather stations distributed across Southern and Central Arizona.

Abstract

Average wind speed data was obtained from The Arizona Meteorological Network weather stations for the date 5/14/2023. Using three interpolation methods, inverse distance weighted, spline, and Kriging, the average wind speed was predicted across Arizona as a continuous field with a raster cell size of approximately 111m. The interpolation fields were compared to the measured data distribution, as well as compared to a testing data set of known values. Additionally, a semivariogram was created for the wind speed data, and a variety of additional summary statistics were produced for the Kriging method, which can also be used to draw conclusions about the other interpolation methods generally. The strengths and weaknesses of each method are discussed, and an optimal model is chosen.

Introduction

The weather is a continuous phenomenon across the Earth's surface. It is happening everywhere, all the time. This may seem like an obvious statement, but it begs the question, how do you measure something that is everywhere? Recent decades have seen an explosion of interest in use of remotely sensed satellite data for continuous weather monitoring, and for good reason (Thies & Bendix, 2011). However, satellites are expensive, subject to interference from atmospheric diffraction, and are dependent on ground measurements for calibration. For all these reasons, and more, people still rely on discrete, point measurements of weather phenomena collected from on-the-ground weather stations. One such network of discrete weather stations is The Arizona Meteorological Network (AZMET) operated by the University of Arizona, College of Agriculture and Life Sciences (Arizona Meteorological Network, 2024).

AZMET has been continuously recording weather data from discrete locations in Arizona (AZ) since 1987, and currently has 28 operational weather stations throughout South and Central AZ (Fig. 1). Active weather stations range in elevation from 28m (92ft, Yuma South) to 1479m (4852ft, Payson). Although there has historically been an AZMET weather station in Flagstaff, there are currently no active weather stations in Northern AZ.

So how is weather data collected at multiple (discrete) locations applied across an area to draw conclusions about places where no measurements have been taken? Through a process known as geospatial interpolation.

Geospatial interpolation is a broad term that encompasses a variety of mathematical models, that take advantage of Tobler's 1st Law of Geography, which states "everything is related to everything else, but near things are more related than distant" (i.e., spatial autocorrelation; Tobler, 2004). Using this insight, mathematical models can be developed which (generally) use distance as a weighting factor as they attempt to "smooth out" the data between measured points. While there are innumerable variations on these interpolation functions, they typically fall into one of three categories, Inverse Distance Weighting (IDW), spline, and Kriging (pronounced kric-king; O’Sullivan & Unwin, 2014).

Using an example data set of average daily wind speed collected from 28 AZMET weather stations on 5/14/2023 I will explore a variety of geospatial interpolation methods, demonstrating the use of each of the aforementioned interpolation methods, and evaluating the results of each method. The applicability of each method for average wind speed in AZ will be discussed, and an optimal model will be chosen.

Methods

Data Acquisition

Daily AZMET weather station data for 5/14/2023 was downloaded in bulk from the AZMET API utilizing R v4.3.2 (programming environment; R Core Team, 2021) and the "azmetr" package (Weiss & Scott, 2024). Wind speed data was collected and recorded every 10 seconds using either a Met One Model 014A (±1.5%) or an RM Young Wind Sentry Model 03101-5 (±0.5 m/s) anemometer located 3m above ground level (Arizona Meteorological Network, 2024). Daily averages are calculated from the mean of the daily recordings (i.e., 8,640 10-second readings).

AZMET location data was scraped from https://ag.arizona.edu/azmet/ using a custom R scrip (R Core Team, 2021), using packages "RCurl", "XML", and "tidyverse" (Lang, 2024a; Lang, 2024b; Wickham et al., 2019). All data used in this analysis is publically available, and R scripts for data acquisition have been uploaded to my GitHub repository.

State and county data were obtained from ESRI Living Atlas (online), and were used exclusively for cartography purposes. All layer metadata is summarized in Table 1.

AZMET MetaData Table

Preprocessing

Prior to analysis, weather data was joined to weather station location data. The data table was reduced to a more manageable size by removing extraneous weather data, not related to wind speed. The data table was imported into ArcGIS Pro v3.2.2, and the weather station data was converted to a point feature class with the "XY Table to Point" tool. The distribution of average wind speed was assessed.

All data processing and cartography was performed in a NAD 1983 StatePlane Arizona Central FIPS 0202 (Intl. Feet) (EPSG:2223) projection. This is an N. to S. Transverse Mercator projection, and is the most suitable compromise projection for data spanning the width of AZ. This projection will minimize distance distortions for this dataset.

The full data set (28 weather stations) was randomly subset into training and testing categories, with 25 and 3 data points respectively, using the "Subset Features" geoprocessing tool (Fig. 2). The three, randomly determined, testing stations are Payson, Tucson, and Bowie (Fig. 1). All subsequent analysis herein were performed on the training stations dataset, with the testing stations reserved for post hoc model evaluation, unless otherwise stated.

Figure 2. Workflow demonstration for ESRI "Subset Features" tool, used to randomly determine training and testing data points.

Figure 3. Workflow demonstration for ESRI "IDW" tool, used to calculate an inverse distance weighted interpolation field for average wind speed.

Inverse Distance Weighted (IDW)

The IDW method utilizes a weighted average function such that the weight between any known location and an unknown point is equal to the inverse of the distance between the two points, raised to some (user defined) power (Eq. 1). The function typically either utilizes a fixed number of nearest neighbors, or a search distance to define points used for the local weighted average. Additionally, the Euclidean distance between points is frequently raised to some power to increase the decay rate of the weighting factor as desired. Often this will take the form of 1/d2 a form of the IDW function sometimes referred to as inverse distance squared. A special case of the IDW equation is applied any time that the interpolation surface is coincident with a measured data point (i.e., dij = 0) such that the IDW result equals the measured data. This avoids the issue of an undefined (divide by 0) weight, and has the added benefit that it perfectly matches known data values. Because IDW is a weighted average function, it will inherently retain minimum and maximum values within the dataset, as interpolated values will trend towards the mean. Depending upon the use case, this could be considered an advantage or disadvantage of IDW.

Utilizing the "IDW" spatial analyst tool, I've used an inverse distance squared function with 12 nearest neighbors, and a raster cell size of 0.001 decimal degrees (about 111 m, 364 ft) for interpolating wind speed (Fig. 3).

Spline

The spline method involves generating a complex mathematical function which aims to generate the "smoothest possible" interpolation surface, while maintaining all known (measured) points. This method has the advantage of creating inherently visually appealing surfaces, especially when point density is relatively high. However, spline functions do not honor the minimum or maximum values of a data set, which may result in unlikely predictions, particularly when the distance between points is large.

The "Spline" spatial analysis tool was used to generate a continuous field of average wind speed using a regularized (smoothest) spline function across the 12 nearest neighbors with the default weight value (0.1, unitless). As with the IDW the output raster cell size was defined as 0.001 decimal degrees (about 111 m, 364 ft).

Figure 4. Workflow demonstration of the ESRI "Spline" tool, used to calculate an interpolation field for average wind speed.

Kriging

Kriging, named after Dani Krige, and further refined by Georges Matheron, is a sophisticated form of distance weighting interpolation that uses the semivariance (a measure of spatial autocorrelation) to empirically define the weighting function, thereby reducing the arbitrary nature of the user defined IDW function (O’Sullivan & Unwin, 2014). The semivariance is a function of the distance between two points (dij) and the corresponding difference between some measured value at those two points squared, in our case average wind speed (wij2). The semivariance can be summarized graphically through use of the semivariogram, in which the points are typically binned by distance (lag), and represented as a box plot. Figure 5 shows an example semivariogram of elevation, taken from O’Sullivan & Unwin (2014) Figure 10.8.

Once the semivariance has been calculated, the dataset can be modeled with a line of best fit, using one of a variety of common functions, including linear, circular, etc. The model intercept is known as the "kernel", and is representative of the minimum model error. Most semivariogram models will also have a plateau, beyond which the data is no longer well autocorrelated. This model defines the distance weights to be used in the Kriging interpolation function.

Figure 5. O’Sullivan & Unwin (2014), Figure 10.8, example semivariogram of elevation in a theoretical study area.

Figure 6. Workflow demonstration of the ESRI "Empirical Bayesian Kriging" tool, used to calculate an interpolation field for average wind speed.

Due in part to the need to compute the semivariance of the datasets, Kriging is most suitable for large data sets. In ESRI ArcGIS Pro, the need for large numbers of measurements can be somewhat alleviated through use of Empirical Bayesian Kriging (EBK). In EBK a semivariogram is calculated as previously described. Then the EBK algorithm uses the modeled weight function to estimate simulated values at each location. A new semivariogram is calculated using the simulated data, and the model error is calculated. This process is repeated a large number of times, and an optimal weights model is chosen from the many semivariograms (What Is Empirical Bayesian Kriging?, 2024). This is a computationally intensive process, but the results are typically more accurate, with smaller standard errors, than ordinary Kriging methods.

Euclidean (planar) distances between weather station coordinates (WGS 84) were calculated in R for every possible pair of stations (378 possible combinations) using the "terra" package (Hijmans, 2023), and the standard semivariogram was plotted for the entire data set (training and testing data).

Due to the relatively small sample size of the average wind speed dataset, EBK was employed using the self-named ESRI geostistical analyst tool (Fig. 6). As before, the output raster had a cell size of 0.001 decimal degrees (about 111 m, 364 ft). There was no data transformation applied, and the semivariogram model type "Power" was used, which is of type γ(h)= Nugget + b|h|α, where Nugget is the y-intercept, b is the slope coefficient, and α is some power between 0.25 and 1.75. The semivariogram algorithm was iterated 100 times, and an optimal neighborhood search radius was set with between 10 and 15 nearest neighbors. Additional EBK statistics were provided by the ESRI "Geostatistical Wizard".

Results

The distribution of AZMET average daily wind speed values across AZ is reasonably normal, with a slight positive skew, and very slightly leptokurtic (Fig. 7). The average wind speed ranged from 1.2 m/sec to 5.2 m/sec. The skewness and kurtosis of the data were 1.17 and 3.68 respectively (unitless).

Figure 7. The histogram and summary statistics of daily average wind speed collected across Arizona Meteorological Network (AZMET) weather stations on 5/14/2023.

The minimum and maximum average wind speed values of the measured dataset (training data only) were 1.2 m/sec (2.7 mph) and 5.2 m/sec (11.6 mph) respectively. The average wind speed was 2.54 m/sec (5.7 mph), with a standard deviation (SD) of 0.99 (Table 2; Fig. 7). The IDW function honors the minimum and maximum values of the input data set. Its average wind speed was 2.60 m/sec (5.8 mph), with a SD of 0.60. The spline method returned an illogical minimum wind speed, -34.79 m/sec (-77.8 mph), and a highly improbable maximum wind speed, 23.71 m/sec (53 mph). However, the mean was a reasonable 2.89 m/s (6.5 mph), with a SD of 7.96. The EBK method found a minimum and maximum wind speed of 1.32 m/sec (3 mph), and 5.01 m/sec (11.2 mph) respectively. The mean of the Kriging method was 2.67 m/sec (6 mph), with a SD of 0.67. The summary statistics of each interpolation field are recorded in Table 2.

Table 2

Table 3

The raster values of the various interpolation fields were extracted at each of the testing station locations using the ESRI "Extract Multi Values to Points" spatial analysis tool (Fig. 8). The results are compared to the known values in Table 3. All three models predicted the wind speed values accurately at Tucson and Bowie. However, all three models also overpredicted wind speed at Payson. The IDW method had the lowest overall error, however the Kriging method was a close second. The spline method severely overpredicted the wind speed at Payson, by > 200%.

Figure 8. Workflow demonstration of the ESRI "Extract Multi Values to Points" tool, used to extract predicted wind speeds at testing station locations.

Additional statistics were calculated for the Kriging model. The standard semivariogram shows little trend in the mean wind speed difference, although the quartiles and outliers do tend to increase with distance (Fig. 9). There is somewhat of a concave upward curve, up to 500 Km.

Several advanced statistics for the EBK model are summarized in Figure 10. The standard error map clearly indicates a rapid increase in the error rate towards the North. The leave-out-one error test is a feature of the ESRI EBK tool, which will remove data points, one at a time, from a dataset and use a given Kriging model to predict the value at that point. The predicted value is then compared to the measured value, and the process is repeated for all data points. The results of the leave-out-one error test are shown in Figure 10 (top-left) as well as the histograms of both the prediction values and the error values (center-left and center-right respectively). The prediction values are highly skewed, however the error values closely approximate a normal distribution. Finally, the semivariogram cloud is shown with measured values represented by blue crosses, and all the simulated EBK semivariograms shown in light gray (bottom-left). The optimal, least-error, Kriging weights model is indicated in magenta.

Figure 9. The standard semivariogram of average daily wind speed for 5/14/2024 (all stations). Variable box plot width relative to the number of data points per bin (range 21 to 115).

Emperical Bayesian Kriging advanced statistics.

Figure 10. Advanced statistical output of the Empirical Bayesian Kriging (EBK) model provided by the ESRI "Geospatial Wizard". Top-left: The standard error field for the EBK model. Top-right: The predicted and measured values of the leave-out-one error test for the best fit Kriging model. Center-left: The histogram of predicted values for the leave-out-one error test for the best fit Kriging model. Center-right: The histogram of the error values (difference between predicted and measured values) for the leave-out-one error test for the best fit Kriging model. Bottom-left: The semivariogram cloud of 100 EBK simulated semivariograms. Measured data values represented by "+" with the best fit weighting model indicated in magenta.

All three interpolation prediction fields are plotted in Figure 11. Graphically, the spline method appears to be the smoothest, particularly in the North. However, from the summary statistics we know that much of the cyan and magenta (lowest and highest color values respectively) are far below/above the shown legend values and are either illogical or extremely unlikely. Conversely, the IDW and Kriging methods produced visually similar results with a similar range of values in the South/Central regions, and similar (but distinct) artifacts in the North. Looking closely, the Kriging method produces a smoother surface in the South/Central regions, while the IDW method resulted in several distinct "bulls-eyes" that are typical of this method. This is particularly apparent by comparing the two fields in the region from Safford to Queens Creek, and again in the area around Sahuarita.

Figure 11. Graphical comparison of interpolation fields generated via three different methods, (A) inverse distance weighted, (B) spline, and (C) empirical Baysean Kriging.

Conclusions

The raw average wind speed values are reasonably close to normally distributed, which is a good indication that this dataset is generally suitable for interpolation (Fig. 7). However, the data set is relatively small and would likely benefit from additional measurements, especially in N AZ. Both the IDW and Kriging models resulted in accurate prediction fields in the South/Central regions. The spline model also performed reasonably well, however the large region in N. AZ without any measured values resulted in extreme, illogical, prediction values.

The standard semivariogram of all data points shows a slight upward concave shape. Although subtle, this could be an indication of nonstationarity of conditions (O’Sullivan & Unwin, 2014). Kriging in particular can be sensitive to nonstationay data, however the results of the testing data, combined with the normal distribution of the Kriging leave-one-out error, do not seem to indicate a serious problem with the model.

The Kriging model likely benefited from the use of Empirical Bayesian Kriging technique. The use of simulated semivariograms and leave-one-out testing reduced the number of measured data points required. In fact, I was unable to successfully run the ESRI ordinary "Kriging" spatial analysis tool on this dataset, most likely due to the relatively small number of points.

Unfortunately, the Payson weather station was randomly selected for the testing data set. The removal of this station, in central AZ, representing the most NW active AZMET weather station, very likely limited the predictive power of all tested interpolation models in N-Central AZ. This is further supported by the percent error rate of all three models in predicting Payson, relative to the other testing stations (Table 3). While the Kriging method predicted the Payson wind speed most accurately, all models would likely improve if Payson were replaced as a testing station with a station in a more densely packed region, such as Salome or Harquahala.

It is apparent that all three models failed to accurately predict wind speed values in N AZ. Although it is impossible to quantify the accuracy of predicted values in this region without measured data, it is clear that this data should not be relied on. It is recommended that predicted values NE of the line between Mohave and Safford be considered highly imprecise (Fig. 11). However, if Payson weather station could be included in re-analysis it is likely that this theoretical line could be extended N, perhaps as far north as I-40 (not shown).

All things considered, I would recommend the Kriging model for this data set. Although the IDW had a lower error rate at the testing station locations, the difference in error rate between these two models was marginal (Table 2). Additionally, the Kriging model was more accurate at the Payson station (Table 3), and the prediction field was smoother, particularly around S. Central AZ (Figure 11). Furthermore, the SD of the Kriging predictions more closely resembled the distribution of measured values.

Using the Kriging model it can be seen that on this day (5/14/2023), the wind speeds were higher in the E, while Central AZ, and especially W AZ were relatively still.

References

Arizona Meteorological Network. (2024). Arizona Meteorological Network (AZMet) Data. https://azmet.arizona.edu

Hijmans, R. J. (2023). terra: Spatial Data Analysis. https://CRAN.R-project.org/package=terra

Knowing the Unknowable: The Statistics of Fields. (2014). In D. O’Sullivan & D. J. Unwin, Geographic information analysis, 2nd edition (2nd ed., pp. 277–314). Wiley. https://doi.org/10.1002/9780470549094

Lang, D. T. (2024). RCurl: General Network (HTTP/FTP/...) Client Interface for R. https://CRAN.R-project.org/package=RCurl

Lang, D. T. (2024). XML: Tools for Parsing and Generating XML Within R and S-Plus. https://CRAN.R-project.org/package=XML

R Core Team. (2021). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R-project.org/

Thies, B., & Bendix, J. (2011). Satellite based remote sensing of weather and climate: Recent achievements and future perspectives. Meteorological Applications, 18(3), 262–295. https://doi.org/10.1002/met.288

Tobler, W. (2004). On the First Law of Geography: A Reply. Annals of the Association of American Geographers, 94(2), 304–310. https://doi.org/10.1111/j.1467-8306.2004.09402009.x

Weiss, J., & Scott, E. (2024). azmetr: Access Arizona weather data from the AZMet API. https://doi.org/10.5281/zenodo.7675685

What is empirical Bayesian kriging? (2024, April 9). [Help documentations]. ArcGIS Pro. https://pro.arcgis.com/en/pro-app/latest/help/analysis/geostatistical-analyst/what-is-empirical-bayesian-kriging-.htm

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Bonus Image

The chosen Kriging model for wind speed was combined with a simple IDW interpolation for wind direction to create a wind vector model (Fig. 12).

Figure 12. A wind vector map of SE AZ created by combining the best Kriging average wind speed prediction field with a simple IDW wind direction prediction field. Arrow size and color represent wind speed, and arrow orientation shows wind direction.

Page updated

Google Sites

Report abuse