Our Data

Datasets available on this website are licensed under CC-BY Attribution 4.0. Attribution must be provided for the use of any of our data. The specific article you are required to reference for each dataset is denoted below.

53,677 Labeled Aerial Images of Road Infrastructure (Road Roughness)

Description: This project sought to explore the degree to which we can estimate road quality from high resolution satellite imagery, with training labels collected during 2020 and 2021 via a custom android application and dedicated android hardware.

Imagery: Each image is a (variably sized) crop of road imagery, representing an aerial image of a road segment for which road roughness information was collected. Imagery was collected as a part of the Virginia Basemap Program, and provided as both 8 and 16 bitmaps. Files are labeled following a [Date]_[ID]_[bitmap].png format, with [ID] being a value that can link to the labels and [bitmap] being a value of either 8 or 16.

Labels: Two labels are provided for each image - a classification (0, 1, or 2) representing road quality, with 0 values indicating low quality, 1 denoting mid-quality, and 2 indicating high quality; the second label is a continuous metric of road quality derived more directly from the roughness sensors. More information on how these values were derived can be found at the below citation (under requirements for use).

Requirements for Use: Any use of these data should cite Brewer, E., Kemper, P., Lin, J., Hennin, J., and Runfola, D. 2021. Predicting Road Quality using High Resolution Satellite Imagery: A Transfer Learning Approach. PLoS One. https://doi.org/10.1371/journal.pone.0253370

5,505 Labeled Satellite Images of Schools (Test Scores)

Description: This data was used (in part) to explore the capability of satellite imagery to be used in the estimation of school test scores to offset gaps in measurements throughout much of the developing world.

Imagery: Each image is a 256x256 pixel crop of the school and surrounding areas retrieved from the Landsat satellite series. Imagery was collected from circa September of 2013 to June of 2014, with a single cloud-free composite being constructed from the full time series. Imagery is named in the format "[ID]_[TimePeriod].png", in which [ID] is the ID of the school (cross-linked with the labels CSV file), and [TimePeriod] is the range of dates the imagery was retrieved for.

Labels: Each image is labeled with a value ranging from 0 to 40, representing the average test score of all students within a given school in the Philippines. This data was retrieved through Freedom of Information requests made to the Philippines government for information on the Philippines' 2013-2014 National Achievement Test.

Requirements for Use: Any use of these data should cite Runfola, D., Stefanidis, A., Baier, H., 2021. Using Satellite Data and Deep Learning to Estimate Educational Outcomes in Data Sparse Environments. Remote Sensing Letters 13(1). https://doi.org/10.1080/2150704X.2021.1987575

Global Political Administrative Boundaries (geoBoundaries)

Description: Built by the community and William & Mary geoLab, the geoBoundaries Global Database of Political Administrative Boundaries Database is an online, open license resource of boundaries (i.e., state, county) for every country in the world. We currently track approximately 1 million boundaries within over 200 entities, including all UN member states. All boundaries are available to view or download in common file formats, including shapefiles.

Requirements for Use: Any use of geoBoundaries should cite or otherwise acknowledge Runfola, D. et al. (2020) geoBoundaries: A global database of political administrative boundaries. PLoS ONE 15(4): e0231866. https://doi.org/10.1371/journal.pone.0231866. More specific usage examples can be seen at geoboundaries.org.

Yearly CO2 Concentrations (Raster)

Description: The raster files available for download here are modeled based on daily global measures of CO2 concentration available from NASA JPL's Orbiting Carbon Observatory-2 (OCO2; see figure 1 for an example of coverage). CO2 concentration is defined as the average concentration of carbon dioxide in a column of dry air extending from Earth’s surface to the top of the atmosphere. The daily measures from OCO2 are concatenated for an entire year and then aggregated to a regular global 10km grid. Points flagged in the OCO2 data as lower quality are omitted. The resulting grid is then interpolated using linear interpolation to fill gaps, and produce a final yearly raster surface. Units of the raster are the CO2 concentration in parts per million (ppm). The underlying data were produced by the OCO-2 project at the Jet Propulsion Laboratory, California Institute of Technology, and obtained from the OCO-2 data archive maintained at the NASA Goddard Earth Science Data and Information Services Center. Replication code is available at https://github.com/wmgeolab/geo-datasets/tree/master/oco2 .

Requirements for Use: This dataset should be cited when used. An example of how to cite this dataset is: Goodman, S., Runfola, D.M. (2019), Global Carbon Dioxide Concentration: 2015-2018. http://geolab.wm.edu/. Accessed On: April 25th, 2019.

NASA Jet Propulsion Laboratory
OCO-2 Lite Version 9
O'Dell, C. W., Connor, B., Bösch, H., O'Brien, D., Frankenberg, C., Castano, R., Christi, M., Eldering, D., Fisher, B., Gunson, M., McDuffie, J., Miller, C. E., Natraj, V., Oyafuso, F., Polonsky, I., Smyth, M., Taylor, T., Toon, G. C., Wennberg, P. O., and Wunch, D.: The ACOS CO2 retrieval algorithm – Part 1: Description and validation against synthetic observations, Atmos. Meas. Tech., 5, 99-121, https://doi.org/10.5194/amt-5-99-2012, 2012.