Data Sets
Kaggle Data Sets
Kaggle is a website for data science competitions that hosts thousands of data sets and code examples. It is a great resource for those starting out with data science. Examples of data sets that may be of interest to students in EEPS 1960D:
additional data sets are available at Kaggle Data Sets
Awesome Public Data Sets
GitHub repository [link] of public data sets, including "Climate + Weather" and "Earth Science" categories. Some examples listed below:
UCI Machine Learning Repository
Collection of data sets [link] used in published machine learning papers. Some examples of interest:
Brown University Library
Open Data Resources, includes data in categories "Earth and Geoscience" and "Environment."
Academic Data Sets
CloudCase: A large-scale dataset and baseline for forecasting clouds
WildfireDB: A Spatio-Temporal Dataset Combining Wildfire Occurrence with Relevant Covariates
FOWD: Free Ocean Wave Dataset for Data Mining and Machine Learning
Harvard Dataverse: a repository for research data
Other Data Sets or Repositories
ClimateNet Datasets (NERSC)
Google Dataset Search: search engine for data
Google Earth Engine Data Sets: a planetary-scale platform for Earth Science data and analysis
Registry of Open Data on AWS: open geospatial data
National Centers for Environmental Information - Data Access: climate and weather data (NOAA)
Paleoclimatology, Marine/Ocean, Model data, etc.
xBD Dataset: annotated high-resolution satellite imagery for building damage assessment
iWildCam 2020 Competition: camera trap images
DeepWeeds: a multi-class weed species image classification data set
xView: aerial imagery
AI for Good Foundation (data available for some projects)
TensorFlow Datasets: ready-to-use datasets for machine learning in Python (e.g. BigEarthNet, EuroSAT)
Earth System Science Data: data publishing journal
Data.gov: US government's Open Data
COVID + Atmosphere data sets (Caltech)
Gridded Climate Data Sets (Physical Sciences Laboratory, NOAA)
New data sets
Weather4Cast: Multi-sensor Weather Forecast Competition
HumAID: human-annotated disaster incidents data from Twitter [paper] [dataset]
CrisisBench: crisis-related social media datasets for humanitarian information processing [paper] [dataset]
This list of data sets has been made available as a resource. I did not create or maintain any of these data sets, so I can not guarantee their accuracy or availability.
If you are familiar with other interesting data sets related to Earth, Environmental or Planetary Sciences, please email the course staff and we'll add them to the list!