Sub-Seasonal Climate Forecasting (SSF) Dataset
University of Illinois at Urbana-Champaign (UIUC) University of Minnesota, Twin Cities (UMN)
George Mason University (GMU) Carnegie Mellon University (CMU)
Overview
The SSF climate dataset is a benchmark dataset for training and evaluating machine learning models for sub-seasonal climate forecasting, including a variety of high-resolution climate variables over the atmosphere, ocean, and land from 1980 to recent.
The codebase developed for data extraction, preprocessing, and SSF model training and evaluation using the SSF dataset are publicly available at https://github.com/SSF-climate/SSF and https://github.com/Sijie-umn/SSF-MIP/.
Details
The SSF Dataset is constructed by climate variables representing the condition of atmosphere, land, and ocean from 1980 until the recent, which are collected from a diverse collection of data sources. In addition, we include some climate indexes monitoring El Niño-Southern Oscillation (Niño indexes) and North Atlantic Oscillation (NAO), etc. All the climate variables are interpolated to 0.5 degree latitude by 0.5 degree longitude grid and daily temporal resolution from 1980 to the recent. The climate variables of each year are saved as multi-index pandas DataFrames and can be read using python 3.
Spatiotemporal Climate Variables
tmp2m: 2 meter temperature
precip: precipitation
sst: sea surface temperature
slp: sea level pressure
icec: sea ice concentration
hgt10, hgt200, hgt500, hgt700: geopotential height at 10mb, 200mb, 500mb, and 700mb
rhum.sig995: relative humidity at level sig 995
sm: soil moisture
Temporal Climate Variables
mei: Multivariate ENSO Index Version 2
mjo_phase: Madden/Julian Oscillation phase
mjo_amplitude: Madden/Julian Oscillation amplitude
nao: North Atlantic Oscillation index
nino1+2, nino3, nino4, nino3.4: nino sst indicies
ssw: sudden stratospheric warming index
Spatial Climate Variables
elevation
masks: latitude and longitude coverage of the United States, North Atlantic, and North Pacific Ocean, etc.
SubX (NCEP-CFSv2 and GMAO-GEOS)
the climatology of tmp2m of weeks 3 and 4 computed from the corresponding hindcast periods
the SubX hindcasts of tmp2m anomalies of weeks 3 and 4 over the western U.S. (1999 - 2015)
the SubX forecasts of tmp2m anomalies of weeks 3 and 4 over the western U.S. (2017 - 2020)
the initialization dates in the forecast periods of NCEP-CFSv2 and GMAO-GEOS
Groundtruth (computed from the NOAA’s Climate Prediction Center (CPC) Global Gridded Temperature dataset)
the week 3 & 4 temperature anomalies over the western U.S. from 1990 to 2020
the climatology computed from 1990 to 2017 for each grid point over the western U.S. and each month-day combination
Preprocessed data
pca-zscore: The folder contains covariates after preprocessing (pca first then z-scored) from 1986 to 2018.
train-validation: The folder contains the training and validation data used for hyper parameter tuning.
train-test: The folder contain the training and test data used for tmp2m forecasting over 2017-2018
Sample Data
Sea Surface Temperature on Jan. 1. 2019
Multi-index DataFrame of
spatiotemporal climate variable
(sst in 2020)
DataFrame of temporal climate variable
(MJO amplitude)
Downloads
Please download the dataset using Google Drive.
The original sources of the climate variables can be found in the Appendix B.1 of "Sub-Seasonal Climate Forecasting via Machine Learning: Challenges, Analysis, and Advances and the Table 2 in Learning and Dynamical Models for Sub-seasonal Climate Forecasting: Comparison and Collaboration.
Citation
@inproceedings{ssf_dataset,
title={Sub-Seasonal Climate Forecasting via Machine Learning: Challenges, Analysis, and Advances},
author={He, Sijie and Li, Xinyan and DelSole, Timothy and Ravikumar, Pradeep and Banerjee, Arindam},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={35},
number={1},
pages={169--177},
year={2021}}
@article{ssf_mip,
title={Learning and Dynamical Models for Sub-seasonal Climate Forecasting: Comparison and Collaboration},
author={He, Sijie and Li, Xinyan and Trenary, Laurie and Cash, Benjamin and DelSole, Timothy and Banerjee, Arindam},
journal={arXiv preprint arXiv:2110.05196},
year={2021}
}
Faculty
Arindam Banerjee (UIUC)
Timothy DelSole (GMU)
Pradeep Ravikumar (CMU)
Benjamin A. Cash (GMU)
Laurie Trenary (GMU)
The SSF dataset is part of the Harnessing the Data Revolution (HDR) project on 'Physics-Based Machine Learning for Sub-Seasonal Climate Forecasting'.