Sub-Seasonal Climate Forecasting (SSF) Dataset

University of Illinois at Urbana-Champaign (UIUC) University of Minnesota, Twin Cities (UMN)

George Mason University (GMU) Carnegie Mellon University (CMU)

Overview

The SSF climate dataset is a benchmark dataset for training and evaluating machine learning models for sub-seasonal climate forecasting, including a variety of high-resolution climate variables over the atmosphere, ocean, and land from 1980 to recent.

The codebase developed for data extraction, preprocessing, and SSF model training and evaluation using the SSF dataset are publicly available at https://github.com/SSF-climate/SSF and https://github.com/Sijie-umn/SSF-MIP/.



Details

The SSF Dataset is constructed by climate variables representing the condition of atmosphere, land, and ocean from 1980 until the recent, which are collected from a diverse collection of data sources. In addition, we include some climate indexes monitoring El Niño-Southern Oscillation (Niño indexes) and North Atlantic Oscillation (NAO), etc. All the climate variables are interpolated to 0.5 degree latitude by 0.5 degree longitude grid and daily temporal resolution from 1980 to the recent. The climate variables of each year are saved as multi-index pandas DataFrames and can be read using python 3.

Spatiotemporal Climate Variables

  • tmp2m: 2 meter temperature

  • precip: precipitation

  • sst: sea surface temperature

  • slp: sea level pressure

  • icec: sea ice concentration

  • hgt10, hgt200, hgt500, hgt700: geopotential height at 10mb, 200mb, 500mb, and 700mb

  • rhum.sig995: relative humidity at level sig 995

  • sm: soil moisture

Temporal Climate Variables

  • mei: Multivariate ENSO Index Version 2

  • mjo_phase: Madden/Julian Oscillation phase

  • mjo_amplitude: Madden/Julian Oscillation amplitude

  • nao: North Atlantic Oscillation index

  • nino1+2, nino3, nino4, nino3.4: nino sst indicies

  • ssw: sudden stratospheric warming index

Spatial Climate Variables

  • elevation

  • masks: latitude and longitude coverage of the United States, North Atlantic, and North Pacific Ocean, etc.

SubX (NCEP-CFSv2 and GMAO-GEOS)

  • the climatology of tmp2m of weeks 3 and 4 computed from the corresponding hindcast periods

  • the SubX hindcasts of tmp2m anomalies of weeks 3 and 4 over the western U.S. (1999 - 2015)

  • the SubX forecasts of tmp2m anomalies of weeks 3 and 4 over the western U.S. (2017 - 2020)

  • the initialization dates in the forecast periods of NCEP-CFSv2 and GMAO-GEOS

Groundtruth (computed from the NOAA’s Climate Prediction Center (CPC) Global Gridded Temperature dataset)

  • the week 3 & 4 temperature anomalies over the western U.S. from 1990 to 2020

  • the climatology computed from 1990 to 2017 for each grid point over the western U.S. and each month-day combination

Preprocessed data

  • pca-zscore: The folder contains covariates after preprocessing (pca first then z-scored) from 1986 to 2018.

  • train-validation: The folder contains the training and validation data used for hyper parameter tuning.

  • train-test: The folder contain the training and test data used for tmp2m forecasting over 2017-2018

Sample Data

Sea Surface Temperature on Jan. 1. 2019

Multi-index DataFrame of

spatiotemporal climate variable

(sst in 2020)

DataFrame of temporal climate variable

(MJO amplitude)

Downloads

Citation

@inproceedings{ssf_dataset,

title={Sub-Seasonal Climate Forecasting via Machine Learning: Challenges, Analysis, and Advances},

author={He, Sijie and Li, Xinyan and DelSole, Timothy and Ravikumar, Pradeep and Banerjee, Arindam},

booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},

volume={35},

number={1},

pages={169--177},

year={2021}}


@article{ssf_mip,

title={Learning and Dynamical Models for Sub-seasonal Climate Forecasting: Comparison and Collaboration},

author={He, Sijie and Li, Xinyan and Trenary, Laurie and Cash, Benjamin and DelSole, Timothy and Banerjee, Arindam},

journal={arXiv preprint arXiv:2110.05196},

year={2021}

}

Research Team

Graduate Students

Acknowledgments

The SSF dataset is part of the Harnessing the Data Revolution (HDR) project on 'Physics-Based Machine Learning for Sub-Seasonal Climate Forecasting'.


Contacts
Please contact Sijie He and Xinyan Li for questions about the dataset.