GraphCast: Learning skillful medium-range global weather forecasting

Ferran Alet, Google Deepmind

Slides (pptx, pdf)

Abstract:
Global medium-range weather forecasting is critical to decision-making across many social and economic domains. Traditional numerical weather prediction uses increased compute resources to improve forecast accuracy, but cannot directly use historical weather data to improve the underlying model. We introduce a machine learning-based method called “GraphCast”, which can be trained directly from reanalysis data. It predicts hundreds of weather variables, over 10 days at 0.25° resolution globally, in under one minute. We show that GraphCast significantly outperforms the most accurate operational deterministic systems on 90% of 1380 verification targets, and its forecasts support better severe event prediction, including tropical cyclones, atmospheric rivers, and extreme temperatures. GraphCast is a key advance in accurate and efficient weather forecasting, and helps realize the promise of machine learning for modeling complex dynamical systems.

Bio:
Ferran Alet is a Research Scientist at Google DeepMind working on ML for Science, particularly weather forecasting and mathematics. His research leverages techniques from meta-learning and program synthesis, and insights from mathematics and the physical sciences. He completed his PhD at MIT CSAIL advised by Leslie Kaelbling, Tomas Lozano-Perez, and Joshua Tenenbaum. During his PhD, he created the MIT Embodied Intelligence Seminar, mentored 17 students, and won the MIT Outstanding Mentor award 2021. Ferran studied mathematics and physics in Barcelona thanks to CFIS, a program for doing two degrees, where he was the valedictorian of his promotion. In grad school, he earned a “La Caixa” fellowship and was responsible for the high-level planner of the MIT-Princeton team for the Amazon Robotics Challenge, which won the stowing task in 2017. You can find more information and papers at https://web.mit.edu/alet/www

Summary:

State of the Earth’s weather described as a 3D global map of physical quantities (pressure, humidity, temperature, etc.)
Dynamics traditionally modeled by approximate differential equations
Goal:
- Learn the approximate physical laws governing weather dynamics from real data
- Medium-range forecast: 10 days
- Evaluation using skill metrics and prediction of extreme weather events
- Code: https://github.com/google-deepmind/graphcast
- Paper: https://arxiv.org/abs/2212.12794
Data
- 37 levels of atmospheric pressure
- Temperature, humidity, wind, lat/lon, land/sea, sun
- .25 degree of lat/lon resolution (ERA5)
- Very large dataset by ML standards
- ~35GB
- ERA5: reanalysis of prior weather observations
  - From numerical weather simulations
  - 40+ years of data, high quality
Available compute:
- Training: 32 TPUs
- Prediction: 1 TPU-minute vs 10k CPU (1 hour) for numerical simulations
Approach:
- Create multiple icosahedral meshes of the earth sphere with different resolutions: capture different causal distances/delays
- Input physical weather data on lat/lon grid at 6 hour, .25 degree resolution
- Encode state each lat/lon grid point to the icosahedral mesh using a neural net to produce embedded weather state at each icosahedral mesh node
- Given embedded node states at current time step and last time step, predict embedded state at adjacent icosahedral mesh nodes at next time step
- Decode that state back to to physical states that lat/lon does that correspond to each icosahedral mesh node and compute loss in the physical quantity prediction
  - Mean squared error on physical quantities
  - Averaged across different
    - lead times,
    - globe locations, and
    - physical variables: different scales, so weighted by
      - inverse residual variance,
      - importance for human applications
      - Density of mesh cells (at equator cells are larger than at poles)
- Autoregressive training for 3 weeks
- Fine-tune for 1 week
Evaluation
- Whole-world metrics:
  - Comparing GraphCast to HRES (top numerical simulation)
  - Better skill over all prediction horizons (0-10 days) and physical quantities
- Prediction of severe weather
  - Tropical cyclones
    - Predicted previously observed cyclones, compared to HRES
    - GraphCast’s predicted cyclone paths are much more accurate than HRES’:
      - Median track error: makes warnings with same level of accuracy 9hr earlier than HRES
      - Median paired error: identifies the center of cyclone 20km more precisely for same accuracy level
  - Atmospheric rivers
    - Transport water vapor away from tropics, delivering via heavy rain
    - Improved prediction of the physical properties of these high-humidity zones
  - Extreme heat
    - Predict when surface temperature will reach top 2% extreme.
    - GraphCast is better at 5d, 10d but worse at 12h; consistent with RMSE results.
Challenges in evaluating accuracy
- What is ground truth?
  - ERA5 reanalysis is based on numerical simulations based on past data
  - For HRES used operational analysis stored results from prior runs. This is not equal to ERA5 data.
  - Thus, all GraphCast predictions fine-tuned to HRES operational analysis and predicts it, rather than ERA5.
- Leakage from future:
  - ERA5 uses 12hr data assimilation windows, with 00z/12z states being influenced 9h into the future.
  - HRES operational analysis uses 6hr windows, with states influenced only 3h into the future.
  - As such, comparison uses ERA5 06z/18z which also uses only 3h.
- RMSE tends to be lower when forecasts are spatially blurred
  - GraphCast’s use of RMSE as loss encourages it to use optimal blurring, which may cheat the metric
  - To make the evaluation more fair, allowed HRES to also blur its predictions.
  - This improves HRES’ RMSE but GraphCast is still better
Geographic biases
- E.g. GraphCast predicts Antarctica as being colder than real, North America as warmer for longer time predictions
- But this is in line with the biases of other weather models
Distribution shift?
- When predicting for 2021 it helps to train on reanalyses from 2020
- Predictions from models trained in recent data are better than predictions from models on data < 2018
- Has the climate shifted? Has observational technology shifted?