GraphCast: Learning skillful medium-range global weather forecasting
Abstract:
Global medium-range weather forecasting is critical to decision-making across many social and economic domains. Traditional numerical weather prediction uses increased compute resources to improve forecast accuracy, but cannot directly use historical weather data to improve the underlying model. We introduce a machine learning-based method called “GraphCast”, which can be trained directly from reanalysis data. It predicts hundreds of weather variables, over 10 days at 0.25° resolution globally, in under one minute. We show that GraphCast significantly outperforms the most accurate operational deterministic systems on 90% of 1380 verification targets, and its forecasts support better severe event prediction, including tropical cyclones, atmospheric rivers, and extreme temperatures. GraphCast is a key advance in accurate and efficient weather forecasting, and helps realize the promise of machine learning for modeling complex dynamical systems.
Bio:
Ferran Alet is a Research Scientist at Google DeepMind working on ML for Science, particularly weather forecasting and mathematics. His research leverages techniques from meta-learning and program synthesis, and insights from mathematics and the physical sciences. He completed his PhD at MIT CSAIL advised by Leslie Kaelbling, Tomas Lozano-Perez, and Joshua Tenenbaum. During his PhD, he created the MIT Embodied Intelligence Seminar, mentored 17 students, and won the MIT Outstanding Mentor award 2021. Ferran studied mathematics and physics in Barcelona thanks to CFIS, a program for doing two degrees, where he was the valedictorian of his promotion. In grad school, he earned a “La Caixa” fellowship and was responsible for the high-level planner of the MIT-Princeton team for the Amazon Robotics Challenge, which won the stowing task in 2017. You can find more information and papers at https://web.mit.edu/alet/www
Summary:
State of the Earth’s weather described as a 3D global map of physical quantities (pressure, humidity, temperature, etc.)
Dynamics traditionally modeled by approximate differential equations
Goal:
Learn the approximate physical laws governing weather dynamics from real data
Medium-range forecast: 10 days
Evaluation using skill metrics and prediction of extreme weather events
Data
37 levels of atmospheric pressure
Temperature, humidity, wind, lat/lon, land/sea, sun
.25 degree of lat/lon resolution (ERA5)
Very large dataset by ML standards
~35GB
ERA5: reanalysis of prior weather observations
From numerical weather simulations
40+ years of data, high quality
Available compute:
Training: 32 TPUs
Prediction: 1 TPU-minute vs 10k CPU (1 hour) for numerical simulations
Approach:
Create multiple icosahedral meshes of the earth sphere with different resolutions: capture different causal distances/delays
Input physical weather data on lat/lon grid at 6 hour, .25 degree resolution
Encode state each lat/lon grid point to the icosahedral mesh using a neural net to produce embedded weather state at each icosahedral mesh node
Given embedded node states at current time step and last time step, predict embedded state at adjacent icosahedral mesh nodes at next time step
Decode that state back to to physical states that lat/lon does that correspond to each icosahedral mesh node and compute loss in the physical quantity prediction
Mean squared error on physical quantities
Averaged across different
lead times,
globe locations, and
physical variables: different scales, so weighted by
inverse residual variance,
importance for human applications
Density of mesh cells (at equator cells are larger than at poles)
Autoregressive training for 3 weeks
Fine-tune for 1 week
Evaluation
Whole-world metrics:
Comparing GraphCast to HRES (top numerical simulation)
Better skill over all prediction horizons (0-10 days) and physical quantities
Prediction of severe weather
Tropical cyclones
Predicted previously observed cyclones, compared to HRES
GraphCast’s predicted cyclone paths are much more accurate than HRES’:
Median track error: makes warnings with same level of accuracy 9hr earlier than HRES
Median paired error: identifies the center of cyclone 20km more precisely for same accuracy level
Atmospheric rivers
Transport water vapor away from tropics, delivering via heavy rain
Improved prediction of the physical properties of these high-humidity zones
Extreme heat
Predict when surface temperature will reach top 2% extreme.
GraphCast is better at 5d, 10d but worse at 12h; consistent with RMSE results.
Challenges in evaluating accuracy
What is ground truth?
ERA5 reanalysis is based on numerical simulations based on past data
For HRES used operational analysis stored results from prior runs. This is not equal to ERA5 data.
Thus, all GraphCast predictions fine-tuned to HRES operational analysis and predicts it, rather than ERA5.
Leakage from future:
ERA5 uses 12hr data assimilation windows, with 00z/12z states being influenced 9h into the future.
HRES operational analysis uses 6hr windows, with states influenced only 3h into the future.
As such, comparison uses ERA5 06z/18z which also uses only 3h.
RMSE tends to be lower when forecasts are spatially blurred
GraphCast’s use of RMSE as loss encourages it to use optimal blurring, which may cheat the metric
To make the evaluation more fair, allowed HRES to also blur its predictions.
This improves HRES’ RMSE but GraphCast is still better
Geographic biases
E.g. GraphCast predicts Antarctica as being colder than real, North America as warmer for longer time predictions
But this is in line with the biases of other weather models
Distribution shift?
When predicting for 2021 it helps to train on reanalyses from 2020
Predictions from models trained in recent data are better than predictions from models on data < 2018
Has the climate shifted? Has observational technology shifted?