LINK Download Cmip6 Data With Python

By downloading the master CSV file enumerating all available data stores, we can interact with the spreadsheet through a pandas DataFrame to search and explore for relevant data using the CMIP6 controlled vocabulary:

When working with multiple data stores at the same time, it may be necessary to combine several together to form a dataset for analysis.In these cases, it is easier to access them using an ESM collection with intake-esm.An ESM collection contains metadata describing how data stores can be combined to yield highly aggregated datasets, which is used by intake-esm to automatically merge/concatenate them when they are loaded into an xarray container.This eases the burden on the user to manually combine data, while still offering the ability to search and explore all of the available data stores.

Download Cmip6 Data With Python

DOWNLOAD 🔥 https://bytlly.com/2y7N5J 🔥

This gives a summary of the ESM collection, including the total number of Zarr data stores (referred to as assets), along with the total number of datasets these Zarr data stores correspond to.The collection can also be viewed as a DataFrame:

Though these aggregation specifications are sufficient to merge individual data assets into xarray datasets, sometimes additional arguments must be provided depending on the format of the data assets.For example, Zarr-based assets can be loaded with the option consolidated=True, which relies on a consolidated metadata file to describe the assets with minimal data egress:

For the sake of simplicity, and to facilitate data download, the tutorial will make use of some of the coarser resolution models that have a smaller data size. It is nevertheless only a choice for this exercise and not a recommendation (since ideally all models, including those with highest resolution, should be used). Many more models are available on the CDS, and when calculating an ensemble of models, it is best practice to use as many as possible for a more reliable output. See here a full list of models included in the CDS-CMIP6 dataset.

The next step is then to request the data with the help of the CDS API. Below, we loop through multiple data requests. These include data for different models and scenarios. It is not possible to specify multiple models in one data request as their spatial resolution varies.

Finally, we will create additional dimensions for the model and for the experiment. These we will label with the model and experiment name as taken from the metadata of the original data (see above). These will be useful when we repeat the processes above for all models and experiments, and combine them into one array.

b. During my PhD, I mainly used Matlab and I became quite comfortable with it and good at it too. Back in 2013, python was not as big as today and there was not as much interest/push for open source data or online tutorials. I remember how difficult it was for me in those days to analyze NetCDF data without making a mistake on the latlon or time coordinate (but now xarray has made it unbelievably easy!). It is always difficult to change, and my hope is to showcase the capabilities and convenience of python, and be a starting point that helps people navigate through their learning journey.

This section focuses on some of the basic commands and essential functions of xarray. We will download and extract daily observed precipitation data (CPC-CONUS from NOAA) for 4 years, and we will practice working with functions such as groupby, concat, and sel & isel (to select data for specific dates or particular locations). In addition, we will learn how to handle leap years, and we will be saving our desired outputs as NetCDF files. Lastly, we will be making simple plots and save them as high quality figures. In this section, our main focus is on the 2012 precipitation across the CONUS (when an extreme drought caused massive damage to Midwest crops). Two of the plots that are developed in this section are presented here for your reference.

In this section, we go beyond the basics. We utilize observed air temperature data from two datasets, and we focus on the February 2021 cold storm that happened in Texas (more info here). We will be practicing interpolation, scaling datasets (i.e. converting from K to C), assigning new attributes (e.g. converting 0:360 degrees longitude to -180:180), timeseries analysis (working with datetime format coordinate), and rolling average (with specific time window). Two of the sample plots that are generated in this section are shown here.

In both parts, we will load the data directly in the memory (without downloading the data on the disk, thus skipping data download). The main functionalities that are explored in this section are timeseries analysis, anomaly calculation, working with zarr data format, and making a timelapse animation.

Climate datasets are stored in different formats, and I think it is essential for a climate data scientist to be able to analyze various data formats. Here are a few data formats (along with a few examples for each case) that you can consider working with:

In addition to the data format, I think it is beneficial to experience working with data at different temporal resolutions (e.g. sub-hourly, daily, monthly, seasonal, annual, and decadal) since they each require unique skills. Similarly, it is beneficial to work with data at different spatial resolutions. Overall, you need to have a general understanding of how big you can define a project and what is practical given your available resources (e.g. is it practical for you to downscale CMIP6 climate projections for all models and ensemble members to 1km resolution?)

Dask is a customizable open source library that provides advanced parallelism for analytics. It can be employed for speeding up the analyses or to analyze data which is larger than the available memory (RAM). I tired to add an example with Dask to the tutorial, but I was getting errors on Google Colab and I did not want to spend too much time on debugging, so I excluded it. If you are interested to learn Dask and play around with it, I suggest two resources: one is the tutorials that are on Dask's website, and the other is a short course developed by the Coild team on TalkPython.

I mentioned this one in the tutorial as well; OpenCV is a powerful image processing tool that has many applications in computer vision analyses. However, it can be super useful for processing climate data too. In fact, if we consider one time step of any climate variable (at a certain elevation or pressure level), it can be viewed as an image with hundreds or thousands of pixels (i.e. grids). OpenCV can be utilized for geospatial analyses (e.g. edge detection for detecting clouds or detecting the boundary of a farm from satellite imagery, object tracking for tracking a storm or cold front through time, blurring and spatial smoothing, etc.).

Notably, there are numerous tutorials, blog posts, packages, and codes available to develop complicated machine learning models, and you can easily train a model with several million parameters on Google Colab or even your laptop. While it is necessary to know how to train and test such models, it is more important to learn the fundamentals and know what approach is suitable for your problem (e.g. the difference between logistic and linear regression, or when to choose parametric vs. non-parametric models, or how to validate/interpret model outputs, or how to work with imbalanced datasets, etc.). It is often easy to jump in and develop/implement predictive AI models, but being sure of the robustness of a model when new data is fed to it is crucial. Often times, depth is more important than breadth of knowledge.

First we need to understand a bit better what all these facets mean and how to see which ones are in the full collection. Lets start with the experiment_id: This is the prescribed forcing for a particular MIP that is exactly the same across all different models. We can look at the values of the collection as a pandas dataframe to convieniently list all values.

A set of scripts to access the CMIP6 model simulations from Pangeo gallery using Google and Amazon APIs. The script finds the data in the store, downloads it, and regrids it to a common spatial resolution. The latest version of the script can be downloaded from the gitHub repository at -su/python_pangeo_cmip6

To download and regrid a CMIP6 dataset to a common resolution (e.g., 1.406525 degree), go to the corresponding directory inside snakemake_configs and runsnakemake all --configfile config_2m_temperature.yml --cores 8This script will download and regrid the 2m_temperature data in parallel using 8 CPU cores. Modify configfile for other variables. After downloading and regrdding, run the following script to preprocess the .nc files into .npz format for pretraining ClimaXpython src/data_preprocessing/nc2np_equally_cmip6.py \ --dataset mpi --path /data/CMIP6/MPI-ESM/1.40625deg/ --num_shards 10 --save_dir /data/CMIP6/MPI-ESM/1.40625deg_np_10shardsin which num_shards denotes the number of chunks to break each .nc file into.

Then, preprocess the netcdf data into small numpy files and compute important statisticspython src/data_preprocessing/nc2np_equally_era5.py \ --root_dir /mnt/data/5.625deg \ --save_dir /mnt/data/5.625deg_npz \ --start_train_year 1979 --start_val_year 2016 \ --start_test_year 2017 --end_year 2019 --num_shards 8

First, download ClimateBench data. ClimaX can work with either the original ClimateBench data or the regridded version. In the experiment in the paper, we regridded to ClimateBench data to 5.625 degree. To do that, runpython src/data_preprocessing/regrid_climatebench.py /mnt/data/climatebench/train_val \ --save_path /mnt/data/climatebench/5.625deg/train_val --ddeg_out 5.625andpython src/data_preprocessing/regrid_climatebench.py /mnt/data/climatebench/test \ --save_path /mnt/data/climatebench/5.625deg/test --ddeg_out 5.625 006ab0faaa