Date: Thu, 05 Jul 2012 17:01:44 +0200
Subject: [PMIPn news] Accessing PMIP3/CMIP5 data on the web
Dear PMIPers,
We have received several requests about "How to access the PMIP3/CMIP5
data", so I will summarize it below. The following mail should guide you
through your first steps, so that getting data is easy instead of
frustrating :)
1) Create a regular CMIP5 ESGF account if you don't have one already
(ie get a CMIP5 "OpenID" identifier)
http://cmip-pcmdi.llnl.gov/cmip5/data_getting_started.html
2) Look for PMIP3/CMIP5 data on one of the ESGF "peer-to-peer
front-ends" (aka "P2P FE") that are gradually replacing the existing
'gateway" system
3) Go to the PCMDI P2P
http://pcmdi9.llnl.gov/
or to a a P2P closer to you (the links to other P2P FE are
available at the bottom left of the PCMDI P2P).
Each P2P will display
* either the data that is stored locally on the current node
* or ALL the available data on ALL the P2P nodes
==> This is what you want! Learn how to do it below
4) Click on the "Search" button and possibly clear a previous search
query in the Current Selections panel ("remove all" link)
IMPORTANT!!! Select "Search All Sites" if you want to access ALL the
available data. See attached P2P_01.jpg picture
5) If you want to look for PMIP3-specific data, click on "Project" in
the "Search Categories" side panel, and select "PMIP3". "(x)
project:PMIP3" will appear in the "Current Selections" panel, and you
can then click on all the "Search Categories" item to check their
updated content.
6) What you really want is probably "data that is available for a given
paleo experiment", regardless of the data being "PMIP3" or "CMIP5"!
Click on "(x) project:PMIP3" to remove this search constraint, then
choose "Experiment Family:paleo" and then "Experiment:lgm" (or any other
"Paleo" experiment you are interested in). If you then click on
"Project", you will see that you get results for both "CMIP5" and
"PMIP3", and if you click on "Institute", it will list which institutes
have submitted "lgm" data. See attached P2P_02.jpg picture
7) You can keep on adding more search constraints using the Search
Categories, or remove existing constraints from the "Current Selections".
You can also use the text input field to specify the variables you want
(and click "Search"). See attached P2P_03.jpg picture
e.g. "variable:tas OR variable:pr"
WARNING! Use "OR" (uppercase letters) and not "or", otherwise you will
possibly end up getting more variables than what you need. The text
search field is currently case-sensitive ("project:cmip5" will return NO
results, "project:CMIP5" will return CMIP5-only results). More examples
are available on
http://www.esgf.org/wiki/ESGF_Web_Search_User_Guide
Reminder: the list of requested variables (and their standard "output
variable names") and what "realms" and "tables" they belong to is
available on
http://cmip-pcmdi.llnl.gov/cmip5/output_req.html#req_list
A search may return more variables than what you are looking for,
because a variable may belong to several "Realms" and several MIP tables
(or "CMOR" tables).
e.g. "pr" can be found in the "atmos", "ocean" and "seaIce" realms
and in the the following MIP tables
"Amon", "OImon", "Omon", "day"
"Aclim", "Oclim" ("Xclim" tables are PMIP3 specific)
You should therefore narrow the search by specifying a MIP/CMOR table
e.g. You need the Atmos Monthly "tas" and "pr"
==> add the constraint "MIP table:Amon"
8) Once you have narrowed your search to the datasets containing the
variables you are looking for, you can check the results on the central
part of the page and then add the datasets to the cart to download the data.
The attached P2P_04.jpg picture shows the result of the previous search.
You should pay attention to the version of a dataset if you want to you
want to check if there is a newer version of the data (if you have
already downloaded the same data and have kept track of the version
numbers...). Unless you have selected "Show All Versions", the "Results"
window only displays the most recent version of the data. In the example
shown, the same NCAR data seems to have 2 different versions. This is
not an error, because a closer look shows that the 2 versions are
associated with 2 different ensembles, "r1i1p1" and "r2i1p1" (the group
who produced the data specified "The r2i1p1 simulation is a high
temporal resolution dataset that replicates the last 30 years of the LGM
simulation. This simulation outputs more variables than our normal
simulation, including sub-daily and daily output")
You can click on the "Add to Cart" option of each of the dataset you are
interested in, and then go to the "Data Cart Window". Make sure you have
"Filter over search constraints" selected and click on the "WGET All
Selected" link to generate a wget script. See attached P2P_05.jpg picture
9) Save the "wget-<date>.sh" script that is generated by the P2P FE to a
linux machine. You should save the wget script to a new directory (where
you have enough space), because the script will download all the
requested data to the directory from where it is executed.
WARNING! You may want to check the script before executing it, to make
sure that it will not download more variables than what you have
requested. If this is the case, just remove the unrequested files from
the list of files at the end of the script
When you execute the script, it will prompt you for your OpenID and your
password and then download the requested files. It the transfer is
interrupted, just restart the script and it will resume from where it
stopped. If you are not sure if all the files were (correctly)
downloaded, just restart the script as well!
Congratulations, you are ready to download some data and to do some
science! :)
EXTRA INFORMATION
=================
Miscellaneous other things that may be useful to know
1) How to access the PMIP3 climatological monthly means (monClim variables)?
PMIP3 "Xclim" variables are computed from the PMIP3/CMIP5 "Xmon" variables
e.g. "Amon" MIP/CMOR table ==> "Aclim" PMIP3 CMOR table
You can access these variables by selecting "project:PMIP3" and "time
frequency:monClim" in the "Search Categories"
These variables are not automatically updated and are being upgraded.
They may not be based on the latest version of the Xmon data available
in the distributed database
2) What is a Dataset?
A dataset logically groups all the variables of the same type (same
realm, frequency and CMOR2 table) of an experiment performed by a
modeling group. If you change 1 file (1 variable can be split into
several files along the time axis), you have to change the version of
the dataset
The identifier of a dataset (as can be seen in the Data Cart) has the
following form
<activity>.<product>.<institute>.<model>.<experiment>.
<frequency>.<modeling realm>.<MIP table>.
<ensemble member>.<version>
e.g. pmip3.output.IPSL.IPSL-CM5A-LR.lgm.
monClim.atmos.Aclim.
r1i1p1.v20120418
3) Read the data errata page from time to time, and report things that
look suspicious
http://pcmdi-cmip.llnl.gov/cmip5/errata/cmip5errata.html
4) The model documentation (when the questionnaire has already been
filled) is available from
http://q.cmip5.ceda.ac.uk/
(see "Published CIM metadata" at the bottom)
5) You can install and use the synchro-data tool developed at IPSL to
automatically download data
https://forge.ipsl.jussieu.fr/prodiguer/wiki/docs/synchro-data
6) You can learn a lot more about searching the data by following the
links in the upper right corner, next to the "Search" button. See
attached P2P_06.jpg picture
7) The ESGF system is quite reliable (surprisingly reliable, considering
the size of the distributed database and the complex technical
infrastructure involved), but please remember that some data can be be
temporarily missing or out-of-date somewhere, some computers or disks
may be down for maintenance (or because of unexpected crashes), etc...
and sometimes there are just bugs waiting to be fixed