I began working on the The Predictive Ecosystem Analyzer (PEcAn) project in August 2015 as a Graduate Research Assistant under Dr. Ankur Desai at the University of Wisconsin - Madison. PEcAn (see pecanproject.org) is an integrated ecological bioinformatics toolbox (Dietze et al 2013, LeBauer et al, 2013) that consists of: 1) a scientific workflow system to manage the immense amounts of publicly-available environmental data and 2) a Bayesian data assimilation system to synthesize this information within state-of-the-art ecosystems models. This project is motivated by the fact that many of the most pressing questions about global change are not necessarily limited by the need to collect new data as much as by our ability to synthesize existing data. This project seeks to improve this ability by developing a accessible framework for integrating multiple data sources in a sensible manner.
The PEcAn workflow system allows ecosystem modeling to be more reproducible, automated, and transparent in terms of operations applied to data, and thus ultimately more comprehensible to both peers and the public. It reduces the redundancy of effort among modeling groups, facilitates collaboration, and makes models more accessible the rest of the research community. -- (https://github.com/PecanProject/pecan)
As a graduate research assistant for this project, I was responsible for creating and testing code that extracted, debiased, and temporally downscaled meteorological data. I developed code coherent with the systematic guidelines we had set and assimilated the scripts within the workflow. The scripts that I developed can now be accessed through the PEcAn R package and users can now extract a wide variety of datasets, debias the data, and temporally downscale the data in a robust fashion. These scripts can be accessed through my Github page (https://github.com/jsimkins2). I'll provide a more detailed explanation of the functions I developed below.
Data Extraction Functions
Each of these functions extract netCDF (Network Common Data Format) files from a THREDDS (Thematic Real-time Environmental Distributed Data Services) catalog using OPeNDAP (Open-source Project for a Network Data Access Protocol). I convert latitude & longitude from canonical coordinates to decimal form and query meteorological variables for specified years. The output of each function is a netCDF file with the PEcAn standard framework so that each meteorological data file can be downloaded on the fly and used to drive an ecosystem model in PEcAn.
download.CRUNCEP - Climate Research Unit, National Centers for Environmental Prediction (CRUNCEP) is a global reanalysis meteorological data product.
download.CRUNCEP_NARR - North American Regional Reanalysis meteorological data product is a high resolution data product that encompasses CONUS from Climate Research Unit, National Centers for Environmental Prediction
download.GFDL - Extracts global climate projected data from the Geophysical Fluid Dynamics Laboratory (GFDL)
download.MACA - Extracts spatially downscaled CMIP5 (Climate Model Intercomparison Project 5) projected data over CONUS available from the Multivariate Adaptive Constructed Analogs (MACA) database
Statistical Correction Functions
These functions offer robust statistical corrections of meteorological data. Nearly all meteorological data is offered at a coarse spatial resolution and coarse temporal resolution. We can improve the poor resolution data by training it using a specified dataset with better resolution. The debias script will debias poor resolution data by a training dataset through a number of methods which a user can select. These methods currently include debiasing based on mean, median, and linear regression. I have also added a window argument that allows a user to specify a time window from which we can debias. The next set of functions I have developed are combined into the Temporal Downscale Meteorology (TDM) family. These functions allow a user to statistically downscale daily resolution meteorological data based on the covariances and betas of an hourly resolution data product. We offer the first open source temporal downscaling procedure that propagates uncertainty and generates ensembles of predictions. We generate subdaily models based on the hourly statistics of the training dataset and use these to downscale the daily resolution data via linear regression. I have generalized these scripts so any type of meteorological data can be used to train poor resolution data, though the most robust data comes from eddy covariance towers.
debias.met - takes meteorological data and debiases it based on statistics from a training dataset
tdm_nc2dat.train - Combines netCDF met files and parses them for specific use in the tdm_gen_subdaily_models function
tdm_gen_subdaily_models - Reads dat.train created in previous step, configures lag/next time steps for coherence, and calls tdm_temporal_downscale_functions.R to generate subdaily models and betas of the hourly meteorological dataset
tdm_temporal_downscale_functions.R - Does the heavy lifting for the tdm_gen_subdaily_models function
tdm_predict_subdaily_met - This is the workflow script that actually calls the final script (tdm_temporal_downscale.R) to downscale the modeled daily dataset. This script outputs multiple ensembles of meteorological data of netCDF files that are ready to be used by the PEcAn workflow to run ecosystem models.
tdm_temporal_downscale.R - Does the heavy lifting for the tdm_predict_subdaily_met function
I presented preliminary research using my Temporal Downscaling algorithms where I analyzed sources of uncertainty in projecting the impact of future climates for ecosystem carbon cycling at the American Geophysical Union Conference in December 2016.