Crop & GHG Global Benchmark Database

A Global Collaborative Network

Rationale

Data volume is increasing by up to 20% each year worldwide and is projected to reach a volume of 175 zettabytes in nearly 2 to 3 years. This large amount of data comes from diverse sources with different qualities but includes an enormous amount of data from scientific research. Science and business always use these so-called 'alternative' data for research analyses.

Analysis of alternative data in a combination of datasets from own experiments can be used to obtain insights and support decisions based on intensive data analysis. Ecosystem models, as the approach describing the complex ecosystem in a simple mathematical system, often use alternative data from model development to application. Analyses with diverse data sources, such as experimental data, geodata, weather data, genotyping data, and phenotyping data, help improve model predictions and provide valuable information for decision-making.

Alternative data source use in modeling studies provides advantages such as:

1. Get more substantial insights and achieve a comprehensive overview using diverse sources augmenting the self-produced data.

2. Establish a multi-scale evaluation of models for competitive advantage for accuracy and improvement of models.

3. Identify the advantages and disadvantages of the models, leading to improvements.

4. Promote cost-effective expansion of model applications beyond their initial development domain.

5. Promote cross-spatial (multiple sites) validations of MRV to enlarge the potential adaptation regions for increasing the value of the MRV tool and associated carbon credits.

Initiative

IRRI is involved in several activities on GHG modeling: 1) AgMIP Rice Team GHG modeling for model comparison and improvement; 2) Ensemble GHG modeling concept for emissions verification; 3) Global rice/ rice-based field GHG data sharing collaborative network to serve MRV development; and 4) cross spatial meta-analysis for specific technologies on crop and land management relating the production and environmental sustainability, climate change mitigation, and adaptation, etc. As part of these initiatives, we will invest efforts to build a collaborative network to identify and establish a worldwide benchmark database on crop management, crop growth, and GHG emissions.

Data quality and quantity

The datasets for the benchmark database are expected to include the following information as much as possible:

1. Daily weather includes geolocation, max and min air temperature, radiation, rainfall as essential, and relative humidity and wind speed included, if applicable;

2. Soil texture, pH, SOM (or SOC and SON), and total soil carbon, as essential, and soil hydraulic properties, if applicable;

3. The land and crop management records (sowing, transplanting, cropping density, irrigation, fertilizer, tillage, weeding, pest and disease control, harvest, residue management, and other related information like date, method, amount, etc.);

4. Crop phenology and sequential LAI and biomass accumulation measurements, if applicable, biomass measurement would be organ-based;

5. Crop seasonal grain yield and straw biomass;

6. GHG measurements (weekly or biweekly fluxes and/or seasonal total in CH4 and N2O as essential, CO2 included if applicable);

7. If applicable, field soil moisture and temperature.

The data collection scheme and protocols of plant, soil, and GHG sampling and sample analyses are attached to provide the references for members to manage the data collection and contribution.

The data owner (contributor) is not asked to have the dataset precisely as those listed above, and more or less intensive is acceptable as long as the dataset meets the minimal requirement (see data classification).

Data contribution & sharing pathway

IRRI will be the focal point and the voluntary system manager for data gathering, data quality checks, establishing and managing the database, and maintaining a data-sharing system. Figure 1 illustrates the data contribution and sharing scheme. First, the data owner, a contributor, contributes the data to the system manager (IRRI). Second, the system manager will conduct the data quality check and communicate with the data owner for any clarifications and missing information. Third, the data is integrated into the database as a new record, and the membership will be assigned to the contributor or an entity or person appointed by the contributor. Fourth, the member can query and share the data in the database. Data owners (contributors) are encouraged to contribute more datasets. The quantity of shared data will positively correlate to the contributions, implying 'contributing more for sharing more'.

Figure 1. The scheme of data contribution and sharing

The ratio of contribution to sharing will increase as the database is enriched. For example, at the beginning stage, the contribution of one dataset may share ten datasets but may increase to 100 after contributors enrich the database.

A dataset may include serval datasheets in an Excel file, where one sheet may contain the general description of the experiment (observation or survey) and metadata, and other sheets for soil, crop, GHG, and weather information. The template is provided within this site (Appendix section) for the convenience of the data owner (contributor) to organize data and email to the contact person(s) listed at the end of this document.

Data ownership, accessibility, & rights of data owner

This system reserved two types of data ownership: protection data ownership and open data ownership, which corresponds with the two types of data sharing categories: protected and opened data, and two types of sharing modules: partial access to unprotected data and full access to unprotected data (Table 1).

The data owner who only contributed protected data will only access the 5% of unprotected data. The data owner who only contributed unprotected data can access 30% of all unprotected data for each contributed dataset. One data owner can contribute both types of data. Sequentially, the data owner will have access to 5% of unprotected data for each protected dataset contributed, plus 30% of the unprotected data for each unprotected data contributed. The first download from members will accumulate as a data download of a dataset, and the data owner will be awarded 1% of additional access to unprotected data. Accessibility management reflects the basic principle of “sharing more with more contribution”.

All data owners reserve the right to be appropriately cited and acknowledged, and it is optional for co-authors or collaborators for research projects.

Table 1. The correlation layout of data types, member types, and accessibilities to data.

Data type (open vs. protected) and sharing:

There are two types of contributed data: unprotected vs. protected.

The open data will have the following characteristics:

1. All data will be physically held in the data center;

2. The metadata is available for data search;

3. There is no interest conflict associated with the data;

4. There is no IP right associated with the data (i.e., the original data owner loses the IP right as the data was contributed to and collected in the database);

5. There is no obstacle to be used and cited freely and

6. There is no need to have confirmation from the original owner for using the data as long as the data is cited or acknowledged correctly.

The protected data will have the following characteristics:

1. The metadata for the general introduction, data variable list, and contact information will be stored in the database system;

2. All copyrights are reserved;

3. Other data users can only access the metadata and

4. The data can be accessed by contacting the data owner according to the metadata information

The metadata of protected and unprotected data will be accessible for non-members, i.e., the metadata will be opened to internet search to enhance the data's use.

All data is contained for scientific research and public service. The data is restrictive for direct use for business purposes but can be the validation data for evaluating business products.

Data sharing & membership management

This initiative is a voluntary program. The data will be managed under the membership framework.

1. Only members can access ten datasets physically in the database, similar to the contributed data category, for each successful contribution, which implies more access after contributing more.

2. Only the data and metadata contributors or representatives, such as entities or persons, are qualified for membership.

3. A data contributor or representative will automatically become a member only if the contributed data or metadata has passed the data quality check and been collected in the database.

4. One dataset can only be assigned one member. Identical data or data from the same experiment, observation, or survey can not be assigned membership again.

5. Data owners are encouraged to contribute more datasets, ensuring access to other data. The high-quality data will have more downloads, which will also be awarded more access to other data.

6. Membership is heritable or transferable, but no double-member, which means the original membership owner will lose the membership automatically after the membership was heritage by or transferred to another representative based on written and signed documents by both parties.

7. Members are encouraged to work together or individually for non-business purposes using the shared data.

Following Activities

IRRI will develop a database to hold the data, a database management system for data quality control, upload and download, and internet access for searching, submitting, and sharing data.

Before the system is made available to members and the public, please contact us for additional information or data submission. To organize and submit your data for sharing and exchanging data, you can use the two data templates available in the Appendix section of this webpage.

We look forward to your collaboration and hearing your suggestions.