Philosophy
To many, observational data once used for its original purpose has no further use or value. Data however has intrinsic value to the wider community - it can be reprocessed, combined with other streams and used for purposes beyond the original intention of the original providers. This can only be achieved if the data can be found, has sufficient supporting information (metadata) to identify and trace its origin and veracity, and can be extracted. These are the FAIR principles (Findable, Accessible, Interoperable and Re-Usable) .
To deliver data to the community that can be used in this way requires three distinct components:
A data storage facility that is expandable, searchable and has a user friendly interface for both data providers and users (expert and non-expert)
Data products that consistently follow a standard format, conform to internationally recognised standards, are documented, and the process applied to the data on its road from instrument-to-archive is both transparent, traceable, and documented.
Tools & services for both providers and users allow the creation of compliant datafiles easy and the extraction/visualisation of data a simple process.
Although each component could be addressed individually it is the feedback and interplay between each of the components that is key. Developers of software need to know what to expect to find in the data files and be sure that the format is consistent. Data providers need to know what tool developers and archive managers require, and archives need to deliver an easily navigable system allowing data to be found and extracted. Like the fire triangle, in order to ignite the fire that is increased data usage these three components must work together: tinkering with any component individually can have unforeseen consequences that results in disrupting the interaction between the three key components and the data uptake fire goes out.
The NCAS Data Project has taken a collective approach in which data providers, tool developers and archive managers are working together. The UK is already serviced by the CEDA data archive. CEDA is NERC's data repository for atmospheric science and Earth observation, and the NCAS Data Project has seen an unprecedented collaboration between CEDA Data Scientists, NCAS-IT, and NCAS-Observations. By working together, the team have managed to develop defined file structures and products, and these have been brought together in a series of NCAS data standards (all defined on these pages). With these definitions in place, tools have been developed to automate conformity checking and reporting and also conversion from provider native files. The project has also developed a joined up processing and data capture approach that makes use of the JASMIN supercomputer - this reduces the amount of network traffic and reduces the burden on host institution IT resources and the amount of time staff spend tackling data archiving (NERC require all are data to be deposited with an archive), and is in the process of developing bespoke tools for users to aid visualisation and extraction.
Data in the files should be traceable to standards where applicable, and the process applied in Quality Control traceable and transparent. To this end, the IT side of the project has developed a Github repository structure for code and calibration documentation - all version controlled and branchable. These are open access and can be inspected by the community. This approach has multiple benefits:
The user community can see what has been done and by doing so develop "trust" in the product.
The user community can spot and inform providers if/when errors are found leading to rapid rectification.
The user community can inform as to best processing practice and help.
The user community can suggest and develop new products that can be incorporated into the NCAS data product catalogue.
Finally this approach highlighted the need to keep the user and provider communities informed and to ensure that documentation, training, and support is available. It is unreasonable to expect new standards to be adopted without provision of support. The project has worked again with CEDA to develop a series of webinars and data providers have developed documentation for each data product.