Data, be they observations or output from models, are being produced in ever-increasing quantities, but once used by the initiating project what happens to them? For programmes funded by NERC there is a requirement that the data be made freely available to any user: programme funding comes through UK taxes so it can be said that ultimately the data belongs to the UK public and they have the right to access it.
For data to be made “useful” beyond the remit of the initiating programme the data files need to be stand-alone and self-supporting: that is, all the necessary information is contained within the file , so that the user does not need to ask questions of the scientist who generated the file. Put another way if a user came along in 30 year's time with questions about a data file it is unlikely that the scientist who produced it would be in any position to help. That means taking metadata, data standards and file structures seriously. All users also need to be confident in the quality of the data being provided, so the processes by which data are obtained, calibrated, and quality controlled need to be documented, and be available for inspection.
The tools available for users to find, access, and visualise data are as important as the data themselves: even the best data product files ever produced are of little value if there is no facility to store, discover, read and visualise them. Likewise if we expect NCAS scientists to provide data in a stand-alone and self-supporting format, then tools need to be made available to help them to do this.
As a provider of data, NCAS–Observations decided to put its house in order and in September 2015 it held an open data forum. This forum contributed to the development of the short and medium term strategy for handling the archiving and exploiting of observational data from NCAS and the wider community at CEDA (formerly known as BADC).
The forum addressed the tensions between what is necessary to manage and exploit data at scale, and what is wanted by individuals and projects – whether as providers or consumers. These tensions are exposed at many points in the workflow, but the forum concentrated on three aspects of our observational data workflow:
archiving and discovery at scale
project requirements
visualisation
Over the following 12–18 months this has coalesced into the NCAS Data Project and this project is now delivering the strategy and protocols being applied to data from the NCAS-Observational program.