©Andy Brown all rights reserved
©Timm Cleasby all rights reserved
Potentially, applying the FAIR principles to quantitative data can be quite straightforward, with most areas not requiring any extra steps than those laid out in the general guidance pages in this resource. However, care should be taken to ensure that the most can be achieved with the data and code you make available, and that it is as easy as possible to understand and use.
For more general information on making your data FAIR, please see the guidance below:
Do not include special characters and spaces in column headers of tabular data. If the whole column is in the same unit, include this in the column headers where possible (not in the data). If there are multiple units per column (e.g. months, years), either separate these into multiple columns or place the unit in a column separate to the measurement.
Use international standards for fields where possible. For example, dates should be in the format YYYY-MM-DD.
Ensure that each observation has its own row, (and each variable its own column).
Any column headings should be in the first row.
Column names should be consistent in case and format (e.g. all lower case, underscore used as separator).
A clear consistent standard on use of NA, NULL, or empty values.
If using a spreadsheet programme to work on your data:
Do not include any charts or images in the spreadsheet; these should be kept separate and the relationship between these and your data made clear in your README file.
Each tab in your spreadsheet should be saved as a separate csv (or similar) file.
Any calculations or formulae that need to be preserved should be saved in a separate spreadsheet (e.g. Excel Book, .xlsx) and kept alongside the data files.
Further guidance for tabulated data can be found on the Turing Way webpages.
Data should be effectively documented in order to enable its reuse by others, fulfilling the 'R' element of the FAIR principles. Provide a README file, complete all metadata fields when depositing your data, and include a data dictionary, ideally as a component of your README.
A data dictionary can be a very useful document for anyone wanting to reuse your data. While some of the information in a data dictionary could be found in your README file, if your data is particularly large or complex (including multiple linked tables), setting it out in a tabular way can make it much easier for someone to comprehend. A data dictionary could include:
Field name
Displayed field title.
Field data type (string, integer, date, etc.).
Measures such as min and max values, display width, or number of decimal places.
Default value.
Is-required (Boolean) - If 'true', the value can't be blank, null, or only white-spaces.
Reference table name, if a foreign key (linking value to another table).
Description or synopsis.
Reference data that is used (e.g. diagnosis codes).
Code lists for required fields including all possible values (e.g. 1=Male).
General guidance on how to create a data dictionary can be found on the Open Science Framework website.
When dealing with large datasets, a process of appraisal and selection might be necessary in order to determine which (if not all) of the data to deposit in a repository. For more guidance on what to do with such data, see the large data page.
However, it is important to know that while the University of Sheffield's institutional repository, ORDA, initially allows up to 25GB of storage per user, this can be increased on request up to 100GB via the deposit form. If more than this is required, please contact rdm@sheffield.ac.uk.
If your project involves mixed methods, you should also read the guidance on Qualitative data, as well as being aware that combining different datasets could potentially lead to participants being identifiable. See the Sensitive data page for more information on what to do in such cases.