Collect & Capture

The WHAT questions

Research data  are very much about when they are used as well as what they constitute and the purpose for which they are to be used


New to using data? View the UK Data Services for assistance

Types of data

Research data exist in many different forms: Textual, numerical, databases, geospatial, images, audio-visual recordings and data generated by machines or instruments. Digital data exists in specific file formats, which are coded so that a software programme can read and interpret these data.

Using standard and interchangeable or open lossless data formats ensures longer-term usability of data. For long term preservation, digital data is converted to such formats. UK Data Service

Research data can be classified in different ways, for example based on their:

Ghent University       University of Pittsburgh

UK Data Service

Secondary data


Data should be managed so that any researcher can discover, use and interpret the data after a period of time has passed.

Making use of data in this way falls under the domain of secondary data

To prepare data for secondary research, researchers should document data appropriately. They should also explain the procedures and fieldwork methods, the objectives and methodology of the research, and explicitly describe the meanings of variables and codes used. Additionally, they should describe any derivation, transformations, de-identification (pseudonymisation/anonymisation) or data cleaning carried out.

They should also ensure that data are held in an organised manner. Documentation is invaluable in enabling secondary users to contextualise data and conduct better, informed re-use of the material. UK Data Service

MANTRA - John MacInnes - Primary data versus secondary data

4 May 2012

MANTRA - John MacInnes - Issues with secondary data

4 May 2012

Research data formats

A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free. Wikipedia

Data files should be clearly named, well organised, structured and quality, and version-controlled throughout the research. It is vital to develop suitable procedures before data gathering starts in order to adhere to any conventions, instructions, guidelines or templates that will help to ensure quality and consistency across a data collection. UK Data Service

A file format describes how information is stored within a digital file. Although each file format is unique, different file formats exist for similar types of information (e.g. text can be stored in a plain text file as well as in a word file).

On most computer systems, the format of a file is indicated by the ‘extension’ in the filename (e.g. .txt, .csv). The extension provides an immediate clue about the type of data within a file. For example, we expect that a file with a .jpg extension is an image, whereas a .docx should contain formatted text..

Simple vs complex formats: e.g. the .txt format is a very simple way of storing text, while a .docx file has more complex properties.

Examples of recommended file formats for different types of data can be found via:

Ghent University

Choosing the file format

The format of the electronic data files you work with during your research may be determined by the research equipment and computer hardware and software that you have access to. However, for long-term preservation and ease of sharing, best practices may dictate that the files be converted to a different format after your project has ended. Give some thought to this eventuality at the outset. Considerations include:

University of Pittsburgh


Stanford University Libraries - Data Management Services provides a useful overview of preferred file formats. From the Stanford resource:

Additional helpful guidelines for selecting file formats can be found at these websites:

University of Pittsburgh

The WHY questions

Research data  are very much about when they are used as well as what they constitute and the purpose for which they are to be used. UK Data Service

Data Capture

Why document data?

Now where did I put that file?

Finding and reusing your data will be easier, both for you and for other researchers, if you give a little thought early in the process to how you will name your data files and what file formats you will use to store your data. If you are planning to archive or share your data, you will also want to consider best practices for describing your data.

University of Pittsburgh

A crucial part of ensuring that research data can be shared and reused by a wide range of researchers for a variety of purposes is by taking care that those data are accessible, understandable and (re)usable.

This requires clear and detailed data description and annotation. Besides the information that is needed to reuse the data, data also need to be accompanied by information for citing and discovering the data.

 UK Data Service

The comprehensive description of the data and contextual information that future researchers need to understand and use the data.

Documentation deposited alongside data files should enable users, with no prior knowledge of the research project and data collected, to understand exactly how the research was carried out and what the data mean, in order to (re)use the data correctly in their respective projects and for their respective purposes.

Original researchers wishing to return to their data some time later, or new users wanting to use data, need sufficient contextual and explanatory information to make sense of those data.

Research data should always be accompanied with documentation because it:

As such, documentation is an essential step in making your data FAIR.

Ghent University

File Naming

A File Naming Convention (FNC) is a framework for naming your files in a way that describes what they contain and how they relate to other files

A file naming convention (FNC) can help you stay organized by making it easy to identify the file(s) that contain the information that you are looking for just from its title and by grouping files that contain similar information close together.  A good FNC can also help others better understand and navigate through your work. Purdue University

File name elements

It is advocated researchers decide on a naming convention for files at the start of the research project.

File names can be constructed using the following elements:

Example: CONS_INT1_12-03-2019.rtf. 

Ghent University

University of Pittsburgh

Version Control


Versioning refers to saving new copies of your files when you make changes so that you can go back and retrieve specific versions of your files later. Saving multiple versions makes it possible to decide at a later time that you prefer an earlier version. You can then immediately revert back to that version instead of having to retrace your steps to recreate it. University of Pittsburgh


Version control is a good research practice in the collaborative research environment. UK Data Service

When you work with different versions of a file, it can be a challenge to locate the 'correct' version or to know how versions differ from each other. If not done well, it can even be difficult to know which file preceded the other.

The matter is even complicated further when files are kept in multiple locations, and multiple users edit these files. To avoid confusion and safeguard against accidental loss, a versioning system can be put in place.

Example:

Ghent University

In its most basic form, versioning relies on a sequential numbering system. Within a given version number category (major, minor), these numbers are generally assigned in increasing order and correspond to changes in the data. The US Geological Survey recommends the following structure: 

 

University of Pittsburgh

MANTRA - Richard Rodger - Organising data

13 May 2014

MANTRA - Richard Rodger - Organising data (short)

1 May 2020

MANTRA - Stephen Lawrie - File transformation

30 May 2014

MANTRA - Jeff Haywood - Importance of good file management in research

30 November 2011

MANTRA - Lynn Jamieson - Documenting research data

4 May 2012

MANTRA - Lynn Jamieson - Importance of documenting data in research

4 May 2012

John MacInnes - Tips on Documentation

3 June 2014

MANTRA - John MacInnes - Data documentation in secondary data analysis

4 May 2012

Metadata

What is metadata?

Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: Descriptive metadata – the descriptive information about a resource. It is used for discovery and identification. Wikipedia

Why is metadata so important?

Metadata which is ascribed to data by Librarians and publishers is done so in accordance with the international standards and ISOs.

The data descriptors include:

 

UK Data Service

How will you ensure data quality assurance ?

Librarians, publishers and data scientists make use of internationally recognised metadata practices which ensures the data is described to...


UK Data Service

Metadata represents data about data. Metadata enriches the data with information that makes it easier to find, use and manage. For instance, HTML tags define layout for human readers. Semantic metadata helps computers to interpret data by adding references to concepts in a knowledge graph. 

Metadata are an important subset of core data documentation

Collating and recording metadata is important for the purposes of cataloguing, citing, discovering and retrieving data collections. Metadata are a subset of core data documentation providing standardised, structured information.

Metadata are intended for reading by machines, and help to explain the purpose, origin, time references, geographic location, creator, access conditions and terms of use of a data collection. Without this essential documentation, collections become of limited value simply because researchers and reusers will not be able to search for or cite the data collection. UK Data Service

Data Sharing

Do your chosen formats and software enable sharing and long-term access to the data?

A crucial part of ensuring that research data can be shared and reused by a wide range of researchers for a variety of purposes is by taking care that those data are accessible, understandable and (re)usable.

This requires clear and detailed data description and annotation. Besides the information that is needed to reuse the data, data also need to be accompanied by information for citing and discovering the data.  UK Data Service

Collaborative research

Collaborative research brings additional data management challenges for providing shared storage, access and the transfer of research data across the various partners or institutions. UK Data Service

Accessible data & authentication

The list of typical requirements for researchers working in a collaborative environment.

UK Data Service

Why consider data copyright?


Copyright is essential for data sharing and fair dealing

When data are shared or archived, the original copyright owner retains the copyright. UK Data Service

A data archive cannot archive data unless all rights holders are identified and give their permission for the data to be shared. Secondary users need to obtain copyright clearance before data can be reproduced. However, exceptions exist under the fair dealing concept. UK Data Service

Research Data ownership

To help inform our members and the broader community regain and maintain community ownership over data and data infrastructure. 

Creative Commons is a nonprofit organization that helps overcome legal obstacles to the sharing of knowledge and creativity to address the world’s pressing challenges. 

Authors give away the copyright rights to their work to the publisher when the article is published in the traditional publication process.

However, when authors publish their work via the Open Access process, they retain the copyright of that work. It is important that authors assign a Creative Commons license to determine how their work may be used and shared.

Choose the Creative Commons license which is right for you!

The WHEN questions

Research data  are very much about when they are used as well as what they constitute and the purpose for which they are to be used. UK Data Service

Research Lifecycle

Data is collected, captured, managed, stored and preserved throughout the research process

The HOW questions

What standards or methodologies will be used?

How data is managed depends on the types of data involved, how data is collected and stored, and how it is used - throughout the research lifecycle


Observational 

Observational data is 

Survey Data

Qualitative data

UK Data Service

Derived or compiled


Derived or compiled data is a result from processing or combining 'raw' data, often reproducible but expensive e.g. compiled databases, text mining, aggregate census data. UK Data Service

Reference or canonical 


Reference or canonical data is a (static or organic) conglomeration or collection of smaller (peer reviewed) datasets, most probably published and curated e.g. gene databanks, crystallographic databases UK Data Service

Experimental data


Experimental data is 

UK Data Service

Simulation


Simulation data is data generated from test models where model and metadata may be more important than output data from the model e.g. economic or climate models:

Computational social science

UK Data Service

Qualitative

MANTRA - Lynn Jamieson - Challenges in working with qualitative data

4 May 2012

This unit introduces you to concepts around data, what constitutes research data, and the multiple forms of data that make up the digital world.

After completing this unit you will:


▪ Be able to distinguish between various types of research data.

▪ Recognise the importance of managing research materials.

▪ Be aware of challenges presented by data in society.

▪ Understand the need for data science and data literacy.

The aim of this unit is to introduce you to the concepts of research data organisation, explain why it is important, and what constitutes good data file management.

After completing this unit you will:


▪ Appreciate why research data organisation is important as your project grows.

▪ Understand data file naming, re-naming and versioning conventions.

▪ Be prepared to manage your code and track workflows to make them shareable and reproducable.

▪ See how electronic lab notebooks can support the collaborative research process.

This unit introduces you to the concepts of documentation and metadata.

After completing this unit you will:


▪ Understand why documenting your research data is important, and why documentation is important for future users of the data.

▪ Know why and when to use metadata.

▪ Understand the importance of citing data, and how to do it.

This unit introduces you to the concepts of data file formatting, compression, normalisation, and other kinds of data transformation and why they are useful.

After completing this unit you will:


▪ Understand why research data formatting and transformation is important.

▪ Know how to make decisions about data file formatting, compression, normalisation and other transformations.

▪ Use the information featured in the course to improve your research data management practice.


The UK Data Service provide various training opportunities

Data Skills Modules

There is a wealth of data available for reuse in research and reports. These free, interactive tutorials are designed for anyone who wants to start using secondary data. They show you how to get started with finding good quality data, understanding it and starting your analyses. 

New to using data

Best practice and training for researchers new to accessing and using data in our collection. Includes advice and tools to correctly cite data; student-specific information on our Dissertation Award for undergraduates; and more.

Survey Data

Survey data, including data from long-running surveys, series and longitudinal studies, are a major part of social science research. Learn how to use survey and longitudinal data through training resources including videos, on-demand webinars and written guides. 

Qualitative data

Qualitative research gives a voice to the lived experience, offering researchers a deeper insight into a topic or individuals’ experiences. Qualitative data can be combined with quantitative to enhance understanding around a policy or topic in a way that quantitative data by itself often cannot. 

Computational social science

New technologies, resources and methods are constantly changing how researchers interact with and use data. This section provides the latest insights, learning materials and practical advice on rapidly developing techniques including modelling, simulation, big data, web-scraping, social media and more. 

Geography and data

Learn how to use updateable subnational population information to assess the impact of policies and tackle area-based issues, such as neighbourhood deprivation and poor health. Guidance on generating local survey estimates and mapping data from key data such as the Census and UK Household Longitudinal Study

International data

Our international macrodata contain socio-economic time series data aggregated to a country or regional level for a range of countries over a substantial time period.