Discover, Reuse & Cite

Many kinds of data created as part of a research project are subject to the same rights as literary or artistic work. Such items acquire rights like copyright or more general Intellectual Property rights when they are created. This gives the rights owner control over the exploitation of their work, such as the right to copy and adapt the work, the right to rent or lend it, the right to communicate it to the public and the right to licence and distribute.

UK Data Service

Discover

Data discovery

Data discovery is the process of visually navigating data and applying analytics in order to detect patterns, gain insight, answer specific questions, and derive value from the data. 

This stage of the Research Process is a time for reviewing the RDM plan

 

New data discovery and data creation

Data should be managed so that any scientist (including the collector or data originator) can discover, use and interpret the data after a period of time has passed.

Data mining is the process of finding anomalies, patterns and correlations within large data sets to predict outcomes. Using a broad range of techniques, you can use this information to increase revenues, cut costs, improve customer relationships, reduce risks and more. 

Locating existing data


Identifying and locating sources of existing data can be important for a variety of reasons, including:

University of Pittsburgh 

List of resources to locate existing data

Data Directories

These online directories maintain lists of data sources and repositories across a wide range of disciplines.

re3data - A global registry of research data repositories covering a wide variety of academic subjects in the sciences, social sciences, and humanities.

Open Access Directory of Data Repositories - Lists of open access data repositories for a wide range of subject areas.

General repositories

These repositories maintain data from a wide range of subject areas and are not limited to a particular discipline.

figshare - A repository for sharing all types of research output in any subject - includes papers, figures, posters, slides.

Amazon Web Services Public Data Sets * - Hosts a variety of large public datasets, such as Landsat, census, and genomic data. Creating an account may be required and charges may apply for computing time and data transfer.


Discipline related repositories

The following are examples of data repositories that focus on a particular subject area, discipline, or cluster of related disciplines within the broad categories of humanities, sciences, social sciences, and government. 

 

HUMANITIES

LINGUISTICS

OLAC – Open Language Archives Community - An international partnership “creating a worldwide virtual library of language resources,” currently with 58 participating archives.

TROLLing-Tromsø Repository of Language and Linguistics - An open access repository of linguistic data and statistical code.

MUSIC

Mutopia Project - Free sheet music.

 

SCIENCES

BIOLOGY / LIFE SCIENCES

DRYAD - General purpose repository for data underlying scientific and medical publications, historically with a concentration in life sciences.

Gene Expression Atlas - Information on gene expression patterns under different biological conditions, such as different cell types, organism parts, or diseases. ?

genenames.org (HUGO Gene Nomenclature Committee) - Curated repository of HGNC approved gene names and symbols, gene families, and links to related genomic, proteomic, and phenotypic information.

NCBI (National Center for Biotechnology Information) - Provides access to a variety of sources for biomedical and genomic data, including:

Conserved Domain Database (CDD) - Sequence alignments and profiles representing protein domains conserved in molecular evolution.

Gene - Gene data from a variety of species with related information, such as nomenclature, chromosome location, phenotypes, etc.

Database of Genotypes and Phenotypes (dbGaP)  - Data and results from investigations of the interaction of genotypes and phenotypes in humans.                     

WormBase - Data on the genetics, genomics, and biology of C. elegans and some related nematodes.

UniProt (The Universal Protein Resource) - Collection of databases that provide a comprehensive source for protein sequence and annotation data, including a repository for metagenomics and environmental data.

CHEMISTRY

eCrystals - Mostly open access source of fundamental and derived data from single crystal X-ray structure determinations from the University of Southampton and EPSRC UK National Crystallography Service.

PubChem - Database of chemical substances with descriptive and property information along with bioactivity screening data.

Zinc15 - Database of commercially available compounds with 3-D structure representations in a format ready for virtual screening for potential biological activity.

 

SOCIAL SCIENCES

ECONOMICS

GTAP Database – Global Trade Analysis Project  - Global database describing bilateral trade patterns, production, consumption and intermediate use of commodities and services.

GeoFRED® - Geographical Economic Data - Maps of data contained in FRED®. Create customized maps and download data.

 

Data Journals

Many journals can be helpful tools in locating data, although they can play different roles as noted below.

 

Traditional Articles that Publish Data

These traditional "data journals" publish only articles that focus on presenting data, either experimental or computational, or may review experimental methods.

Journal of Physical and Chemical Reference Data - Publishes articles reporting critically evaluated reference data and property measurements.

Journal of Chemical and Engineering Data - Publishes both experimental and computational data.

  

Data Journals  or "Data Paper" Journals

These newer style "data journals" primarily publish articles that describe publicly available datasets and link to those datasets.They may also publish articles on data-related topics, such as describing or reviewing certain analytical or statistical methods. However, traditional research articles that actually analyze the data and draw conclusions from that analysis are generally outside the scope of these journals.

Biodiversity Data Journal - Community peer-reviewed and open-access. Promotes the publishing, dissemination and sharing of biodiversity-related data of any kind. Publishes data papers, general articles, software descriptions, species inventories, and more.

Earth System Science Data - An international interdisciplinary journal that provides a distinctive model for publishing papers about original research data sets and encouraging the reuse of high quality data. Includes methods and review articles and a "living data" process for handling datasets that undergo regular updating or extension.

IUCrData - Open-access and peer-reviewed. Provides descriptions of crystallographic datasets and datasets from related disciplines.

Scientific Data  - Open-access and peer-reviewed. Its Data Descriptor articles describe data sets, the method of data collection and analyses relating to the quality of the data. They also link to one or more published sources of the data.

 

Mixed Journals

These journals publish a mixture of article types, including "data papers" that describe datasets along with traditional research articles and other formats.

International Journal of Robotics Research - Publishes peer-reviewed data papers and multimedia extensions in addition to articles.

Internet Archaelogy - Open access and peer-reviewed. Publishes data papers as well as research articles, methodologies, reviews and more.

Nucleic Acids Research -  For more than 20 years has published a special issue in January that reports on databases containing data related to bioinformatics generally, including nucleic acids, proteins, and genomics.

 

These are only a few examples of journals that can point you to useful data. For more complete listings, check these sites:

Sources of Dataset Peer Review  (from the Edinburgh DataShare Wiki)

A Growing List of Data Journals  (from Data@MLibrary)

Open Data Journals (from the FOSTER project)

Data Visualisation

Data Visualisation is the visual representation of data, and is used to enable people to both understand and communicate information through:

Data Visualisation tools

A variety of tools are available that support data discovery, integration, analysis, and visualization


Therefore the tools enable

Data Analysis

Data interpretation and analysis is the process of assigning meaning to the gathered information and ascertaining “the conclusions, significance, and implications of the findings. Source: University of Pittsburgh & University of Oxford

Reuse

Data for reuse & interpretation

Data should be managed so that any scientist (including the collector or data originator) can discover, use and interpret the data after a period of time has passed


The comprehensive description of the data and contextual information that future researchers need to understand and use the data

Data sharing for reuse & interpretation is good science

Data sharing for reuse & interpretation is good science. 

A crucial part of ensuring that research data can be shared and reused by a wide range of researchers for a variety of purposes is by taking care that those data are accessible, understandable and (re)usable. Source: UK Data Service

Save time and money

Data reuse also enables colleagues to save time and money. 


Data sharing for reuse & interpretation enables peer researchers to validate research findings

By sharing their data, researchers enable others to reproduce and validate their research findings, providing the researcher with transparency, accountability, and material support to strengthen their findings. 

Source: University of Pittsburgh & University of Oxford

Rights in data reuse

Many kinds of data created as part of a research project are subject to the same rights as literary or artistic work. 

Such items acquire rights like copyright or more general Intellectual Property rights when they are created. 

This gives the rights owner control over the exploitation of their work, such as the right to copy and adapt the work, the right to rent or lend it, the right to communicate it to the public and the right to license and distribute.

When data are shared or archived, the original copyright owner retains the copyright.

Copyright is an intellectual property right assigned automatically to the creator. It prevents unauthorised copying and publishing of an original work. Copyright applies to research data and plays a role when creating, sharing and reusing data. Source: UK Data Service

FAIR dealing in data reuse

Under the fair dealing concept, data can be copied for non-commercial teaching or research purposes, private study, criticism or review without infringing copyright, provided that the owner of the work is sufficiently acknowledged. Source: UK Data Service

In 2016, the ‘FAIR Guiding Principles for scientific data management and stewardship were published in Scientific Data. The authors intended to provide guidelines to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets. The principles emphasise machine-actionability (i.e., the capacity of computational systems to find, access, interoperate, and reuse data with none or minimal human intervention) because humans increasingly rely on computational support to deal with data as a result of the increase in volume, complexity, and creation speed of data. 

Findable

The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process. 

Accessible

Once the user finds the required data, she/he/they need to know how they can be accessed, possibly including authentication and authorisation. 

Interoperable

The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing. 

Reusable

The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings. 


Information about Persistent Identifiers (PID)

Sharing and licensing for reuse

When publishing research data, researchers need to consider how they want their data to be reused by other researchers. 

Thereafter, researchers need to specify their choice by licensing the data to match the intended uses. Source: UK Data Service

Creative Commons licenses

Creative Commons (CC) licenses allow creators to easily communicate the rights, which they wish to keep, and the rights, which they wish to waive in order for other people to make reuse of their intellectual properties. Source: UK Data Service


GOFAIR: Information about licenses

Research data ownership

Copyright is essential for data sharing and fair dealing

When data are shared or archived, the original copyright owner retains the copyright. Source: UK Data Service

A data archive cannot archive data unless all rights holders are identified and give their permission for the data to be shared. Secondary users need to obtain copyright clearance before data can be reproduced. However, exceptions exist under the fair dealing concept. Source: UK Data Service

SPARC

To help inform our members and the broader community regain and maintain community ownership over data and data infrastructure.

MANTRA - Jeff Haywood - RDM Legacy access and data reuse

8 November 2013

Creative Commons

Creative Commons is a nonprofit organization that helps overcome legal obstacles to the sharing of knowledge and creativity to address the world’s pressing challenges. 

Authors give away the copyright rights to their work to the publisher when the article is published in the traditional publication process.

However, when authors publish their work via the Open Access process, they retain the copyright of that work. It is important that authors assign a Creative Commons license to determine how their work may be used and shared.

Choose the Creative Commons license which is right for you!

Cite

Make data easy to reuse & cite

This requires clear and detailed data description and annotation. Besides the information that is needed to reuse the data, data also need to be accompanied by information for citing and discovering the data. UK Data Service


Why make data easy to cite?

By documenting your data and recommending appropriate ways to cite your data, you can be sure to get credit for your data products

Purdue University

Citing data

If you’re reusing a dataset to inform your own work, you’ll want to make sure that you are providing proper recognition. 

Datasets are scholarly products and should be cited as such. 

If you are using a dataset that was deposited in a disciplinary data repository, you may find that the repository has a recommended citation standard.

ICPSR provides useful guidance on data citations and suggests that a citation for a dataset should include the following basic elements:

For general information about citing a dataset, see the following resources:

University of Pittsburgh