Open Data repositories

What are Open Data repositories?

Open data repositories are where researchers place/archive/deposit (and discover) datasets from original research. There are repositories designed around specific disciplines and around specific data types.


Identifying and locating sources of existing data can be important for a variety of reasons, including: asking new questions or providing a new analysis of the data, comparing results from various studies, replicating and validating previous results, developing or testing computational models, and extending a study across time, geography, or other variables by incorporating data from multiple datasets.

The resources listed below can help you find relevant datasets for use in your research. Many also serve as repositories if you are interested in a place to deposit and share your own research data. University of Pittsburgh 

Open Data Directories: Lists

These online directories maintain lists of data sources and repositories across a wide range of disciplines.


Lists of open access data repositories for a wide range of subject areas.

About the Open Access Directory

The Open Access Directory (OAD) is a wiki where the open access (OA) community can create and support simple factual lists about open access to science and scholarship. It launched on April 30, 2008.

A wiki is a natural solution for hosting, organizing, and maintaining lists. On a wiki, revised continuously by the community, a list can be more comprehensive and up to date than the same list maintained by an individual. By bringing many OA-related lists together in one place, OAD will make it easier for users, especially newcomers, to discover them and use them for reference. The easier they are to discover, the more effectively they can spread useful, accurate information about OA.

The goal is for the OA community itself to enlarge and correct the lists with little intervention from the editors or editorial board. For quality control, we limit editing privileges to registered users.

As far as possible, the OAD lists will be limited to brief factual statements without narrative or opinion. One reason is that there are already outlets for narrative texts, such as Wikipedia and OASIS. Another reason is that factual lists create a much lighter load for the editors and editorial board.

A global registry of research data repositories covering a wide variety of academic subjects in the sciences, social sciences, and humanities.

Research data are valuable and ubiquitous. Permanent access to research data is a challenge for all stakeholders in the scientific community. Long-term preservation and the principle of open access to research data offer broad opportunities for the scientific community. In the last decade, more and more universities and research centres established research data repositories allowing permanent access to data sets in a trustworthy environment. Due to disciplinary requirements, the landscape of data repositories is very heterogeneous. Thus, it is difficult for researchers, funding bodies, publishers, and scholarly institutions to select appropriate repositories for storing and searching research data.

Re3data is a global registry of research data repositories that covers research data repositories from different academic disciplines. It includes repositories that enable permanent storage of and access to data sets to researchers, funding bodies, publishers, and scholarly institutions. re3data promotes a culture of sharing, increased access and better visibility of research data. The registry has gone live in autumn 2012 and has been funded by the German Research Foundation (DFG).

Open Data Repository

The Open Data Repository (ODR) has created a database creation platform with an easy-to-use interface that allows researchers to quickly create online databases to share data for collaboration, for publication, or as a citable repository. As the project matures, it aims to be a simple system for managing, publishing, and working with research data that will promote semantic approaches to data sharing. 

The Open Data Repository's Data Publisher allows researchers, graduate students, and the general public to quickly create database structures and publish data on the web. Using the drag-and-drop form designer, you can easily create your database schema and then populate it with meta-data, files, and graphs. 

General Repositories

These repositories maintain data from a wide range of subject areas and are not limited to a particular discipline.

FigShare

Figshare is a cloud-based data repository where users can make all of their research outputs available in a citable, shareable and discoverable manner. A repository for sharing all types of research output in any subject - includes papers, figures, posters, slides.

Academics are busy enough. figshare features aim to help you organize your research and get as much impact for it as possible, without adding time or effort to your day:

Hosts a variety of large public datasets, such as Landsat, census, and genomic data. Creating an account may be required and charges may apply for computing time and data transfer.

This registry exists to help people discover and share datasets that are available via AWS resources. See recent additions and learn more about sharing data on AWS.

Get started using data quickly by viewing all tutorials with associated SageMaker Studio Lab notebooks.

See all usage examples for datasets listed in this registry.

See datasets from Allen Institute for Artificial Intelligence (AI2), Digital Earth Africa, Data for Good at Meta, NASA Space Act Agreement, NIH STRIDES, NOAA Big Data Program, Space Telescope Science Institute, and Amazon Sustainability Data Initiative.

Discipline specific Repositories

The following are examples of data repositories that focus on a particular subject area, discipline, or cluster of related disciplines within the broad categories of humanities, sciences, social sciences, and government. 

Humanities

LINGUISTICS

OLAC – Open Language Archives Community - An international partnership “creating a worldwide virtual library of language resources,” currently with 58 participating archives.

TROLLing-Tromsø Repository of Language and Linguistics - An open access repository of linguistic data and statistical code.


MUSIC

Mutopia Project - Free sheet music.

University of Pittsburgh 

Sciences

BIOLOGY / LIFE SCIENCES

DRYAD - General purpose repository for data underlying scientific and medical publications, historically with a concentration in life sciences.

Gene Expression Atlas - Information on gene expression patterns under different biological conditions, such as different cell types, organism parts, or diseases. ?

genenames.org (HUGO Gene Nomenclature Committee) - Curated repository of HGNC approved gene names and symbols, gene families, and links to related genomic, proteomic, and phenotypic information.

NCBI (National Center for Biotechnology Information) - Provides access to a variety of sources for biomedical and genomic data, including:

     Conserved Domain Database (CDD) - Sequence alignments and profiles representing protein domains conserved in molecular evolution.

     Gene - Gene data from a variety of species with related information, such as nomenclature, chromosome location, phenotypes, etc.

     Database of Genotypes and Phenotypes (dbGaP)  - Data and results from investigations of the interaction of genotypes and phenotypes in humans.                     

WormBase - Data on the genetics, genomics, and biology of C. elegans and some related nematodes.

UniProt (The Universal Protein Resource) - Collection of databases that provide a comprehensive source for protein sequence and annotation data, including a repository for metagenomics and environmental data.

University of Pittsburgh 


CHEMISTRY

eCrystals - Mostly open access source of fundamental and derived data from single crystal X-ray structure determinations from the University of Southampton and EPSRC UK National Crystallography Service.

PubChem - Database of chemical substances with descriptive and property information along with bioactivity screening data.

Zinc15 - Database of commercially available compounds with 3-D structure representations in a format ready for virtual screening for potential biological activity.

 University of Pittsburgh 

DRYAD

Mission

The Dryad Digital Repository is a curated resource that makes research data discoverable, freely reusable, and citable. Dryad provides a general-purpose home for a wide diversity of data types.

Dryad originated from an initiative among a group of leading journals and scientific societies to adopt a joint data archiving policy (JDAP) for their publications, and the recognition that open, easy-to-use, not-for-profit, community-governed data infrastructure was needed to support such a policy. These remain our guiding principles.

Dryad’s vision is to promote a world where research data is openly available, integrated with the scholarly literature, and routinely re-used to create knowledge.

Our mission is to provide the infrastructure for, and promote the re-use of, data underlying the scholarly literature.

Key features

Zenodo

Passionate about Open Science!

Built and developed by researchers, to ensure that everyone can join in Open Science.

The OpenAIRE project, in the vanguard of the open access and open data movements in Europe was commissioned by the EC to support their nascent Open Data policy by providing a catch-all repository for EC funded research. CERN, an OpenAIRE partner and pioneer in open source, open access and open data, provided this capability and Zenodo was launched in May 2013.

In support of its research programme CERN has developed tools for Big Data management and extended Digital Library capabilities for Open Data. Through Zenodo these Big Science tools could be effectively shared with the long­-tail of research.

Open Science knows no borders!

The need for a catch-all is not restricted to one funder, or one nation, so the concept caught on, and Zenodo rapidly started welcoming research from all over the world, and from every discipline.

The digital revolution has necessitated a re­tooling of the scholarly processes to handle data and software, but this is proceeding at varying speeds across different communities, disciplines, and nations. To ensure no one is left behind through lack of access to the necessary tools and resources, Zenodo makes the sharing, curation and publication of data and software a reality for all researchers.

Social Sciences

ECONOMICS

GTAP Database – Global Trade Analysis Project  - Global database describing bilateral trade patterns, production, consumption and intermediate use of commodities and services.

 University of Pittsburgh 

Data Journals

Many journals can be helpful tools in locating data, although they can play different roles as noted below.

Traditional Articles that Publish Data

These traditional "data journals" publish only articles that focus on presenting data, either experimental or computational, or may review experimental methods.

Journal of Physical and Chemical Reference Data - Publishes articles reporting critically evaluated reference data and property measurements.

Journal of Chemical and Engineering Data - Publishes both experimental and computational data.

 University of Pittsburgh 

Data Journals  or "Data Paper" Journals

These newer style "data journals" primarily publish articles that describe publicly available datasets and link to those datasets.They may also publish articles on data-related topics, such as describing or reviewing certain analytical or statistical methods. However, traditional research articles that actually analyze the data and draw conclusions from that analysis are generally outside the scope of these journals.

Biodiversity Data Journal - Community peer-reviewed and open-access. Promotes the publishing, dissemination and sharing of biodiversity-related data of any kind. Publishes data papers, general articles, software descriptions, species inventories, and more.

Earth System Science Data - An international interdisciplinary journal that provides a distinctive model for publishing papers about original research data sets and encouraging the reuse of high quality data. Includes methods and review articles and a "living data" process for handling datasets that undergo regular updating or extension.

IUCrData - Open-access and peer-reviewed. Provides descriptions of crystallographic datasets and datasets from related disciplines.

Scientific Data  - Open-access and peer-reviewed. Its Data Descriptor articles describe data sets, the method of data collection and analyses relating to the quality of the data. They also link to one or more published sources of the data.

 University of Pittsburgh 

Mixed  Journals

These journals publish a mixture of article types, including "data papers" that describe datasets along with traditional research articles and other formats.

International Journal of Robotics Research - Publishes peer-reviewed data papers and multimedia extensions in addition to articles.

Internet Archaelogy - Open access and peer-reviewed. Publishes data papers as well as research articles, methodologies, reviews and more.

Nucleic Acids Research -  For more than 20 years has published a special issue in January that reports on databases containing data related to bioinformatics generally, including nucleic acids, proteins, and genomics.

 University of Pittsburgh 

Journals that can point you to useful data. For more complete listings, check these sites...

Sources of Dataset Peer Review  (from the Edinburgh DataShare Wiki)

A Growing List of Data Journals  (from Data@MLibrary)

Open Data Journals (from the FOSTER project)

 University of Pittsburgh