Open science is a rapidly growing movement to make scientific research accessible to all. It encompasses policy changes, publishing, community engagement, equitable access, and more. This page focuses on the requirements, processes, practices, and tools that can help you make research data and code more easily accessible and reproducible while minimizing the time and effort required by the research team.
An increasing number of Federal agencies require grant applications to include a data management plan (DMP) that specifies how research data will be handled during and after the project. Specifying how research data will be openly shared (or why it won't be) is frequently a requirement. dmptool.org is a free, easy-to-use service that provides templates from various agencies and allows you to collaborate with co-authors. Key aspects of your plan include:
Roles and responsibilities of various team members
Who will be responsible for storing, organizing, and documenting data?
Will this be a shared responsibility?
Who will be responsible for ensuring compliance with the DMP?
Though the DMP may not require you to specify it, consider where you will store the data during the project to ensure that research team members have the appropriate level of access.
Code, metadata, data dictionaries
Reproducibility is a key goal of open science. Providing clear descriptions of data sets and variables and documenting how each code file works facilitates the ability of other researchers to understand and reproduce the work.
Projects which rely on secondary data may not have the rights to re-share or distribute data. However, the code developed to work on and analyze those existing data sets serves as an important record of how the project was execute and, in most cases, can still be shared.
Following best practices for data management and programming sets you up for successfully sharing data and code at the end of your project by keeping files organized and commented.
What data will be shared? How and where?
Data sets containing individually identifiable information or information that could be used to reverse-identify individuals need to be "sanitized" before they are shared. This should be done in accordance with the IRB requirements for the project. Johns Hopkins University offers some helpful principles regarding de-identification.
Funding agencies and disciplines may have recommended repositories for data sharing. At UMD, most studies can also be deposited in the Digital Repository for at the University of Maryland (DRUM). The Open Science Framework also offers long-term digital storage with persistent identifiers.
Effort and budget implications
Though some practices that facilitate data sharing and open science can be built-in to a research project, it still takes additional time and effort to make files interpretable and accessible to others and to de-identify data. Wherever possible, budget for this extra effort based on the anticipated complexity of the data and code.
If the repository you will use for data sharing charges a fee, be sure to budget for that as well. Many repositories offer free storage up to a certain size. For example, DRUM is free for up to 15GB in 2GB file increments.
Perhaps the single most useful tool for facilitating computing aspects of open science is the Open Science Framework. OSF is a free tool supported by the UMD Libraries which aims to support researchers throughout the lifecycle of their projects.
UMD PACT (Publishing, Access, and Contract Terms) is a campus hub for open research and scholarship.
UMD Libraries’ Research Data Services can provide guidence on data management plans and archiving.