Frequently Asked Questions


What is the Data Curation Network (DCN)?

The Data Curation Network is a Sloan-funded project that aims to conceptualize and develop a “network of expertise” model for U.S. academic libraries to collectively provide data curation services to support digital research data deposit into repositories for open access and reuse. The project launched as a one-year planning phase in May 2016 to develop a model for how the DCN will function and will reflect the experiences and demand for data curation services from our six partner institutions that each, separately, provide repository and curation services to their campus constituents. A future implementation phase will pilot the DCN model across our institutions to demonstrate the value proposition that a network of expertise model provides more effective data curation services for a wider variety of data types, discipline, file format, etc. than what any single institution might offer alone.

Up

Who is involved with the DCN?

The DCN project is in it’s initial planning phase to develop a model for how the Data Curation Network might function once implemented. The planning phase brings together the perspectives of research data librarians, academic library administration, and data curation subject experts from six major academic institutions and led by the University of Minnesota. Project members are described at https://sites.google.com/site/datacurationnetwork/people.

Up

Why is data curation needed?

Researchers are required by many federal [1] and private funders [2] and publishers to make the digital data underlying their research openly available for sharing and reuse. However, in order for data to be fully publicly accessible to search, retrieve, and analyze, specialized curatorial actions must be taken to best prepare these data for reuse including quality assurance, file integrity checks, documentation review, metadata creation for discoverability, and file transformations into archival formats.

[1]  The 2013 Public Access to Federally Funded Research memo from the White House Office of Science and Technology Policy directed most federal grant-funding agencies to develop policy requirements for public access to resulting articles and data. The memo is available at https://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf. This web page analyzes the responses to the memo from federal funders at https://www.lib.umn.edu/datamanagement/funding.

[2]  Some publishers have specific requirements for how authors must publically share the data related to their publication. See the author guidance for data sharing from PLoS ONE (http://journals.plos.org/plosone/s/data-availability) or Nature (http://www.nature.com/authors/policies/availability.html#data) as examples.

Up

Why are libraries involved with data curation?

Supporting researchers with data curation is an important role that academic research libraries aspire to fill as we transform our workforce to assume greater digital stewardship responsibilities in the academy [3]. Libraries are experts at identifying, selecting, organizing, describing, preserving, and providing access to information materials, print and digital. And as a cornerstone of the academic institution, libraries are persistent, with a demonstrated and sustainable model for providing services such as collection management, preservation, and access to a broad variety of information.
[3]  See for example the 2015 “Open Letter to PLoS: Libraries' Role in Data Curation,” co-authored by DCN project personnel and signed by 48 signatories, outlining the value of institutional repositories for meeting the needs of journal data sharing requirements (https://datacurepublic.wordpress.com/open-letter-to-plos-libraries-role-in-data-curation/).
Up

What about subject repositories?

While discipline-specific data repositories (e.g., ICPSR, GenBank) manage largely homogeneous data sets, our general-purpose repositories for data, called institutional repositories (IRs), can accommodate a wider-range of data formats and metadata from many disciplines. IRs support the long-tail of research subjects that do not have subject repository home and/or data that must stay at the home institution in accordance with ownership and research IP policies.

What is a “network of expertise” model?

A network of expertise model enables multiple institutions to deliver a unique service by sharing relevant and expert staff. Kirchner et al. (2015) define networks of expertise as “... a way to implement and sustain new information services for research…. Through this method, existing organizations will start to change as they integrate experts more fully into the daily work and as a greater number of information professionals share knowledge” (p.16). Additionally they define a network of expertise as “a shared approach to address specialized information needs or to solve a common problem” (p17).
Reference: Kirchner, J., Diaz, J., Henry, G., Fliss, S., Culshaw, J., Gendron, H., and Cawthorne, J. E. (2015). The Center of Excellence Model for Information Services. Retrieved from the Council on Library and Information Resources, http://www.clir.org/pubs/reports/pub163.

What are some examples of networked or shared staffing models in the library science field?

Successful models in the library and information science discipline for sharing expertise and staff include the 2CUL project that supports collection development and cataloging services jointly at Columbia University and Cornell University (https://www.2cul.org), the Digital Preservation Network (http://www.dpn.org) which consists of geographically redundant nodes such as the Academic Preservation Trust, and DuraSpace (http://www.duraspace.org) repository software, such as Fedora and DSpace, whose code base and service models have been developed by a global community.

Why is a network of expertise model needed for data curation?

Due to the heterogeneous and multidisciplinary nature of research data generated in our nation's academic institutions, the skills and expertise required to curate data (to prepare, arrange, describe, and test data for optimal reuse) cannot reasonably be provided by a few experts siloed at single institutions. University of Minnesota research [4] has shown that multiple data curation experts are needed to effectively curate the diverse data types an IR typically receives. Local implementation of a small-scale distributed data curation staffing model [5] for the Data Repository at the University of Minnesota has proven to be a workable solution. Given limited resources, it is unrealistic however to expect that every academic library can hire a data curator for every data type (e.g., GIS, spreadsheet/tabular, statistical/survey, video/audio, computer code) or discipline-specific data set (genomic sequence, chemical spectra, biological image) that one might encounter. Similarly, each type of data curation expertise might only be needed occasionally depending on the disciplinary makeup at each institution.


[4] In order to better understand the capacity and limitations of accepting and curating data in the Digital Conservancy the University of Minnesota libraries ran a successful Data Curation Pilot in 2013. Five data sets from five faculty in diverse fields provided a draft workflow model for data curation in the libraries. The full report and recommendation from this pilot provided a roadmap for implementing these services is published as Johnston, Lisa R. (2014). A Workflow Model for Curating Research Data in the University of Minnesota Libraries: Report from the 2013 Data Curation Pilot. University Digital of Minnesota Conservancy. Retrieved from the University of Minnesota Digital Conservancy, http://hdl.handle.net/11299/162338.


[5]. The business and service model for the Data Repository for the University of Minnesota (DRUM) recommended a distributed and coordinated staffing model successfully instituted in November 2014 and includes 5 data curation experts each covering data curation needs for data in the physical sciences, health sciences, social sciences, spatial data, and digital humanities data. This model was published as:

University of Minnesota Libraries. (2015). The Supporting Documentation for Implementing the Data Repository for the University of Minnesota (DRUM): A Business Model, Functional Requirements, and Metadata Schema. Retrieved from the University of Minnesota Digital Conservancy, http://hdl.handle.net/11299/171761.

What are some projects related to the goals of the DCN?

The DCN project involves data curation workflows and best practices, shared staffing models, and skill sets needed for data curators. Related more so to data curation workflows and best practices, there are projects underway by the Research Data Alliance (RDA) Publishing Data Workflows (Bloom et. al., 2015) and the Stewardship Gap project (http://www.colorado.edu/ibs/cupc/stewardship_gap) which seeks to “ensure sustainable long-term access to valuable sponsored research data and creative content,” which our team will closely monitor. In the area of data curation skill sets for professional staff, the research underway by Camille Thomas and Richard J. Urban (Florida State University School of Information) assessing perceptions of educational preparation for data curation services will be relevant. Finally, the Data Curation Network model is distinct from existing collaboration-focused programs underway:


Reference: Bloom, Theodora et al. (2015). Workflows for Research Data Publishing: Models and Key Components (Submitted Version). Zenodo. http://dx.doi.org/10.5281/zenodo.20308.

Up

What is the team’s approach for developing a DCN?

The project team will develop a model Data Curation Network that is intended for future pilot implementation across our six institutions. The model for the Data Curation Network will focus on enabling multiple institutions to share data curation staff in a “network of expertise” in order to provide high-quality data curation services that span the breadth of disciplines and researcher need than what any single institution might offer alone. The DCN model will include the following:

  • An implementation plan, informed by Input from researchers at each of our institutions, for a clear, well-coordinated DCN that addresses the procedures necessary to handle the curation workload for a wide-variety of data types and formats as well as the challenges of managing a geographically- and institutionally-distributed staff.

  • Baseline measures of current demand for data curation services to forecast the potential workload and effort involved with providing data curation services, using our institutions as a gauge.

  • An assessment plan that defines ways to assess the cost-effectiveness, efficiency, and demand (both in data variety and skills utilized) for a Data Curation Network model.

  • A sustainability plan that recommends potential levels of support needed to sustain the Data Curation Network post-implementation and provides the necessary incentives to grow the network beyond these initial partners (e.g., a membership model that allows for new institutions to join the Network, for example).

Comments