Lesson 7

Synopsis

Data collection is essential to the great majority of research projects in linguistics, but sadly many fail to seek advice on what to do with the data when the project ends. This might lead to poor structuring or failure to safely store the data, which in the worst case ends up with valuable data being forgotten about, not reusable, or lost. In this chapter, I discuss how linguists can proceed in archiving their data, including how to select a data repository that ensures optimal conditions for preservation, retrieval, and visibility. I also discuss potential benefits of archiving as well as some key barriers to data sharing. Although the focus of the chapter is on archiving open data, the large majority of advice also holds for data with restricted access.

Core concepts & keywords

Research Data Management (RDM): The practice of creating, maintaining, and preserving digitally created data.

Repository: Database or virtual archive established to collect, disseminate and preserve scientific output, where material is deposited via archiving or self-archiving.

Archiving: The action of transferring data to a resource provider, e.g. a. repository or a data center, all while complying with any documented guidance, policies, or legal requirements.

Data set: Data with content of a particular kind, that are related and treated collectively, and which have a shared and distinctive intended application.

Curation: The action of maintaining, preserving and adding value to digital research data throughout its lifecycle.

Metadata: Documentation that describes archived data and which facilitates search, retrieval, understanding and reuse.

Activities

Exercises - Practice what you've learned

  • Pick three articles from your linguistic sub-discipline which use original data. For these articles, answer the following questions. Is the research data archived? If so, was it easy to locate the data? What are the access and reuse restrictions?

Implement these practices in your career

  • Look at one of your existing datasets and consider its readiness to be archived. Are you missing any metadata, methods information, or permissions? Do you have everything needed to deposit into the repository you chose from the previous exercise? If you do not have your own dataset, create a plan for future data that you plan to collect.

Quiz - Test yourself!

Related readings

Kaplan, Judith. Archiving descriptive language data. In Limn, edited by Boris Jardine and Christopher M. Kelty, 6 (March 2016). http://limn.it/archiving-descriptive-language-data/?doing_wp_cron=1511981061.1292219161987304687500

LSA Tutorial on Archiving: Archiving and linguistic resources or How to keep your data from becoming endangered. The Open Language Archives Community. Organized by Jeff Good and Heidi Johnson. LSA 2005, Oakland, California. Accessed October 18, 2021. http://www.language-archives.org/events/olac05/ (please see powerpoints and pdfs)

The Language Archive. Max Planck Institute for Psycholinguistics. Accessed October 18, 2021. https://tla.mpi.nl.

Goodman, Alyssa et al. 2014. Ten simple rules for the care and feeding of scientific data. PLOS Computational Biology.

https://doi.org/10.1371/journal.pcbi.1003542.

Share your thoughts on this article or topic

Use #LingData #ArchivingData #DataRepositories on your favorite social media platform!

About the author:

Picture of Helene Andreassen

Helene N. Andreassen

Helene N. Andreassen is the Head of Library Teaching and Learning Support at UiT The Arctic University of Norway, where she coordinates the institutional training programme on research data management, as well as seminars on open science and research data management for PhD candidates and supervisors. Helene holds a PhD in French Linguistics and her research interests are foreign language phonology and varieties of spoken French.

Citations

Cite this chapter:

Andreassen, Helene N. 2022. Archiving research data. In The Open Handbook of Linguistic Data Management, edited by Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller, and Lauren B. Collister, 89-100. doi.org/10.7551/mitpress/12200.003.0011. Cambridge, MA: MIT Press Open.

Cite this online lesson:

Gabber, Shirley, Danielle Yarbrough, Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller, Lauren B. Collister, and Helene N. Andreassen. 2022. "Lesson 7." Linguistic Data Management: Online companion course to The Open Handbook of Linguistic Data Management. Website: https://sites.google.com/hawaii.edu/linguisticdatamanagement/course-lessons/07-archiving-research-data [Date accessed].