Lesson 7
Helene N. Andreassen
Synopsis
Data collection is essential to the great majority of research projects in linguistics, but sadly many fail to seek advice on what to do with the data when the project ends. This might lead to poor structuring or failure to safely store the data, which in the worst case ends up with valuable data being forgotten about, not reusable, or lost. In this chapter, I discuss how linguists can proceed in archiving their data, including how to select a data repository that ensures optimal conditions for preservation, retrieval, and visibility. I also discuss potential benefits of archiving as well as some key barriers to data sharing. Although the focus of the chapter is on archiving open data, the large majority of advice also holds for data with restricted access.
Core concepts & keywords
Research Data Management (RDM): The practice of creating, maintaining, and preserving digitally created data.
Repository: Database or virtual archive established to collect, disseminate and preserve scientific output, where material is deposited via archiving or self-archiving.
Archiving: The action of transferring data to a resource provider, e.g. a. repository or a data center, all while complying with any documented guidance, policies, or legal requirements.
Data set: Data with content of a particular kind, that are related and treated collectively, and which have a shared and distinctive intended application.
Curation: The action of maintaining, preserving and adding value to digital research data throughout its lifecycle.
Metadata: Documentation that describes archived data and which facilitates search, retrieval, understanding and reuse.
Activities
Exercises - Practice what you've learned
Pick three articles from your linguistic sub-discipline which use original data. For these articles, answer the following questions. Is the research data archived? If so, was it easy to locate the data? What are the access and reuse restrictions?
Implement these practices in your career
Visit re3data.org to locate repositories where you may be interested in archiving your research. Use the criteria discussed in Section 4 of the chapter to find the most suitable repository for you. Figure out how to become a depositor there. You can additionally visit the Open Language Archives Community (OLAC) site and the CLARIN Virtual Language Observatory to look for more repositories.
Look at one of your existing datasets and consider its readiness to be archived. Are you missing any metadata, methods information, or permissions? Do you have everything needed to deposit into the repository you chose from the previous exercise? If you do not have your own dataset, create a plan for future data that you plan to collect.
Quiz - Test yourself!
Relevant data management use cases
Managing legacy data in a sociophonetic study of vowel variation and change by James Grama
Managing data for writing a reference grammar by Nala H. Lee
Managing data in TerraLing, a large-scale cross-linguistic database of morphological, syntactic, and semantic patterns by Koopman, Hilda and Cristina Guardiano
Managing data for integrated speech corpus analysis in SPeech Across Dialects of English (SPADE) by Morgan Sonderegger et al.
Managing data for descriptive morphosemantics of six language varieties by Malin Petzell and Caspar Jordan
Related readings
Kaplan, Judith. Archiving descriptive language data. In Limn, edited by Boris Jardine and Christopher M. Kelty, 6 (March 2016). http://limn.it/archiving-descriptive-language-data/?doing_wp_cron=1511981061.1292219161987304687500
LSA Tutorial on Archiving: Archiving and linguistic resources or How to keep your data from becoming endangered. The Open Language Archives Community. Organized by Jeff Good and Heidi Johnson. LSA 2005, Oakland, California. Accessed October 18, 2021. http://www.language-archives.org/events/olac05/ (please see powerpoints and pdfs)
The Language Archive. Max Planck Institute for Psycholinguistics. Accessed October 18, 2021. https://tla.mpi.nl.
Goodman, Alyssa et al. 2014. Ten simple rules for the care and feeding of scientific data. PLOS Computational Biology.
Share your thoughts on this article or topic
Use #LingData #ArchivingData #DataRepositories on your favorite social media platform!
About the author:
Helene N. Andreassen is the Head of Library Teaching and Learning Support at UiT The Arctic University of Norway, where she coordinates the institutional training programme on research data management, as well as seminars on open science and research data management for PhD candidates and supervisors. Helene holds a PhD in French Linguistics and her research interests are foreign language phonology and varieties of spoken French.
Citations
Cite this chapter:
Andreassen, Helene N. 2022. Archiving research data. In The Open Handbook of Linguistic Data Management, edited by Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller, and Lauren B. Collister, 89-100. doi.org/10.7551/mitpress/12200.003.0011. Cambridge, MA: MIT Press Open.
Cite this online lesson:
Gabber, Shirley, Danielle Yarbrough, Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller, Lauren B. Collister, and Helene N. Andreassen. 2022. "Lesson 7." Linguistic Data Management: Online companion course to The Open Handbook of Linguistic Data Management. Website: https://sites.google.com/hawaii.edu/linguisticdatamanagement/course-lessons/07-archiving-research-data [Date accessed].