Lesson 5
Synopsis
With the growth of data management requirements from funding agencies and a recognition of the value of reproducibility, replication, and data reuse, there have been efforts by disciplinary communities, administrators of data repositories, and libraries to develop guidance and services to support researchers as they care for and share data. While there are indeed disciplinary differences in the types of data collected and used, varying expectations from funders and journals for data preservation and sharing, and distinct traditions for open research, this chapter takes a high-level view. As a starting point, this chapter considers the data lifecycle model as a means for perceiving the persistent and ongoing nature of data management. It reviews guidance and best practices for sustaining data, emphasizing the value of consistency and a future-minded orientation as the core principles that should underlay this work.
Core concepts & keywords
Data Management Plan (DMP): A living document created before data collection begins which often includes description of data type, metadata to be produced, storage and backup plans, security and privacy restrictions, access policies, as well as means for data preservation and dissemination.
Metadata: Information about data that aids in discoverability, description, and scope of the data.
Records lifecycle: The conceptualization of the stages of data records. Stages typically include creation, period of active use, inactive phase where long-term value is assessed, and either the destruction or preservation of the record at an archival repository.
3-2-1 Rule: The commonly adapted storage backup plan of having 3 copies, 2 different storage media on-site, and 1 off-site copy.
Appraisal: Examining data to determine its research value for long term preservation.
Economic sustainability of research data: The resources involved in digital data stewardship.
Social sustainability: The commitment needed amongst all involved with interest in data maintenance and accessibility of digital data.
Technical sustainability: Development of robust repository architectures, workflows, tools, and preservation techniques.
Lossless formats: File formats which when compressed do not lose information (ex. .tiff file.)
Open file formats: File formats which are accessible by more than one software program or platform and are supported by more than one developer.
Proprietary file formats: File formats which are dependent on a particular software program or platform and are supported by only one developer.
Activities
Exercises - Practice what you've learned
Choose one or more article(s) from linguistics journals in which the authors discuss their research process. For each article, consider the following questions:
(1) What do you observe about the lifecycle of the data, as described in this chapter?
(2) Can you visualize the stages that this data moves through?
(3) Is the metadata available?
(4) If the data are available to you, are the file formats are open or proprietary? Lossless or lossy?
Implement these practices in your career
If you currently have a Data Management Plan, compare it to the UK Data Services and the United States Geological Survey Science Data Lifecycle Model for data lifecycles (see section 2). Which one does your project more closely follow? If it deviates from their model, why?
Consider the file naming system you use for your data. Is it consistent? Does it include elements suggested in this chapter? Could you improve it using Mattern's guidelines?
Watch this video on filenaming from the Archive of Indigenous Languages of Latin America (AILLA) for further tips on naming files and examples of good and bad file naming schemes.
Quiz - Test yourself!
Relevant data management use cases
Managing data in a language documentation corpus by Christopher Cox
Managing data for descriptive and historical research by Don and Kelsey Daniels
Managing acquisition data for developing large Sesotho, English and French corpora for CHILDES by Katherine Demuth
Managing historical linguistic data for computational phylogenetics and computer-assisted language comparison by Tiago Tresoldi et al.
Managing computational data for models of language acquisition and change by Matthew Lou-Magnuson and Luca Onnis
Related readings
Ball, Alex. “Review of Data Management Lifecycle Models: REDm-MED Project Document.” Bath, UK: University of Bath, 2012. https://purehost.bath.ac.uk/ws/portalfiles/portal/206543/redm1rep120110ab10.pdf
Berman, Francine. “Got Data?: A Guide to Data Preservation in the Information Age.” Communications of the ACM 51, no. 12 (December 2008), 50-56.
Lavoie, Brian F. “Sustainable Research Data.” In Managing Research Data, edited by Graham Pryor, 67-82. London : Facet Publishing, 2012.
National Digital Stewardship Alliance. “Levels of Digital Preservation,” version 1 (2013). https://ndsa.org/activities/levels-of-digital-preservation/.
Van den Eynden, Veerle, Louise Corti, Matthew Woollard, Libby Bishop and Laurence Horton. “Managing and Sharing Data: Best Practices for Researchers,” 3rd edition. Colchester, UK: UK Data Archive, 2011. https://data-archive.ac.uk/media/2894/managingsharing.pdf
Share your thoughts on this article or topic
Use #LingData, #DataPreservation #Metadata #DataLifecycle on your favorite social media platform!
About the author:
Eleanor “Nora” Mattern is a teaching assistant professor at the University of Pittsburgh’s School of Computing and Information. Her research interests are in the areas of information policy, archives, government information practices and systems, and digital curation.
Citations
Cite this chapter:
Mattern, Eleanor. 2022. The linguistic data life cycle, sustainability of data, and principles of solid data management. In The Open Handbook of Linguistic Data Management, edited by Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller, and Lauren B. Collister, 61-72. doi.org/10.7551/mitpress/12200.003.0009. Cambridge, MA: MIT Press Open.
Cite this online lesson:
Gabber, Shirley, Danielle Yarbrough, Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller, Lauren B. Collister, and Eleanor Mattern. 2022. "Lesson 5." Linguistic Data Management: Online companion course to The Open Handbook of Linguistic Data Management. Website: https://sites.google.com/hawaii.edu/linguisticdatamanagement/course-lessons/05-the-linguistic-data-life-cycle-sustainability-of-data-and-principles [Date accessed].