Lesson 11
Philipp Conzett and Koenraad De Smedt
Synopsis
While citation of publications is fairly standardized, citation of data is not yet common practice and does not have widely established guidelines. This chapter is mainly addressed at authors wishing to understand why and how to cite linguistic data in scholarly works. We first sketch the rationale and motivation for the citation of linguistic data, based on the Austin Principles of Data Citation in Linguistics and the FAIR principles of data management. We then proceed to suggest a set of principles for data citation and concrete guidelines concerning bibliographical references as well as in-text citations. We also discuss issues such as granularity and citation of unpublished data. The guidelines are illustrated in an example section, followed by some practical tips for citing data with reference managers. The chapter will conclude with some advice for resource providers and for academic publishers.
Core concepts & keywords
Resource: General term for various types of digital research products, including research data, language models, analyzers, annotation tools, and statistical code.
Metadata: Documentation of data that should explain how to obtain and interpret the data, and the conditions of its potential reuse.
Data Availability Statement: Explains where to find the data used in a publication, provided in addition to full citations.
Reference management tool: Software used for storage and formatting of bibliographic citations, may not fully support research data citations.
Activities
Exercises - Practice what you've learned
Conzett and De Smedt discuss the first three of the Austin Principles of Data Citation in this chapter (see section 2). Why are these principles important for linguistic data? What are some possible consequences of not following these principles?
How do the Tromsø recommendations for citation of research data in linguistics implement the Austin Principles of Importance, Credit and Attribution, and Evidence?
Find a linguistic journal article that builds on data without citing them (properly). How would you create a citation that would conform with the Tromsø recommendations?
Implement these practices in your career
If you have not been citing data in your own articles, how would you do that following the Tromsø recommendations? (See sections 4 and 5). Consider both in-text citations and the references section. Remember that even your own data needs to be properly cited.
In your next publication, consider providing a Data Availability Statement. Think about what types of data may be associated with your publication and what information you would want to include in the statement.
Quiz - Test yourself!
Relevant data management use cases
Managing Conversation Analysis Data by Elliott M. Hoey and Chase Wesley Raymond
Managing sociolinguistic data with the Corpus of Regional African American Language (CORAAL) by Tyler Kendall and Charlie Farrington
Managing data for integrated speech corpus analysis in SPeech Across Dialects of English (SPADE) by Morgan Sonderegger, Jane Stuart-Smith, Michael McAuliffe, Rachel Macdonald, and Tyler Kendall
Related readings
Developing Standards for Data Citation and Attribution for Reproducible Research in Linguistics. Accessed October 18, 2021. https://sites.google.com/a/hawaii.edu/data-citation/welcome
Berez-Kroeker, Andrea L., Helene N. Andreassen, Lauren Gawne, Gary Holton, Susan Smythe Kung, Peter Pulsifer, Lauren
B. Collister, the Data Citation and Attribution in Linguistics Group, and the Linguistics Data Interest Group. 2018. The
Austin Principles of data citation in linguistics. https://site.uit.no/linguisticsdatacitation/austinprinciples/.
GO FAIR. Accessed October 18, 2021. https://www.go-fair.org/
Data Citation Working Group (WG). The Research Data Alliance. Accessed October 18, 2021. https://www.rd-alliance.org/groups/data-citation-wg.html
Linguistics Data Interest Group (IG). The Research Data Alliance. Accessed October 18, 2021. https://rd-alliance.org/groups/linguistics-data-ig
Andreassen, Helen N.; Berez-Kroeker, Andrea L.; Gawne, Lauren; Conzett, Philipp; De Smedt, Koenraad; Cox, Chirstopher; Collister, Lauren B. Using the Tromsø Recommendations to cite data in language work. 2021. YouTube. https://www.youtube.com/watch?v=GyBCslbn6tc&feature=youtu.be
Share your thoughts on this article or topic
Use #LingData #Citation #TromsøRecommendations on your favorite social media platform!
About the authors:
Philipp Conzett is a Senior Research Librarian at UiT The Arctic University of Norway working with Digital Scholarship, especially Open Science and research data management. He is one of the managers of the Tromsø Repository of Language and Linguistics (TROLLing), and is currently doing research on word-formation and grammatical gender in Norwegian.
Koenraad De Smedt is Professor of computational linguistics at the University of Bergen (Norway), where he teaches natural language processing. His current research interests are in corpus linguistics and grammar. Since 2008 he has been National Coordinator for Norway in CLARIN (the European Research Infrastructure for Language Resources and Technology).
Citations
Cite this chapter:
Conzett, Philipp and Koenraad De Smedt. 2022. Guidance for citing linguistic data. In The Open Handbook of Linguistic Data Management, edited by Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller, and Lauren B. Collister, 143-156. doi.org/10.7551/mitpress/12200.003.0015. Cambridge, MA: MIT Press Open.
Cite this online lesson:
Gabber, Shirley, Danielle Yarbrough, Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller, Lauren B. Collister, Philipp Conzett, and Koenraad De Smedt. 2022. "Lesson 11." Linguistic Data Management: Online companion course to The Open Handbook of Linguistic Data Management. Website: https://sites.google.com/hawaii.edu/linguisticdatamanagement/course-lessons/11-guidance-for-citing-linguistic-data [Date accessed].