Lesson 11

Philipp Conzett and Koenraad De Smedt

Synopsis

While citation of publications is fairly standardized, citation of data is not yet common practice and does not have widely established guidelines. This chapter is mainly addressed at authors wishing to understand why and how to cite linguistic data in scholarly works. We first sketch the rationale and motivation for the citation of linguistic data, based on the Austin Principles of Data Citation in Linguistics and the FAIR principles of data management. We then proceed to suggest a set of principles for data citation and concrete guidelines concerning bibliographical references as well as in-text citations. We also discuss issues such as granularity and citation of unpublished data. The guidelines are illustrated in an example section, followed by some practical tips for citing data with reference managers. The chapter will conclude with some advice for resource providers and for academic publishers.

Core concepts & keywords

Resource: General term for various types of digital research products, including research data, language models, analyzers, annotation tools, and statistical code.

Metadata: Documentation of data that should explain how to obtain and interpret the data, and the conditions of its potential reuse.

Data Availability Statement: Explains where to find the data used in a publication, provided in addition to full citations.

Reference management tool: Software used for storage and formatting of bibliographic citations, may not fully support research data citations.

Activities

Exercises - Practice what you've learned

  • Conzett and De Smedt discuss the first three of the Austin Principles of Data Citation in this chapter (see section 2). Why are these principles important for linguistic data? What are some possible consequences of not following these principles?

  • How do the Tromsø recommendations for citation of research data in linguistics implement the Austin Principles of Importance, Credit and Attribution, and Evidence?

  • Find a linguistic journal article that builds on data without citing them (properly). How would you create a citation that would conform with the Tromsø recommendations?

Implement these practices in your career

  • If you have not been citing data in your own articles, how would you do that following the Tromsø recommendations? (See sections 4 and 5). Consider both in-text citations and the references section. Remember that even your own data needs to be properly cited.

  • In your next publication, consider providing a Data Availability Statement. Think about what types of data may be associated with your publication and what information you would want to include in the statement.

Quiz - Test yourself!

Related readings

Developing Standards for Data Citation and Attribution for Reproducible Research in Linguistics. Accessed October 18, 2021. https://sites.google.com/a/hawaii.edu/data-citation/welcome

Berez-Kroeker, Andrea L., Helene N. Andreassen, Lauren Gawne, Gary Holton, Susan Smythe Kung, Peter Pulsifer, Lauren

B. Collister, the Data Citation and Attribution in Linguistics Group, and the Linguistics Data Interest Group. 2018. The

Austin Principles of data citation in linguistics. https://site.uit.no/linguisticsdatacitation/austinprinciples/.

GO FAIR. Accessed October 18, 2021. https://www.go-fair.org/

Data Citation Working Group (WG). The Research Data Alliance. Accessed October 18, 2021. https://www.rd-alliance.org/groups/data-citation-wg.html

Linguistics Data Interest Group (IG). The Research Data Alliance. Accessed October 18, 2021. https://rd-alliance.org/groups/linguistics-data-ig

Andreassen, Helen N.; Berez-Kroeker, Andrea L.; Gawne, Lauren; Conzett, Philipp; De Smedt, Koenraad; Cox, Chirstopher; Collister, Lauren B. Using the Tromsø Recommendations to cite data in language work. 2021. YouTube. https://www.youtube.com/watch?v=GyBCslbn6tc&feature=youtu.be

Share your thoughts on this article or topic

Use #LingData #Citation #TromsøRecommendations on your favorite social media platform!

About the authors:

Picture of Philipp Conzett

Philipp Conzett is a Senior Research Librarian at UiT The Arctic University of Norway working with Digital Scholarship, especially Open Science and research data management. He is one of the managers of the Tromsø Repository of Language and Linguistics (TROLLing), and is currently doing research on word-formation and grammatical gender in Norwegian.

Koenraad De Smedt

Koenraad De Smedt is Professor of computational linguistics at the University of Bergen (Norway), where he teaches natural language processing. His current research interests are in corpus linguistics and grammar. Since 2008 he has been National Coordinator for Norway in CLARIN (the European Research Infrastructure for Language Resources and Technology).

Picture of Koenraad De Smedt

Citations

Cite this chapter:

Conzett, Philipp and Koenraad De Smedt. 2022. Guidance for citing linguistic data. In The Open Handbook of Linguistic Data Management, edited by Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller, and Lauren B. Collister, 143-156. doi.org/10.7551/mitpress/12200.003.0015. Cambridge, MA: MIT Press Open.

Cite this online lesson:

Gabber, Shirley, Danielle Yarbrough, Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller, Lauren B. Collister, Philipp Conzett, and Koenraad De Smedt. 2022. "Lesson 11." Linguistic Data Management: Online companion course to The Open Handbook of Linguistic Data Management. Website: https://sites.google.com/hawaii.edu/linguisticdatamanagement/course-lessons/11-guidance-for-citing-linguistic-data [Date accessed].