Linguistic Data Management

Handbook

About The Open Handbook of Linguistic Data Management

The Open Handbook of Linguistic Data Management, edited by Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller and Lauren B. Collister, is an open-access publication by MIT Press Open (2022), and is the inaugural volume in the Open Handbooks in Linguistics Series.

The Handbook is a guide to the principles and methods for the management, storing, sharing and citing of linguistic data, especially digital data. The volume is intended for linguistics researchers at all levels, from undergraduate students through graduate students and faculty. Anyone who is concerned about the proper care of linguistics data will find this Handbook useful.

The Handbook features chapters from 127 international authors across the discipline. The volume contains two major parts: Part 1 contains 13 chapters describing the conceptual foundations, principles, and implementation of data management in Linguistics.

Part 2 contains 43 Data Management Use Cases, which describe real-life data management cases in actual linguistic research across a variety of subdisciplines, including sociolinguistics, phonetics, phonology, syntax, discourse analysis, language documentation, language reclamation, semantics, natural language processing, sign linguistics, corpus linguistics, first- and second-language acquisition, historical linguistics, neurolinguistics, psycholinguistics, and typology.

The links below will take you to the free, online versions of the chapters in the volume.

Part 1: Conceptual Foundations, Principles, and Implementation of Data Management in Linguistics

Foreword, by Sarah G. Thomason

Chapter 1: Data, Data Management, and Reproducible Research in Linguistics: On the Need for The Open Handbook of Linguistic Data Management, by Andrea L. Berez-Kroeker, Bradley McDonnell, Lauren B. Collister, and Eve Koller

Chapter 2: Situating Linguistics in the Social Science Data Movement, by Lauren Gawne and Suzy Styles

Chapter 3: The Scope of Linguistic Data, by Jeff Good

Chapter 4: Indigenous People, Ethics, and Linguistic Data, by Gary Holton, Wesley Y. Leonard, and Peter L. Pulsifer

Chapter 5: The Linguistic Data Life Cycle, Sustainability of Data, and Principles of Solid Data Management, by Eleanor Mattern

Chapter 6: Transforming Data, by Na-Rae Han

Chapter 7: Archiving Research Data, by Helene N. Andreassen

Chapter 8: Developing a Data Management Plan, by Susan Smythe Kung

Chapter 9: Copyright and Sharing Linguistic Data, by Lauren B. Collister

Chapter 10: Linguistic Data in the Long View, by Laura Buszard-Welcher

Chapter 11: Guidance for Citing Linguistic Data, by Philipp Conzett and Koenraad De Smedt

Chapter 12:Metrics for Evaluating the Impact of Data Sets, by Robin Champieux and Healther L. Coates

Chapter 13: The Value of Data and Other Non-traditional Scholarly Outputs in Academic Review, Promotion, and Tenure in Canada and the United States, by Juan Pablo Alperin, Lesley A. Schimanski, Michelle La, Meredith T. Niles, and Erin C. McKieman

Part 2: Data Management Use Cases

Chapter 14: Managing Sociolinguistic Data with the Corpus of Regional African American Language (CORAAL), by Tyler Kendall and Charlie Farrington

Chapter 15: Managing Data for Integrated Speech Corpus Analysis in SPeech Across Dialects of English (SPADE), by Morgan Sonderegger, Jane Stuart-Smith, Michael McAuliffe, Rachel Macdonald, and Tyler Kendall

Chapter 16: Data Management at the Ottawa Sociolinguistics Laboratory, by Shana Poplack

Chapter 17: Managing Legacy Data in a Sociophonetic Study of Vowel Variation and Change, by James Grama

Chapter 18: Managing Sociophonetic Data in a Study of Regional Variation, by Valerie Fridland and Tyler Kendall

Chapter 19: Data Management Practices in an Ethnographic Study of Language and Migration, by Lynette Arnold

Chapter 20: Managing Conversation Analysis Data, by Elliott M. Hoey and Chase Wesley Raymond

Chapter 21: Managing Sign Language Data from Fieldwork, by Nick Palfreyman

Chapter 22: Managing Data in a Language Documentation Corpus, by Christopher Cox

Chapter 23: Managing Data for Writing a Reference Grammar, by Nala H. Lee

Chapter 24: Managing Lexicography Data: A Practical, Principled Approach Using FLEx (FieldWorks Language Explorer), by Christine Beier and Lev Michael

Chapter 25: Managing Data from Archival Documentation for Language Reclamation, by Megan Lukaniec

Chapter 26: Managing Data for Descriptive and Historical Research, by Don Daniels and Kelsey Daniels

Chapter 27: Managing Historical Data in the Chirila Database, by Claire Bowern

Chapter 28: Managing Historical Linguistic Data for Computational Phylogenetics and Computer- Assisted Language Comparison, by Tiago Tresoldi, Christoph Rzymski, Robert Forkel, Simon J. Greenhill, Johann-Mattis List, and Russell D. Gray

Chapter 29: Managing Computational Data for Models of Language Acquisition and Change, by Matthew Lou-Magnuson and Luca Onnis

Chapter 30: Managing Sign Language Acquisition Video Data: A Personal Journey in the Organization and Representation of Signed Data, by Julie A. Hochgesang

Chapter31: Managing Acquisition Data for Developing Large Sesotho, English, and French Corpora for CHILDES, by Katherine Demuth

Chapter 32: Managing Phonological Development Data within PhonBank: The Chisasibi Child Language Acquisition Study, by Yvan Rose and Julie Brittain

Chapter 33: Managing Oral and Written Data from an ESL Corpus from Canadian Secondary School Students in a Compulsory, School-Based ESL Program, by Philippa Bell, Laura Collins, and Emma Marsden

Chapter 34: Managing Second Language Acquisition Data with Natural Language Processing Tools, by Scott A. Crossley and Kristopher Kyle

Chapter 35: Managing Data Workflows for Untrained Forced Alignment: Examples from Costa Rica, Mexico, the Cook Islands, and Vanuatu, by Rolando Coto-Solano, Sally Akevai Nicholas, Brittany Hoback, and Gregorio Tiburcio Cano

Chapter 36: Managing Transcription Data for Automatic Speech Recognition with Elpis, by Ben Foley, Daan van Esch, and Nay San

Chapter 37: Managing Data and Statistical Code According to the FAIR Principles, by Laura Janda

Chapter 38: Managing Synchronic Corpus Data with the British National Corpus (BNC), by Stefan Th. Gries

Chapter 39: Managing Data in Sign Language Corpora, by Onno Crasborn

Chapter 40: Managing Sign Language Video Data Collected from the Internet, by Lynn Hou, Ryan Lepic, and Erin Wilkinson

Chapter 41: Managing Data from Social Media: The Indigenous Tweets Project, by Kevin P. Scannell

Chapter 42: Managing Semantic Norms for Cognitive Linguistics, Corpus Linguistics, and Lexicon Studies, by Bodo Winter

Chapter 43: Managing Treebank Data with the Infrastructure for the Exploration of Syntax and Semantics (INESS), by Victoria Rosén and Koenraad De Smedt

Chapter 44: Managing Data in a Formal Syntactic Study of an Underinvestigated Language (Uzbek), by Vera Gribanova

Chapter 45: Managing Data for Theoretical Syntactic Study of Underdocumented Languages, by Philip T. Duncan, Harold Torrance, Travis Major, and Jason Kandybowicz

Chapter 46: Managing Experimental Data in a Study of Syntax, by Matthew Wagers

Chapter 47: Managing Web Experiments for Psycholinguistics: An Example from Experimental Semantics/Pragmatics, by Judith Degen and Judith Tonhauser

Chapter 48: Managing, Sharing, and Reusing fMRI Data in Computational Neurolinguistics, by Hiroyuki Akama

Chapter 49: Managing Phonological Data in a Perception Experiment, by Rory Turnbull

Chapter 50: Managing Speech Perception Data Sets, by Anne Cutler, Mirjam Ernestus, Natasha Warner, and Andrea Weber

Chapter 51: Managing and Analyzing Data with Phonological CorpusTools, by Kathleen Curry Hall, J. Scott Mackie, and Roger Yu-Hsiang Lo

Chapter 52: Managing Phonological Inventory Data in the Development of PHOIBLE, by Steven Moran

Chapter 53: Managing Data in a Typological Study, by Volker Gast and Łukasz Jędrzejowski

Chapter 54: Managing Data for Descriptive Morphosemantics of Six Language Varieties, by Malin Petzell and Caspar Jordan

Chapter 55: Managing Data in TerraLing, a Large-Scale Crosslinguistic Database of Morphological, Syntactic, and Semantic Patterns, by Hilda Koopman and Cristina Guardiano

Chapter 56: Managing AUTOTYP Data: Design Principles and Implementation, by Alena Witzlack-Makarevich, Johanna Nichols, Kristine A. Hildebrandt, Taras Zakharko, and Balthasar Bickel

Page updated

Report abuse