Lesson 2

Synopsis

We can learn a lot about good data management in our own discipline by learning from what is happening in related fields, both in terms of innovations and new benchmarks, as well as when things have not gone right. We look at how data has been conceptualised and managed in other areas of the social sciences, particularly social psychology, and how current attitudes are shaping the future of research. The fundamental theme of this discourse is the centrality of openness, both in terms of transparency of methodology, and making primary data more accessible to people beyond the original researchers. This move towards open research aims to reduce biases, both for individual researchers and for the discipline, and encourages more considered data collection and presentation.

Core concepts & keywords

The File Drawer: Term coined by Rosenthal used to describe the many studies that are conducted but never published.

Publication Bias: Explicit biases in high profile publication outlets where results showing ʻno differenceʻ are less desirable than novel findings.

HARKing: ʻHypothesizing after results are known,' the practice of retroactively coming up with a hypothesis once statistical significance has been achieved.

P-hacking: Running statistics over a variety of sub-sets of the data until the desired statistical significance is achieved.

Replicability: The application of the original methodology to a new sample to check whether the original data are representative in general.

Replication Crisis: The idea that if core papers in the field fail to replicate then the reputation of the whole field will be called into question.

Conceptual Replications: Testing hypotheses from earlier studies with a different methodological setup.

Reproducibility: When the use of existing materials and methods confirms the original conclusions.

Inference Reproducibility: Whether different researchers draw the same or different inferences from the results of a single study.

Activities

Exercises - Practice what you've learned

  • Choose 3 articles in linguistic subfields of your interest. For each article, decide if (1) if the study is replicable (2) if it is reproducible. If the answer to either of those is "no," decide what information would be needed to make the study replicable and/or reproducible.

  • Find an article in any social science field that is a good example of reproducible research. Determine what elements of the paper make it a good example of reproducible research and consider how to apply them to linguistics.

Implement these practices in your career

  • Select a published study by someone else to replicate using your own data. This could potentially be developed into a publication.

  • Add a section or sections to your CV for non-traditional types of outputs (e.g. open data sets, preprints, etc.)

  • Do you have data that you can get "out of the file drawer" and publish in a digital archive? (See section 4.1). Is it possible to make that data open?

  • Before you start data collection on a research project, pre-register your hypothesis and methods with an online service such as Aspredicted.org, or submit your article to journals as a registered report.

Quiz - Test yourself!

Related readings

Penders, Bart; Holbrook, J. Britt; and Sarah de Rijcke. 2019. Rinse and Repeat: Understanding the Value of Replication across Different Ways of Knowing. https://www.mdpi.com/2304-6775/7/3/52

Share your thoughts on this article or topic

Use #LingData #ReproductionAndReplication #LingDataManagement on your favorite social media platform!

About the authors:

Picture of Lauren Gawne

Lauren Gawne

Lauren Gawne’s research focuses on the documentation of Tibeto-Burman languages, with specialisation in evidentiality, gesture and critical approaches to language documentation. Lauren is a Senior Lecturer at La Trobe University.

Suzy Styles

Suzy J. Styles, trained in Linguistics and Asian Studies at the Australian National University and Tohoku University, followed by a doctorate in Experimental Psychology at the University of Oxford. Her BLIP Lab investigates Brain, Language and Intersensory Processing, in multilingual Singapore.

Picture of Suzy Styles

Supplemental Materials

"Lucky Cowboys" distribution

Illustration by Dannii Yarbrough

"Lucky Cowboys" illustration

Citations

Cite this chapter:

Gawne, Lauren, and Suzy Styles. 2022. Situating linguistics in the social science data movement. In The Open Handbook of Linguistic Data Management, edited by Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller, and Lauren B. Collister, 9-26. doi.org/10.7551/mitpress/12200.003.0006. Cambridge, MA: MIT Press Open.

Cite this online lesson:

Gabber, Shirley, Danielle Yarbrough, Andrea L. Berez-Kroeker, Bradley McDonnell, Eve Koller, Lauren B. Collister, Lauren Gawne and Suzy Styles. 2022. "Lesson 2." Linguistic Data Management: Online companion course to The Open Handbook of Linguistic Data Management. Website: https://sites.google.com/hawaii.edu/linguisticdatamanagement/course-lessons/02-situating-linguistics-in-the-social-science-data-movement [Date accessed].