Resources


    Data repositories

Supplementary materials for: Podcasts as an emerging register of computer-mediated communication (version 1.0)

DOI: tba

Supplementary materials for: Characterising online news comments: a multi-dimensional cruise through online registers (version 1.0)

DOI: 10.5281/zenodo.4885180

Replication data and scripts for: Meaning and measures: Interpreting and evaluating the meaning of complexity metrics (version 1.0).

Data, scripts and additional statistics for the interpretation of complexity measures as described in Ehret et al (2021). 

DOI: 10.5281/zenodo.7371015

Scripts for the retrieval and processing of gzip's debugging output and the basic analysis of compressed strings.

DOI: 10.5281/zenodo.7350707

  Replication data and scripts for: Ehret, Katharina, and Maite Taboada (2021). "The interplay of complexity and subjectivity in opinionated discourse." 

        DOI: 10.5281/zenodo.4106003 

     Repository containing the original dataset, scripts and extensive statistics for the analysis of text complexity and subjectivity in online news comments, opinion articles and general news articles. Go to project website

Supplementary materials for: Are online news comments like face-to-face conversation? A multi-dimensional analysis of an emerging register (version 1.0).

DOI: 10.5281/zenodo.3556820

Repository comprising the supplementary materials and replication data for Ehret and Taboada (2020), a multi-dimensional analysis of online news comments and other traditional English registers. Go to project website

Repository of the Interactive Workshop on Measuring Language complexity (IWMLC) which contains the metrics, code and data presented by the workshop participants as part of a shared task.

Scripts and sample data for the compression technique (Ehret 2017, 2018) (version 1.0)

DOI: 10.5281/zenodo.3727536

Repository containing R scripts and test resources for implementing the compression technique, a Kolmogorov-based metric of language complexity, described in Ehret (2017) and Ehret (2018).  

    Databases

The up to date largest database on morphosyntactic features of English varieties world wide. Edited by Bernd Kortmann, Kerstin Lunkenheimer, and Katharina Ehret (2020)

Interactive interface to the Freiburg Corpus of English Dialects (FRED) which allows easy access to the corpus texts and many audio recordings. Data manager: Katharina Ehret, Martin Helfer, and Wael Sidawi