Resources

Data repositories

EWW_speakerNumbers

Supplementary materials for: How to obtain speaker numbers for English varieties around the world: Theoretical concepts, challenges and estimations

MDA-podcasts

Supplementary materials for: Podcasts as an emerging register of computer-mediated communication (version 1.0)

DOI: https://doi.org/10.5281/zenodo.14868843

MDA workshop

Workshop materials for: How to do Multi-dimensional Analysis: Theoretical background and practical analyses in R

MDA-OnlineRegisters

Supplementary materials for: Characterising online news comments: a multi-dimensional cruise through online registers (version 1.0)

DOI: 10.5281/zenodo.4885180

complexityMeaning: Interpreting and evaluating the meaning of complexity metrics

Replication data and scripts for: Meaning and measures: Interpreting and evaluating the meaning of complexity metrics (version 1.0).

Data, scripts and additional statistics for the interpretation of complexity measures as described in Ehret et al (2021).

DOI: 10.5281/zenodo.7371015

compStrings: Analysing algorithmically compressed strings

Scripts for the retrieval and processing of gzip's debugging output and the basic analysis of compressed strings.

DOI: 10.5281/zenodo.7350707

compinion: Analysing complexity and subjectivity

Replication data and scripts for: Ehret, Katharina, and Maite Taboada (2021). "The interplay of complexity and subjectivity in opinionated discourse."

DOI: 10.5281/zenodo.4106003

Repository containing the original dataset, scripts and extensive statistics for the analysis of text complexity and subjectivity in online news comments, opinion articles and general news articles. Go to project website

- MDA-OnlineComments

Supplementary materials for: Are online news comments like face-to-face conversation? A multi-dimensional analysis of an emerging register (version 1.0).

DOI: 10.5281/zenodo.3556820

Repository comprising the supplementary materials and replication data for Ehret and Taboada (2020), a multi-dimensional analysis of online news comments and other traditional English registers. Go to project website

- IWMLC

Repository of the Interactive Workshop on Measuring Language complexity (IWMLC) which contains the metrics, code and data presented by the workshop participants as part of a shared task.

- measuring-language-complexity

Scripts and sample data for the compression technique (Ehret 2017, 2018) (version 1.0)

DOI: 10.5281/zenodo.3727536

Repository containing R scripts and test resources for implementing the compression technique, a Kolmogorov-based metric of language complexity, described in Ehret (2017) and Ehret (2018).

Databases

A socio-demographic Dataset fOr Varieties of English - DOVE

A unique dataset which comprises extensive socio-demographic information for spontaneous spoken varieties of English around the world. The extra-linguistic triggers (aka socio-demographic information) can be broadly assigned to the categories geography, language contact/isolation, and demography. Compiled and released by Kat Ehret (2025). DOI: 10.5281/zenodo.16794586

- eWAVE 3.0: The electronic World Atlas of Varieties of English

The up to date largest database on morphosyntactic features of English varieties world wide. Edited by Bernd Kortmann, Kerstin Lunkenheimer, and Katharina Ehret (2020)

- Freiburg Corpus of English Dialects Interactive Database

Interactive interface to the Freiburg Corpus of English Dialects (FRED) which allows easy access to the corpus texts and many audio recordings. Data manager: Katharina Ehret, Martin Helfer, and Wael Sidawi

Page updated

Google Sites

Report abuse