Datasets

I'm working on adding more, please stay tuned ...

The Grafmiller, Szmrecsanyi & Hinrichs (in press) supplementary materials
download the zip archive here

The Szmrecsanyi, Biber, Egbert & Franco (2016) supplementary materials
download the zip archive here

The CBDM dataset
download the following datasets on which my (2013) monograph is based:
  • the 34 x 57 frequency matrix as a csv table (see Szmrecsanyi 2013: chapter 2.2.2): the structure of the table is fairly self-explanatory -- counties in rows, features in columns; observations are log-transformed normalized feature frequencies per 10,000 words.
  • the 34 x 34 distance matrix (Euclidean; see Szmrecsanyi 2013: chapter 2.2.3) in L04 format.

The Wolk, Bresnan, Rosenbach & Szmrecsanyi (2013) dataset
download a zip archive containing csv tables & documentation here

The Hinrichs & Szmrecsanyi (2007) dataset
download the csv table here

The Handbook of Varieties of English morphosyntax survey (Kortmann & Szmrecsanyi 2004, Szmrecsanyi & Kortmann 2009, etc.)
download the csv table here