Datasets

I'm working on adding more, please stay tuned ...
  • The Szmrecsanyi, Grafmiller, Bresnan, Rosenbach, Tagliamonte & Todd (2017) supplementary materials:
    download the datasets here
  • The Grafmiller, Szmrecsanyi & Hinrichs (in press) supplementary materials:
    download the zip archive here
  • The Szmrecsanyi, Biber, Egbert & Franco (2016) supplementary materials:
    download the zip archive here
  • The CBDM dataset -- download the following datasets on which my (2013) monograph is based:
    • the 34 x 57 frequency matrix as a csv table (see Szmrecsanyi 2013: chapter 2.2.2): the structure of the table is fairly self-explanatory -- counties in rows, features in columns; observations are log-transformed normalized feature frequencies per 10,000 words.
    • the 34 x 34 distance matrix (Euclidean; see Szmrecsanyi 2013: chapter 2.2.3) in L04 format.
  • The Wolk, Bresnan, Rosenbach & Szmrecsanyi (2013) dataset:
     download a zip archive containing csv tables & documentation here
  • The Hinrichs & Szmrecsanyi (2007) dataset:
    download the csv table here
  • The Handbook of Varieties of English morphosyntax survey (Kortmann & Szmrecsanyi 2004, Szmrecsanyi & Kortmann 2009, etc.):
    download the csv table here