Research

My research focuses primarily on writing systems and written language. At the broadest level, I'm interested in why writing systems are the way they are (and aren't the way they aren't), particularly with respect to the following question:

This research builds on the growing literature in linguistics and cognitive science which considers how communicative efficiency may shape language, which thus far has predominantly concerned itself with spoken language. My research aims to broaden this scope to include written language as well, and in doing so determine the extent to which efficiency pressures are modality-general or modality specific.

My work is concerned with writing systems/written languages generally, but has had slightly greater focus on writing systems that make greater use of logography, meaning that a word's spelling/reading is based on morphological/lexical identity and/or semantic content instead of (or in addition to) phonological form. Complementing this is my interest in writing system typology, including how conventional writing system typological category membership can be measured in a quantifiable way.


From CDLI (the Cuneiform Digital Library Initiative). CDLI #p102525

Ambiguity and Efficiency in Sumerian Cuneiform

Written Sumerian made extensive use of polyvalence: any given cuneiform character could potentially map to multiple different phonetic, morphological, and/or semantic values. The system was clearly functional, but it's unclear how efficient it was. Using data from ORACC, this project aims to address the questions of how much context is needed to maximally disambiguate the reading of a character, as well as  whether this system was well-suited for writing Sumerian. Parts of this project were presented as a talk at the 41st Annual Meeting of the Cognitive Science Society (link to proceedings paper).

Writing system taxonomy

Writing systems have traditionally been categorized as being phonographic or logographic. This project joins a recent push in the literature to determine how membership in these categories may be quantified, proposing a mutual information-based metric that incorporates phonology, morphology, and semantics.

Parts of this work were presented at the first workshop on Computation and Written Language (CAWL), hosted at ACL 2023. (Link to proceedings paper)



ごろごろ = /gorogoro/

ゴロゴロ = /gorogoro/

ごろごろ = ゴロゴロ?

Orthographic representation of Japanese mimetic vocabulary

One distinction of written Japanese is that it canonically uses different scripts to mark different etymological lexical strata. An outlier in the orthography is mimetic vocabulary, which can be written with either of the two phonographic kana systems. This ongoing project has identified a range of structural and sociolinguistic factors which influence script choice when writing mimetic words, using data drawn from BCCWJ.