Research & funding

My research interests all broadly fall within the remit of variationist linguistics and variation studies, including their interfaces with typology, geolinguistics, and psycholinguistics. I view linguistic variation as a window into the hidden structure of human language and the nature of linguistic knowledge, and I am ultimately interested in what fuels linguistic variation in synchrony and diachrony.

Research interests

  • variation studies (synchronic & diachronic)
  • probabilistic grammar
  • language complexity
  • geolinguistics, dialectology, and dialect typology
  • varieties of English world-wide
  • methods: probabilistic modeling, aggregate analysis techniques, corpus-based dialectometry

Current funded projects

  • Exploring probabilistic grammar(s) in varieties of English around the world
    Applicant and PI
    Funded by a Type II Odysseus grant awarded by the Research Foundation Flanders (FWO) (grant # G0C5913N, budget: €856,260)
The project is situated at the crossroads of research on English as a World Language, usage-based theoretical linguistics, variationist linguistics, and cognitive sociolinguistics. It specifically marries the spirit of the Probabilistic Grammar framework (which posits that grammatical knowledge is experience-based and partially probabilistic) to research along the lines of the "English World-Wide" paradigm (which is concerned with the dialectology and sociolinguistics of post-colonial English-speaking communities around the world). The overarching objective is to understand the lectal plasticity of probabilistic knowledge of English grammar, on the part of language users with diverse regional and cultural backgrounds.
  • The register-specificity of probabilistic grammatical knowledge in English and Dutch
    Applicant and PI, with Jason Grafmiller and Freek Van de Velde (Co-PIs)
    Funded by the Research Foundation Flanders (FWO) (grant # G0D4618N, budget: €229,000)
Probabilistic grammars regulate the way in which we choose between different ways of saying the same thing. For example, in English people can say either Tom sent Mary a letter, or Tom sent a letter to Mary. Both syntactic variants have roughly the same meaning, and we know that variant choice is a function of precisely quantifiable effects of probabilistic factors such as the length of the theme, or the pronominality of the recipient. The question the project is asking is if language users have different probabilistic grammars for different types of speech situations – in other words, do our linguistic choice making processes differ depending on whether we engage in e.g. informal conversation or write blog entries? The project will tackle this question empirically by investigating the register-specificity of grammatical variation in English and Dutch. The contrastive variation analysis will rely on both corpus evidence (i.e. observation) and rating task experiments. 

  • Nephological Semantics: using token clouds for meaning detection in variationist linguistics
    Co-PI with Dirk Geeraerts, Stefania Marzo & Dirk Speelman
    Funded by a C1 grant awarded by the KU Leuven Research Council (grant # 3H150305, budget: €1,271,200)
The increasing importance of corpus data in linguistics creates a need for appropriate methods for retrieving semantic information from corpora. In the project proposed here, existing computational methods of distributional corpus semantics are further developed in the form of a meaning detection approach based on token clouds, i.e. clusters of distributionally similar attestations of words or expressions in a multidimensional vector space. The first phase of the project has a methodological orientation, focusing on the finetuning of such a 'nephological' method for detecting linguistic meanings in corpus data. In the second phase of the project, the method is put to use in two descriptive research lines: lectometrical research into the relationship between language varieties, and variationist grammar research.
project website

  • North and South, bottom to top: using big data to model syntactic variation in Belgian and Netherlandic Dutch
    Co-PI with Dirk Speelman, Stefan Grondelaers, and Antal van den Bosch
  • Funded by a "Letteren, Nijmegen en Leuven" (LN&L) grant (budget: approx. €100,000)
While Belgians and Dutchmen are well aware that they use different words, and that their pronunciation diverges, they are mostly oblivious to the fact that there are also grammatical discrepancies between Belgian and Netherlandic Dutch. Few Belgians, for instance, will realize that the preposition voor in Jan maakte (voor) haar een boterham is optional for them, whereas it is indispensable for almost all the Dutch. How come there are such outspoken syntactic differences between two varieties (in a comparatively small language area) which did not begin to diverge before the 16th century? And where do these differences come from? In order to answer these questions, we draw on large subtitle and newspaper corpora, and marshal machine translation, machine learning, and automated semantic classification technologies to access the syntactic motor, or motors, of Dutch.