Research & funding

My research interests all broadly fall within the remit of variationist linguistics and variation studies, including their interfaces with typology, geolinguistics, and psycholinguistics. I view linguistic variation as a window into the hidden structure of human language and the nature of linguistic knowledge, and I am ultimately interested in what fuels linguistic variation in synchrony and diachrony.

Research interests

  • variation studies (synchronic & diachronic)
  • probabilistic grammar
  • language complexity
  • geolinguistics, dialectology, and dialect typology
  • varieties of English world-wide
  • methods: probabilistic modeling, aggregate analysis techniques, corpus-based dialectometry

Current funded projects

  • Exploring probabilistic grammar(s) in varieties of English around the world
    PI -- funded by a Type II Odysseus grant awarded by the Research Foundation Flanders (FWO) (grant # G.0C59.13N, budget: €856,260)
The project is situated at the crossroads of research on English as a World Language, usage-based theoretical linguistics, variationist linguistics, and cognitive sociolinguistics. It specifically marries the spirit of the Probabilistic Grammar framework (which posits that grammatical knowledge is experience-based and partially probabilistic) to research along the lines of the "English World-Wide" paradigm (which is concerned with the dialectology and sociolinguistics of post-colonial English-speaking communities around the world). The overarching objective is to understand the lectal plasticity of probabilistic knowledge of English grammar, on the part of language users with diverse regional and cultural backgrounds.
  • Nephological Semantics: using token clouds for meaning detection in variationist linguistics
    Co-PI with Dirk Geeraerts, Stefania Marzo & Dirk Speelman
    Funded by a C1 grant awarded by the KU Leuven Research Council (grant # 3H150305, budget: €1,271,200)
The increasing importance of corpus data in linguistics creates a need for appropriate methods for retrieving semantic information from corpora. In the project proposed here, existing computational methods of distributional corpus semantics are further developed in the form of a meaning detection approach based on token clouds, i.e. clusters of distributionally similar attestations of words or expressions in a multidimensional vector space. The first phase of the project has a methodological orientation, focusing on the finetuning of such a 'nephological' method for detecting linguistic meanings in corpus data. In the second phase of the project, the method is put to use in two descriptive research lines: lectometrical research into the relationship between language varieties, and variationist grammar research.
project website

  • North and South, bottom to top: using big data to model syntactic variation in Belgian and Netherlandic Dutch
    Co-PI with Dirk Speelman, Stefan Grondelaers, and Antal van den Bosch
  • Funded by a "Letteren, Nijmegen en Leuven" (LN&L) grant (budget: approx. €100,000)
While Belgians and Dutchmen are well aware that they use different words, and that their pronunciation diverges, they are mostly oblivious to the fact that there are also grammatical discrepancies between Belgian and Netherlandic Dutch. Few Belgians, for instance, will realize that the preposition voor in Jan maakte (voor) haar een boterham is optional for them, whereas it is indispensable for almost all the Dutch. How come there are such outspoken syntactic differences between two varieties (in a comparatively small language area) which did not begin to diverge before the 16th century? And where do these differences come from? In order to answer these questions, we draw on large subtitle and newspaper corpora, and marshal machine translation, machine learning, and automated semantic classification technologies to access the syntactic motor, or motors, of Dutch.