2021 - 2025
The University of Groningen
Human conversation is more than words. Tone of voice, facial expression, and context often reveal what we really mean. For example, “Oh, that’s just great” can sound like genuine praise when said with a smile, or clearly sarcastic when spoken with a flat tone after missing the last train. Humans use these cues naturally, but machines usually miss them.
My research asks: how can AI understand non-literal human language, full of emotion, figuratively, and embedded in rich context?
To explore this, I focus on sarcasm as a test case. Sarcasm is tricky because its meaning depends on the interplay between different channels of communication: text, speech, and visual expression. My approach combines these cues in new ways, not just by stacking data together, but by looking at how they complement or contradict each other. This insight helps design AI models that are not only data-driven, but also grounded in linguistic and cognitive science.
The ultimate goal is to build human-centered AI that can handle the subtleties of everyday conversation. Such systems could make human–machine interaction feel more natural and human-centered, and they can also support people who find it difficult to interpret social cues, such as individuals with autism or dementia.
While this research focuses on sarcasm as a test case, the broader vision extends well beyond sarcasm. The same multimodal and context-sensitive approaches can inform:
Broader pragmatics in conversation → modeling humor, irony, politeness, intention, and indirect requests.
Cross-lingual communication → exploring how pragmatic cues differ across cultures and languages, and building tools that adapt to these variations.
Inclusive AI systems → supporting individuals with communication challenges (e.g., autism, dementia) by designing assistive systems that help interpret intent and emotion.
More human-centered HMI → moving toward machines that can engage in interaction that feels natural, empathetic, and socially aware.
This research has been featured in international and regional media:
Appearances by the supervisory team on BBC Radio 4 and CBC Radio 1.
Additional articles appeared in EW Magazine, RTV Noord, Leeuwarder Courant, Fox Business, and El Correo
2024
COST Action – Short Term Scientific Mission,
Full funding awarded for research visit at The Hong Kong Polytechnic University
Sarcasm is more than clever wordplay, it is carried in how something is said. A phrase like “Oh my god, what amazing weather!” can be either sincere or sarcastic, depending on tone, pitch, and timing. Linguists have shown that sarcasm is often signaled by changes in tone: English speakers may use a slower tempo and lower pitch, while Cantonese speakers raise their pitch. These studies show how prosody shapes interpretation. Yet in Mandarin, this relationship is far less understood. Because Mandarin is a tonal language, where pitch is used to distinguish word meaning, the same vocal cues may simultaneously signal both lexical content and speaker intent. This makes Mandarin a fascinating and challenging case for sarcasm detection.
This project asks: What speech features, particularly prosodic changes, are used to convey sarcasm in Mandarin conversations? Can these features be incorporated into data-efficient, interpretable AI models for sarcasm detection?
To answer these questions, we begin by curating a new audiovisual dataset of Mandarin conversations, drawn from naturalistic sources such as talk shows that capture spontaneous, humorous speech. From this source, we conduct acoustic analyses to identify prosodic markers, shifts in pitch, changes in timing, or variations in speech rate, that distinguish sarcastic from sincere expressions. The goal is not simply to feed these signals into a black-box model, but to design architectures that explicitly incorporate linguistic insights.
Understanding sarcasm in Mandarin has broader impact. Linguistically, It pushes sarcasm research beyond English and indo-european languages, offering new insights into how prosody functions across linguistic systems. Technologically, it contributes to the development of systems that can operate across cultures and languages. By showing how subtle vocal shifts shape meaning in Mandarin, this project lays the groundwork for more natural and inclusive human–machine interaction.
Cross-Linguistic Comparisons: Comparing sarcasm cues across tonal and non-tonal languages to reveal universal vs. culture-specific patterns.
Multimodal Expansion: Integrating facial expressions and gesture data alongside audio for richer sarcasm detection.
Interpretability & Data Efficiency: Advancing methods that don’t just classify sarcasm, but explain why an utterance is sarcastic, which is crucial for trustworthy, transparent AI.