Speakers & ABSTRACTS

Speakers & abstracts

The following speakers will give a presentation at the third SpeechTechday, Feb 10, 2025.

Academic speakers

Cristian Tejedor Garcia (Radboud University, Nijmegen)

Title: The Transformation of Language and Speech Models in Artificial Intelligence Research

Summary: Transformer-based speech and language models are revolutionizing research in Artificial Intelligence (AI). In speech, models such as Wav2Vec 2.0, Whisper, and T5 excel in automatic speech recognition (ASR), text-to-speech (TTS), and speaker identification, while in language, models such as GPT and BERT dominate tasks such as question answering (Q&A), summarization, and sentiment analysis. These models are increasingly being utilized in healthcare and education, specifically for the early diagnosis of neurodegenerative diseases in elderly populations and the detection of reading difficulties in children. By analyzing speech patterns, pauses, and language use, these models can assist in identifying markers for Parkinson’s or Alzheimer’s disease, offering non-invasive, cost-effective, and scalable tools for early intervention.

However, a significant limitation arises regarding data privacy and the potential harms of foundational models, as models are often trained on massive datasets scraped from the internet, which can inadvertently include sensitive or biased information. This poses challenges in ensuring ethical use, avoiding harmful outputs, and maintaining compliance with data protection regulations, especially when deploying these models in sensitive or regulated research areas. This presentation is about their scalability, adaptability, and ability, making them indispensable tools for advancing both fundamental and applied research, but responsible usage is critical to mitigate these risks.

Wietse de Vries (RU Groningen)

Title: Challenges in Speech Technology for Minority Languages in the Netherlands

Short summary: Developing speech technology for minority languages is not just training models on smaller datasets. Minority languages often have more dialectal variation and do not always have a single standardized written form. Therefore, these languages pose more challenges in addition to being low resource. Specifically for Frisian and Low Saxon language varieties, we work on collecting data together with local communities and volunteers.

Moreover we work on novel modeling techniques that work with highly diverse minority languages. Language technology should be used to help preserve local language varieties rather than forcing people to use the standard form of a majority language.

Zhengjun Yue (TU Delft)

Title: Challenges and recommendations for Dutch atypical speech data collection, annotation, sharing, and usage
Short summary: The quality of speech datasets is important for advancing speech technology, particularly for atypical speech, where data scarcity and variability pose significant challenges. This talk discusses the challenges encountered in collecting, annotating, sharing, and using Dutch atypical speech datasets, based on insights from several collaborative research projects TU Delft involves, including stuttered and disordered child speech, personalized dysarthric speech, and mock medical conversations. Common issues such as therapist dominance in recordings, unsuitability of clinical data for speech downstream tasks, and lack of standardized protocols are discussed alongside practical recommendations to address them.

Emphasis is placed on creating high-quality datasets for specific speech-related tasks like speech recognition, speech analysis, and assessment, with a call for collaboration and knowledge-sharing to advance research and clinical applications in Dutch speech technology.

Tom Lentz (Tilburg University)

Title: Analyzing speech models with tools from psycholinguistics and communication science

Short summary: Dutch (as well as English) prosody hardly shows up in text and is largely, but not completely, speaker-specific, making it a good candidate to disappear in speech recognition. It is also unlikely to reappear reappear when text is used to generate speech. However, prosody conveys important non-verbal information that is hard to predict without (higher-order) context. I will discuss two prosodic phenomena: (1) the representation of lexical stress in self-supervised speech models and (2) the quality of intonation in speech synthesis. For (1), I present results showing that Wav2Vec 2.0 contais representations of lexical stress, that become stronger in higher layers and seem to be generalized over phone types, as proof-of-concept that prosodic information may not be fully lost. For (2), I present communication science results on synthesized speech, showing its prosody is not context-aligned, which causes listeners to doubt they are being addressed personally.

Berend Jutte (Attendi)

Use case speech technology in healthcare by Attendi

Jari Hazelebach (Speaksee)

Use case speech technology for the hearing impaired at Speaksee

Heino Schaght (Mediahuis)

Use case speech technology at Mediahuis

Jan-Marc Verlinden

The MediSpeech project

Roeand Ordelman (TU Twente, Beeld en Geluid)

The HOSAN project: Building Speech Models for All Dutch Voices

Page updated

Google Sites

Report abuse