The following speakers will give a presentation at the third SpeechTechday, Feb 2, 2026.
Will be further updated
David van Leeuwen (Radboud University, Spraaklab)
Title: Sound or silence?
Abstract: In this talk I will present my views and experiences in speech technology in relation to this Speech Tech Day's theme. I will do this along the lines of several counterparts: science vs technology, academia vs industry, difficult vs solved problems, speech vs non-speech, speaker vs speech recognition, engineered vs learned features, LLM vs human, and past vs future knowledge.
Aki Kunikoshi (ReadSpeaker)
Title: A Comparative Look at Industry and Academic Research
Abstract: Research conducted in companies and research carried out in universities may look similar at first glance, yet they differ in fundamental ways. Their goals, the processes through which ideas are developed, and the users they ultimately serve are shaped by distinct missions. Industrial research is driven by real world applications, product timelines, and the needs of customers and markets. Academic research, by contrast, is guided by curiosity, theory building, and the pursuit of new knowledge, often without immediate commercial constraints.
Drawing on my experience working at ReadSpeaker while remaining actively involved in the academic community, I will explore these contrasts through concrete examples. Having moved between both environments, I have seen how the same technical idea can evolve differently depending on its purpose.
In this talk, I will highlight what each environment values, how decisions are made, and how researchers navigate expectations, collaboration, and impact. By comparing these two worlds, I hope to offer insights that help participants understand where their interests and strengths may align, and how industry and academia can learn from one another to foster meaningful innovation.
Tessel Wisman (Juvoly)
Title: Project FRIS: Creating a Multi-Speaker, Multi-Dialect Frisian Speech Dataset for Robust ASR
Abstract: Open-source speech data for minority languages remains scarce, limiting the inclusion of dialectal and minority language speakers in speech recognition solutions. We present Project FRIS, a multi-speaker, multi-dialect speech corpus for the Frisian language. The corpus comprises approximately 30 hours of spontaneous speech collected through simulated doctor–patient conversations, and is designed to contribute towards the development of robust Frisian automatic speech recognition (ASR) systems for real-world use with a particular focus on the healthcare domain. We describe the practical and technical processes involved in creating this resource and present a comparative evaluation of ASR models trained on the corpus. Curated by Juvoly in collaboration with the Province of Fryslân, Project FRIS aims to encourage collaboration between commercial partners, academia, and public authorities, contributing to more inclusive and representative ASR technologies.
Martijn Bauer (Autoscriber)
Title: Automated medical notes with Autoscriber: from idea to validated tool
Abstract: Autoscriber was born as an idea in the Leiden University Medical Hospital with the aim to provide automated summarization of medical consultations, so that the healthcare provider does not need to type anymore and can direct all their attention to the patient. After an initial phase in the hospital IT department, the idea was transferred to a newly founded company, where it was transformed into a commercially available tool. The use of the tool and the quality of its output have now been validated in two clinical studies, both showing that typing is indeed reduced and the quality of the note may even improve. However, typing is not reduced to zero – yet. How we aim to achieve a minimum of typing will also be addressed in this talk.
Roeland Ordelman & Xiyuan Gao
Title: The HOSAN project, past, present, future
Abstract:
Sil Aarts (U Maastricht)
Title: From Data Abundance to Care Insights: An AI-Based Approach for Quality of Care in Long-Term Care
Abstract: This presentation presents our approach which focuses on transforming spoken narratives from residents, their family members and care professionals into structured textual data. By converting speech into text, qualitative experiences of care become accessible for systematic analysis. AI methods are then used to identify themes, emotions, and patterns, supporting the assessment of experienced quality of care while preserving the richness of personal narratives. The ultimate goal is to enhance reflection, learning, and data-informed quality improvement in long-term care settings.
Lottie Stipdonk (Erasmus MC)
Title: Development of a child-friendly, clinical tool to objectively measure speech motor control in children who stutter
Abstract: Stuttering is a complex neurodevelopmental condition for which the biological cause remains unknown. While most children recover within 2–3 years after onset, approximately 25% develop persistent stuttering into adulthood, and early prediction is still not possible. This project therefore focuses on developing an objective, non invasive assessment tool targeting a proposed core underlying skill in stuttering: speech motor skill (SMS). SMS refers to the ability to coordinate the jaw, tongue, lips, and laryngeal system to produce stable speech movements. Reduced SMS leads to increased variability and instability in speech, making breakdowns (stutters) more likely.
Because current clinical assessment relies largely on perceptual judgments with limited reliability, we aim to develop an instrument based alternative. In the ongoing OSMOS study, participating children (4–10 years) produce 5–8 repetitions of 30 nonwords varying in phonetic complexity. Acoustic analyses quantify within item variability across repetitions as an index of speech motor control. We examine group differences between children who stutter, children with a history of stuttering, and typically developing peers, as well as variability within the stuttering population. Complementary video based lip movement analyses capture potential compensation strategies (e.g. larger articulatory movements) and their impact on SMS.
Following validation, this child friendly tool will provide clinicians with objective markers to support prognosis and treatment decisions, representing a substantial advance over current subjective clinical measures.
Thomas Wilschut (RUG)
Title: Speech technology in the classroom: using prosodic speech analysis to optimize inclusive adaptive fact learning systems
Abstract: Memorizing declarative knowledge—such as vocabulary items or toponymy—is a central component of formal education. Adaptive fact learning systems optimize this process by tailoring practice sessions to the needs of individual learners. Typically, these systems present retrieval questions, record learner responses, and use performance data to estimate memory strength for real-time personalization. Most current implementations, however, rely on typed or mouse-based input.
Our research demonstrates that adaptive learning systems can be made more inclusive by supporting spoken responses. This is particularly beneficial for learners with dyslexia or spelling difficulties, for whom text-based input can form a barrier that increases error rate and processing time.
Beyond accessibility, speech input offers an additional advantage: the spoken response contains rich acoustic information that can be leveraged to improve the underlying learning models. Adaptive systems rely on accurate inference of latent memory parameters, for which prosodic speech features provide valuable cues. Specifically, our analyses show that different prosodic dimensions reflect distinct cognitive and metacognitive states. For example, response intensity relates to the strength of memory encoding, while features such as fundamental frequency (F0) and response duration provide information about learner confidence.
These findings suggest that speech technology can play a valuable role in educational technology: increasing inclusivity while simultaneously enhancing the accuracy of adaptive learning models. This approach opens new opportunities for integrating speech technology and cognitive modeling in the design of inclusive and intelligent educational systems.