Date: Saturday, August 16th, 2025
Time: 9am to 5pm (welcome coffee from 8:30 am)
Venue: Aula Conference Center
Delft University of Technology
The Netherlands
Find the venue on Google maps.
Schedule:
08:30-09:00 welcome coffee
09:00-09:30 opening: welcome, introductions
09:30-09:45 lightning poster overviews I (chair: Johannah O'Mahony)
09:45-10:45 morning poster session
Poster 1: Jiashu Dong "Singing Voice Synthesis in your language: cross lingual transfer with limited data using diffusing models"
Poster 2: Anna Taylor "Modeling Emphasis Area Prediction for Text-to-Speech Synthesis"
Poster 3: Aruna Srivastava "Transcribing In Context"
Poster 4: Cathy Zhang "Computer Vision-based Assessment of Limb Motor Function in ALS using Remote Monitoring of Activities of Daily Living via Multimodal Dialog System"
Poster 5: Chin-Jou Li "Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages"
Poster 6: Emma Sharratt "Automatic Assessment and Feature-based Characterisation of Oral Narratives of Afrikaans and isiXhosa Children"
Poster 7: Hang Chen "Layer-wise Cross-Lingual Depression Detection from Speech: A HuBERT-Based Study on English and Mandarin"
10:45-11:15 coffee break
11:15-12:15 doctoral student panel (chair: Spyretta Leivaditi)
Ariadna Sanchez (University of Edinburgh, UK)
Yuanyuan Zhang (TU Delft, The Netherlands)
Jhansi Mallela (IIIT Hyderabad, India)
Pooneh Mousavi (Concordia University, Canada)
Tina Raissi (RWTH Aachen, Germany)
12:15-13:15 lunch break
13:15-14:15 mentoring (chair: Iona Gessinger)
Table 1: Bornali Phukon (University of Illinois Urbana-Champaign, USA)
Group 1: Emma Sharratt, Maria Paula Cardeliquio Orfanelli, Anna Taylor
Group 2: Hiya Chaudhari, Aruna Srivastava, Jiashu Dong
Table 2: Jessica Fernando (LXT, Canada)
Group 1: Hiya Chaudhari, Aruna Srivastava, Jiashu Dong
Group 2: Emma Sharratt, Maria Paula Cardeliquio Orfanelli, Anna Taylor
Table 3: Éva Székely (KTH Royal Institute of Technology, Sweden)
Group 1: Hang Chen, Piroska Zsófia Barta, Renyi Yang, Larissa Kleppel
Group 2: Cathy Zhang, Chin-Jou Li, Signe Gram Sand, Anjana Rajasekha
Table 4: Tanvina Patel (Erasmus MC | TU Delft | DataQueue, The Netherlands)
Group 1: Cathy Zhang, Chin-Jou Li, Signe Gram Sand, Anjana Rajasekha
Group 2: Hang Chen, Piroska Zsófia Barta, Renyi Yang, Larissa Kleppel
14:15-14:30 coffee break
14:30-14:45 lightning poster overviews II (chair: Yuanyuan Zhang)
14:45-15:45 afternoon poster session
Poster 8: Hiya Chaudhari "Echoes from the Womb: Modeling Prenatal Musical Memory and Predicting Postnatal Vocal Complexity"
Poster 9: Anjana Rajasekhar "Assessing the Effectiveness of Obfuscation Techniques for Speech Privacy Preservation and Their Impact on Utility"
Poster 10: Larissa Kleppel "Efficient One-Pass Decoding for Current ASR Architectures and their Combination"
Poster 11: Maria Paula Cardeliquio Orfanelli "Prosodic Rhythm Development in Brazilian Portuguese: A Dynamic Systems Approach to Children's Speech from Ages 6 to 17"
Poster 12: Piroska Zsófia Barta "Language Model-Based Correction for Dysarthric Speech Recognition"
Poster 13: Renyi Yang "AdvSpeech: Adversarial Attack Against Zero-Shot Voice Cloning"
Poster 14: Signe Gram Sand "Uncovering Emotional Dynamics in Schizophrenia through Interpretable Multimodal Analysis"
15:45-16:45 senior panel (chair: Iona Gessinger)
Catherine Lai (University of Edinburgh, UK)
Jingyao Wu (MIT, USA)
Bornali Phukon (University of Illinois Urbana-Champaign, USA)
Jessica Fernando (LXT, Canada)
Helena Moniz (INESC-ID / University of Lisbon, Portugal)
16:45-17:00 closing: best poster, final comments, group photo
Biographies of panelists and mentors
Ariadna Sanchez is a doctoral candidate from the CDT in Natural Language Processing at the University of Edinburgh, working in speech synthesis technologies for speakers with dysarthria. Currently, she is also part of Young IT Girls, a Catalan non-profit that encourages young girls to follow careers in STEM. Before, she has worked at Amazon in the Text-to-Speech team for Alexa and Polly. She has also completed an MSc in Speech and Language Processing at the University of Edinburgh, and a BSc in Audiovisual Systems Engineering at the Universitat Politecnica de Barcelona. In her free time, she enjoys practicing karate, reading books, attempting pottery and bookbinding, or spending time with her cat.
Dr. Bornali Phukon is a researcher in Natural Language Processing (NLP) and Speech Processing, focusing on low-resource languages, ASR evaluation, and disordered speech recognition. She earned her PhD in NLP and worked as a postdoctoral researcher at the University of Illinois Urbana-Champaign, where she worked on improving ASR for dysarthric speech as part of the Speech Accessibility Project. Her research explores LLM-based ASR error correction, intelligibility metrics, and evaluation methods that better align with human perception. She is currently affiliated as a Research Scientist at the University of Illinois Urbana-Champaign.
Dr. Catherine Lai is a Reader (Associate Professor) in Speech and Language Technology, based in Linguistics and English Language and the Centre for Speech Technology Research at the University of Edinburgh. Her main interest is speech prosody, e.g., intonation and rhythmic properties of speech, how it contributes to spoken dialogue understanding, drawing on both speech technology/machine learning and linguistic/social science perspectives.
Dr. Éva Székely is an Assistant Professor at KTH Royal Institute of Technology in Stockholm. Her primary research interest lies in modelling spontaneous speech phenomena in conversational TTS. She is PI of three research projects, two of which introduce a novel research methodology that uses spontaneous speech synthesis to study speech perception, and aim to uncover biases in how listeners perceive and evaluate speakers based on their voice and speaking style. Her latest project aims to develop self-supervised approaches for modeling conversational dynamics. Eva holds a Master’s degree in Speech and Language Technology from the University of Utrecht. She completed her PhD at University College Dublin, on the topic of expressive speech synthesis in human interaction.
Dr. Helena Moniz is the President of the European Association for Machine Translation (2021-). She is a Visiting Research Fellow at the UNESCO Chair in Translating Cultures (2025-). Helena is an Assistant Professor at the School of Arts and Humanities at the University of Lisbon, where she teaches Computational Linguistics, Computer Assisted Translation, and Machine Translation Systems and Post-editing. She is the Chair of the Ethics Committee of the Center for Responsible AI (2023-), and the coordinator of the project Bridge AI. Since 2025, She is an integrated researcher at the Center of Linguistics of the University of Lisbon and a collaborator of INESC-ID. Former President of the International Association for Machine Translation (2023-2025).
Jessica Fernando has worked in language technology and AI data for 10 years, following a BA (Hons) in phonetics with a focus on sociophonetics, phonation, and prosody. Most of her career was spent as a linguistic specialist in areas like pronunciation dictionary development, language resources, and voice coaching. Now in Business Development at LXT, she helps clients design data strategies to help launch their AI models. While she supports projects across all AI/ML areas, her core interest is still in speech, particularly ASR and TTS.
Jhansi Mallela is a Ph.D. student in the Department of Electronics and Communication Engineering (ECE) at IIIT Hyderabad, affiliated with the Language Technology Research Center (LTRC). She joined the Ph.D. program directly after completing her B.Tech, demonstrating a strong early interest in research. She was awarded the prestigious Kohli Research Fellowship for the year 2021-22 in recognition of her prior research work. Her research focuses on speech processing, with a particular emphasis on prosody modeling, especially stress and intonation. She works on expressive speech synthesis, automatic syllable stress detection, and machine learning–based pathological speech assessment, particularly under noisy or real-world conditions. She has co-authored multiple peer-reviewed papers on syllable stress detection, sequence-dependent neural architectures, and prosody embedding improvements for non-native speech.
Dr. Jingyao Wu is a Postdoctoral Associate at the MIT Media Lab and a recipient of the MIT–Novo Nordisk Artificial Intelligence Postdoctoral Fellowship (2025-2027). She received her B.E. (Hons) in Telecommunications Engineering and her Ph.D. in Speech Signal Processing from the University of New South Wales, Sydney, Australia, in 2020 and 2024, respectively. She is the lead author of the Best Paper Award at ACII 2023, and her work was recognized as a Top 3% Paper at ICASSP 2023. She is awarded The Rising Stars Women in Engineering - Asian Dean’s Forum 2023. She will serve on the Publications Committee for INTERSPEECH 2026. Her research interests include affective computing, AI in mental healthcare, speech processing, and deep learning.
Pooneh Mousavi is a Ph.D. candidate and affiliated researcher at Concordia University and Mila, working under the supervision of Mirco Ravanelli and Cem Subakan. Her research focuses on Conversational AI and representation learning for speech and audio, with an emphasis on bridging the gap between audio and large language models. She holds a Master’s degree in Computer Science from the University of Texas at Dallas (UTD). Pooneh is a core contributor to SpeechBrain, a widely used open-source toolkit for conversational AI. She also leads a weekly Conversational AI Reading Group at Mila, which features leading researchers and scientists in the field.
Dr. Tanvina Patel is a Senior Researcher at Erasmus Medical Center (EMC), Rotterdam in collaboration with TU Delft where she works on children's speech with cleft lip and palate. She also works as a Machine Learning Engineer at DataQueue, Netherlands on ASR systems. In 2017, she earned her Ph.D. from DA-IICT, Gandhinagar on spoofed speech detection. During this time, she was also associated with two DeitY, GoI-sponsored ASR/TTS projects. From 2017–2021, she was a Data Scientist at Cogknit Semantics, Bangalore, where she worked on ASR and various speech-technology applications. In 2024, she completed her postdoctoral research at TU Delft on inclusive ASR.
Tina Raissi is a final-year PhD candidate at RWTH Aachen University, Germany, specializing in acoustic modelling for automatic speech recognition. She earned her master in computer science at RWTH Aachen and her bachelor in computer engineering at the University of Florence, Italy. Her research involves principled comparisons between classic and end-to-end approaches, focusing on simplifying classic methods by incorporating features proven effective in end-to-end systems. Before studying computer science, she was a professional classical pianist, holding a master in piano performance and a bachelor in arts and theatre in Italy, where she taught and performed.
Yuanyuan Zhang is a Ph.D. student at the Delft Inclusive Speech Communication (DISC) Lab / Multimedia Computing Group at Delft University of Technology. Her research focuses on inclusive speech technology, particularly automatic speech recognition for atypical speech such as dysarthric and child speech, as well as mitigating bias against non-native accents. She has experience creating a dysarthric speech dataset, developing data augmentation methods, and quantifying bias in ASR systems. Before her Ph.D., Yuanyuan worked as a speech algorithm engineer at Li Auto in China, where she contributed to multiple projects on child speech recognition and code-switch speech recognition.