Mark Huckvale is Emeritus Professor of Speech Sciences in the Department of Speech, Hearing and Phonetic Sciences at University College London. In a long career in Speech and Hearing Sciences, he has published over 100 research articles in areas involving speech recognition, speech synthesis, voice conversion, speech intelligibility and computational paralinguistics. He is best known for work in accent recognition, use of avatars to provide mental health therapy, hearing for speech, and voice analysis for the measurement of speaker state. He is currently CEO of Avatar Therapy Ltd which seeks to make available in clinical practice a novel therapy for relief from auditory hallucinations in schizophrenia.
Verena Rieser is a Senior Staff Research Scientist at Google DeepMind, where she founded the VOICES team (Voices-of-all in alignment). Her team is a core contributor to Gemini with a mission to enhance model safety and usability for diverse communities.
Verena has pioneered work in data-driven multimodal Dialogue Systems and Natural Language Generation, encompassing conversational RL agents, faithful data-to-text generation, spoken language understanding, evaluation methodologies, and applications of AI for societal good.
Verena previously directed the NLP lab as a full professor at Heriot-Watt University, Edinburgh, and held a Royal Society Leverhulme Senior Research Fellowship. She earned her PhD from Saarland University.
Nadine Lavan is currently a Senior Lecturer at the Department of Psychology, Queen Mary University of London. Nadine completed her BA in Linguistics, Phonetics and English Studies at the University of Cologne in 2011, want to UCL to complete a MRes in Speech, Language and Cognition in 2011/2012, and eventually a PhD at Royal Holloway between 2013 and 2017. She then worked as a post-doc at Brunel, Royal Holloway, and UCL until joining QMUL in 2020 with a Sir Henry Wellcome Fellowship.
Abstract: Avatar Therapy allows people with psychosis to confront the voices they hear and thereby gain control over them. In Avatar Therapy, the voice hearer creates a computer representation of a persecutory voice in the form of an avatar with an appearance and voice chosen by the person. Avatar Therapy sessions then involve a dialogue between the person and the avatar voiced by a trained therapist. In these sessions the person learns how to stand up to the avatar and by proxy gains control over their voice. The safety, efficacy and cost-effectiveness of the therapy has now been shown in two large clinical trials. I’ve been involved in Avatar Therapy for more than 15 years over a period which has seen a transformation in the technology of speech and video generation, and the use of generative AI within healthcare. In this talk I’ll give an update on the current state of the therapy, the challenges in running clinical trials and dealing with medical device regulation, progress on moving the therapy from research into clinical practice, and the opportunities afforded by foundation models and large corpora to build better custom avatars of the voices heard in psychosis.
Title: Whose Gold? Re-imagining Alignment for Truly Beneficial AI
Abstract: Human feedback is often the "gold standard" for AI alignment, but what if this "gold" reflects diverse, even contradictory human values? This keynote explores the technical and ethical challenges of building beneficial AI when values conflict -- not just between individuals, but also within them. My talk advocates for a dual expansion of the AI alignment framework: moving beyond a single, monolithic viewpoint to a plurality of perspectives, and transcending narrow safety and engagement metrics to promote comprehensive human well-being.
Title: Forming first impressions from voices
Abstract: As soon as we hear a voice, we very quickly form an impression of the person we are hearing: Are they young or old? Are they friendly or grumpy? Do they sound posh? Impressions are often quite detailed and include a whole range of difference characteristics. While some aspects of these first impressions are accurate to some degree (e.g., rough age estimates, gender judgments), other aspects do not seem to have any link to the person's actual characteristics (e.g., personality-related judgments).
In this talk, I will trace how listeners put together these complex impressions from the first few milliseconds of hearing a voice to having formed a full first impressions. I will also discuss factors, such 'personal taste' and shared conceptual knowledge (or stereotypes), that can shape first impressions from voices.