Invited speakers

Prof. Carlos Toshinori Ishi (Guardian Robot Project, RIKEN and ATR )

Title: “Multimodal speech processing for dialogue robots and applications of sound environment intelligence”,

Abstract:

In this talk, I will introduce our research progress on multimodal speech processing for dialogue robots, by focusing on two topics. One is about speech-driven motion generation, including lip motion, head motion, facial expression and upper-body motion (during laughter, surprise utterances and anger expressions), hand gestures and gaze behaviors. The other is about sound environment intelligence technologies based on multiple microphone arrays and human tracking, to understand who is talking, when and where in an environment, and their applications to dialogue robots. During the talk, I will present several demo videos of different behaviors generated in humanoid robots or android robots, showing how human-likeness and personality expressions differ by controlling different modalities.

Dr. Shota Horiguchi (Research & Development Group, Hitachi)

Title: “Speaker Diarization: A Key to Solving Cocktail Party Problem”

Abstract:

Multi-speaker automatic speech recognition (ASR) plays an important role in the realization of human-robot or human-system speech communication. In general, multi-speaker ASR is realized by first separating each speaker's speech from the input mixture and then applying ASR to each separated speech. However, it is known that in noisy situations with highly overlapping speech, a promising approach is to first estimate "who spoke and when" and then perform speech separation based on the estimated results, instead of directly separating each speaker's speech. This step of estimating "who spoke and when" is called speaker diarization. Traditionally, speaker diarization has been performed by cascading multiple modules, but recently, end-to-end approaches that can easily handle overlapping utterances with a simpler model have been actively studied. In this talk, I will present such research trends in speaker diarization, and in particular, I will introduce an ongoing research project in our group, end-to-end neural diarization (EEND).

Page updated

Google Sites

Report abuse