Program
1st day (March 4th)
9:00-9:15 Opening
9:15-10:00 Keynote 1 (Giuseppe Riccardi, University of Trento Italy)
1. Conversational AI to Benefit Individuals
Research in human-machine dialogue (aka conversational AI) has been driven by the quest for open-domain, knowledgeable and multimodal agents. In contrast, the complex problem of designing , training and evaluating a conversational system and its components is currently reduced to a) prompting large language models, b) coarse evaluation of machine responses and c) poor management of the affective signals. In this talk we will review the current state-of-the-art in human-machine dialogue research and its limitations. We will present the most challenging frontiers of conversational AI when the objective is to create personal conversational systems that benefit individuals. In this context we will report experiments and RCT trials of so-called personal healthcare agents supporting individuals and healthcare professionals.
10:00-10:30 coffee break
10:30-12:10 Oral Session 1 (Cognitive and Psychological perspective of dialogue systems)
2. Response Generation for Cognitive Behavioral Therapy with Large Language Models: Comparative Study with Socratic Questioning; Kenta Izumi (Nara Institute of Science and Technology)*; Hiroki Tanaka (Nara Institute of Science and Technology); Kazuhiro Shidara ( Nara Institute of Science and Technology); Hiroyoshi Adachi (Osaka University); Daisuke Kanayama (Osaka University); Takashi Kudo (Osaka University); Satoshi Nakamura (Nara Institute of Science and Technology, Japan)
3. Persona-based Dialogue Response Generation Using Personal Facts and Personality Traits; Weiwen SU (University of Tokyo)*; Naoki Yoshinaga (Institute of Industrial Science, The University of Tokyo); Yuma Tsuta (University of Tokyo); Masashi Toyoda (Institute of Industrial Science, The University of Tokyo)
4. Can Large Language Models Be Used to Provide Psychological Counselling? An Analysis of GPT-4-Generated Responses Using Role-Play Dialogues; Michimasa Inaba (The University of Electro-Communications)*; Mariko Ukiyo (iDEAR Human Support Service); Keiko Takamizo (Japanese Organization of Mental Health and Educational Agencies)
5. Is the plan ready yet? – Exploring LLMs when talking about well-being and health; Kristiina Jokinen (AIRC, AIST)*
6. A Unified Approach to Emotion Detection and Task-Oriented Dialogue Modeling; Armand Stricker (LISN, CNRS)*; Patrick Paroubek (LISN, CNRS)
12:10-13:25 Lunch (lunch box will be provided)
13:25-13:30 LT of online posters
13:30-15:00 Poster/Demo/Position Session 1
7. Toward building dialogue system deepening knowledges on past dialogue: a corpus study and analysis; Kanta Watanabe (NAIST)*; Seiya Kawano (RIKEN); Akishige Yuguchi (Tokyo University of Science); Koichiro Yoshino (RIKEN)
8. Dialog Breakdown Recovery Strategies Based on User Personality; Kazuya Tsubokura (Aichi Prefectural University)*; Takuya Takeda (Aichi Prefectural University); Yurie Iribe (Aichi Prefectural University); Norihide Kitaoka (Toyohashi University of Technology)
9. Examining the Impact of a Forgetful Multi-store Memory System in a Cognitive Assistive Robot; Angel F Garcia Contreras (RIKEN)*; Seiya Kawano (RIKEN); Yasutomo Kawanishi (RIKEN); Yutaka Nakamura (RIKEN); Saito Satoru (Kyoto University); Koichiro Yoshino (RIKEN)
10. Mnemosyne: Scaling-up Conversation Data using Conversation Design Graphs for Task-oriented Dialogue Systems Training; Agathe Lherondelle (J.P.Morgan Chase)*; Ruibo Shi (JP Morgan Chase & Co); Denis Kochedykov (JP Morgan Chase & Co)
11. Retrieval-Augmented Language Model for Long-Term Conversation via Weakly-Supervised Learning from Perplexity Improvements; Kosuke Nishida (NTT Human Informatics Laboratories / The University of Tokyo)*; Naoki Yoshinaga (Institute of Industrial Science, The University of Tokyo); Masashi Toyoda (Institute of Industrial Science, The University of Tokyo)
12. Evaluating Dialogue Systems from the System Owners' Perspectives; Mikio Nakano (C4A Research Institute, Inc.)*; Hisahiro Mukai (Nextremer Co., Ltd.); Yoichi Matsuyama (Equmenopolis, Inc.); Kazunori Komatani (Osaka University)
13. RASwDA: Re-Aligned Switchboard Dialog Act Corpus for Dialog Act Prediction in Conversations; Run Chen (Columbia University)*; Eleanor M Lin (Columbia University); Shayan Hooshmand (Columbia University); Mariam Mustafa (Columbia University); Rose Sloan (Bard College); Ritika Nandi (Columbia University); Alicia Yang (Columbia University); Andrea Lopez (Columbia University); Ansh Kothary (Columbia University); Isaac Suh (Columbia University); Catherine Lyu (Columbia University); Eric Chen (Columbia University); Sophia Horng (Columbia University); Julia Hirschberg (Columbia University) (online presentation)
14. Can Noisy Cross-Utterance Contexts Help Speech-Recognition Error Correction?; Seongmin Lee (The University of Tokyo); Kohki Tamura (University of Tokyo); Tomoaki Nakamura (The University of Tokyo); Naoki Yoshinaga (Institute of Industrial Science, The University of Tokyo)*
15. Entrainment Metrics/Strategies Evaluation in Conversation Response Re-ranking; Shota Kanezaki (Doshisha University/RIKEN)*; Seiya Kawano (RIKEN); Akishige Yuguchi (RIKEN/Tokyo University of Science); Marie Katsurai (Doshisha University); Koichiro Yoshino (RIKEN)
15:00-15:30 coffee break
15:30-16:30 Oral Session 2 (Dialogue and LLMs)
16. A Hybrid Rule-based and Generative Language Model for Flexible Instructional Dialogue; Carl Strathearn (Edinburgh Napier University); Yanchao Yu (Edinburgh Napier University)*; Dimitra Gkatzia (Edinburgh Napier University)
17. Are LLMs Robust for Spoken Dialogues?; Seyed Mahed Mousavi (Signals and Interactive Systems Lab, University of Trento, Italy)*; Gabriel Roccabruna (University of Trento); Simone Alghisi (Signals and Interactive Systems Lab, University of Trento, Italy); Massimo Rizzoli (Signals and Interactive Systems Lab, University of Trento, Italy); Mirco Ravanelli (Université de Montréal); Giuseppe Riccardi (University of Trento)
18. Multi-Intent Recognition in Dialogue Understanding: A Comparison Between Smaller Open-Source LLMs; Adnan Ahmad (Technical University of Berlin )*; Philine Thalia Kowol (Technical University of Berlin); Stefan Hillmann (Technische Universität Berlin); Sebastian Möller (TU Berlin) (online presentation)
27. New technologies for spoken dialogue systems: LLMs, RAG and the GenAI Stack (demo paper); Graham Wilcock (CDM Interact and University of Helsinki)*
16:50-18:20 Poster/Demo/Position Session 2
19. Automating User Feedback: Constructing a Chat Dialogue Model Using Reinforcement Learning with Rewards from Videos; Enfu Guo (The University of Electro-Communications)*; Yasuhiro Minami (the University of Electro-Communications)
20. Towards Harnessing Large Language Models for Comprehension of Conversational Grounding; Kristiina Jokinen (AIRC, AIST)*; Phillip Schneider (Technical University of Munich); Taiga Mori (AI Research Center AIST)
21. Real-time and Continuous Turn-taking Prediction Using Voice Activity Projection; Koji Inoue (Kyoto University)*; Bing'er Jiang (KTH); Erik Ekstedt (KTH); Tatsuya Kawahara (Kyoto University); Gabriel Skantze (KTH)
22. Towards Interactive Anomaly Detection using Natural Language; Callum Rothon (University of Hull); Simon Keizer (Toshiba Europe Ltd)*; Rama S Doddipatla (Toshiba Europe LTD); Nina Dethlefs (University of Hull) (online presentation)
23. “I’m not sure I heard you right, but I think I know what you mean” – investigations into the impact of speech recognition errors on response selection for a virtual human.; Vera Harris (University of the Incarnate Word); Robert Braggs (United States Military Academy); David Traum (USC Institute for Creative Technologies)* (online presentation)
24. Why should a dialogue system speak more than
one language?; Jacqueline Brixey (University of Southern California)*; David Traum (USC Institute for Creative Technologies)
25. Commonsense Generation and Evaluation for Dialogue Systems using Large Language Models; Marcos Estecha Garitagoitia (UPM)*; Chen Zhang (National University of Singapore); Mario Rodríguez-Cantelar (Universidad Politécnica de Madrid); Luis Fernando D'Haro (Speech Technology and Machine Learning Group - Universidad Politécnica de Madrid)
26. The Remdis Toolkit: Building Advanced Real-time Multimodal Dialogue Systems with Incremental Processing and Large Language Models; Yuya Chiba (NTT); Koh Mitsuda (rinna Co., Ltd.); Akinobu Lee (Nagoya Institute of Technology); Ryuichiro Higashinaka (Nagoya University)*
2nd day (March 5th)
9:00-9:05 Announcement
9:05-9:50 Keynote 2 (Tatsuya Kawahara, Kyoto University)
28. Semi-autonomous Dialogue for Cybernetic Avatars and Ainu Speech Processing
Spoken dialogue systems (SDS) have made dramatic advances thanks to improved speech technology and large language models. However, they still have limitations in terms of human-level empathy. Purely autonomous systems might be out of control and produce unexpected responses. Meanwhile, avatars have become prevailing in online communications since the pandemic. A hybrid of avatars with AI and robotics, called cybernetic avatars (CA), is expected to complement each other and provide human-level services to many people in parallel and simultaneously. The semi-autonomous dialogue system can be implemented for either a robot or a CG-based avatar. The talk addresses this new project of spoken dialogue systems for cybernetic avatars, which is sponsored by the Moonshot Research and Development Program in Japan.
The talk also introduces the projects on speech processing of the Ainu language, a critically endangered language in Hokkaido. Specifically, automatic speech recognition is developed for annotation of the folklore archive, and speech synthesis is explored for language learning.
9:50-10:10 coffee break
10:10-11:30 Oral Session 3 (Best Paper Session)
29. Going beyond word-similarity in evaluating document-grounded response generation in task-oriented dialogues; Abigail M Sticha (University of Cambridge)*; Norbert Braunschweiler (Toshiba Europe Limited); Rama S Doddipatla (Toshiba Europe LTD)
30. Evaluation of Off-the-shelf Whisper Models for Speech Recognition Across Diverse Dialogue Domains; Kallirroi Georgila (University of Southern California)*; David Traum (USC Institute for Creative Technologies)
31. Acknowledgment of Emotional States: Generating Validating Responses for Empathetic Dialogue; Zi Haur Pang (Kyoto University)*; Yahui Fu (Kyoto University); Divesh Lala (Kyoto University); Keiko Ochi (Kyoto University); Koji Inoue (Kyoto University); Tatsuya Kawahara (Kyoto University)
32. ASMR: Augmenting Life Scenario using Large Generative Models for Robotic Action Reflection; Shang-Chi Tsai (National Taiwan University)*; Seiya Kawano (RIKEN); Angel F Garcia Contreras (RIKEN); Koichiro Yoshino (RIKEN); Yun-Nung Chen (National Taiwan University)
11:30-12:30 Sponsor Lunch (lunch box will be provided)
12:30-19:00 Excursion (Upopoy; National Ainu Museum)
19:00-21:00 Banquet (Hotel)
3rd day (March 6th)
9:00-9:15 Announcement
9:15-10:00 Keynote 3 (Yukiko Nakano, Seikei University)
33. Modeling Multimodal Interactions to Enhance the User Understanding in Dialogue Systems
In Human-Agent interaction using virtual characters and robots, both the user and the conversational agent display nonverbal signals. In order to achieve the system’s accurate understanding of the content and state of such multimodal interactions, it is essential to enhance the agent’s ability to properly interpret nonverbal information such as prosody, facial expressions, and eye gaze, in addition to verbal information. This talk will discuss our research in multimodal interaction, addressing issues for detecting important events and utterances that provides richer representation of dialogue states and estimating user characteristics that would be useful in understanding group dynamics in multiparty communications. This talk will cover methodologies of interaction analysis, multimodal machine learning, multimodal corpus collection, and communicative behavior annotation, and will show how multimodal information contributes to improving the performance of machine learning models.
10:00-10:30 coffee break
10:30-12:10 Oral Session 4 (Dialogue Related Technologies)
34. Toward OOV-word Acquisition during Spoken Dialogue using Syllable-based ASR and Word Segmentation; Ryu Takeda (Osaka University)*; Kazunori Komatani (Osaka University)
35. JudgerToken: A Single-Token Method for Reducing Repetition in Dialogue System; Qiang Xue (Kobe University)*; Tetsuya Takiguchi (Kobe University); Yasuo Ariki (Kobe University)
36. Development and Validation of Engagement and Rapport Scales for Evaluating User Experience in Multimodal Dialogue Systems; Fuma Kurata (Waseda University)*; Mao Saeki (Waseda University); Masaki Eguchi (Waseda University); Shungo Suzuki (Waseda University); Hiroaki Takatsu (Waseda University); Yoichi Matsuyama (Waseda University)
37. Enhancing Personality Recognition in Dialogue by Data Augmentation and Heterogeneous Conversational Graph Networks; Yahui Fu (Kyoto University)*; Haiyue Song (Kyoto University); Tianyu Zhao (rinna Co., Ltd.); Tatsuya Kawahara (Kyoto University)
38. Data Augmentation for Robust Natural Language Generation Based on Phrase Alignment and Sentence Structure; Kenta Yamamoto (Osaka University); Seiya Kawano (RIKEN); Tatsuya Kawahara (Kyoto University); Koichiro Yoshino (RIKEN)*
12:10-13:30 Lunch (lunch box will be provided)
13:30-14:50 Oral Session 5 (Dialogue Evaluation)
39. Evaluation of a semi-autonomous attentive listening system with takeover prompting; Haruki Kawai (Kyoto University); Divesh Lala (Kyoto University)*; Koji Inoue (Kyoto University); Keiko Ochi (Kyoto University); Tatsuya Kawahara (Kyoto University)
40. An Analysis of User Behaviours for Objectively Evaluating Spoken Dialogue Systems; Koji Inoue (Kyoto University)*; Divesh Lala (Kyoto University); Keiko Ochi (Kyoto University); Tatsuya Kawahara (Kyoto University); Gabriel Skantze (KTH)
41. Multifaceted Evaluation of Automatically Generated Dialogue Format Summary; Sanae Yamashita (Nagoya University)*; Ryuichiro Higashinaka (Nagoya University)
42. Dialogue System Live Competition Goes Multimodal: Analyzing the Effects of Multimodal Information in Situated Dialogue Systems; Ryuichiro Higashinaka (Nagoya University)*; Tetsuro Takahashi (Fujitsu LTD.); Michimasa Inaba (The University of Electro-Communications); Zhiyang Qi (The University of Electro-Communications); Yuta Sasaki (Tokyo Institute of Technology); Kotrao Funakoshi (Tokyo Institute of Technology); Shoji Moriya (Tohoku University); Shiki Sato (Tohoku University); Takashi Minato (ATR); Kurima Sakai (ATR); Tomo Funayama (ATR); Masato Komuro (IR-Advanced Linguistic Technologies Inc.); Hiroyuki Nishikawa (Meikai University); Ryosaku Makino (Waseda University); Hirofumi Kikuchi (Waseda University); Mayumi Usami (Tokyo University of Foreign Studies)
14:50-15:20 Coffee Break
15:20-16:20 Panel
Topic: Research directions and sustainability for linguistic diversity in the age of LLMs?
Panelists:
•Yun-Nung (Vivian) Chen (National Taiwan University)
•Luis Fernando D'Haro (Universidad Politécnica de Madrid)
•Jacqueline Brixey (University of Southern California)
16:30-17:00 Closing (Includes introduction of the Best Paper Award)
Instruction to presenters
Oral presentations
Any oral presentations have 15 minutes for presentation and 5 minutes for QAs. Presentations will be distributed to online participants via Zoom. We will record the video to share with participants for a short period of time after the workshop. If you or your organization requires us not to record the video, please get in touch with organizers <iwsds2024@gmail.com>.
If you plan to present online, please share your screen after the chair requires you.
Poster/demo/position presentations
Any poster/demo/position presentation has a poster board suitable for an A0 portrait. If you need a power supply, please get in touch with organizers <iwsds2024@gmail.com>. Other equipment, such as monitors, will not be provided.
If you plan to present online, please send your poster and 3-minute video to organizers by February 23rd. We will play your video at the LT session and print/hang your poster on your poster board.