Recently, conversational systems have seen a significant rise in popularity due to commercial applications such as Amazon's Alexa, Apple's Siri, Microsoft's Cortana, and Google Assistant. However, there is untapped potential in the study of multimodal chatbots, which let users and dialogue agents converse using both natural language and visual information.
Due to this increased demand, multimodal agents are becoming ubiquitous as various companies push for this technology. This increased use poses many challenges in achieving more natural, human-like, and engaging interactions. Several of these research avenues are currently very active in the community: How to combine visual and text data? How to interpret the user intent? How to engage in a conversation about multimodal content?
The potential target audience is research students or practitioners who wish to broaden their understanding of conversational agents that can engage in conversations about multimedia content.
7th November 2022, 2:00 PM IST: at the 24th ACM International Conference on Multimodal Interaction (ICMI)
We will start the tutorial with an introduction to the concept of Conversational Task Assistants, agents that can assist users in completing tasks.
The second part of the tutorial will focus on the introduction of multimodality to conversational systems, and we will address some of the challenges of assistant embodiment and user understanding.
In the third part, we will discuss other components needed to support multimodal conversations, including a dialogue policy, search/recommendation components, and response generation methods.
In the final part of the tutorial, we will present case studies of the presented methods, in particular, Nova Wiki Wizard, which is a conversational search platform, iFetch an online-fashion shopping assistant, and the Alexa Prize Taskbot Challenge award-winning TWIZ bot.
Part 1: Introduction (30 mins)
What is a Conversational Agent?
Task Assistants
Open-task Assistants
Dialogue Systems Concepts
Part 2: Multimodal Conversational Agents (50 mins)
Virtual Assistant Embodiment and Personality
Multimodal Conversations
DST and Dialog Managers
Coffee Break (15 mins)
Part 3: Conversational Agent Components (50 mins)
Dialog Policy
Answering User Needs (Search and Recommendation)
Response Computation
Part 4: Case studies (30 mins)
Case Study - iFetch: Online Fashion Shopping Assistant
Case Study - TWIZ: The Multimodal Task-Assistant
Associate Professor at the Department of Computer Science, Universidade Nova de Lisboa (FCT NOVA).
He holds a Ph.D. degree (2004-2008) in Computer Science from Imperial College London, UK.
He is regularly involved in international program committees and research projects.
His research interests cover the different problems of Vision and Language Mining and Search, such as: multimedia retrieval, social media information analysis, and machine learning.
Researcher at NOVA LINCS currently pursuing a Ph.D. degree in the area of multimodal conversational systems.
He holds an M.Sc. Degree (2015-2020) in Computer Science from NOVA University.
He has experience in conversational search and task-guiding agents and was the team leader of the award-winning Alexa’s TWIZ TaskBot.
His interests include the development of conversational agents, NLP, and multimodal AI.
You can find the resources for this tutorial here.