Work Plan‎ > ‎

Multimodal Analytics for Interactive Voice Platforms


Analyze call logs and transcripts of spoken dialogue systems that have been deployed by the SME partners. The aim is to use machine learning to detect places where the call-flow and prompts/grammars should be improved, based on a combination of different features including acoustic features indicating the affective state of the user.

Description of Work

  • Task 2.1 Affective Analysis of Dialogues [TSI-TUC, INESC ID, VoiceWeb] In this task we will identify useful features for continuous-time affective tagging of SDS user utterance. This will include features such as valence (positive/negative), arousal (high/low), and certain/uncertain. Affective analysis of SDS transcripts (from text) and pragmatic modelling will also be used here to improve performance. In this task, we will also label the age/gender of the user automatically.
  • Task 2.2 Call-flow, discourse and cross-modal analytics [KTH, TSI-TUC, VoiceWeb] This task will be responsible for manually tagging the users’ intended speech acts (on the transcribed user utterances), and to identify problematic and successful parts of the dialogues. To do this, we will investigate the use of crowd-sourcing services such as Amazon's Mechanical Turk. Speaker turns will be diarized and talk over analysis will be performed. We will explore different machine learning algorithms to automatically identify patterns that distinguish unsuccessful from successful interactions and identify problematic parts of the interaction (dialogue hot-spot analysis). The machine learning algorithms will use the acoustic features identified in Task 2.1, but also other acoustic features (speaking rate, prosody, voice quality), KPI-related metrics found in the call logs (number of turns, total time, number of info slots, number of trials to fill a slot, ASR errors, ASR confidence, task success, time, certain words) and transcribed text.
  • Task 2.3 Multilingual analytics [INESC ID, TSI-TUC, KTH] We will investigate how the above techniques (affective analysis, call-flow analysis, hot-spot detection) can be applied across multiple languages (i.e., their universality) and adapt them for the set of languages that will be available in the SpeDial platform.


  • D2.1 Interim Report on IVR Analytics (M12)
  • D2.2 IVR Analytics Demonstrator (M12)
  • D2.3 Final Report on IVR Analytics (M24)