We will develop and transfer technologies for porting SDS across languages using top-down and bottom-up methods for speech resource creation. Additionally, machine-aided translation with crowd-sourced post-editing of the resources (annotated corpora) needed to train language-understanding models and build grammars using bottom-up methods are investigated. One of the key challenges of this work package is the fusion of top-down (ontology-based) and bottom-up (corpus-based) approaches to language porting for both resource-rich and resource-poor domains. These algorithms will be integrated into an interface for porting dialogue system resources across languages.

  • Porting using Multilingual Lexicalized Ontologies for Resource-rich Domains: We focus on cross-language porting of domain ontologies by means of localizing ontology lexica. Two approaches will be investigated: 1) the lexical layer of ontologies will be populated from the available target language in-domain data, and 2) machine translation-based techniques will be used to translate lexical entries of ontologies. The ported ontology lexicon will then be exploited for the generation of speech grammars using the knowledge-based grammar creation approach.
  • Porting using Corpora and Machine Translation for Resource-poor Domains: The goal of this task is to combine corpus-based grammar induction and machine translation for porting grammars across languages. Two approaches will be investigated: 1) machine translation of mildly lexicalized ontologies followed by the corpus-based grammar induction algorithm, and 2) machine-aided translation of the linguistic resources (ontologies, grammars). Crowd-sourced post-editing will be also investigated for the improvement of the resources created above.
  • Fusion of Ontology and Corpus-based Approaches to Language Porting: We will investigate the fusion of knowledge-based and corpus-based approaches above. Performance will be evaluated for various application domains (resource-rich and poor) and languages to select the optimal algorithmic combination for language porting.
  • Interface for Porting Across Languages: We will built an interface for porting dialogue systems across languages. Similar to the grammar domain-porting module the language-porting interface will allow for grammar fragment selection and post-editing. The interface emphasizes iterative grammar development, where an annotator selects/edits grammar fragments and then the automated system suggests new fragments based on the user's input.