OBJECTIVES
  1. Specify the user requirements and overall architecture of the linguistic resource modules in the SDS development platform.
  2. Integrate the ontology evolution and grammar induction modules into the platform.
  3. Prototype, clean-up and package the linguistic resources for the specified domains and languages.
  4. Evaluate the platform, data and services.
Development of speech services in various domains/languages will be crucial for the evaluation of the resulting
platform and its ability to meet application development requirements.

DESCRIPTION OF WORK

  • Platform, Data and Evaluation Metrics Specification: First the user requirements for speech service creation using the platform and associated linguistic data will be collected for two scenarios: porting to a new domain and porting (an existing application) to a new language. These requirements will be used to drive the design of the platform and data architecture, as well as, the design of the interfaces for enriching linguistic resources. Domains, languages, speech services and data formats that will be supported in PortDial will be fully defined in this task. Also the evaluation metrics for determining the quality and effort associated with the creation of ontologies and inducing grammars for SDS will be fully defined in this task.
  • Integration of the Ontology Evolution Module: The best performing ontology evolution algorithms will be selected and integrated into the speech services prototyping platform. An interface for post-editing of ontologies will be designed specifically for SDS systems (a Protégé plug-in) and integrated in the platform. Edited ontologies will be transformed and represented internally in a flat database schema to ensure scalability and easy integration with VXML.
  • Integration of the Grammar Induction Modules: We will select the best of breed grammar induction algorithms for integration into the platform for the domain porting and language porting scenarios. The grammar enrichment interfaces  will be also integrated into the platform, following the architecture specified above. Practical grammar design issues will be investigated here related to handling ill-formed spontaneous speech, e.g., false-starts and hesitations. Design choices will be made for resource-rich and resource-poor domains.
  • Data Cleanup and Prototype Services Creation: Using the platform we will generate and post-edit (clean-up) linguistic resources (domain ontologies, grammars) for the specified domains and languages. We will also fully prototype (at least four) speech services covering both resource-rich and resource-poor domains, as well as, multiple languages. These services will be used for evaluation purposes.
  • Evaluation of Data, Platform and Services: We will evaluate the algorithms for ontology enrichment and grammar induction, as well as, the evaluation of the data, platforms and services created with these algorithms. The evaluation will focus on ease of development for the platform (for both experienced professionals and new entrants); precision/recall for data; concept accuracy, recognition accuracy, domain coverage and cross-domain compatibility for services.