UK EPSRC Grant
Runtime: 01.06.2017 – 31.05.2020
An international collaboration among linguists and speech experts to study child language development across nations and cultures to gain a better understanding of how an infant’s environment affects subsequent language ability.
FP7 ERC Starting Grant (StG)
Runtime: 01.01.2014 – 31.12.2018
Recently, automatic speech and speaker recognition has matured to the degree that it entered the daily lives of thousands of Europe’s citizens, e.g., on their smart phones or in call services. During the next years, speech processing technology will move to a new level of social awareness to make interaction more intuitive, speech retrieval more efficient, and lend additional competence to computer-mediated communication and speech-analysis services in the commercial, health, security, and further sectors. To reach this goal, rich speaker traits and states such as age, height, personality and physical and mental state as carried by the tone of the voice and the spoken words must be reliably identified by machines. In the iHEARu project, ground-breaking methodology including novel techniques for multi-task and semi-supervised learning will deliver for the first time intelligent holistic and evolving analysis in real-life condition of universal speaker characteristics which have been considered only in isolation so far. Today’s sparseness of annotated realistic speech data will be overcome by large-scale speech and meta-data mining from public sources such as social media, crowd-sourcing for labelling and quality control, and shared semi-automatic annotation. All stages from pre-processing and feature extraction, to the statistical modelling will evolve in “life-long learning” according to new data, by utilising feedback, deep, and evolutionary learning methods. Human-in-the-loop system validation and novel perception studies will analyse the self-organising systems and the relation of automatic signal processing to human interpretation in a previously unseen variety of speaker classification tasks. The project’s work plan gives the unique opportunity to transfer current world-leading expertise in this field into a new de-facto standard of speaker characterisation methods and open-source tools ready for tomorrow’s challenge of socially aware speech analysis.
DeepGLASS -- Deep Learning Speech Enhancement
Industry Cooperation with HUAWEI TECHNOLOGIES
Runtime: 12.11.2016 – 11.11.2018
The research target of this project is to develop state-of-the-art methods for speech enhancement based on deep learning. The aim is to overcome limitations in challenging scenarios that are posed by non-stationary noise and distant speech with a potentially moving device and potentially limited power and memory on the device. It will be studied how deep learning speech enhancement can successfully be applied to multi-channel input signals. Furthermore, an important aspect is robustness and adaptation to unseen conditions, such as different noise types.
EU Horizon 2020 Research & Innovation Action (RIA)
Runtime: 01.01.2015 – 31.12.2017
The ARIA-VALUSPA project will create a ground-breaking new framework that will allow easy creation of Artificial Retrieval of Information Assistants (ARIAs) that are capable of holding multi-modal social interactions in challenging and unexpected situations. The system can generate search queries and return the information requested by interacting with humans through virtual characters. These virtual humans will be able to sustain an interaction with a user for some time, and react appropriately to the user’s verbal and non-verbal behaviour when presenting the requested information and refining search results. Using audio and video signals as input, both verbal and non-verbal components of human communication are captured. Together with a rich and realistic emotive personality model, a sophisticated dialogue management system decides how to respond to a user’s input, be it a spoken sentence, a head nod, or a smile. The ARIA uses special speech synthesisers to create emotionally coloured speech and a fully expressive 3D face to create the chosen response. Back-channelling, indicating that the ARIA understood what the user meant, or returning a smile are but a few of the many ways in which it can employ emotionally coloured social signals to improve communication. As part of the project, the consortium will develop two specific implementations of ARIAs for two different industrial applications. A ‘speaking book’ application will create an ARIA with a rich personality capturing the essence of a novel, whom users can ask novel-related questions. An ‘artificial travel agent’ web-based ARIA will be developed to help users find their perfect holiday – something that is difficult to do with existing web interfaces such as those created by booking.com or tripadvisor.