Spoken Language Understanding
Speech is the most natural, powerful and universal media for human-machine/computer communication. Our focus is to develop all necessary modules for Spoken Dialog System including robust speech, speaker and language recognition and natural speech synthesis.
To approach the goal of establishing an ＥEnd-to-End speech synthesis system, we propose to use character-level recurrent neural networks (RNNs) to directly convert input character sequences into latent linguistic feature vectors.
Secondary language learning is more and more important today (global village). Therefore, we would like to build a computer-assisted language learning system that could simultaneously detect pronunciation errors, speech prosody deviations and dialogue act mistakes.
Multimedia Signal Processing
We are participating in The Nature Conservancy Fisheries Monitoring competition. Ref: Deep Learning Goes To The Deep Seas And The Billion-Dollar Tuna Industry.
Sound event detection is essential for advanced smart-home applications. We would like to build a system that integrates several Kinect One sensors for elderly care, baby monitor and especially, home security.
Microphone array is the key to the success of mobile phone, Smart-TV and Smart-Home applications. Especially, a good microphone array should not only remove background noise and also allow a speaker to freely move to any position.
Our deep LSTM-based ASR achieved overall performance Rank2, English performance Rank1 in the extended submission of OC16 Chinese-English MixASR Challenge
Our gated DNN system forthe NIST 2015 language recognition i-vector machine learning challenge. It was designed to solve the language clustering and out-of-set detection issues simultaneously. It achieves a relative performance gain of up to 51%, compared to the baseline cosine distance scoring (CDS) system provided by organizer.
Example of FA-DNN outputs for speaker recognition: (a) original speaker i-vectors, (b) purified speaker i-vectors.
Although, deep neural networks (DNNs) are very powerful but it still could be easily affected by noises. We have developed a new factor analysis DNN (FA-DNN) structure and training algorithm that can successfully separate wanted signals and noises.
Sound is the most natural, powerful and universal media for wireless communication. Therefore, we would like to build a Acoustic Communication/Networking system to directly transmit messages through the air.
Indoor navigation and its application is a hot research topic recently. We want to combine Air-Beacon and internet information retrieval to built a in-door navigation system for Location-Based Service (LBS).
Sound is an universal media for broadcasting or exchanging messages between different platforms, for example, between iOS/Android smart TVs, tablets and mobile phones.