Dr. Yong XU

Bio:

Now i am a Senior Research Scientist in Tencent America LLC, Seattle, USA . I worked in University of Surrey, UK as a research Fellow. I once also worked in IFLYTEK (科大讯飞) company from April, 2015 to April, 2016. I got my PhD degree from University of Science and Technology of China (USTC) in 2015. My PhD supervisors are Prof. Chin-Hui Lee (Georgia Tech, USA) Prof. Li-Rong Dai (USTC) and Prof. Jun Du (USTC). I once visited Georgia Institute of Technology, USA from Sept., 2014 to May, 2015. I won the 1st prize in DCASE 2017 challenge for "Large-scale weakly supervised sound event detection for smart cars". I have two ESI highly cited IEEE journal papers. I serve as a reviewer for several conferences and journals. I achieved 2018 IEEE SPS Best paper award.

I serve as a reviewer for ICASSP, IJCNN, EUSIPCO, DSP, Audio Engineering Society conference, ICONIP, IEEE /ACM Transactions on Audio, Speech and Language Processing, IEEE signal processing letters, Neurocomputing, IEEE Journal of Selected Topics in Signal Processing, Speech communication, IEEE Access, IEEE Transactions on Emerging Topics in Computational Intelligence, IEEE Transactions on signal processing, IEEE Transactions on multimedia, etc.

My Google Scholar: https://scholar.google.co.uk/citations?user=nCmKPM4AAAAJ&hl=en (1500+ citations)

My Github: github.com/yongxuUSTC (500+ stars)

My ResearchGate: https://www.researchgate.net/profile/Yong_Xu63

Email: yong.xu.ustc@gmail.com

News:

  1. I achieved 2018 IEEE SPS Best paper award for the paper "A Regression Approach to Speech Enhancement Based on Deep Neural Networks", P. 7-19, Vol. 23, No. 1, IEEE/ACM trans. on audio, speech, language processing, 2015
  2. One paper "Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks" got accepted by IJCAI 2019 (accept rate=18%)
  3. Three ICASSP2019 (Brighton, UK) papers get accepted!
  4. We win the 1st place in the DCASE2017 "large-scale weakly supervised sound event detection for smart cars" challenge. The detailed results and rank can be found here: http://www.cs.tut.fi/sgn/arg/dcase2017/challenge/task-large-scale-sound-event-detection-results [media] [slides] [PDF]

Publications:

Google Scholar: https://scholar.google.co.uk/citations?user=nCmKPM4AAAAJ&hl=en Total Citations=1500+, h-index=16, i10-index=20

Patent:

[1] Speech separation method and system, US patent, US 20160189730A1

DU Jun, XU Yong, TU Yanhui, Dai Li-rong, Wang Zhiguo, HU Yu, Liu Qingfeng, June 2016

Journal papers:

[2] Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging

Yong Xu, Qiang Huang, Wenwu Wang, Peter Foster, Siddharth Sigtia, Philip J. B. Jackson, Mark D. Plumbley, accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing, July 2016 [Top 3 Popular Articles now]

[3] A Regression Approach to Speech Enhancement Based on Deep Neural Networks. [2018 IEEE SPS Best paper award] [citations: 320, Top 5 Popular Articles now][ESI highly cited papers]

Yong Xu, Jun Du, Li-Rong Dai and Chin-Hui Lee, P. 7-19, Vol. 23, No. 1, IEEE/ACM trans. on audio, speech, language processing, 2015

[4] Sound Event Detection and Time-Frequency Segmentation from Weakly Labelled Data

Qiuqiang Kong*, Yong Xu* (equal contribution) , Iwona Sobieraj, Wenwu Wang, Mark D. Plumbley, accepted by IEEE/ACM trans. on audio, speech, language processing, 2019

[5] An Experimental Study on Speech Enhancement Based on Deep Neural Networks. [citations: 340] [ESI highly cited papers]

Yong Xu, Jun Du, Li-Rong Dai and Chin-Hui Lee, IEEE signal processing letters, p. 65-68, vol. 21, no. 1, January 2014

[6] Hierarchical deep neural network for multivariate regression

Jun Du and Yong Xu, Pattern Recognition, Volume 63, March 2017, Pages 149–157

[7] Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition

Tian Gao, Jun Du, Yong Xu, Cong Liu, Li-Rong Dai, Chin-Hui Lee, EURASIP Journal on Advances in Signal Processing, 2016

[8] Auxiliary Features from Laser-Doppler Vibrometer Sensor for Deep Neural Network Based Robust Speech Recognition

Lei Sun, Jun Du, Zhipeng Xie, Yong Xu, Journal of Signal Processing Systems, Springer, 2017

Conference papers:

[33] Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks,

Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark Plumbley, Philip Jackson, accepted to IJCAI2019 (accept rate=18%)

[32] Joint training of complex ratio mask based beamformer and acoustic model for noise robust,

YONG XU, CHAO WENG, LIKE HUI, JIANMING LIU, MENG YU, DAN SU, DONG YU, accepted to ICASSP2019

[31] Acoustic scene generation with conditional sampleRNN,

Qiuqiang Kong, Yong Xu, Turab Iqbal, Yin Cao, Wenwu Wang, Mark Plumbley, accepted to ICASSP2019

[30] An attention-based neural network approach for single channel speech enhancement,

Xiang Hao, Changhao Shan, Yong Xu, Sining Sun, Lei Xie, accepted to ICASSP2019

[29] Large-scale weakly supervised audio classification using gated convolutional neural network, [pdf] [Rank 1st system in DCASE2017 challenge]

Yong Xu, Qiuqiang Kong, Wenwu Wang and Mark D. Plumbley, accepted to ICASSP2018

[28] A joint separation-classification model for sound event detection of weakly labelled data

Qiuqiang Kong, Yong Xu (* equal contribution), Wenwu Wang and Mark D. Plumbley, accepted to ICASSP2018

[27] Audio Set classification with attention model: A probabilistic perspective

Qiuqiang Kong, Yong Xu (* equal contribution), Wenwu Wang and Mark D. Plumbley, accepted to ICASSP2018

[26] Iterative deep neural networks for speaker-independent binaural blind speech separation

QINGJU LIU, YONG XU, PHILIP COLEMAN, PHILIP JACKSON, WENWU WANG, accepted to ICASSP2018

[25] Intelligent signal processing mechanisms for nuanced anomaly detection in action audio-visual data streams

Josef Kittler, Ioannis Kaloskampis, Cemre Zor, Yong Xu*, Yulia Hicks and Wenwu Wang, accepted to ICASSP2018

[24] Attention and Localization based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging,

Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang and Mark D. Plumbley, accepted to Interspeech2017

[23] Joint Detection and Classification Convolutional Neural Network (JDC-CNN) on Weakly Labelled Bird Audio Data (BAD)

Qiuqiang Kong, Yong Xu, Mark D. Plumbley, accepted to EUSIPCO2017

[22] Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang and Mark D. Plumbley, IJCNN2017

[21] A joint detection-classification model for audio tagging of weakly labelled data

Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark Plumbley, ICASSP2017

[20] Fast Tagging of Natural Sounds Using Marginal Co-regularization

Qiang Huang, Yong Xu, Philip J. B. Jackson, Wenwu Wang, Mark D. Plumbley, ICASSP2017

[19] Binaural and Log-Power Spectra Features with Deep Neural Networks for Speech-Noise Separation

Alfredo Zermini, Qingju Liu, Yong Xu, Mark D. Plumbley, Dave Betts, Wenwu Wang, MMSP2017

[18] Deep neural network based audio source separation

A. Zermini and Y. Yu and Yong Xu and W. Wang and M. D. Plumbley, 11th IMA International Conference on Mathematics in Signal Processing, 2016

[17] Fully DNN-based Multi-label regression for audio tagging.

Yong Xu, Qiang Huang, Wenwu Wang, Philip J B Jackson, Mark D Plumbley, accepted by DCASE2016 workshop, July 2016

[16] Hierarchical learning for DNN-based acoustic scene classification

Yong Xu, Qiang Huang, Wenwu Wang, Mark D. Plumbley, accepted by DCASE2016 workshop, July 2016

[15] Deep Neural Network for Robust Speech Recognition With Auxiliary Features From Laser-Doppler Vibrometer Sensor. Xie, Zhi-Peng and Du, Jun and McLoughlin, Ian Vince and Xu, Yong and Ma, Feng and Wang, Haikun. ISCSLP2016

[14] Multi-objective learning and Mask-based Post-processing for Deep Neural Network based Speech Enhancement.

Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee, accepted, Interspeech2015, Dresden, Germany

[13] DNN-Based Speech Bandwidth Expansion and Its Application to Adding High Frequency Missing Features for Automatic Speech Recognition of Narrowband Speech.

Kehuang Li, Zhen Huang, Yong Xu and Chin-Hui Lee, accepted, Interspeech2015, Dresden, Germany

[12] Dynamic Noise Aware Training for Speech Enhancement Based on Deep Neural Networks.

Yong Xu, Jun Du, Li-Rong Dai and Chin-Hui Lee, Interspeech2014, Singapore

[11] Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments. (Best paper candidate)

Tian Gao, Jun Du, Yong Xu, Cong Liu, Li-Rong Dai, Chin-Hui Lee, accepted, LVA/ICA 2015, Liberec, Czech Republic

[10] Robust Speech Recognition with Speech Enhanced Deep Neural Networks

Jun Du, Qing Wang, Tian Gao, Yong Xu, Li-Rong Dai and Chin-Hui Lee, Interspeech2014, Singapore

[9] Cross-language Transfer Learning for Deep Neural Network Based Speech Enhancement

Yong Xu, Jun Du, Li-Rong Dai and Chin-Hui Lee, ISCSLP2014, Singapore

[8] Speech Separation Based on Improved Deep Neural Networks with Dual Outputs of Speech Features for both Target and Interfering Speakers, Yanhui Tu, Jun Du, Yong Xu, Lirong Dai and Chin-Hui Lee, ISCSLP2014, Singapore

[7] Speech separation of a target speaker based on deep neural networks.

Jun Du, Yanhui Tu, Yong Xu, Li-Rong Dai and Chin-Hui Lee, P. 532 – 536, ICSP2014, Hangzhou, China

[6] Deep neural network based speech separation for robust speech recognition.

Yanhui Tu, Jun Du, Yong Xu, Lirong Dai and Chin-Hui Lee, P. 532 – 536, Hangzhou, China

[5] Global Variance Equalization for Improving Deep Neural Network Based Speech Enhancement.

Yong Xu, Jun Du, Li-Rong Dai and Chin-Hui Lee, to be appeared at ChinaSIP2014, Xi’an, China

[4] Spoken Term Detection for OOV Terms Based on Phone Fragment.

Yong Xu, Wu Guo, Shan Su and Li-Rong Dai, ICALIP2012, Shanghai, China

[3] Improved Spoken Term Detection by Template-based Confidence Measure.

Shan Su, Wu Guo, Yong Xu and Li-Rong Dai, ICALIP2012, Shanghai, China

[2] A hybrid fragment / syllable-based system for improved OOV term detection.

Yong Xu, Wu Guo and Li-Rong Dai, ISCSLP2012, Hong Kong

[1] Spoken term detection for OOV terms based on tri-phone confusion matrix.

Yong Xu, Wu Guo and Li-Rong Dai, ISCSLP2012, Hong Kong


Research Experience:

University of Surrey, Guildford, UK Full-time Research Fellow Apr. 2016 – present Deep learning (DNN/CNN/LSTM, etc) based environmental sound classification and analysis.

IFLYTEK co., ltd, Hefei, China Full-time Researcher Jun. 2015 – Apr. 2016 Deep learning (DNN/CNN/RNN, etc) based speech recognition, speech enhancement and speech dereverberation. Now focus on the task for distant speech recognition.

Georgia Institute of Technology, USA Visiting Student Sep. 2014– Apr. 2015

Deep neural networks based speech enhancement and used for the automatic speech recognition (ASR), and my advisor is Prof. Chin-Hui Lee.

Bosch - research center, CA, USA Short Internship Oct. 2014– Nov. 2014

Deep neural networks based speech enhancement and used for the automatic speech recognition (ASR)

National Engineering Laboratory of Speech and language information processing, USTC, China Jul. 2012 – Jun. 2015

DNN based speech enhancement, cooperated with Prof. Chin-Hui Lee (Georgia Tech)

IFLYTEK co., ltd, Hefei, China Internship Jul. 2010 – Nov. 2010

I developed a Large Vocabulary Continuous Speech Recognition (LVCSR) system trained on 2300h English speech database, and built a baseline for OOV term detection. MLE, DT, Tandem systems were built.

IFly Speech Lab, USTC, Hefei, China Graduate student Sept. 2010 – Jul. 2012

Working on Spoken Term Detection (STD) for Out-Of-Vocabulary (OOV) words, I use the tri-phone confusion matrix and a hybrid fragment / syllable system to improve the performance of OOV term detection.

IFly Speech Lab, USTC, Hefei, China Undergraduate student Mar. 2010 – Jul. 2010

I did the project of my undergraduate thesis about room acoustic impulse response.