Publications

Journal articles

S. Feng, B. M. Halpern, O. Kudina, O. Scharenborg (2023). Towards inclusive automatic speech recognition. To appear in Computer Speech & Language.
Bence Mark Halpern, Siyuan Feng, Rob van Son, Michel van den Brekel, Odette Scharenborg (2022). Automatic evaluation of spontaneous oral cancer speechwith crowdsourcing data. In Speech Communication.
Bence Mark Halpern*, Siyuan Feng*, Rob van Son, Michel van den Brekel, Odette Scharenborg (2022). Low-Resource Automatic Speech Recognition and Error Analyses of Oral Cancer Speech. In Speech Communication. (*Equal contribution)
P. Żelasko, S. Feng, L. Moro-Velázquez, A. Abavisani, S. Bhati, O. Scharenborg, M. Hasegawa-Johnson, N. Dehak (2021). Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition. In Computer Speech & Language.
Siyuan Feng, Odette Scharenborg (2021). The effectiveness of unsupervised subword modeling with autoregressive and cross-lingual phone-aware networks. In IEEE Open Journal of Signal Processing, doi: 10.1109/OJSP.2021.3076914.
Siyuan Feng, Tan Lee (2019). Exploiting Cross-Lingual Speaker and Phonetic Diversity for Unsupervised Subword Modeling. In IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 12, pp. 2000-2011, Dec. 2019, doi: 10.1109/TASLP.2019.2937953.

Conference proceedings

S. Feng, M. Tu, R. Xia, C. Huang, Y. Wang (2023). Language-universal phonetic encoder for low-resource speech recognition. In INTERSPEECH 2023.
S. Feng, M. Tu, R. Xia, C. Huang, Y. Wang (2023). Language-universal phonetic representation in multilingual speech pretraining for low-resource speech recognition. In INTERSPEECH 2023.
Liming Wang, Siyuan Feng, Mark A. Hasegawa-Johnson, Chang D. Yoo (2022). Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition. In Proc. ACL 2022.
L. Prananta, B. M. Halpern, S. Feng, O. Scharenborg (2022). The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition. In INTERSPEECH 2022.
Siyuan Feng, Odette Scharenborg (2021). Effectiveness of self-supervised representation learning in zero-resource subword modeling. In IEEE Asilomar Conference on Signals, Systems, and Computers (ACSSC). [Slides][Poster][Video (15 min)]
Siyuan Feng, Piotr Żelasko, Laureano Moro-Velázquez, Odette Scharenborg (2021). Unsupervised acoustic unit discovery via learning a subword discriminative feature representation. In INTERSPEECH 2021. [Poster][Video (3 min)][Slides (2-page)]
Siyuan Feng*, Piotr Żelasko*, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak (2021). How Phonotactics Affect Multilingual and Zero-shot ASR Performance. In ICASSP 2021. [Preprint][Slides][Poster]
Xinsheng Wang, Siyuan Feng, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg (2021). Show and Speak: Directly Synthesize Spoken Description of Images. In ICASSP 2021. [Preprint]
Siyuan Feng, Odette Scharenborg (2020). Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling. In INTERSPEECH 2020. [Video (90 sec)][Video (15 min)][Slides]
Zhiyuan Peng, Siyuan Feng, Tan Lee (2020). Mixture factorized auto-encoder for unsupervised hierarchical deep factorization of speech signal. in ICASSP 2020. [Preprint]
Siyuan Feng, Tan Lee (2019). Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation. In Proc. INTERSPEECH 2019. [Slides]
Siyuan Feng, Tan Lee, Zhiyuan Peng (2019). Combining Adversarial Training and Disentangled Speech Representation for Robust Zero-Resource Subword Modeling. In Proc. INTERSPEECH 2019. [Slides]
Zhiyuan Peng*, Siyuan Feng*, Tan Lee (2019). Adversarial Multi-Task Deep Features and Unsupervised Back-End Adaptation for Language Recognition. In Proc. ICASSP 2019. (*Equal contribution)
Siyuan Feng, Tan Lee (2018). Exploiting speaker and phonetic diversity of mismatched language resources for unsupervised subword modeling. In Proc. INTERSPEECH 2018. [Slides]
Siyuan Feng, Tan Lee (2018). Improving cross-lingual knowledge transferability using multilingual TDNN-BLSTM with language-dependent pre-final layer. In Proc. INTERSPEECH 2018. [Poster]
Ying Qin, Tan Lee, Siyuan Feng, Anthony Pak Hin Kong (2018). Automatic speech assessment for people with aphasia using TDNN-BLSTM with multi-task learning. In Proc. INTERSPEECH 2018.
Man-Ling Sung, Siyuan Feng, Tan Lee (2018). Unsupervised pattern discovery from thematic speech archives based on multilingual bottleneck features. In Proc. APSIPA-ASC 2018. [Preprint]
Yuanyuan Liu, Ying Qin, Siyuan Feng, Tan Lee, P. C. Ching (2018). Disordered speech assessment using Kullback-Leibler divergence features with multi-task acoustic modeling. In Proc. ISCSLP 2018.
Siyuan Feng, Tan Lee (2017). On the linguistic relevance of speech units learned by unsupervised acoustic modeling. In Proc. INTERSPEECH 2017. [Slides]
Siyuan Feng, Tan Lee, Haipeng Wang (2016). Exploiting language-mismatched phoneme recognizers for unsupervised acoustic modeling. In Proc. ISCSLP 2016.

Thesis

Siyuan Feng, (2020). Exploiting Cross-Lingual Knowledge in Unsupervised Acoustic Modeling for Low-Resource Languages. Ph.D. Thesis. The Chinese University of Hong Kong, Electronic Engineering, May 2020.

Preprints and technical reports

Q. Dong, Z. Huang, Q. Tian, C. Xu, T. Ko, Y. Zhao, S. Feng et al., PolyVoice: Language Models for Speech to Speech Translation. Under review
M. W. Y. Lam, Q. Tian, T. Li, Z. Yin, S. Feng, M. Tu, Y. Ji, R. Xia, M. Ma, X. Song, J. Chen, Y. Wang, Y. Wang, Efficient Neural Music Generation.
Siyuan Feng, Olya Kudina, Bence Mark Halpern, Odette Scharenborg. Quantifying bias in automatic speech recognition.
Si-Ioi Ng, Wei Liu, Zhiyuan Peng, Siyuan Feng, Hing-Pang Huang, Odette Scharenborg and Tan Lee. The CUHK-TUDelft System for The SLT 2021 Children Speech Recognition Challenge. Technical report, submitted to SLT 2021 Children Speech Recognition Challenge.

Google Sites

Report abuse