Prosody modeling of spontaneous Mandarin speech and its application to automatic speech recognition

Author(s): Cheng-Hsien Lin, Meng-Chian Wu, Chung-Long You, Chen-Yu Chiang, Yih-Ru Wang and Sin-Horng Chen


A prosody-assisted ASR approach for spontaneous Mandarin speech is proposed. It employs the joint prosody labeling and modeling algorithm proposed previously to construct a hierarchical prosodic model (HPM) and uses it in two-stage speech recognition. A word lattice is first generated by the HMM method using tri-phone AM and bigram LM. Then, the lattice is extended by replacing LM to a trigram model. A rescoring process is applied in the second stage to sequentially add factor POS and PM LMs, and the HPM. The method is evaluated on the MCDC database comprising 8 dialogues of 16 speakers with length of 9.09 hours. Error rates of syllable/character/word were reduced from 35.6/40.2/45.1% by the baseline trigram HMM method to 32.4/36.5/41.8% by the proposed method. The improvement is reasonably good as considering the WER upper-bound of 13.4% for the word lattice owing to the high OOV rate of the database. By error analysis, we find that many tone recognition errors and word segmentation errors were corrected. Besides, some information of the testing utterance was also obtained by the ASR, including POS of word, PM, tone of syllable, break type of syllable juncture, prosodic state of syllable.