Prediction of learners' listening disfluency (LD)
Language learners can enhance their skills of identifying words in a given speech by continuing shadowing practices, and their practices are stored as sequential data of listening disfluency. With the data and machine learning, we developed a model of predicting which words in a new speech are difficult for individual learners to identify. Pedagogically speaking, it is good to prepare adequately difficult speech samples, not too easy and not too difficult, for individual learners to enhance their listening skills in an efficient way. Using our model, it will be possible to prepare valid speech samples for efficient enhancement of the individual learners' skills.
Prediction of raters' listening disfluency (LD)
We developed middle-sized corpora from raters, who shadowed hundreds of speech samples from language learners. The raters also script-shadowed the learners' speech samples. Here, we viewed a pair of a learner's reading-aloud (R) and a rater's shadowing (S) as source and target of voice conversion (VC). Also we viewed another pair of a learner's reading-aloud (R) and a rater's script-shadowing (SS) and another pair of a rater's script-shadowing (SS) and his/her shadowing (S) as such. The R-S conversion functions as virtual shadower, and the R-SS conversion works as accent reducer, and finally the SS-S converter adds inarticulate or dysfluent speech productions to SS to simulate S. VC generally means a process of converting a given speech into another speech, where the content of spoken message is retained but some non-linguistic or para-linguistic information is changed, such as speaker identity. The above conversions of R-S and R-SS are more complicated as it has to change speaker identity and speaking style at the same time. For details, please read the papers that will be published in the near future.
References
Send emails to shadowing [ATMARK] gavo.t.u-tokyo.ac.jp
Minematsu-Saito lab. of Graduate School of Engineering, UTokyo, Japan