Conventional method to measure LD without brain sensing techniques
In language class, by asking a learner to transcribe a given speech and comparing the transcript and the content of the given speech, LD is often measured as word-based correct transcription rate. If a rater transcribes a learner's speech, the rater's LD is often interpreted as the intelligibility of the learner's speech. However, this approach has clear drawbacks.
Writing needs a longer time than the given speech, and the transcriber may rephrase or rewrite what s/he actually heard.
A given speech has to be transcribed with one-time presentation. Short speech is only available due to memory capacity.
A transcriber may have to recall the orthography of words heard in the given audio for correct transcription. This is totally irrelevant to listening disfluency.
Our proposal to measure LD
The conventional method uses a hand for writing, which inevitably causes the above three drawbacks. In our project, a mouth for speaking is used instead, where an input speech is repeated orally while listening. This speech task is often called as shadowing. After the listener shadows the input speech, s/he shadows it again with the content (text or script) of the speech visually presented to the listener. First shadowing (S1) often includes shadowing disfluency, which is generally caused by listening disfluency (LD), and in the following script-shadowing (SS), LD never happens because every word in the input speech is visually presented.
If S1 and SS are compared in sequence by dynamic time warping (DTW), it gives sequential data of shadowing disfluency, which is interpreted reasonably as LD. It should be noted that the above three drawbacks never happen in our proposed method, even when the speech presented for shadowing is so long as 1 minute.
Shadowing is voluntary and vocalized mirroring, and mirroring is involuntary and silent shadowing.
How valid is the shadowing method compared to the transcription method from a viewpoint of brain sciences? Brain scientists explain that, when a human listens to an oral message, his/her brain inevitably replicates the articulatory movements to "mirror" the oral message silently. As far as we know, however, human brains do not execute the finger movements to transcribe the oral message when listening. Shadowing is considered to be valid enough to expose what listeners' brains are doing internally so that listening behaviors can be measured acoustically.
For details, please read the papers listed in References.
References
Send emails to shadowing [ATMARK] gavo.t.u-tokyo.ac.jp
Minematsu-Saito lab. of Graduate School of Engineering, UTokyo, Japan