Dysarthria detection and assessment (my PhD thesis problem)

Summary of my thesis


Title: Extraction of information from speech for detection and assessment of dysarthria


Dysarthria is a speech disorder caused by the effect of neurological damage on the motor control activity of speech production. It affects speech quality and intelligibility. Analysis of speech not only helps in discriminating dysarthric speech from normal speech but also helps in assessing its severity level. This is very useful in clinical studies of dysarthric patients, instead of depending on only qualitative and subjective assessments.


There are many speech analysis methods used for dysarthric speech detection and assessment. But most of them adopt standard signal processing methods which were used for the representation of speech for various other speech-based applications. This thesis makes an attempt to analyse speech, keeping in mind the changes in the vocal-tract dynamics arid excitation or dysarthric patients. We developed new signal processing methods to highlight specific features of production, to examine if they reflect the features of dysarthria better than the features obtained from standard signal processing methods. The studies are made using standard UA-dysarthric speech database.


In the first study, the effect on tongue tip movement of a dysarthric patient is examined by considering changes in the duration of rhotic approximant from those for normal speaker. The duration change was manually measured using the lowering of the third formant (F3) of the rhotic approximant in different phonetic contexts. The trajectory of the F3 contour is obtained from the spectrograms obtained using quasi-closed-phase analysis. The results showed that the duration of rhotic approximant is a good indicator of the severity level of dysarthria. The longer the duration, the higher is the severity level. The dependency of the duration feature on gender, speaker, and phonetic context suggests the need for more careful analysis of speech. Some new methods are proposed in this thesis for more robust detection and assessment of dysarthria speech.


Dysarthric speech (due to low severity) may differ from normal speech only in small changes in the vocal-tract dynamics. To capture these changes, instantaneous spectrum analysis is proposed based on single frequency filtering (SFF) analysis. A new feature, called perceptually enhanced single frequency cepstral coefficients, is introduced. These coefficients are used along with standard classifiers for detection as well as classification into different intelligibility groups for assessing the severity. The studies showed that the proposed features performed better than standard features, in terms of accuracy in detection and in intelligibility assessment.


Significance of analytic phase is examined for dysarthric speech detection and intelligibility assessment, by proposing SFF-based instantaneous frequency cepstral coefficients. Apart from performing better than the standard features the score level fusion of the proposed features with the magnitude spectral features improved the accuracy further, indicating the complementary information present in the analytic phase feature.


In the final study, the excitation features are explored. The features are extracted using epoch-based speech processing. A new method for epoch extraction, namely, zero-phase zero frequency filtering, is proposed. A good analysis of the effect of different combinations of the excitation features is made. The study demonstrated the importance of the excitation features for dysarthria detection and intelligibility assessment, even though the performance is not better than the features based on the vocal-tract system.