ALS VS Healthy Classification Metrices for Spontaneous (SPON) and Diadochokinetic rate (DIDK) tasks
DNN models with varying numbers of dense layers (DNN-2L, DNN-1L, DNN-0L), trained on healthy speech for nasal vs non nasal phoneme classification are used to classify ALS vs HC speech considering ALS as the nasal class and Healthy controls (HC) as the non nasal class. Training and validation utilize speech samples from the TIMIT [1] and ITIMIT [2] datasets. Testing is done on 57 ALS and 55 HC subjects. The precision, recall and F1 scores are reported.
APFL (All Phoneme Frame Level): Training is done using all phoneme frames in the dataset.
APPL (All Phoneme Phoneme Level): Training uses the mean representations of all frames of each phoneme.
VPFL (Voiced Phoneme Frame Level): Training only uses voiced frames of the voiced phonemes in the dataset.
VPPL (Voiced Phoneme Phoneme Level): Training uses only voiced frames using the mean representation of all frames for each voiced phoneme.
Speech: The input for the test set consists of all speech frames.
Voiced: The input for the test set is consists of only the voiced frames
For phoneme level training chunk level testing is employed with chunk size 60ms corresponding to the average duration of phoneme during testing
[1] . S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, “DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1,” NASA STI/Recon technical report n, vol. 93, p. 27403, 1993.
[2] C. Yarra, R. Aggarwal, A. Rajpal, and P. K. Ghosh, “Indic TIMIT and Indic English lexicon: A speech database of Indian speakers using TIMIT stimuli and a lexicon from their mispronunciations,” in 22nd Conference of the Oriental International Committee for the Coordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA). IEEE, 2019, pp. 1–6.