Baseline Performance
AUC = 0.82 (PVH vs controls)
AUC = 0.78 (NPVH vs controls)
Can your model beat this?
AUC = 0.82 (PVH vs controls)
AUC = 0.78 (NPVH vs controls)
Can your model beat this?
Models trained on the VHM ambulatory dataset have already shown strong potential for detecting vocal hyperfunction. Past studies demonstrate that accelerometer-based features collected in real-world conditions can reliably distinguish both phonotraumatic (PVH) and nonphonotraumatic (NPVH) voice disorders from matched healthy controls.
These benchmark results provide a useful reference — the challenge invites teams to improve upon them with advanced yet interpretable models
PVH Detection
Logistic regression and support vector machine (SVM) using airflow- and spectral-based features:
AUC 0.82 (Cortés et al., 2018; Van Stan et al., 2020)
Nested cross-validation logistic regression using the difference between first and second harmonic magnitudes (H1-H2 kurtosis & skewness)
AUC 0.81 (Ghasemzadeh et al., 2024)
NPVH Detection
Quadratic Discriminant Analysis using cepstral peak prominence (CPP) and H1-H2 (mode)
AUC 0.78 (Van Stan et al., 2021)
Early models applied linear classifiers and demonstrated feasibility (AUC 0.70–0.74) using limited features and small datasets
Introduction of airflow-estimation features using IBIF markedly improved discrimination between PVH and controls
Parameters such as spectral tilt, H1–H2 variability, and loudness distribution consistently show strong effect sizes in ambulatory monitoring
Models built on daily distributional statistics perform well, but frame-level or deep learning approaches may uncover additional patterns
For NPVH detection, CPP (voice quality) and reduced variability in harmonics are particularly effective