Baseline Performance

AUC = 0.82 (PVH vs controls)
AUC = 0.78 (NPVH vs controls)

Can your model beat this?

Baseline Performance

Models trained on the VHM ambulatory dataset have already shown strong potential for detecting vocal hyperfunction. Past studies demonstrate that accelerometer-based features collected in real-world conditions can reliably distinguish both phonotraumatic (PVH) and nonphonotraumatic (NPVH) voice disorders from matched healthy controls.

These benchmark results provide a useful reference — the challenge invites teams to improve upon them with advanced yet interpretable models

Key Baseline Results

PVH Detection

Logistic regression and support vector machine (SVM) using airflow- and spectral-based features:
- AUC 0.82 (Cortés et al., 2018; Van Stan et al., 2020)
Nested cross-validation logistic regression using the difference between first and second harmonic magnitudes (H1-H2 kurtosis & skewness)
- AUC 0.81 (Ghasemzadeh et al., 2024)

NPVH Detection

Quadratic Discriminant Analysis using cepstral peak prominence (CPP) and H1-H2 (mode)
- AUC 0.78 (Van Stan et al., 2021)

Helpful Insights

- Early models applied linear classifiers and demonstrated feasibility (AUC 0.70–0.74) using limited features and small datasets
- Introduction of airflow-estimation features using IBIF markedly improved discrimination between PVH and controls
- Parameters such as spectral tilt, H1–H2 variability, and loudness distribution consistently show strong effect sizes in ambulatory monitoring
- Models built on daily distributional statistics perform well, but frame-level or deep learning approaches may uncover additional patterns
- For NPVH detection, CPP (voice quality) and reduced variability in harmonics are particularly effective

Overview

Tasks & Data

FAQ

Page updated

Google Sites

Report abuse

Baseline Performance