FAQ
Quick guidance on key participant questions
Quick guidance on key participant questions
Researchers, students, and industry professionals globally are welcome to join. Each person may belong to only one team.
A registration link (Google Form) will be available on the Home page. Registration is required before dataset access.
Teams must complete a Data Transfer and Use Agreement (DTUA) through their institution. An institutional authorized official must review and sign the standard, unaltered DTUA template. Participation is contingent upon full execution of the DTUA and approval of the relevant Institutional Review Board (IRB) amendment authorizing the sharing of deidentified data with the participating institution. Data access will be granted only after both steps are completed.
No. The dataset was collected under Mass General Brigham IRB-approved protocols. However, if your institution requires internal review, you are responsible for checking.
No. Only one DTUA is required per participating team. All researchers at the same institution are covered under the executed agreement as Recipient Personnel.
The DTUA must be signed by an institutional authorized official designated by the participating institution. This is typically handled through offices such as contracts, grants, research administration, or legal counsel. Teams should consult their institution to identify the appropriate signing authority.
After registration, teams should complete and sign the provided PDF form and return it as instructed by the challenge organizers.
No. Access to the dataset is limited to personnel at the recipient institution named in the executed DTUA. Third parties or collaborators at other institutions are not permitted to access the data under the same agreement. Researchers at a different institution must complete the DTUA process separately to obtain access.
The data were collected using a high-bandwidth neck-surface accelerometer, not an acoustic microphone. See description in Mehta et al., 2012 and Mehta et al., 2015. The accelerometer measures front skin-surface vibrations during phonation when worn on the neck.
Across the study, multiple accelerometer assemblies and smartphone platforms were used, including instances where multiple recording kits were deployed in parallel across participants to support large-scale data collection. Minor variability related to hardware components and device audio codecs is expected and is addressed through calibration and standardized processing procedures during data collection and analysis.
After registering and once all required approvals are completed, including full execution of the DTUA, approved teams are granted access to the challenge dataset. A labeled training set is released first for model development, followed by an unlabeled test set released later for standardized evaluation.
The dataset is provided via a secure, password-protected file-sharing system, with access credentials shared separately to ensure data security.
The dataset package includes documentation (README, dataset dictionary, usage guidance, references), label files, and voice feature files. Detailed descriptions of labels, features, and expected usage will be provided in the dataset dictionary.
Data are organized at the subject-day level, with each recording day defined by an anonymized subject identifier and a deidentified monitoring date. Subjects typically have multiple monitoring days.
Feature data are provided as MATLAB .mat files containing frame-based time-series features, with each file corresponding to a single subject and monitoring day.
Yes. All team members accessing the dataset must use workstations that meet institutional security standards, including password protection, encryption at rest, up-to-date operating systems, firewalls, and antivirus/endpoint security protections.
Only authorized team members listed under the executed DTUA and affiliated with the approved institution are permitted to access the dataset. Data must not be shared outside the approved team or institution.
Yes. Teams are required to submit the names, titles, and institutional affiliations of all members who will access the dataset and participate as coauthors.
Along with voice features and diagnostic labels, the dataset includes a limited set of deidentified metadata to protect participant privacy and ensure consistency across teams. The provided metadata include anonymized subject identifiers, biological sex, and monitoring dates. Other variables (e.g., age or occupation) were used during study design for cohort matching but are not included in the dataset shared with participants.
Age and occupation are not provided to ensure a uniform dataset across all participating teams and to avoid introducing potential confounds that could influence model development and evaluation. The challenge is designed to focus analyses on the vocal measures, which are the primary targets of the challenge tasks.
Yes. A small number of subjects have missing values for certain features (primarily IBIF measures; see Tasks & Data for details). Teams should account for the presence of missing data when designing their analysis or modeling approach.
Data were split into training and testing sets at the subject level, using random assignment. Noting that the splitting process was not conditioned, e.g., on feature availability, meaning that subjects with missing features may appear in different stages of the dataset. This approach ensures unbiased and consistent evaluation across teams.
No external private data is allowed. Publicly available pretrained models may be used (e.g., open-source embeddings, foundation models), provided they do not include voice data not released as part of this challenge.
Submitted results will be evaluated on the unlabeled test data using:
AUC (primary metric)
Accuracy, sensitivity, and specificity
Plus additional diagnostics to assess clinical relevance
Details are on the Tasks & Data page.
Yes. Approaches that provide insight into underlying voice mechanisms (e.g., feature importance, physiological relevance) are strongly encouraged and will be considered favorably during final review.
Only one final submission per team is allowed, including:
Prediction file (probabilities and labels)
Short technical report
Reproducible code or pipeline
Full instructions on the Tasks & Data page
Participation requires submission of a conference paper to Interspeech 2026. Paper submission is not limited to the winning team.
Teams are required to submit a conference paper, in accordance with the guidelines on the Interspeech conference website. The paper format is part of a separate conference track and is not associated with the challenge submissions.
All dates follow the Interspeech 2026 schedule. Any updates will be communicated directly via email to registered teams.
Please contact neckvibe-challenge+managers@googlegroups.com