FAQ

Quick guidance on key participant questions

Who can participate?

Researchers, students, and industry professionals globally are welcome to join. Each person may belong to only one team.

How do I register?

A registration link (Google Form) will be available on the Home page. Registration is required before dataset access.

What are the next steps after registering to participate in the challenge?

Teams must complete a Data Transfer and Use Agreement (DTUA) through their institution. An institutional authorized official must review and sign the standard, unaltered DTUA template. Participation is contingent upon full execution of the DTUA and approval of the relevant Institutional Review Board (IRB) amendment authorizing the sharing of deidentified data with the participating institution. Data access will be granted only after both steps are completed.

Do I need independent Institutional Review Board (IRB) approval to participate?

No. The dataset was collected under Mass General Brigham IRB-approved protocols. However, if your institution requires internal review, you are responsible for checking.

Do individual team members need to sign separate Data Transfer and Use Agreements (DTUAs)?

No. Only one DTUA is required per participating team. All researchers at the same institution are covered under the executed agreement as Recipient Personnel.

Who is authorized to sign the Data Transfer and Use Agreement (DTUA) on behalf of a team?

The DTUA must be signed by an institutional authorized official designated by the participating institution. This is typically handled through offices such as contracts, grants, research administration, or legal counsel. Teams should consult their institution to identify the appropriate signing authority.

How should completed and signed DTUA or approval forms be submitted?

After registration, teams should complete and sign the provided PDF form and return it as instructed by the challenge organizers.

Can team members from another institution access the challenge data under the same DTUA?

No. Access to the dataset is limited to personnel at the recipient institution named in the executed DTUA. Third parties or collaborators at other institutions are not permitted to access the data under the same agreement. Researchers at a different institution must complete the DTUA process separately to obtain access.

What type of sensor was used to collect the voice data?

The data were collected using a high-bandwidth neck-surface accelerometer, not an acoustic microphone. See description in Mehta et al., 2012 and Mehta et al., 2015. The accelerometer measures front skin-surface vibrations during phonation when worn on the neck.

Across the study, multiple accelerometer assemblies and smartphone platforms were used, including instances where multiple recording kits were deployed in parallel across participants to support large-scale data collection. Minor variability related to hardware components and device audio codecs is expected and is addressed through calibration and standardized processing procedures during data collection and analysis.

When will participating teams receive access to the dataset?

After registering and once all required approvals are completed, including full execution of the DTUA, approved teams are granted access to the challenge dataset. A labeled training set is released first for model development, followed by an unlabeled test set released later for standardized evaluation.

How is the dataset delivered to participating teams?

The dataset is provided via a secure, password-protected file-sharing system, with access credentials shared separately to ensure data security.

What materials are included in the dataset package?

The dataset package includes documentation (README, dataset dictionary, usage guidance, references), label files, and voice feature files. Detailed descriptions of labels, features, and expected usage will be provided in the dataset dictionary.

How are recordings organized within the dataset?

Data are organized at the subject-day level, with each recording day defined by an anonymized subject identifier and a deidentified monitoring date. Subjects typically have multiple monitoring days.

What is the format of the feature data?

Feature data are provided as MATLAB .mat files containing frame-based time-series features, with each file corresponding to a single subject and monitoring day.

Are there workstation or security requirements for accessing the dataset?

Yes. All team members accessing the dataset must use workstations that meet institutional security standards, including password protection, encryption at rest, up-to-date operating systems, firewalls, and antivirus/endpoint security protections.

Who is permitted to access the dataset after approval?

Only authorized team members listed under the executed DTUA and affiliated with the approved institution are permitted to access the dataset. Data must not be shared outside the approved team or institution.

Do teams need to provide a list of members who will access the data?

Yes. Teams are required to submit the names, titles, and institutional affiliations of all members who will access the dataset and participate as coauthors.

What demographic or metadata variables are provided with the dataset?

Along with voice features and diagnostic labels, the dataset includes a limited set of deidentified metadata to protect participant privacy and ensure consistency across teams. The provided metadata include anonymized subject identifiers, biological sex, and monitoring dates. Other variables (e.g., age or occupation) were used during study design for cohort matching but are not included in the dataset shared with participants.

Why are other metadata variables, like age and occupation, not provided?

Age and occupation are not provided to ensure a uniform dataset across all participating teams and to avoid introducing potential confounds that could influence model development and evaluation. The challenge is designed to focus analyses on the vocal measures, which are the primary targets of the challenge tasks.

Are there missing feature values in the dataset?

Yes. A small number of subjects have missing values for certain features (primarily IBIF measures; see Tasks & Data for details). Teams should account for the presence of missing data when designing their analysis or modeling approach.

How were the data split into training and testing sets for model development and evaluation?

Data were split into training and testing sets at the subject level, using random assignment. Noting that the splitting process was not conditioned, e.g., on feature availability, meaning that subjects with missing features may appear in different stages of the dataset. This approach ensures unbiased and consistent evaluation across teams.

Can I use external data for model training?

No external private data is allowed. Publicly available pretrained models may be used (e.g., open-source embeddings, foundation models), provided they do not include voice data not released as part of this challenge.

What is the evaluation method?

Submitted results will be evaluated on the unlabeled test data using:

AUC (primary metric)
Accuracy, sensitivity, and specificity
Plus additional diagnostics to assess clinical relevance

Details are on the Tasks & Data page.

Is interpretability required?

Yes. Approaches that provide insight into underlying voice mechanisms (e.g., feature importance, physiological relevance) are strongly encouraged and will be considered favorably during final review.

Can I submit multiple models?

Only one final submission per team is allowed, including:

Prediction file (probabilities and labels)
Short technical report
Reproducible code or pipeline

Full instructions on the Tasks & Data page

Is a conference paper submission required for all participating teams or only the winner?

Participation requires submission of a conference paper to Interspeech 2026. Paper submission is not limited to the winning team.

What paper format should teams use for the conference submission?

Teams are required to submit a conference paper, in accordance with the guidelines on the Interspeech conference website. The paper format is part of a separate conference track and is not associated with the challenge submissions.

What happens if the timeline changes?

All dates follow the Interspeech 2026 schedule. Any updates will be communicated directly via email to registered teams.

Still have questions?

Please contact neckvibe-challenge+managers@googlegroups.com

The challenge ended on February 20, 2026

Overview

Tasks & Data

Key Dates

Rules & Ethics

Page updated

Google Sites

Report abuse

FAQ