Use this guide as a quick reference when reviewing Common Voice submissions. The goal is to ensure contributions are accurate, understandable, and usable for building high-quality speech datasets.
As a validator, you are not judging accents, voice style, or personal delivery. Focus on correctness, clarity, and adherence to guidelines.
Confirm that the submission represents natural, accurate speech that matches the provided text and follows content standards.
Focus on:
Accuracy
Clarity
Natural speech
Avoid judging stylistic preferences.
Check:
Does the speaker read the sentence exactly as written?
Are all words present and in the correct order?
Is anything added or removed?
Accept if:
The sentence matches the audio precisely.
Reject if:
Words are missing, changed, or added.
The speaker clearly misreads the sentence.
Check:
Is the recording understandable?
Is the audio complete from start to finish?
Accept if:
Minor pauses or breathing sounds are present.
Accent differences are noticeable but understandable.
Reject if:
Loud background noise makes understanding difficult.
Audio cuts off at the beginning or end.
Audio is distorted or unintelligible.
If validating written sentences:
Accept if:
The sentence is complete and readable.
It sounds natural when spoken aloud.
It contains no sensitive or personal information.
Reject if:
It includes private names, addresses, emails, or URLs.
It contains offensive or inappropriate content.
It feels unnatural or incomplete.
Important reminders:
Accents are not errors.
Dialects are valid.
Natural pronunciation differences are expected.
Do not reject submissions solely due to accent or regional variation.
Rejecting due to accent or tone differences.
Rejecting for minor pauses or natural breathing.
Expecting studio-quality audio.
Accepting recordings that clearly misread the text.
Focus on understandability and correctness rather than perfection.
Ask yourself:
Does the recording match the text exactly?
Can the speech be understood clearly?
If YES to both:
β Approve.
If NO to either:
β Reject.
Listen once for clarity, again for accuracy.
Trust guidelines over personal preference.
Avoid over-rejecting borderline cases.
Common Voice succeeds because it reflects real people speaking naturally!
Use this decision flow when validating audio recordings.
Check:
Are all words present?
Are the words in the correct order?
Were any words added, skipped, or changed?
π YES β Continue
π NO β β Reject
Check:
Does the audio include the full sentence?
Is the beginning or end cut off?
π YES β Continue
π NO β β Reject
Ask:
Can you understand the speaker without difficulty?
Is background noise minimal?
Is the audio free from heavy distortion?
π YES β Continue
π NO β β Reject
Important reminder:
Accents are not errors.
Dialects are valid.
Natural speech variation is expected.
Ask:
Can the recording still be understood clearly?
π YES β Continue
π NO (unintelligible) β β Reject
Accept:
Natural pacing
Minor pauses
Breathing sounds
Reject only if:
Speech is unintelligible or clearly incorrect.
If all checks pass:
β Approve recording.
If any critical rule fails:
β Reject recording.
Use this decision flow when validating written sentences.
Check:
Does it form a full sentence?
Does it make grammatical sense?
π YES β Continue
π NO β β Reject
Ask:
Would a real person realistically say this?
Is it easy to read aloud?
π YES β Continue
π NO β β Reject
Reject if it includes:
Full names of private individuals
Phone numbers, emails, addresses
URLs or hashtags
Offensive or inappropriate content
References to violence, hate, or illegal activity
π If clean β Continue
π If restricted β β Reject
Check:
Neutral tone
No sensitive or personal information
π YES β Continue
π NO β β Reject
If all checks pass:
β Approve sentence.
If one or more critical rules fail:
β Reject sentence.