Mozilla Common Voice

Overview

Mozilla Common Voice tasks support the creation of open, high-quality voice datasets used to train and evaluate speech recognition systems. These datasets are published by the Mozilla Foundation and used by researchers, developers, and organizations around the world.

On Effect Alpha, Common Voice tasks focus on natural human speech, clarity, and correctness. The goal is not perfection, but realism. Contributors help create datasets that reflect how real people speak in everyday situations, including different accents, dialects, and speaking styles.

You may encounter three main types of tasks:

Sentence creation
Audio recording
Sentence or audio validation

Each workflow has slightly different expectations, but all follow the same core principles.

What You Will Do

Depending on the assignment, you may:

Write natural sentences that will later be recorded.
Record yourself reading provided sentences aloud.
Validate sentences or recordings created by other contributors.

Your role is to help create clear, natural, and usable voice data that accurately represents human speech.

Requirements Before Starting

Work in a quiet environment when recording.
Use a functioning microphone.
Be attentive and follow instructions carefully.
Use natural language and realistic speech patterns.

Avoid rushing through tasks. Accuracy and clarity are more important than speed.

Key Principles of Common Voice Tasks

Natural Language and Speech

Content should reflect how people naturally speak. Avoid overly complex phrasing, unnatural wording, or machine-like language.

Accuracy Matters

Read or write sentences exactly as required.
Do not add, remove, or alter words unless instructed.
Avoid paraphrasing.

Inclusivity and Accent Diversity

Common Voice is designed to include diverse accents and speaking styles.

Accents are not errors.
Dialects are valid.
Natural variations in speech are expected.

Sentence Creation Guidelines

When writing sentences:

Use complete, grammatically correct sentences.
Write in a natural conversational style.
Keep phrasing easy to read aloud.

Good examples:

The sun was already setting behind the hills.
She forgot her keys on the kitchen table.

Poor examples:

XJ92 device initialization complete.
!!! This is so cool !!!

Content Restrictions

Do not include:

Full names of private individuals
Phone numbers, addresses, or emails
URLs or hashtags
Profanity or offensive language
References to violence, hate, or illegal activity

Sentences should remain neutral and appropriate for a global audience.

Audio Recording Guidelines

Recording Best Practices

Before recording:

Find a quiet environment.
Ensure your microphone works correctly.
Speak at a natural pace.

During recording:

Read the sentence exactly as written.
Do not paraphrase or correct grammar.
Speak naturally, not robotically.

If you make a mistake, re-record instead of submitting.

Common Recording Issues

Avoid:

Background noise.
Whispering or exaggerated pronunciation.
Speaking too quickly or slowly.
Cutting off the start or end of the recording.

Natural clarity is more important than perfection.

Validation Guidelines

Validation helps maintain dataset quality. A handy "Validator Cheat Sheet" can be found here.

Accept when:

The recording matches the text exactly.
The audio is clear and understandable.
The sentence follows content guidelines.

Reject when:

Words are missing or changed.
The speaker clearly misreads the sentence.
Audio is distorted, incomplete, or unintelligible.
The sentence violates content rules.

Language and Accent Considerations

Common Voice encourages linguistic diversity.

Do not reject for:

Accent differences
Regional pronunciation
Natural speaking variation

Only reject when differences prevent understanding or accuracy.

Common Mistakes to Avoid

Rejecting recordings because of accent or tone.
Writing overly complex or unnatural sentences.
Speaking too slowly or unnaturally clearly.
Accepting recordings that clearly misread the text.

Why These Rules Matter

Common Voice datasets are used globally to improve speech recognition systems. Following these guidelines ensures:

Voice models become more inclusive.
Data reflects real human speech.
Contributors receive fair rewards.
Open datasets remain reliable and trustworthy.

When You Are Unsure

If something feels unclear:

Reread the task instructions.
Use your best judgment.
Ask for clarification in Discord.

It is always better to ask than to guess!

Page updated

Report abuse