Building Africa focused AI and Data

Among the several datasets that enable state-of-the-art speech recognition systems (MNIST, LibriSpeech, SpeechCommands, …) there are many that provide sufficient trained or untrained data for major languages (English, France, German, …). However, data remain scarce for languages such as Ewè, spoken by above 10 million speakers in the West African region. The Yodi Project aims at providing the state-of-the-art datasets necessary for Machine learning development in Africa (West Africa).

Mission of the project

Neural Machine Translation Text Dataset (Melinda Text Dataset)

Pre-processing and labeling textual data from different sources.

Speech Recognition Dataset (Melinda Vocal Dataset)

Pre-processing and labeling data for the speech vocal dataset for West Africa.

AI & ML Developement

Help Umbaji build the Yodi model.

Project Management

Help Umbaji manage the repositories for the Yodi Project.

Take action

Questions?

Contact contact@umbaji.org to get more information about the project.

Page updated

Report abuse