About the Aya Project

Aya is an open science project that aims to build a state of art multilingual generative language mode; that harnesses the collective wisdom and contributions of people from all over the world. Support this amazing initiative and make sure no language is left behind!

We are seeking volunteers with fluency in languages other than English to contribute to our open science project and help integrate your languages in worldwide AI development. Volunteers will submit strong examples of writing in said languages and review others’ contributions to help build an open-source language model that will be available to research scientists around the world, ensuring that the next generation of generative AI and AI tools will be accessible to members of your communities. 

 

From our blog, initially published on 05/06/2023

Introducing Aya: An Open Science Initiative to Accelerate Multilingual AI Progress

TL;DR:

Aya is an open science project that aims to build a state of art multilingual generative language model; that harnesses the collective wisdom and contributions of people from all over the world.

Cohere For AI is a research lab that seeks to solve complex machine learning problems. We are honored to introduce Aya—an ongoing collaborative open science endeavour aimed at building a multilingual language model via instruction tuning that harnesses the collective wisdom and contributions of people from all over the world. This yearlong open science initiative brings together AI experts from academia, industry, non-profits and independent research to create a state-of-the-art multilingual model and foster open collaboration.

In the Aya Multilingual project, we want to improve available multilingual generative models and accelerate progress for languages across the world. The word Aya is derived from the Twi language and is translated to “fern”. Aya is a symbol of endurance and resourcefulness which captures the spirit of our own commitment to accelerate multilingual AI progress. Contributing to Aya is open to anyone who is passionate about advancing the field of natural language processing and is committed to promoting open science. You don’t have to be an AI expert to be involved, we are looking for everyday citizens, teachers, linguists and lifelong learners. By joining Aya, you become part of a global movement dedicated to democratizing access to language technology. We will be open-sourcing all our models, training data, and the data collection tool as part of this project.


As natural language processing technologies advance, not all languages have been treated equally by developers and researchers. Much of the data used to train large language models comes from the internet, which continues to reflect the composition of early users of this technology - 5% of the world speaks English at home, yet 63.7% of internet communication is in English.  There are around 7,000 languages spoken in the world, and around 400 languages have more than 1M speakers.1 However, there is scarce coverage of multilingual datasets.2 3 On top of this, the under-indexing of certain languages is also driven by access to compute resources. Mobile data, compute, and other computational resources may often be expensive or unavailable in regions that are home to under-represented languages. Unless we address this disproportionate representation head-on, we risk perpetuating this divide and further widening the gap in language access of new technologies.

...

The project is led and supported with compute and resources by Cohere For AI. However, it is a truly multi-institutional initiative with the help of a community of researchers, engineers, linguists, social scientists, and lifelong learners from over 100 countries around the world.

Join us on this remarkable journey as we collectively shape the future of multilingual language models. Let's unite, collaborate, and unleash the true potential of open science for the betterment of global communication. Get started today by contributing for your language.

Not sure where to start? Join our dedicated Discord Server for the AYA multilingual project, and you can meet people contributing in your language.

1. How many languages are there in the world?. (2023). Retrieved 30 May 2023, from https://www.ethnologue.com/insights/how-many-languages/

2. From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers. (2023). Retrieved 30 May 2023, from https://aclanthology.org/2020.emnlp-main.363.pdf

3. Team, N., Costa-jussà, M., Cross, J., Çelebi, O., Elbayad, M., & Heafield, K. et al. (2022). No Language Left Behind: Scaling Human-Centered Machine Translation. Retrieved 30 May 2023, from https://arxiv.org/abs/2207.04672