Home

A multilingual, open-source suite of high-quality LLMs for European languages.

EuroLLM: Open Source European Large Language Model

New Website:

Goal

To build an open source European Large Language Model that supports 24 Official European Languages, and a few other strategically important languages.

News

12-12-2024 We released our 9B model on Huggingface on 2 December, and have had 50k downloads in the first week! We are also currently the top performing 9B model on the OpenGPTX European leaderboard apart from Gemma 2 which is actually a bigger model as they do not count embeddings in the 9B.
22-10-2024 We are attending the EuroHPC user day in October in Amsterdam - please reach out to meet us in person!
24-09-24 Paper on the 1.7B EuroLLM model is published on Arxiv: https://arxiv.org/abs/2409.16235

Duration

Project Timeline: 1 May 2024 - 30 April 2025

We thank EuroHPC for the HPC resources used to support this work through grant EHPC-EXT-2023E01-042.

Deliverables

A series of models of different sizes for optimal effectiveness and efficiency (1B, 9B and 22B) trained on 4T tokens
A multimodal model which can process and understand speech or text input
Full project codebase available to the public with detailed data and model descriptions

Multilingual

Models pretrained and finetuned on text from all languages for better performance, especially for native users of these LLMs.

Multimodal

Models trained to process speech and text in various languages, enabling them to support spoken languages and recognize prosody and emotion.

High Performance

Models that offer high performance on various tasks in multiple languages, including QA, summarisation, and translation.

Open Source

Models that can be used freely by all researchers, organisations and citizens of Europe.

Models

Euro LLM-9 B

EuroLLM-9B is a 9B parameter model trained on similar data to EuroLLM-1.7B.

Euro LLM-1.7 B

EuroLLM-1.7B is a 1.7B parameter model trained on 4 trillion tokens divided across the considered languages and several data sources: Web data, parallel data (en-xx and xx-en), and high-quality datasets. EuroLLM-1.7B-Instruct was further instruction tuned on EuroBlocks, an instruction tuning dataset with focus on general instruction-following and machine translation.

View on Hugging Face

Organisations

With links to

Know More

Do you have questions?

Contact eurollm.info@gmail.com to get more information about the project

Page updated

Google Sites

Report abuse