A 1.7M GPU grant from EuroHPC on the Barcelona Super Computer MareNostrum5 to train a language model for Europe. It covers the 24 official European languages and 11 strategic international languages. We pretrained a 1.7B and 9B model from scratch on 50% non-English data. EuroLLM shows the best translation performance compared to all the other top medium sized LLMs. It also performs competitively in understanding and reasoning across 11 EU languages, beating other European models, and it is fully open.
Unified Transcription and Translation for Extended Reality
UTTER aims to use large language models to develop scalable, adaptable, contextualised, emotion-aware and explainable technologies for translation, summarisation and transcription.
UTTER is a Horizon Europe project running from October 2022 for three years.
MTStretch aimed to improve low-resource machine translation by stretching our capabilities and resources.
MTStretch was a EPSRC Innovation Fellowship which ran from 2019-2021.
Global Under-Resourced Media Translation
GoURMET improves neural machine translation for low-resource language pairs and domains.
The project develops open-source applications and implements these through use cases at the BBC and Deutsche Welle.
GoURMET is a European Horizon 2020 project which ran from 2019-2022
Scalable Understanding of Multilingual Media
SUMMA developed an extensible media monitoring platform and tools, to enhancing Multilingual and cross lingual capabilities. It was an EU Horizon 2020 project that ran from 2016-2019.
An industrial collaboration with Samsung Research Poland, covering a broad range of challenges for neural machine translation lasting from 2015-2020.
Scalable Understanding of Multilingual Media
SUMMA developed an extensible media monitoring platform and tools, to enhancing Multilingual and cross lingual capabilities. It was an EU Horizon 2020 project that ran from 2016-2019.
Health in my language.
Making public health information available in a wider variety of languages. It was an EU Horizon 2020 project which ran from 2015-2018