AfricaNLP 2023 Workshop

 African NLP in the Era of Large Language Models.

(Collocated with ICLR 2023, 5th May 2023 )

About the Workshop

Over 1 billion people live in Africa, and its residents speak more than 2,000 languages. But those languages are among the least represented in NLP research, and work on African languages is often sidelined at major venues. In 2022, the wave of large language models built through collaborative networks and large investments in compute has come to the shores of African languages. This year has seen the release of large multilingual models such as BLOOM and NLLB-200 for machine translation. While those models have been publicly open-sourced, their impact on the community of African NLP researchers is yet to be assessed and deserves to be a matter of wider discussion. This has inspired the theme for the 2023 workshop: African NLP in the Era of Large Language Models

The workshop has several aims

This workshop follows the previously successful editions in 2020, 2021, and 2022. It will be hybrid and co-located with ICLR2023. No paper will be automatically desk-rejected :).

Important Dates

Speakers

Perez Ogayo 

Perez Ogayo is a master's student at Carnegie Mellon University's Technologies Institute (LTI). Prior to her studies at Carnegie Mellon, she received her BSc in Computer Science from African Leadership University-Rwanda. Perez's research pursuits lie in the realm of multilingual and low natural language processing (NLP), where she focuses on machine translation, speech synthesis and recognition, and NLP for endangered languages. Additionally, she is interested in the efficient deployment of NLP models on smaller devices, as she recognizes the importance of accessibility and sustainability in the field. Alongside her studies at Carnegie Mellon, Perez currently serves as a researcher at Masakhane, where she works on the Luo, Swahili, and Suba languages.

Elizabeth Salesky

Elizabeth Salesky is a Ph.D. student at Johns Hopkins University, advised by Philipp Koehn and Matt Post. Her research primarily focuses on language representations for machine translation and multilinguality, including how to create models that are more data-efficient and robust to variation across languages and data sources. 

Dr. Seid Muhie Yimam 

Dr. Seid Muhie Yimam is currently a technical lead at HCDS and a research associate at Language Technology Group, under the supervision of Prof. Chris Biemann. At HCDS, he will mostly work on leading and consulting research on digital humanities that involve big data processing of textual content. He will continue teaching NLP and Data science courses in the house while supervising students on interdisciplinary AI and data science research topics. He is currently participating in the development of a research data and knowledge management project, an intersectional project with knowledge management, AI, and library science. The project is envisioned to ingest metadata from research reports and projects automatically from diverse sources to present the outcomes using appealing visualization components.

He has been working as a postdoctoral researcher at Language Technology Group, UHH, since January 2020. He received his Ph.D. degree from the Universität Hamburg, with a specialization in the integration of adaptive machine learning models into annotation tools and NLP applications. From January 2020-March 2022, he has been working on multiple research topics including social media NLP (hate speech detection, fake news identification, and sentiment analysis) and low-resource language NLP research, mostly for the Ethiopian language of Amharic that include named entity recognition, semantic models, hate speech detection, and sentiment analysis. He has been teaching NLP courses and supervising Master’s projects and thesis in the group.


Paco Guzman

Paco is Research Scientist Manager supporting translation teams in Meta AI (FAIR). He works in the field of machine translation with a focus on low-resource translation (e.g. NLLB, FLORES) and the aim to break language barriers. He joined Meta in 2016. His research has been published in top-tier NLP venues like ACL, EMNLP. 

He was the co-chair of the Research director at AMTA (2020-2022). He has organized several research competitions focused on low-resource translation (including the WMT2022 shared task on African Languages) and data filtering. Paco obtained his PhD from the ITESM in Mexico, was a visiting scholar at the LTI-CMU from 2008-2009, and participated in DARPA’s GALE evaluation program. Paco was a post-doc and scientist at Qatar Computing Research Institute in Qatar in 2012-2016

Laurent Besacier 

Laurent Besacier is a principal scientist and Natural Language Processing (NLP) research team lead at Naver Labs Europe. Before that, he became a professor at the University Grenoble Alpes (UGA) in 2009 where he led the GETALP group (natural language and speech processing). Laurent is still affiliated with UGA. 

His main research expertise and interests lie in the field of natural language processing, automatic speech recognition, machine translation, under-resourced languages, machine-assisted language documentation and the evaluation of NLP systems.

Asmelash Teka Hadgu 

Asmelash Teka Hadgu is the Co-founder and CTO of Lesan and a fellow at the Distributed AI Research Institute (DAIR). At Lesan, he has built state-of-the-art machine translation systems to and from Amharic, Tigrinya, and English. Prior to Lesan, Asmelash did his Ph.D. at the Leibniz University Hannover where his research focused on applied machine learning for applications in scholarly communication, crisis communication, and natural language processing in low resource settings. Currently, as part of the Lesan-DAIR partnership, he is working on language technologies for Ge’ez based languages such as Tigrinya and Amharic.

Niyonkuru Audace


Audace Niyonkuru is  Chief executive officer of Digital Umuganda , an AI and Open data company focusing on democratising access to information in African languages by the creation of open & publicly available datasets to spur AI research and innovation on the continent .He is also a member of United Nations Internet governance forum multi stakeholder advisory group.

Schedule

Best Papers 

We are pleased to announce the three best papers awarded at the AfricaNLP workshop 2023. The papers are:

Accepted Papers

NATURAL LANGUAGE UNDERSTANDING FOR AFRICAN LANGUAGES   [paper] [video]

Authors: Pierrette MAHORO MASTEL, Pierrette MAHORO MASTEL, Ester Namara, Aime Munezero, Richard Kagame, Zihan WANG, Allan Anzagira, Akshat Gupta, Jema David Ndibwile

AfriSign: Machine Translation for African Sign Languages [paper] [video]

Authors:Shester Gueuwou, Kate Takyi, Mathias Müller, Marco Stanley Nyarko, Richard Adade, Rose-Mary Owusuaa Mensah Gyening

Kinyarwanda TTS: Using a multi-speaker dataset to build a Kinyarwanda TTS model [paper] [video]

Authors: Samuel Rutunda, Kleber Kabanda, Adriana Stan

Improving African Language Identification with Multi-task Learning [paper] [video]

Authors: Ife Adebara, AbdelRahim A. Elmadany, Muhammad Abdul-Mageed

ERROR ANALYSIS OF TIGRINYA – ENGLISH MACHINE TRANSLATION SYSTEMS [paper] [video]

Authors: Nuredin Ali Abdelkadir, Negasi Haile Abadi, Asmelash Teka Hadgu

Tigrinya Dialect Identification [paper] [video]

Authors: Asfaw Gedamu Haileslasie, Asmelash Teka Hadgu, Solomon Teferra Abate 

HausaNLP at SemEval-2023 Task 12: Leveraging African Low Resource TweetData for Sentiment Analysis [paper] [video]

Authors: Saheed Salahudeen Abdullahi, Falalu Ibrahim Lawan, Ahmad Mustapha Wali, Amina Abubakar Imam, Aliyu Rabiu Shuaibu, Yusuf Aliyu, Nur Bala Rabiu, Musa Bello, Shamsuddeen Umar Adamu, Saminu Mohammad Aliyu, Murja Sani Gadanya, Sanah Abdullahi Muaz, Mahmoud Said Ahmad, Abdulkadir Abdullahi, Abdulmalik Yusuf Jamoh

IGBONER 2.0: EXPANDING NAMED ENTITY RECOGNITION DATASETS VIA PROJECTION [paper] [video]

Authors: Chiamaka Ijeoma Chukwuneke, Paul Rayson, Ignatius Ezeani, Mo El-Haj, DORIS CHINEDU ASOGWA, Chidimma Lilian Okpalla, CHINEDU EMMANUEL MBONU

MphayaNER: Named Entity Recognition for Tshivenda[paper] [video]

Authors: Rendani Mbuvha, David Ifeoluwa Adelani, Tendani Mutavhatsindi, Tshimangadzo Rakhuhu, Aluwani Mauda, Tshifhiwa Joshua Maumela, Andisani Masindi, Seani Rananga, Vukosi Marivate, Tshilidzi Marwala 

FINE-TUNING MULTILINGUAL PRETRAINED AFRICAN LANGUAGE MODELS [paper] [video]

Authors: Rozina Lucy Myoya, Fiskani Banda, Vukosi Marivate, Abiodun Modupe

Breaking the Low-Resource Barrier for Dagbani ASR: From Data Collection to Modeling [paper] [video]

Authors: Paul Azunre, Naafi Dasana Ibrahim

Yoruba and Unicode: An Overview of a Problem [paper] [video]

Authors: Kolawole Olatubosun

African Substrates Rather Than European Lexifiers to Augment African-diaspora Creole Translation  [paper] [video]

Authors: Nathaniel Romney Robinson, Matthew Dean Stutzman, Stephen D. Richardson, David R Mortensen

Multilingual Automatic Speech Recognition for Kinyarwanda, Swahili, and Luganda: Advancing ASR in Select East African Languages [paper] [video]

Authors: Moayad Elamin, Yonas Chanie, Paul Ewuzie, Samuel Rutunda

Speech Recognition Datasets for Low-resource Congolese Languages [paper] [video]

Authors: USSEN ABRE KIMANUKA, Ciira wa Maina, Osman Büyük

Multilingual Model and Data Resources for Text-To-Speech in Ugandan Languages  [paper] [video]

Authors: Isaac Owomugisha, Benjamin Akera, Ernest Tonny Mwebaze, John Quinn

LEXICON AND RULE-BASED WORD LEMMATIZATION APPROACH FOR SOMALI LANGUAGE  [paper] [video]

Authors: Shafie Abdi Mohamed, Muhidin A. Mohamed 

AfriSenti: A Benchmark Twitter Sentiment Analysis for African Languages [paper] [video]

Authors: Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, Nedjma OUSIDHOUM, David Ifeoluwa Adelani, Seid Muhie Yimam, Ibrahim Said Ahmad, Meriem Beloucif, Saif M. Mohammad, Oumaima Hourrane, Pavel Brazdil, Felermino D. M. A. Ali, Davis David, Salomey Osei, Bello Shehu Bello, Falalu Ibrahim Lawan, Tajuddeen Gwadabe, Samuel Rutunda, Tadesse Belay, Bernard Opoku

Koya: A Recommender System for Large Language Model Selection [paper] [video]

Authors: Abraham Toluwase Owodunni, Chris Chinenye Emezue

How good are Commercial Large Language Models on African Languages? [paper] [video]

Authors: Jessica Ojo, Kelechi Ogueji 

AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages [paper] [video]

Authors: Chris Chinenye Emezue, Sanchit Gandhi, Lewis Tunstall, Abubakar Abid, Joshua Meyer, Quentin Lhoest, Pete Allen, Patrick Von Platen, Douwe Kiela, Yacine Jernite, Julien Chaumond, Merve Noyan, Omar Sanseviero 

MasakhaNEWS: News Topic Classification for African languages [paper] [video]

Authors: David Ifeoluwa Adelani, Marek Masiak, Israel Abebe Azime, Jesujoba Oluwadara Alabi, Atnafu Lambebo Tonja, Christine Mwase, Odunayo Ogundep et al.

ε KÚ <MASK>: INTEGRATING YORÙBÁ CULTURAL GREETINGS INTO MACHINE TRANSLATION [paper] [video]

Authors: Idris Akinade, Jesujoba Oluwadara Alabi, David Ifeoluwa Adelani, Clement Oyeleke Odoje, Dietrich Klakow

AfriNames: Most ASR models "butcher" African Names  [paper] [video]

Authors: Tobi Olatunji, Tejumade Afonja, Bonaventure F. P. Dossou, Atnafu Lambebo Tonja, Chris Chinenye Emezue, Amina Mardiyyah Rufai, Sahib Singh

Evaluating the Robustness of Machine Reading Comprehension Models to Low Resource Entity Renaming  [paper] [video]

Authors:  Clemencia Siro, Tunde Oluwaseyi Ajayi 

Adapting to the Low-Resource Double-Bind: Investigating Low-Compute Methods on Low-Resource African Languages [paper] [video]

Authors:  Colin Leong, Herumb Shandilya, Bonaventure F. P. Dossou, Atnafu Lambebo Tonja, Joel Mathew, Abdul-Hakeem Omotayo, Oreen Yousuf, Zainab Akinjobi, Chris Chinenye Emezue, Shamsudeen Muhammad, Steven Kolawole, Younwoo Choi, Tosin Adewumi 

Online Threats Detection in Hausa Language [paper] [video]

Authors:  Abubakar Yakubu Zandam, Fatima Adam Muhammad, Isa Inuwa-Dutse 

VoxMg: An Automatic Speech Recognition Dataset for Malagasy [paper] [video]

Authors: Falia Ramanantsoa 

Prediction of the ability and motivation to adopt Reproductive Health Behavioural change using anonymized customer center audio data

 [paper] [video]

Authors:  Olubayo Adekanmbi, Anthony Soronnadi 

The first large scale collection of diverse Hausa language datasets [paper] [video]

Authors:  Isa Inuwa-Dutse 

AMHARIC TEXT COMPLEXITY CLASSIFICATION USING COMPLEXITY ANNOTATOR TOOL AND SUPERVISED MACHINE LEARNING 

  [paper] [video]

Authors:  Gebregziabihier Nigusie, Tesfa Tegegne

AROT-COV23: A Dataset of 500K Original Arabic Tweets on COVID-19 [paper] [video]

Authours:  Cheng Xu, Nan Yan 

Organizers

David Ifeoluwa Adelani

Research Fellow, UCL

Bonaventure F. P. Dossou

Ph.D. Student, Mila & McGill

Shamsuddeen Muhammad

     Ph.D. Student, UPorto

Atnafu Lambebo Tonja

Ph. D. Student, IPN

Hady Elsahar

Research Scientist, Meta AI

Happy Buzaaba

Postdoc, RIKEN Center for AIP

Aremu Anuoluwapo

Linguist, YorubaNames

Salomey Osei

PhD. student, DeustoTech

Tunde Ajayi

Ph.D. Student, Insight Centre, University of Galway

Constantine Lignos

Assistant Professor,
Brandeis University

Tajuddeen Rabiu Gwadabe

Project Manager, Masakhane Research Foundation

   Clemencia Siro

PhD student, University of Amsterdam

Everlyn Asiko Chimoto

Ph.D. Student, University of Cape Town, AIMS

 Contacts & Slack Workplace

You're invited to join the Masakhane community slack (channel #africanlp-iclr2023-support) . Meet other participants, find collaborators, mentors and advice there. Organizers will be available on slack to answer questions regarding submissions, format, topics, etc.  If you have any doubt whether you can contribute to this workshop (e.g. if you have never written a paper, if you are new to NLP, if you do not have any collaborators, if you do not know LaTeX, etc.), please join the slack and contact us there as well.

To contact the workshop organizers please send an email to: africanlp-ICLR2023@googlegroups.com



 Sponsors 

             Digital Umuganda