NLP@Deep Learning Indaba

The Natural Language Processing workshop at the Deep Learning Indaba 2022 takes place on Thursday, 25 August 2022. It is organized by Wilhelmina Nekoto, Sebastian Ruder,  Tajuddeen Gwadabe, David Adelani, Bonaventure Dossou, and the Masakhane community.

Schedule

Thursday, August 25. Location (morning and midday session):  Amphitheatre Wood 1 (turn right after entering the main building where the registration was).  Location (afternoon session): Library (where the breaks are). All times are in Tunesia local time (GTM+1).

Morning session (8:30–10:30): Building NLP for Africa

Midday session (14:00–16:00): Practical NLP

Coffee break (16:00–16:30)

Afternoon session (16:30–18:30)

Invited Speakers

Kiswahili Machine Learning Fellow, Mozilla

NLP Engineer, Lelapa AI
Researcher, Masakhane

Associate Professor & ABSA Chair of Data Science, University of Pretoria

PhD Researcher, Music Technology Group

Full Professor, Télécom Paris / IP-Paris

Research Scientist, Google Research Montreal

Assistant Professor of Computer Science, UC San Diego

MSc student, TU Munich
Researcher, Masakhane

Researcher, Masakhane

Invited Talk Details

Kathleen Siminyu & Jade Abbott

Jade Abbott is a leading NLP researcher and engineer on the African continent. She has an MSc Computer Science from the University of Pretoria and works as a software engineer across Africa in every field from fintech, to NGOs, to startups. Currently, she trains and deploys deep learning systems to perform a variety of tasks for real world systems for 9 years. In 2019, she co-founded Masakhane, an open research grassroots natural language processing initiative for Africans, by Africans, which aims to spur research into NLP for African languages, boasting over 1000 members, from 38+ African countries, and 30+ affiliated publications. She is busy working on something new and exciting so watch this space!

Masakhane: Beginnings, Celebrations and Imaginings

It's been 3 years since the launch of Masakhane at Indaba 2019. While the entire world was forced into their homes, Masakhane took this opportunity to grow, and learn and change the face of African NLP. The community has grown to hundreds of members from all over the world and has become a pillar of inspiration, an example of how we can solve NLP differently. Let's examine where we came from, celebrate how far we come and imagine further into the future.

Vukosi Marivate

Dr Vukosi Marivate is the ABSA UP Chair of Data Science at the University of Pretoria. Vukosi works on developing Machine Learning/Artificial Intelligence methods to extract insights from data. He currently serves as a chief investigator on the Masakhane NLP project and on the steering committee of the Lacuna Fund. As part of his vision for Data Science, Vukosi is interested in Data Science for Social Impact, using local challenges as a springboard for research.

Opportunities for Dataset Creation and Curation - Lacuna Fund

Taking in recent progress in AfricanNLP and Dataset creation, we should be thinking about the near future and how we can shape the next innovations. As such I provide an extended introduction to the Lacuna Fund. Lacuna Fund will provide data scientists, researchers, and social entrepreneurs in low- and middle-income contexts globally with the resources they need to either produce new datasets to address an underserved population or problem, augment existing datasets to be more representative, or update old datasets to be more sustainable.

Alia Morsi & Geoffroy Peeters

Alia Morsi is a second-year PhD student at the Music Technology Group in Universitat Pompeu Fabra (UPF), Barcelona. She holds a Bachelor’s degree in Computer Science from the American University in Cairo (AUC) and a Master in Sound and Music Computing from UPF. She is interested in technologies with potential to support music instrument learning, with a current focus on Audio to Score alignment. Currently, she is the student board member for the International Society for Music Information Retrieval (ISMIR).

Music Information Retrieval for Music from Africa

In this introductory talk, we go over 3 of the popular research problems in the Music Information Retrieval (MIR) community: source separation, beat tracking, and music auto-tagging. The goal is to demonstrate the result of their state of the art when applied on different music from Africa, showing when things work and most importantly when they don’t. Finally, we introduce the International Society for Music Information Retrieval (ISMIR), with more information on how to become a part of it. 

Julia Kreutzer

Julia Kreutzer is a Research Scientist at Google Montreal, where she works on improving machine translation. She holds a PhD in Computational Linguistics from Heidelberg University, Germany, and has also been working with the Masakhane community to develop NLP technologies for African languages. She deeply cares about open-source, accessibility, and creativity in NLP research

Tackling Low-Resource Machine Translation with Participation, Data and Scale

This talk will feature three aspects that have recently changed the landscape for low-resource machine translation: First, we'll discover the role of participatory approaches that place native speakers at the core of the development, with the Masakhane community as an example for African languages. Second, we'll dive deep into quality issues of multilingual public datasets that affect low-resource languages disproportionately. And last, we'll learn about the tricks behind Google Translate's most recent success in launching NMT for languages without any parallel data.

Ndapa Nakashole

Ndapa Nakashole is an  Assistant Professor of Computer Science at the University of California, San Diego. Before that she was a postdoc at Carnegie Mellon University. She obtained her PhD from Saarland University and the Max Planck Institute for Informatics, Germany. She obtained her Bsc and Msc in Computer Science from the University of Cape Town, South Africa. In NLP, her work has been on machine reading, machine translation, and natural language interfaces. Her PhD thesis was awarded the Otto Hahn Medal by the Max Planck Society. Her proposed work for 2022–2027 has been awarded an NSF CAREER award. 

Probabilistic Language Models for Healthcare

Much has been said about progress, brought about by probabilistic language models (LMs), in representing, analyzing, and generating natural language(s). Indisputable performance gains have been realized not only on popular corpora such as those derived from mainstream news or from Wikipedia.  Specialized domains whose vocabularies and language patterns are uncommon in standard benchmarks, for example, healthcare, have also benefited from advances in LMs. In this talk, I will start with a big  picture look at these language technologies. Next, I will discuss why healthcare is different, and finish with some of our own work in healthcare applications that leverage LMs, with a focus on entity linking and question answering.

Chris Emezue & Bonaventure Dossou

Chris Emezue spends his time studying at the Technical University of Munich, doing research on structure learning and causal inference at the Mila Quebec AI Institute, ML advocacy at Hugging Face, AfricaNLP research with Masakhane, and building Lanfrica to connect African language resources across the world.

Bonaventure Dossou has a Bachelor's in Mathematics and a Master's in Data Engineering. He is a Drug Discovery and Deep Learning researcher at the Mila Quebec AI Institute, African NLP Researcher with Masakhane Research Foundation, and is co-building Lanfrica to increase the discoverability of African works.  He worked previously as NLP Researcher at Google Research and Roche Canada.

Preserving our African languages through NLP

Chris Emezue and Bonaventure Dossou are active members of the Masakhane research community. The dynamic duo has contributed immensely to NLP for African Languages—from (multilingual) machine translation, speech recognition, named entity recognition, to datasets and Lanfrica. In their keynote, they will tell us about their journey into AfricaNLP research, what kept them going, challenges faced, and lessons learned.

Lacuna Fund Project Talks, Hands-on Sessions, and Lightning Talk Details

Tolúlọpẹ́ Ògúnrẹ̀mí

Tolúlọpẹ́ Ògúnrẹ̀mí is a Ph.D. student at Stanford University in the Stanford NLP Group. Her work focuses on speech and language processing for low-resource languages, currently sub-Saharan African languages. Before, she did a Masters's in Speech and Language Processing at the University of Edinburgh.

Computer Aided Pronunciation Training of Yorùbá using Bayesian Item Response Theory

Yorùbá is a tonal language, making it hard to pronounce for L2 (second language) learners. Tone is usually taught by Elicited Imitation, which is time-consuming and almost impossible in larger classroom settings. Prior work has shown that visualizing the tones helps students with their pronunciation. We combine this with Bayesian Item Response Theory to create a standalone application to help L2 learners of Yorùbá with their pronunciation.

Sebastien Diarra

Sebastien Diarra, a Malian computer science student, currently managing two NLP projects funded by Lacuna Fund, and Google at RobotsMali. Our work on these projects has revolved around the development of datasets, resources, and tools to enable the broader participation of the Malian community in development of the Bambara language.

Constructing Malian Social, Cultural, and Scientific Identity Through NLP

The ends to which we direct NLP research can have a profound, multi-dimensional effect on African nations. In Mali, our research aims to support the construction of our national identity, with implications for our social organizations, education, and cultural expression. The talk will highlight how our work in corpus acquisition used the recording of Griots in Bambara to both advance MT and ASR development while valorizing an important element of Malian culture and our initiatives in crowd-sourcing are enabling broad engagement in the national project to use Malian languages in education and, ultimately, in all spheres of civic lif