ILLC-NLP 2024
First Workshop on NLP for Indigenous Languages of Lusophone Countries
March 12, 2024
Thank you everyone that attended ILLC-NLP 2024, the first edition of our workshop!
The panel discussion can be watched back here: https://drive.google.com/file/d/1dax00Ey2gIGTxPxlt3hKjYU7Nz98IAfA/view?usp=sharing
The keynote and online paper presentations can be viewed here: https://drive.google.com/file/d/1RqFCWCx4kPm2EqhqcLiVzFzK_Ixq86bW/view?usp=sharing
Welcome to ILLC-NLP 2024, the first Workshop on NLP for Indigenous Languages of Lusophone Countries, co-located with PROPOR 2024 in Santiago de Compostela, Galicia!
Get in touch at: illc-nlp-2024@googlegroups.com
The Lusophone community includes nine Portuguese-speaking nations on four different continents; Portuguese is the sole official language in seven of them and one of the official languages for the other two. While Portuguese may be spoken widely in these countries, they also have many indigenous and minority languages spoken natively by large populations (e.g. Umbundu in Angola has ~7 million native speakers, Makhuwa in Mozambique ~9 million, and Fang in Equatorial Guinea ~1 million). There are also many languages spoken natively by smaller populations which nevertheless have an important cultural history (e.g. Brazil alone has 217 recognized indigenous languages).
Despite their prevalence and importance, these languages are seriously under-resourced and under-researched, and many face extinction. As advances in NLP-based technologies have started to reach more spheres of society with applications in diverse domains, these include indigenous languages, with tools aiding in promoting and preserving these languages. However, the focus on indigenous languages of Lusophone countries has been limited. As such, ILLC-NLP targets an area for which research, resources, and tools are in dire need of development and promotion.
Call for Papers
The aim of ILLC-NLP is to encourage the development and application of Natural Language Processing (NLP) techniques to indigenous languages of countries where Portuguese is an official language. Such languages are under-resourced and marginalized, despite often having a large number of native speakers, and hence there is a strong need to develop techniques which: preserve these languages, and hence their indigenous cultures; improve visibility of marginalized communities; and improve communication and access to information for such communities. Addressing this need, the ILLC-NLP 2024 workshop brings together researchers and practitioners from academia and industry to share their work, including annotated datasets, methods, trained models and applications. The workshop will also provide a forum for researchers and practitioners to collaborate on new projects. The workshop will feature a combination of keynote presentations, panel discussions, paper presentations, and interactive hands-on sessions. We will encourage participants to collaborate and develop concrete projects and initiatives during the workshop.
We call for papers describing work on any topic related to computational language and speech processing of indigenous languages of Lusophone countries by researchers in industry or academia. We also welcome work on low-resource languages of Lusophone countries which don't necessarily fall under the indigenous header, including Portuguese creoles (e.g. Cape Verdean Creole, Guinea-Bissau Creole, Papiamento). Topics of interest include, but are not limited to:
Datasets and resources that could help train NLP models and create applications for indigenous and low-resource languages of Lusophone countries.
Models trained to specifically target these languages (e.g. machine translation, text classification, morphological processing etc.).
Applications developed in these languages.
Novel tasks involving these languages.
Resources, models, applications, and tasks for any other languages spoken in Lusophone countries other than Portuguese.
Previous NLP work in other low-resource languages.
ILLC-NLP 2024 will be co-located with PROPOR 2024, which will be held at the University of Santiago de Compostela (Santiago de Compostela – Galicia, Spain) from March 14th to March 15th.
Submissions should describe original, unpublished work. Authors are invited to submit two kinds of papers:
Long papers – Reporting substantial and completed work, especially those that may contribute in a significant way to the advancement of the area. Wherever appropriate, concrete evaluation results should be included. Long papers may consist of up to 8 pages of content, plus unlimited pages of references.
Short papers – Reporting small, focused contributions such as ongoing work, position papers, potential ideas to be discussed, or negative results. Short papers may consist of up to 4 pages of content, plus unlimited pages of references.
Submissions should be written in English. At submission time, papers must be in PDF format only. For the final versions, authors of accepted papers will be given 1 extra content page to take the reviews into account. Authors of accepted papers will be requested to send the source files for the production of the proceedings. All submitted papers must conform to the official ACL style guidelines (Latex or Word).
Both long and short papers will be published in the ACL Anthology.
Submission site: Papers should be submitted via Easy Chair by either selecting the track ILLC-NLP2024 Long Paper or ILLC-NLP2024 Short Paper.
Reviewing format: At least two reviewers will evaluate each submission. The reviewing format will be single-blind.
Dates
All deadlines are 23:59 A.o.E
Full and short paper submission deadline: 10 Jan 2024 17 Jan 2024
Notification of paper acceptance or rejection: 25 Jan 2024 01 Feb 2024
Camera-ready papers due: 01 Feb 2024 05 Feb 2024
Workshop day: 12 March 2024
Organisers
Aline Paes, Federal Fluminense University
Aline Villavicencio, University of Sheffield and University of Exeter
Claudio Pinhanez, IBM Research Brazil and University of São Paulo
Paulo Rodrigo Cavalin, IBM Research Brazil
Edward Gow-Smith, University of Sheffield
Program Committee
Arnaldo Candido Junior, UTFPR
Leonel Figueiredo de Alencar, UFC
Lílian Teixeira De Sousa, UNICAMP
Marcelo Finger, USP
Marcely Zanon Boito, NAVER Labs
Rodrigo Wilkens, KU Leuven
Program
Fontán Building (room 7)
11:10–12:10: Keynote Speaker (prof. Leonel Figueiredo de Alencar, UFC)
12:10–1:10: Papers session 1
P1: Network-based Approach for Stopwords Detection (Felermino D. M. A. Ali, Gabriel de Jesus, Henrique Lopes Cardoso, Sérgio Nunes and Rui Sousa-Silva (LIACC, INESC TEC and CLUP, Universidade do Porto, Portugal)
P2: Computational Model for Yoruba Aroko Communication System (Adéwuyì Adétáyò ADÉGBÌTÉ and Odétúnjí Àjàdí ODÉJOBÍ, Akungba Akoko and Ile-Ife, Nigeria)
P3: African Languages: Overview (Joaquim Mussandi and Andreas Wichert, Instituto Superior Técnico-Lisbon University, Portugal)
1:10–2:40 - Lunch
2:40–4:00: Papers session 2
P1: A Universal Dependencies Treebank for Nheengatu (Leonel Figueiredo de Alencar, Universidade Federal do Ceará, Brazil)
P2: Human Evaluation of the Usefulness of Fine-Tuned English Translators for the Guarani Mbya and Nheengatu Indigenous Languages (Claudio Pinhanez, Paulo Cavalin and Julio Nogima, IBM Research, Brazil)
P3: Grammar Induction for Brazilian Indigenous Languages (Diego Silva and Thiago Pardo, Universidade de São Paulo, Brazil)
P4: Building a Language-Learning Game for Brazilian Indigenous Languages: A Case of Study (Gustavo Polleti, Universidade de São Paulo, Brazil)
4:00–4:30 - Coffee break
4:30–6:00: Panel: Past, Present, and Future of NLP for Indigenous Languages of Lusophone Countries
Antonios Anastasopoulos, Éric Le Ferrand, Fabrício Ferraz Gerardi