The Seventh
Arabic Natural Language Processing Workshop (WANLP 2022)

Workshop Proceedings

at EMNLP 2022 in Abu Dhabi, UAE (Dec 8th, 2022)

with associated event: The Arabic NLP Tutorial (Dec. 7th, 2022)

Workshop Description

Arabic is a challenging language for the field of computational linguistics. This is due to many factors including its complex and rich morphology, its high degree of ambiguity as well as the presence of a number of dialects that vary quite widely. Arabic is also a language with important geopolitical connections. It is spoken by over 400 million people in countries with varying degrees of prosperity and stability. It is the primary language of the latest world refugee problem affecting the Middle East and Europe. The opportunities that are made possible by working on this language and its dialects cannot be underestimated in their consequence on the Arab World, the Mediterranean Region, and the rest of the World.

There has been a lot of progress in the last 20 years in the area of Arabic Natural Language Processing (NLP). Many Arabic NLP (or Arabic NLP-related) workshops and conferences have taken place, both in the Arab World and in association with international conferences. Examples include the following:

- The First, Second, Third, Fourth and Fifth Arabic Natural Language Processing Workshop at EMNLP 2014, ACL 2015, EACL 2017, ACL 2019, and COLING 2020 respectively.
- The First, Second, Third, and Fourth Workshops on Arabic Corpora and Processing Tools at LREC 2014, LREC 2016, LREC 2018, and LREC 2020 respectively.
- The conference on Arabic Language Resources and Tools (MEDAR-2009, NEMLAR-2004).
- The workshop on Computational Approaches to Semitic Languages (LREC 2010, EACL 2009, ACL 2007, ACL 2005, ACL 2002, ACL 1998).
- The workshop on Computational Approaches to Arabic Script-based Languages (MTSummit XII 2009, LSA 2007, COLING 2004).
- The International Symposium on Computer and Arabic Language (ISCAL 2009, ISCAL 2007)

This workshop follows in the footsteps of these efforts to provide a forum for researchers to share and discuss their ongoing work. This workshop is timely given the continued rise in research projects focusing on Arabic NLP.

We invite submissions on topics of natural language processing that include, but are not limited to, the following:

- Basic core technologies: morphological analysis, disambiguation, tokenization, POS tagging, named entity detection, chunking, parsing, semantic role labeling, Arabic dialect modeling, etc.
- Applications: machine translation, speech recognition, speech synthesis, optical character recognition, pedagogy, assistive technologies, social media analytics, sentiment analysis, summarizations, dialogue systems, etc.
- Resources: lexicons, dictionaries, annotated and unannotated corpora, etc.

Submissions may include work in progress, as well as finished work that has not been previously published. Submissions must have a clear focus on specific issues pertaining to the Arabic language whether it is standard Arabic, dialectal, or mixed. Papers on other languages sharing problems faced by Arabic NLP researchers such as Semitic languages or languages using Arabic script are welcome. Additionally, papers on efforts using Arabic resources but targeting other languages are also welcome. Descriptions of commercial systems are welcome, but authors should be willing to discuss the details of their work.

This workshop is endorsed by SIGARAB, an Association for Computational Linguistics Special Interested Group on Arabic Natural Language Processing.

Important Dates

Workshop Paper Due Date (Extended): ~~September, 5~~ September, 12
Notification of Acceptance: ~~October, 10~~ October, 12. The list of accepted papers is available here.
Camera-ready papers due: October, 21
Workshop Date (one day): December 8

All deadlines are 11:59 pm UTC -12h (“Anywhere on Earth”).

Invited Speaker: Prof. Karim Bouzoubaa

Title: Digital Preservation of Arabic between Linguistics and AI

Abstract: Languages are one of the oldest studied disciplines as they are intimately linked to the existence of human beings. The study of languages is a multidisciplinary field which has attracted the interest of several related fields such as linguistics and NLP, each providing additional knowledge for language understanding, learning, evolution, or preservation.

From the technological point of view, computer science in general and artificial intelligence in particular study languages through natural language processing techniques, where the main goal is to discover linguistic patterns from corpora without resorting to linguists at all in many cases. Research in this field is diverse and currently benefits from advances in machine learning and deep learning techniques. One of the less studied aspects is the use and exploitation of these techniques for language preservation needs, for language comprehension needs or for the explanation of linguistic phenomena.

The objective of this talk is to emphasize this perspective and to show through concrete cases how through the exploitation of several old and new computer and AI techniques, we can advance the digital preservation of Arabic and the explanation of some linguistic properties.

Short Biography: Karim Bouzoubaa is a Professor of computer science at the Mohammadia School of Engineers at the Mohammed 5^thUniversity of Rabat. Prof. Bouzoubaa holds a M.Sc. and a Ph.D. from Laval University in Canada in Artificial Intelligence and multi-agent systems fields. He is a research-driven professional with a distinctive combination of leadership, research & development, and education in the areas of Artificial Intelligence and Data Science. He contributed to the release of the Amine platform for the development of intelligent systems. He has published two books and over a hundred papers in top-ranked conferences and journals, taught at undergraduate and postgraduate levels, and worked on various R&D projects. He is the founding president of the Arabic Language Engineering Society in Morocco and the director of the Language Engineering lab. His research interests include Arabic NLP, NLP frameworks, Linguistic Resources and ontologies, IR and QA systems, Dialect processing, and Cognitive systems.