The Fourth Arabic Natural Language Processing Workshop

(WANLP 2019)

co-located with ACL 2019, Florence, Italy, July 28-Aug 2, 2019

Workshop Description

Arabic is a challenging language for the field of computational linguistics. This is due to many factors including its complex and rich morphology, its high degree of ambiguity as well as the presence of a number of dialects that vary quite widely. Arabic is also a language with important geopolitical connections. It is spoken by over 400 million people in countries with varying degrees of prosperity and stability. It is the primary language of the latest world refugee problem affecting the Middle East and Europe. The opportunities that are made possible by working on this language and its dialects cannot be underestimated in their consequence on the Arab World, the Mediterranean Region and the rest of the World.

There has been a lot of progress in the last 20 years in the area of Arabic Natural Language Processing (NLP). Many Arabic NLP (or Arabic NLP-related) workshops and conferences have taken place, both in the Arab World and in association with international conferences. Examples include the following:

    • The First, Second and Third Arabic Natural Language Processing Workshop at EMNLP 2014, ACL 2015 and EACL 2017, respectively.
    • The First, Second, and Third Workshops on Arabic Corpora and Processing Tools at LREC 2014, LREC 2016, and LREC 2018, respectively.
    • The conference on Arabic Language Resources and Tools (MEDAR-2009, NEMLAR-2004).
    • The workshop on Computational Approaches to Semitic Languages (LREC 2010, EACL 2009, ACL 2007, ACL 2005, ACL 2002, ACL 1998).
    • The workshop on Computational Approaches to Arabic Script-based Languages (MTSummit XII 2009, LSA 2007, COLING 2004).
    • The International Symposium on Computer and Arabic Language (ISCAL 2009, ISCAL 2007)

This workshop follows in the footsteps of these efforts to provide a forum for researchers to share and discuss their ongoing work. This workshop is timely given the continued rise in research projects focusing on Arabic NLP.

We invite submissions on topics that include, but are not limited to, the following:

    • Basic core technologies: morphological analysis, disambiguation, tokenization, POS tagging, named entity detection, chunking, parsing, semantic role labeling, sentiment analysis, Arabic dialect modeling, etc.
    • Applications: machine translation, speech recognition, speech synthesis, optical character recognition, pedagogy, assistive technologies, social media, etc.
    • Resources: dictionaries, annotated data, corpus, etc.

Submissions may include work in progress as well as finished work. Submissions must have a clear focus on specific issues pertaining to the Arabic language whether it is standard Arabic, dialectal, or mixed. Papers on other languages sharing problems faced by Arabic NLP researchers such as Semitic languages or languages using Arabic script are welcome. Additionally, papers on efforts using Arabic resources but targeting other languages are also welcome. Descriptions of commercial systems are welcome, but authors should be willing to discuss the details of their work.

Associated with the workshop will be a shared task on Arabic dialect identification. As opposed to previous shared tasked which focused on regional level dialect labeling, this shared task will be the first to target a large set of dialect labels at the city and country levels.

Important Dates

April 26, 2019: Workshop Paper Due Date

May 24, 2019: Notification of Acceptance

June 3, 2019: Camera-ready papers due

August 1-2, 2019: Workshop Dates

Paper Submission Instructions

Paper Length: Submissions are expected to be up to 8 pages long plus any number of pages for references. Final versions of long papers will be given one additional page of content (up to 9 pages) so that reviewers’ comments can be taken into account.

Submission Format: Submissions must be in PDF and prepared using LaTeX. The format must conform with official ACL 2019 style templates. .

Submission Website: submissions will be done via softconf. The link will be shared once ready.

Blind Reviewing Policy: The workshop follows a blind reviewing policy. The authors should omit their names and affiliations from the paper and avoid self-references that reveal their identity. Papers that do not conform to these requirements will be rejected without review.

Multiple Submission Policy: Papers that have been or will be submitted to other meetings or publications must indicate this at submission time. Authors must inform organizers immediately once a paper is to be withdrawn from the workshop for any reason. Attempting to publish the same paper or with a major overlap (50%) may lead to rejection of the paper even after an acceptance notification have gone out.

Workshop schedule

8:30 - 8:40: Opening remarks

8:40 - 9:30: Invited Keynote Speaker

9:30 - 10:30: Session #1 (3 papers)

10:30 - 11:00: Coffee Break

11:00 - 12:20: Session #2 (4 papers)

12:20 - 14:00: Lunch

14:00 - 15:00: Session #3 (3 papers)

15:00 - 16:30: Session #4 (poster session [9-12 papers])

16:30 - 17:00: Coffee Break

17:00 - 18:00: Shared Task Session

    • 17:00 - 17:30: 3 short talks
    • 17:30 - 18:00: Group Discussion

Invited Speaker

Dr. Ahmed Ali of the Qatar Computing Research Institute (QCRI) has agreed to be the keynote speaker at the workshop. He will be talking about the latest research and advances in Arabic dialect speech recognition. The speaker will cover his own expenses.

Workshop Organizers

General Chair:

      • Wassim El-Hajj, American University of Beirut, Lebanon. Email: we07@aub.edu.lb

Program Chairs:

      • Lamia Hadrich Belguith, Sfax University, Tunisia. Email: lamia.belguith@gmail.com
      • Fethi Bougares, University of Le Mans, France. Email: fethi.bougares@univ-lemans.fr
      • Walid Magdy, University of Edinburgh, Scotland. Email: wmagdy@inf.ed.ac.uk
      • Imed Zitouni, Microsoft, Email: izitouni@microsoft.com

Publication Chairs:

      • Nadi Tomeh, LIPN, Université Paris 13, Sorbonne Paris Cité. Email: nadi.tomeh@lipn.univ-paris13.fr
      • Mahmoud El-Haj, Lancaster University, England. Email: m.el-haj@lancaster.ac.uk

Publicity Chair:

      • Wajdi Zaghouani, Hamad Bin Khalifa University, Qatar. Email: wzaghouani@hbku.edu.qa

Ex-General Chair / Advisor:

      • Nizar Habash, New York University Abu Dhabi, UAE. Email: nizar.habash@nyu.edu

Advisory Committee:

      • Hend Al-Khalifa, King Saud University, KSA. Email: hendk@ksu.edu.sa
      • Houda Bouamor, Fortia Financial Solutions, France. Email: houda.bouamor@fortia.fr
      • Fethi Bougares, University of Le Mans, France. Email: fethi.bougares@univ-lemans.fr
      • Kareem Darwish, Qatar Computing Research Institute, Qatar. Email: kdarwish@qf.org.qa
      • Mona Diab, The George Washington University, USA. Email: mtdiab@email.gwu.edu
      • Mahmoud El-Haj, Lancaster University, England. Email: m.el-haj@lancaster.ac.uk
      • Wassim El-Hajj, American University of Beirut, Lebanon. Email: we07@aub.edu.lb
      • Nizar Habash, New York University Abu Dhabi, UAE. Email: nizar.habash@nyu.edu
      • Nadi Tomeh, LIPN, Université Paris 13, Sorbonne Paris Cité. Email: nadi.tomeh@lipn.univ-paris13.fr
      • Wajdi Zaghouani, Hamad Bin Khalifa University , Qatar. Email: wzaghouani@hbku.edu.qa

Program Committee Members

  1. Abdelali Ahmed, Qatar Computing Research Institute, Qatar
  2. Abdul-Mageed Muhammad, The University of British Columbia, Canada
  3. Afli Haithem, Cork Institute of Technology, Ireland
  4. Ali Ahmed, Qatar Computing Research Institute, Qatar
  5. Alkhalifa Hend, King Saud University, Saudi Arabia
  6. Alowsiheq Areeb, Imam University, KSA
  7. Al-Twairesh Nora, King Saud University, Saudi Arabia
  8. Alzahrani Salha, Taif University, Saudi Arabia
  9. B. Al-Said Almoataz, Cairo University, Egypt
  10. Baly Ramy, Massachusetts Institute of Technology, USA
  11. Barrón-Cedeño Alberto, Qatar Computing Research Institute, Qatar
  12. Ben-Hamadou Abdelmajid, University of Sfax, Tunisia
  13. Bouamor Houda, Fortia Financial Solutions, France
  14. Bougares Fethi, Le Mans University, France
  15. Bouzoubaa Karim, Mohammad V University, Morocco
  16. Buckwalter Tim, University of Maryland, USA
  17. Cavalli-Sforza Violetta, Al Akhawayn University, Morocco
  18. Chalabi Achraf, Microsoft Research, Egypt
  19. Choukri Khalid, ELDA, European Language Resource Association, France
  20. Darwish Kareem, Qatar Computing Research Institute, Qatar
  21. Dayel Abeer, King Saud University, Saudi Arabia
  22. Diab Mona, George Washington University, USA
  23. Dichy Joseph, Université Lyon 2 , France
  24. El Haj Mahmoud, Lancaster University, UK
  25. El-Hajj Wassim, American University of Beirut, Lebanon
  26. Elmahdy Mohamed, Qatar University, Qatar
  27. Elsayed Tamer, Qatar University, Qatar
  28. Emam Ossama, IBM, USA
  29. Eskander Ramy, Columbia University, USA
  30. Fahmy Aly, Cairo University, Egypt
  31. Farghaly Ali, Monterey Peninsula College, USA
  32. Ghneim Nada, Higher Institute for Applied Sciences and Technology, Syria
  33. Habash Nizar, New York University Abu Dhabi, UAE
  34. Haddad Bassam, University of Petra, Jordan
  35. Hadrich Belguith Lamia, University of Sfax, Tunisia
  36. Hajj Hazem, American University of Beirut, Lebanon
  37. Hamada Salwa, Cairo University, Egypt
  38. Jarrar Mustafa, Bir Zeit University, Palestine
  39. Khadivi Shahram, Tehran Polytechnic, Iran
  40. Maamouri Mohamed, Linguistic Data Consortium, USA
  41. Magdy Walid, University of Edinburgh, Scotland
  42. Mazroui Azzeddine, University Mohamed I, Morocco
  43. Megerdoomian Karine, The MITRE Corporation, USA
  44. Mohamed Emad, Suez Canal University, Egypt
  45. Mourad Ghassan, Lebanese University, Lebanon
  46. Mubarak Hamdy, Qatar Computing Research Institute, Qatar
  47. Nakov Preslav, Qatar Computing Research Institute, Qatar
  48. Nasr Alexis, University of Marseille, France
  49. Nwesri Abdelsalam, University of Tripoli, Libya
  50. Oflazer Kemal, Carnegie Mellon University Qatar, Qatar
  51. Rafea Ahmed, The American University in Cairo, Egypt
  52. Rambow Owen, Columbia University, USA
  53. Refaee Eshrag, Jazan University, Saudi Arabia
  54. Salameh Mohammad, Carnegie Mellon University, Qatar
  55. Sawaf Hassan, eBay Inc., USA
  56. Shaalan Khaled, The British University in Dubai, UAE
  57. Shaban Khaled, Qatar University, Qatar
  58. Smrž Otakar, Institute of Formal and Applied Linguistics, Charles University in Prague , Czech Republic
  59. Tomeh Nadi, University Paris 13, France
  60. Vogel Stephan, Qatar Computing Research Institute, Qatar
  61. Wray Samantha, Qatar Computing Research Institute, Qatar
  62. Zaghouani Wajdi, Hamad Bin Khalifa University, Qatar
  63. Zerrouki Taha, University of Bouira, Algeria
  64. Zitouni Imed, Microsoft Research, USA

MADAR Shared Task: Arabic Fine-Grained Dialect Identification

Introduction: Arabic dialect identification is the task of automatically labeling a segment of speech or text with the dialect it comes from. Most of previous work and shared tasks on dialect identification focused on regional level dialect labeling (efforts by Zaidan and Callison-Burch, Elfardy and Diab, and the VarDial ADI evaluation campaign (http://alt.qcri.org/vardial2018/index.php?id=campaign)). This new proposed shared task will be the first to target a large set of dialect labels at the city and country levels. The data for the shared task is created or collected under the Multi-Arabic Dialect Applications and Resources (MADAR) project. (MADAR Project Page: https://camel.abudhabi.nyu.edu/madar/)

Task: The participants will be given access to two MADAR datasets corresponding to two evaluation tracks: Dialect identification in (1) the travel-domain and in (2) social media free text (twitter).

1) 80,000 sentences in the travel domain from a mix of 25 cities.

    • Bouamor, H., Habash, N., Salameh, M., Zaghouani, W., Rambow, O., et al. (2018). The MADAR Arabic Dialect Corpus and Lexicon. In Proceedings of the 11th International Conference on Language Resources and Evaluation. (PDF: http://www.lrec-conf.org/proceedings/lrec2018/pdf/351.pdf)
    • Salameh, M., Bouamor, H. & Habash, N. (2018). Fine-Grained Arabic Dialect Identification. In Proceedings of the 27th International Conference on Computational Linguistics. (PDF: http://aclweb.org/anthology/C18-1113)

2) 2,500 twitter profiles manually annotated for country (and city when possible) dialects.

The evaluation metrics will include precision/recall/f-score/accuracy in addition to a new hierarchical evaluation metric designed for Arabic dialects. Average F-score will be the primary metric.

Participants need to register. Once registered, all participating teams will be provided with a common training data set. A common development set will also be provided. A blind test data set will be used to evaluate the output of the participating teams. An evaluation script will be also provided to all the teams.

The shared task will be hosted through Codalab

Shared Task Organizers

    • Nizar Habash, New York University Abu Dhabi, UAE
    • Houda Bouamor, Fortia Financial Solutions, France
    • Sabit Hasan, Carnegie Mellon University Qatar, Qatar