This is the web page for LIN655, Spring 2021.
Link to SBU Class Find system here. Note the descrption there is wrong (since this is a course number that is reused).
Tuesdays and Thursdays 11.30-12.50. BLDG: Frey Hall Room: 105. We will meet there in person, but remote participation possible from the beginning. Check with instructor.
All types of participation welcome: enrolled students, auditors, inofficial auditors, spontaneous auditors, occasional auditors.
Join this mailing list for updates about this course: https://groups.google.com/a/stonybrook.edu/g/linguistics_arabicnlp
Taught by: Owen Rambow, owen.rambow@stonybrook.edu
Arabic is a linguistically very interesting and challenging language. These are the aspects the course will concentrate on:
Arabic has a combination of affixival and templatic morphology (both types used for both inflectional and derivational morphology).
Arabic shows diglossia, meaning that there is a standard written language with no native speakers, and dialects which show wide variation (mainly geographic) and which have not been traditionally written. The standard written language is called Modern Standard Arabic (MSA).Of course, other aspects of Arabic will also be discussed.These aspects of Arabic are not unique to it. Languages with templatic morphology include the other Semitic languages (Hebrew, Amharic), and to some extent Germanic languages show similar patterns. Almost all languages have morphology more complex than English. Many languages show extensive dialectal variation and diglossia, such as Tamil or Swiss German. The situation of African American Vernacular English in the US can be analyzed as diglossic.
Because of the morphological and dialectal complexity mentioned in Section 1, Arabic is challenging for NLP, and techniques developed for English cannot be immediately used for Arabic. It thus represents a technical challenge for NLP.
Arabic is used (in some form) by 270 million people. The market for Arabic NLP is thus substantial.
There has been a rapid increase in NLP for languages other than English both in Academia and in the private sector, including the big technology companies (Google, Amazon, Facebook, Microsoft). It is important to test NLP techniques on multiple languages to understand their generality.
The goal of the course is not simply to appeal to people already working on Arabic NLP. The course is designed to appeal to several groups of people:
Anyone with an interest in computational linguistics and/or natural language processing, i.e., both linguistics and computer science students.
For students with an interest in Arabic linguistics (not specifically computational linguistics), the course will show how existing corpora and computational methods can be used to serve their (non- computational) research interests.
For students with an interest in natural language processing and/or computational linguistics, but who have not so far worked on Arabic, the course will explore how standard approaches can be used for a language which is quite challenging in many ways, and how techniques need to be extended or new techniques developed.
Advanced undergrads are welcome in addition to graduate students. See section on prerequisiNative speakers and learners of Arabic are of course welcome, but no knowledge of any Semitic language is required. The instructor does not speak Arabic himself.