Students should have taken at least one of:
LIN537.01, Computational Linguistics I (offered Fall 2020)
CSE354, Natural Language Processing
CSE538, Natural Language Processing (offered Fall 2020)
Contact me if you want to take the course but don't meet these requirements.
One small project on using resources (corpora, morphological analyzers). The goal will be to find (or confirm) an interesting fact about Arabic from the existing corpora, using existing tools.
One small project on creating a standard small tool (such as a morphological analyzer or tagger, or a parser) for MSA or one or more of the dialects. Ideally, this will build on the first small project.
One large project, which will build on existing resources (corpora and software), possibly the first two projects. This project aims at discovering interesting empirical facts about Arabic, and/or finding computational solutions to problems in Arabic NLP. The final project will be presented (possibly as work in progress) in the final week of the course.
One in-course presentation of a relevant paper (linguistic and/or computational).
All three projects can be done in pairs, ideally one student more versed in linguistic issues, and one student more versed in computational issues. Normally, as these projects are expected to build on one another, the pairs would remain stable for the semester. The amount of work expected will be adjusted if someone does not want to or cannot work in a team. Larger teams will be considered on a case-by-case basis.
For a list of sample projects, see here.