Time: R 4:10pm-6:00pm
Place: SEELEY W. MUDD 327
Instructors: Dr. Nizar Habash and Dr. Nadi Tomeh
Email: habash #åţ# ccls.columbia.edu
Office: 212-870-1289
Office Hours: R 6-8 at the Center for Computational Learning Systems
Teaching Assistant: Wael Salloum
Academic Integrity | Description |Readings
Resources | Requirements | Syllabus | Language-in-10 | Project Ideas
Each session except as indicated in the Syllabus will include:
Grade in this course will be calculated as follows:
Important Dates
All deadlines are by 11:59pm (ET) of the due date unless otherwise specified.
Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you believe you are going to have trouble completing an assignment, please talk to the instructor or TA in advance of the due date.
Philipp Koehn's book Statistical Machine Translation is recommended but not required.
All required readings will be available on line.
This is a short presentation of around 10 minutes on a particular language, e.g., Arabic, Chinese, Czech, Hindi, Italian, Ewe, or Maltese.
For each language, the student will prepare (three to six) slides on a language they do not speak natively. The slides must cover (1) Language Facts (demographics, location, etc.) (2) Important linguistic characteristics (orthography, morphology, syntax) and (3) computational efforts such as resources, tools, papers -- e.g., how many entries in MT Archive? and what are they generally on? Be creative and have fun with this. Asking for help from native speakers or language experts is ok. But the student is ultimately responsible for the presentation.
Examples from previous presentations are also available here.
Resources that can help your preparation of slides:
Midterm Report
The midterm report must include the following:
a. Introduction and problem definition
b. Literature review (at least 5 papers)
c. Description of resources used. This may includes stats on data, OOV rates and the like.
d. Baseline results (comparable to MT lab but for your language).
e. Analysis of errors in baseline based on a sample (not less than 20 sentences and looking at English side only); focus on the problem you are targeting.
f. bibliography of cited papers.
The midterm report should be about half the length of a conference paper (so - 3-4 pages single spaced or 6-8 pages double spaced)
Final Report
Final report should be in the style of an ACL publication: 8 page double column, plus any # of pages for references. You will see many examples in the class.