Schedule

Links will be updated here for readings and labs as we progress through the semester. Readings should be posted the Friday before the relevant Tuesday class, while labs will be posted Thursday mornings just before class. All times are in Pacific Time. (UTC-7 until November, then UTC-8)

SCHEDULE

Week 1: Regular Expressions (8/25, 8/27)

What's this class about? How do computers represent text? What are some basic ways to specify patterns of characters from text, and how might they be useful?

Reading: Syllabus (Tue 8/25), J&M 2.1 (Thu 8/27).

Other recommended reading:

  • This blog post on text encodings and unicode by David Zentgraf (stop at “Encodings and PHP”)

  • This TechCrunch article by Devin Coldewey on how unicode’s role in the new Reiwa era in Japan

Assignments:


Week 2: Tokenization, Normalization, and Segmentation (9/1, 9/3)

How do we separate out and distinguish words in text? Why is it so challenging?

Reading:

Assignments:


Week 3: Probability, N-Grams, and Smoothing (9/8, 9/10)

How can we model the probabilities of sequences of words when some word sequences have never before been spoken?

Reading:

  • All - J&M 3.1-3.4. Need a probability term refresher? Check out these videos on Bayes' Rule from 3Blue1Brown (created by Grant Sanderson) to refresh on terminology and for some beautiful visuals to help your probabilistic intuitions.

  • Sign up for discussion readings on Google Drive

Assignments:

  • Gradescope Worksheet (due Tue 9/8 at 12 PM on Gradescope)

  • Lab 3. Zipf's Law and NGrams, due Wed 9/16 at 10 PM.


Week 4: Vector Semantics (9/15, 9/17)

Speaker: Maria Antoniak, Cornell University

Reading:

Assignment:

  • Gradescope Worksheet (due Tue 9/15 at 12 PM on Gradescope)

  • Lab 4. GloVe Experiments, due Wed 9/23 at 10 PM.


Week 5: Word Sense Disambiguation (9/22, 9/24)

Reading:

Assignment:


Week 6: Part-of-Speech (POS) Tagging (9/29, 10/1)

Reading:

Assignment:

  • Gradescope Worksheet (due Tue 9/30 at 12 PM on Gradescope)

  • Lab 6. HMMs for POS tagging, due Wed 10/7 at 10 PM.


Week 7: Text Classification (10/6, 10/8)

Speaker: Neha Nayak Kennard, UMass Amherst
Reading:

Assignment:


Week 8: Exam Review (10/13, 10/15)

10/13: Midterm Review

10/15: No Class (midterm)

Assignments:


Week 9: Brainstorming (10/20, 10/22)

10/20: Special Topic Planning Session

10/22: Project Proposal Session, Speaker (Yuval Pinter, Georgia Tech)

Assignments:

  • Special Topic Reading Assignments (due 10/23 at 10 PM)

  • Project Quick Proposals (due 10/26 at 10 PM)


Week 10: Automatic Summarization / Project Work (10/27, 10/29)

10/27: Automatic Summarization, Work Session

Readings:

10/29: Work Session, Speaker (Emma Manning, Georgetown)


Week 11: Question Answering / CSS (11/3, 11/5)

11/3: Question Answering, Work Session

Readings:

11/5: Computational Social Science, Speaker (Jonathan Cheng, Apple)

Readings:

Assignments: Project Literature Review + Methods (due 11/11 at 10 PM)


Week 12: Machine Translation / Project Work (11/10, 11/12)

11/10: Machine Translation, Work Session

Readings:

11/12: Work Session, Speaker (Huda Khayrallah, Johns Hopkins University)

Assignments: Peer Review (due 11/18 at 10 PM)


Week 13: Special Topic 5 / Project Presentations (11/17, 11/19)

11/17: Text to Speech

Readings:

11/19 Final Presentations Day 1


Week 14: Project Presentations & Wrap-Up (11/24)

11/24 Final Presentations Day 2


Exam Week

12/4 Final Project Writeup Due at 5 PM