Schedule

This schedule is tentative. Links will be updated here for readings and labs as we progress through the semester. Readings should be posted the Friday before the relevant Tuesday class, while labs will be posted Thursday mornings just before class. All times are in Pacific Time. (UTC-8 until March 14, then UTC-7)

SCHEDULE

Week 1: Regular Expressions (1/26, 1/28)

What's this class about? How do computers represent text? What are some basic ways to specify patterns of characters from text, and how might they be useful?

Reading: Syllabus (Tue 1/26), J&M 2.1 (Thu 1/28).

Other recommended reading:

  • This blog post on text encodings and Unicode by David Zentgraf (stop at “Encodings and PHP”)

  • This TechCrunch article by Devin Coldewey on Unicode’s role in the new Reiwa era in Japan

Slides from day 1: https://drive.google.com/file/d/1mTOmyI29XBLaedxkyk0LeAQdskE_j7GH/view?usp=sharing

Assignments:


Week 2: Tokenization, Normalization, and Segmentation (2/2, 2/4)

How do we separate out and distinguish words in text? Why is it so challenging?

Reading:

Assignments:


Week 3: Probability, N-Grams, and Smoothing (2/9, 2/11)

How can we model the probabilities of sequences of words when some word sequences have never before been spoken?

Reading:

  • All - J&M 3.1-3.4. Need a probability term refresher? Check out these videos on Bayes' Rule from 3Blue1Brown (created by Grant Sanderson) to refresh on terminology and for some beautiful visuals to help your probabilistic intuitions.

  • Sign up for discussion readings.

Assignments:

  • Gradescope Worksheet (due Tue 2/9 at 5 PM on Gradescope)

  • Lab 3. Zipf's Law and NGrams, due Wed 2/17 at 10 PM.


Week 4: Vector Semantics (2/16, 2/18)

Reading:

Assignment:

  • Gradescope Worksheet (due Tue 2/16 at 5 PM on Gradescope)

  • Lab 4. GloVe Experiments, due Wed 2/24 at 10 PM.


Week 5: Word Sense Disambiguation (2/23, 2/25)

Reading:

Assignment:


Week 6: Project Ramp-Up (3/2, 3/4)

Reading:

Assignment:

  • Gradescope Worksheet (due Tue 3/2 at 5 PM on Gradescope)

  • Project Proposals (due Wed 3/24 at 10 PM)


Spring Break 3/6-3/14, no class


Week 7: Part-of-Speech Tagging (3/16, 3/18)

Reading:

Assignment:

  • Gradescope Worksheet (due Tue 3/16 at 5 PM on Gradescope)

  • Sign up for special topics (link out 3/18), due Tues 3/23

  • Lab 6. HMMs for POS tagging, due Wed 3/24 at 10 PM.


Week 8: Text Classification (3/23, 3/25)

Reading:

Assignment:


Week 9: Midterm Review (3/30, 4/1)

3/31: Midterm Review

4/1: No Class (midterm)

Assignments:

  • Midterm Exam Released 3/31 at 10 PM, due 4/5 at 10 PM


Week 10: Brainstorming (4/6, 4/8)

4/6: Shell Basics and Special Topics Coordination

4/8: Special Topic 1

Readings: Section 1/Section 2

Slides: UNIX basics (and detailed guide)

Assignments: Literature Review (due 4/21 10 PM)
Please have a first draft of your related work (methods optional) for peer review 4/15.

Week 11: Special Topics/Projects (4/13, 4/15)

4/13: Special Topic 2

Readings: Section 1/Section 2

4/15: Literature Review Peer Review Session

Assignments:


Week 12: Special Topics/Projects (4/20, 4/22)

4/20: Special Topics 3 and 4

Readings: Section 1/Section 2

4/22: Invited Speaker, Work Session

Section 1: Zhijing Jin, Max Planck Institute

Section 2: Rishi Bommasani, Stanford

Assignments: Lit Review due 4/21 at 10 PM, Project Update Video (due 4/28 at 10 PM)


Week 13: Special Topics/Projects (4/27, 4/29)

4/27: Special Topics 5 and 6

Readings: Section 1/Section 2

4/29: Invited Speaker, Work Session

Section 1: Tal Perry, LightTag

Section 2: Vasundhara Gautam, Dialpad, Inc.

Assignments:

  • Project video peer reviews, due Monday 5/3 at 10 PM

  • Final Project Submission, due 5 PM on May 14th


Week 14: Wrapping Up (5/4, 5/6)

5/4: Topic Models, Work Session

5/6: Invited Speaker, Evals, Closing Thoughts

Section 1: Sabrina Mielke, Johns Hopkins University

Section 2: Volkan Cirik, CMU


Exam Week

5/14 Final Project Writeup Due at 5 PM*

*this is the later of the two exam dates