Schedule
This schedule is tentative. Links will be updated here for readings and labs as we progress through the semester. Readings should be posted the Friday before the relevant Tuesday class, while labs will be posted Thursday mornings just before class. All times are in Pacific Time. (UTC-8 until March 14, then UTC-7)
SCHEDULE
Week 1: Regular Expressions (1/26, 1/28)
What's this class about? How do computers represent text? What are some basic ways to specify patterns of characters from text, and how might they be useful?
Reading: Syllabus (Tue 1/26), J&M 2.1 (Thu 1/28).
Other recommended reading:
This blog post on text encodings and Unicode by David Zentgraf (stop at “Encodings and PHP”)
This TechCrunch article by Devin Coldewey on Unicode’s role in the new Reiwa era in Japan
Slides from day 1: https://drive.google.com/file/d/1mTOmyI29XBLaedxkyk0LeAQdskE_j7GH/view?usp=sharing
Assignments:
Biographical Survey due Thu 1/28 before class. Please use your g.hmc.edu login.
Worksheet (due Thu 1/28 by 5 PM on Gradescope)
Lab 1. RegExp Golf, Making a Discord Bot, due Wed 2/3 at 10 PM on Gradescope.
Week 2: Tokenization, Normalization, and Segmentation (2/2, 2/4)
How do we separate out and distinguish words in text? Why is it so challenging?
Reading:
All - J&M 2.2-2.4 except 2.4.3, 4.7. Also, How to Read a Paper by S. Keshav.
Sign up for discussion readings.
Assignments:
Gradescope In-Class Worksheet (due Tue 2/2 at 5 PM on Gradescope)
Lab 2. Tokenization and Normalization, due Wed 2/10 at 10 PM.
Week 3: Probability, N-Grams, and Smoothing (2/9, 2/11)
How can we model the probabilities of sequences of words when some word sequences have never before been spoken?
Reading:
All - J&M 3.1-3.4. Need a probability term refresher? Check out these videos on Bayes' Rule from 3Blue1Brown (created by Grant Sanderson) to refresh on terminology and for some beautiful visuals to help your probabilistic intuitions.
Sign up for discussion readings.
Assignments:
Gradescope Worksheet (due Tue 2/9 at 5 PM on Gradescope)
Lab 3. Zipf's Law and NGrams, due Wed 2/17 at 10 PM.
Week 4: Vector Semantics (2/16, 2/18)
Reading:
All - J&M 6.1-6.5, 6.10-6.12
Sign up for discussion readings.
Assignment:
Gradescope Worksheet (due Tue 2/16 at 5 PM on Gradescope)
Lab 4. GloVe Experiments, due Wed 2/24 at 10 PM.
Week 5: Word Sense Disambiguation (2/23, 2/25)
Reading:
All - J&M 18.1-18.5.1, 4.8-4.9 (starting on the bottom of page 13).
Sign up for a SemEval 2020 paper to discuss.
Assignment:
Gradescope Worksheet (due Tue 2/23 at 5 PM on Gradescope)
Lab 5. Senseval Decision List Classification, due Wed 3/3 at 10 PM.
Week 6: Project Ramp-Up (3/2, 3/4)
Reading:
Sign up for in-class groups here.
Hovy and Spruit, The Social Impact of Natural Language Processing (2016)
Mitchell et al., Model Cards for Model Reporting (2018)
Assignment:
Gradescope Worksheet (due Tue 3/2 at 5 PM on Gradescope)
Project Proposals (due Wed 3/24 at 10 PM)
Spring Break 3/6-3/14, no class
Week 7: Part-of-Speech Tagging (3/16, 3/18)
Reading:
Assignment:
Gradescope Worksheet (due Tue 3/16 at 5 PM on Gradescope)
Sign up for special topics (link out 3/18), due Tues 3/23
Lab 6. HMMs for POS tagging, due Wed 3/24 at 10 PM.
Week 8: Text Classification (3/23, 3/25)
Reading:
All - J&M 4-4.5, SemEval Hyperpartisan News task paper
Sign up for readings / answer questions about readings on Gradescope (no long discussion).
Assignment:
Gradescope Worksheet (due Tue 3/23 at 5 PM on Gradescope)
Lab 7. Hyperpartisan News Detection, due Wed 3/31 at 10 PM.
Week 9: Midterm Review (3/30, 4/1)
3/31: Midterm Review
4/1: No Class (midterm)
Assignments:
Midterm Exam Released 3/31 at 10 PM, due 4/5 at 10 PM
Week 10: Brainstorming (4/6, 4/8)
4/6: Shell Basics and Special Topics Coordination
4/8: Special Topic 1
Slides: UNIX basics (and detailed guide)
Assignments: Literature Review (due 4/21 10 PM)
Please have a first draft of your related work (methods optional) for peer review 4/15.
Week 11: Special Topics/Projects (4/13, 4/15)
4/13: Special Topic 2
4/15: Literature Review Peer Review Session
Assignments:
Week 12: Special Topics/Projects (4/20, 4/22)
4/20: Special Topics 3 and 4
4/22: Invited Speaker, Work Session
Section 1: Zhijing Jin, Max Planck Institute
Section 2: Rishi Bommasani, Stanford
Assignments: Lit Review due 4/21 at 10 PM, Project Update Video (due 4/28 at 10 PM)
Week 13: Special Topics/Projects (4/27, 4/29)
4/27: Special Topics 5 and 6
4/29: Invited Speaker, Work Session
Section 1: Tal Perry, LightTag
Section 2: Vasundhara Gautam, Dialpad, Inc.
Assignments:
Project video peer reviews, due Monday 5/3 at 10 PM
Final Project Submission, due 5 PM on May 14th
Week 14: Wrapping Up (5/4, 5/6)
5/4: Topic Models, Work Session
5/6: Invited Speaker, Evals, Closing Thoughts
Section 1: Sabrina Mielke, Johns Hopkins University
Section 2: Volkan Cirik, CMU
Exam Week
5/14 Final Project Writeup Due at 5 PM*
*this is the later of the two exam dates