Schedule
Links will be updated here for readings and labs as we progress through the semester. Readings should be posted the Friday before the relevant Tuesday class, while labs will be posted Thursday mornings just before class. All times are in Pacific Time. (UTC-7 until November, then UTC-8)
SCHEDULE
Week 1: Regular Expressions (8/25, 8/27)
What's this class about? How do computers represent text? What are some basic ways to specify patterns of characters from text, and how might they be useful?
Reading: Syllabus (Tue 8/25), J&M 2.1 (Thu 8/27).
Other recommended reading:
This blog post on text encodings and unicode by David Zentgraf (stop at “Encodings and PHP”)
This TechCrunch article by Devin Coldewey on how unicode’s role in the new Reiwa era in Japan
Assignments:
Biographical Survey due Thu 8/27 before class. Please use your g.hmc.edu login.
Worksheet (due Thu 8/27 at 12 PM on Gradescope)
Lab 1. RegExp Golf, Making a Slackbot, due Wed 9/2 at 10 PM.
Week 2: Tokenization, Normalization, and Segmentation (9/1, 9/3)
How do we separate out and distinguish words in text? Why is it so challenging?
Reading:
All - J&M 2.2-2.4 except 2.4.3, 4.7. Also, How to Read a Paper by S. Keshav.
Sign up for discussion readings on Google Drive
Assignments:
Gradescope In-Class Worksheet (due Tue 9/1 at 12 PM on Gradescope)
Lab 2. Tokenization and Normalization, due Wed 9/9 at 10 PM.
Week 3: Probability, N-Grams, and Smoothing (9/8, 9/10)
How can we model the probabilities of sequences of words when some word sequences have never before been spoken?
Reading:
All - J&M 3.1-3.4. Need a probability term refresher? Check out these videos on Bayes' Rule from 3Blue1Brown (created by Grant Sanderson) to refresh on terminology and for some beautiful visuals to help your probabilistic intuitions.
Sign up for discussion readings on Google Drive
Assignments:
Gradescope Worksheet (due Tue 9/8 at 12 PM on Gradescope)
Lab 3. Zipf's Law and NGrams, due Wed 9/16 at 10 PM.
Week 4: Vector Semantics (9/15, 9/17)
Speaker: Maria Antoniak, Cornell University
Reading:
All - J&M 6.1-6.5, 6.10-6.12
Sign up for discussion readings on Google Drive
Assignment:
Gradescope Worksheet (due Tue 9/15 at 12 PM on Gradescope)
Lab 4. GloVe Experiments, due Wed 9/23 at 10 PM.
Week 5: Word Sense Disambiguation (9/22, 9/24)
Reading:
All - J&M 19.1-19.5.1, 4.8-4.9 (starting on page 14).
Sign up for an ACL 2020 paper to discuss.
Assignment:
Gradescope Worksheet (due Tue 9/23 at 12 PM on Gradescope)
Lab 5. Senseval Decision List Classification, due Wed 9/30 at 10 PM.
Week 6: Part-of-Speech (POS) Tagging (9/29, 10/1)
Reading:
All - J&M 8-8.4
Sign up for POS paper readings
Assignment:
Gradescope Worksheet (due Tue 9/30 at 12 PM on Gradescope)
Lab 6. HMMs for POS tagging, due Wed 10/7 at 10 PM.
Week 7: Text Classification (10/6, 10/8)
Speaker: Neha Nayak Kennard, UMass Amherst
Reading:
All - J&M 4-4.5, SemEval Hyperpartisan News task paper
Sign up for text classification paper readings.
Assignment:
Gradescope Worksheet (due Tue 10/6 at 12 PM on Gradescope)
Lab 7. Hyperpartisan News Detection, due Wed 10/14 at 10 PM.
Week 8: Exam Review (10/13, 10/15)
10/13: Midterm Review
10/15: No Class (midterm)
Assignments:
Midterm Exam Released at 10 PM, due 10/16 at 10 PM
Sign up for a special topic presentation.
Week 9: Brainstorming (10/20, 10/22)
10/20: Special Topic Planning Session
10/22: Project Proposal Session, Speaker (Yuval Pinter, Georgia Tech)
Assignments:
Special Topic Reading Assignments (due 10/23 at 10 PM)
Project Quick Proposals (due 10/26 at 10 PM)
Week 10: Automatic Summarization / Project Work (10/27, 10/29)
10/27: Automatic Summarization, Work Session
Readings:
10/29: Work Session, Speaker (Emma Manning, Georgetown)
Week 11: Question Answering / CSS (11/3, 11/5)
11/3: Question Answering, Work Session
Readings:
The NarrativeQA Reading Comprehension Challenge (click Download Options)
11/5: Computational Social Science, Speaker (Jonathan Cheng, Apple)
Readings:
Assignments: Project Literature Review + Methods (due 11/11 at 10 PM)
Week 12: Machine Translation / Project Work (11/10, 11/12)
11/10: Machine Translation, Work Session
Readings:
11/12: Work Session, Speaker (Huda Khayrallah, Johns Hopkins University)
Assignments: Peer Review (due 11/18 at 10 PM)
Week 13: Special Topic 5 / Project Presentations (11/17, 11/19)
11/17: Text to Speech
Readings:
11/19 Final Presentations Day 1
Week 14: Project Presentations & Wrap-Up (11/24)
11/24 Final Presentations Day 2
Exam Week
12/4 Final Project Writeup Due at 5 PM