This schedule is tentative. Links will be updated here for readings and labs as we progress through the semester. Readings should be posted the Friday before the relevant Tuesday class, while labs will be posted Thursday mornings just before class. All times are in Pacific Time. (UTC-8 until March 14, then UTC-7)
What's this class about? How do computers represent text? What are some basic ways to specify patterns of characters from text, and how might they be useful?
Reading: Syllabus (Tue 1/26), J&M 2.1 (Thu 1/28).
Other recommended reading:
This blog post on text encodings and Unicode by David Zentgraf (stop at “Encodings and PHP”)
This TechCrunch article by Devin Coldewey on Unicode’s role in the new Reiwa era in Japan
Slides from day 1: https://drive.google.com/file/d/1mTOmyI29XBLaedxkyk0LeAQdskE_j7GH/view?usp=sharing
Assignments:
Biographical Survey due Thu 1/28 before class. Please use your g.hmc.edu login.
Worksheet (due Thu 1/28 by 5 PM on Gradescope)
Lab 1. RegExp Golf, Making a Discord Bot, due Wed 2/3 at 10 PM on Gradescope.
How do we separate out and distinguish words in text? Why is it so challenging?
Reading:
All - J&M 2.2-2.4 except 2.4.3, 4.7. Also, How to Read a Paper by S. Keshav.
Sign up for discussion readings.
Assignments:
Gradescope In-Class Worksheet (due Tue 2/2 at 5 PM on Gradescope)
Lab 2. Tokenization and Normalization, due Wed 2/10 at 10 PM.
How can we model the probabilities of sequences of words when some word sequences have never before been spoken?
Reading:
All - J&M 3.1-3.4. Need a probability term refresher? Check out these videos on Bayes' Rule from 3Blue1Brown (created by Grant Sanderson) to refresh on terminology and for some beautiful visuals to help your probabilistic intuitions.
Sign up for discussion readings.
Assignments:
Gradescope Worksheet (due Tue 2/9 at 5 PM on Gradescope)
Lab 3. Zipf's Law and NGrams, due Wed 2/17 at 10 PM.
Reading:
All - J&M 6.1-6.5, 6.10-6.12
Sign up for discussion readings.
Assignment:
Gradescope Worksheet (due Tue 2/16 at 5 PM on Gradescope)
Lab 4. GloVe Experiments, due Wed 2/24 at 10 PM.
Reading:
All - J&M 18.1-18.5.1, 4.8-4.9 (starting on the bottom of page 13).
Sign up for a SemEval 2020 paper to discuss.
Assignment:
Gradescope Worksheet (due Tue 2/23 at 5 PM on Gradescope)
Lab 5. Senseval Decision List Classification, due Wed 3/3 at 10 PM.
Reading:
Sign up for in-class groups here.
Hovy and Spruit, The Social Impact of Natural Language Processing (2016)
Mitchell et al., Model Cards for Model Reporting (2018)
Assignment:
Gradescope Worksheet (due Tue 3/2 at 5 PM on Gradescope)
Project Proposals (due Wed 3/24 at 10 PM)
Reading:
Assignment:
Gradescope Worksheet (due Tue 3/16 at 5 PM on Gradescope)
Sign up for special topics (link out 3/18), due Tues 3/23
Lab 6. HMMs for POS tagging, due Wed 3/24 at 10 PM.
Reading:
All - J&M 4-4.5, SemEval Hyperpartisan News task paper
Sign up for readings / answer questions about readings on Gradescope (no long discussion).
Assignment:
Gradescope Worksheet (due Tue 3/23 at 5 PM on Gradescope)
Lab 7. Hyperpartisan News Detection, due Wed 3/31 at 10 PM.
3/31: Midterm Review
4/1: No Class (midterm)
Assignments:
Midterm Exam Released 3/31 at 10 PM, due 4/5 at 10 PM
4/6: Shell Basics and Special Topics Coordination
4/8: Special Topic 1
Slides: UNIX basics (and detailed guide)
Assignments: Literature Review (due 4/21 10 PM)
Please have a first draft of your related work (methods optional) for peer review 4/15.
4/13: Special Topic 2
4/15: Literature Review Peer Review Session
Assignments:
4/20: Special Topics 3 and 4
4/22: Invited Speaker, Work Session
Section 1: Zhijing Jin, Max Planck Institute
Section 2: Rishi Bommasani, Stanford
Assignments: Lit Review due 4/21 at 10 PM, Project Update Video (due 4/28 at 10 PM)
4/27: Special Topics 5 and 6
4/29: Invited Speaker, Work Session
Section 1: Tal Perry, LightTag
Section 2: Vasundhara Gautam, Dialpad, Inc.
Assignments:
Project video peer reviews, due Monday 5/3 at 10 PM
Final Project Submission, due 5 PM on May 14th
5/4: Topic Models, Work Session
5/6: Invited Speaker, Evals, Closing Thoughts
Section 1: Sabrina Mielke, Johns Hopkins University
Section 2: Volkan Cirik, CMU
5/14 Final Project Writeup Due at 5 PM*
*this is the later of the two exam dates