Schedule
This schedule is tentative. Links will be updated here for readings and labs as we progress through the semester. Readings should be posted the Thursday before the relevant Monday class, while labs will be posted Wednesday mornings just before class. All times are in Pacific Time. (UTC-7 until November 7, then UTC-8)
SCHEDULE
Week 0: Warmup and Regular Expressions (8/30, 9/1)
Reading: Syllabus (Mon 8/30), J&M 2.1 (Wed 9/1).
Other recommended reading:
This blog post on text encodings and Unicode by David Zentgraf (stop at “Encodings and PHP”)
This TechCrunch article by Devin Coldewey on Unicode’s role in the new Reiwa era in Japan
Assignments:
Biographical Survey due Wednesday 9/8 before class. Please use your g.hmc.edu login.
Worksheet (due Wed 9/1 before class)
Lab 0: Warm Up, due Tue 9/7 at 10 PM
Presentation Slides: Monday, Wednesday (video)
Week 1: Regular Expressions Continued (9/8)
Labor Day Monday - no reading assignment
Assignments:
Lab 1: Discord Chatbot, due Tue 9/14 at 10 PM
Week 2: Tokenization, Segmentation, and Evaluation (9/13, 9/15)
Reading:
All - J&M 2.2-2.4 except 2.4.3, 4.7. Also, How to Read a Paper by S. Keshav.
Sign up for discussion readings.
Assignments:
Gradescope Worksheet (due before class Monday)
Lab 2: Tokenization and Segmentation, due Tue 9/21 at 10 PM
Presentation Slides: Monday
Week 3: Probability, N-Grams, and Smoothing (9/20, 9/22)
Reading:
All - J&M 3.1-3.4. Need a probability term refresher? Check out these videos on Bayes' Rule from 3Blue1Brown (created by Grant Sanderson) to refresh on terminology and for some beautiful visuals to help your probabilistic intuitions.
Sign up for discussion readings.
Assignments:
Gradescope Worksheet (due before class Monday)
Lab 3: Zipf's Law and NGrams, due Tue 9/28 at 10 PM.
Presentation Slides: Monday (video)
Week 4: Vector Semantics (9/27, 9/29)
Reading:
All - J&M 6.1-6.5, 6.10-6.12
Sign up for discussion readings.
Assignment:
Gradescope Worksheet (due before class Monday)
Lab 4: GloVe Vectors, due Tue 10/5 at 10 PM.
Presentation Slides: Monday (video)
Week 5: Word Sense Disambiguation (10/4, 10/6)
Reading:
All - J&M 18.1-18.5.1, 4.8-4.9 (starting on the bottom of page 13).
Sign up for a SemEval paper to discuss.
Assignment:
Gradescope Worksheet (due before class Monday)
Lab 5: Senseval Decision List Classification, due Thursday 10/14 at 10 PM.
Presentation Slides: Monday (video)
Week 6: POS Tagging and NLP Ethics (10/11, 10/13)
Reading:
Assignment:
Gradescope Worksheets (due before class Monday and Wednesday)
In-class exercise: Evaluating Ethics Criteria for NLP
Wednesday is also a discussion day this week, but please still bring your laptop!
Presentation Slides: Monday (video), Wednesday
Week 7: POS Tagging Continued (10/20)
Fall Break Monday & Tuesday - no reading assignment
Assignment:
Lab 6: Part of Speech Tagging, due Tuesday 10/26 at 10 PM.
Week 8: Text Classification (10/25, 10/27)
Reading:
All - J&M 4-4.5, Don't Patronize Me dataset paper
Assignment:
Lab 7: Classification, due Tuesday 11/2 at 10 PM. (Link pending; for now, see SemEval 2022 PCL Detection task page.)
Presentation Slides: Monday
Week 9: Project Brainstorming and Midterm Review (11/1, 11/3)
Monday - Midterm Review. Please sign up for a topic to help write questions for and be ready to contribute to some brainstorming! (Also fill in this survey about special topics so I can assign groups.)
Wednesday - Project Pitches, Guest Speaker: Angie McMillan-Major. Come with some project ideas!
Assignment:
Sign up for a special projects presentation day (due 11/5)
Project Pitch (due Sunday 11/14 at 10 PM). See the Final Projects page for more information.
Week 10: Midterm (11/8, 11/10)
Monday: Class canceled 11/8 for take-home midterm
Wednesday - a bit about neural nets and NLP, Guest Speaker: Jonathan Chang.
Week 11: Project Work and Special Topics (11/15, 11/17)
Monday:
ST - Contextual Embeddings: Arora et al. (2020), "Contextual embeddings: are they worth it?"
ST - Machine Translation: Ruiz and Federico (2014), "Complexity of spoken versus written language for machine translation"
Wednesday: Final project/lit review work time
Assignments:
Reading linked above (no quiz)
Related work + methods due December 2 by 10 PM on Gradescope (please bring a rough draft of related work for 11/22).
Week 12: Project Work and Special Topics (11/22)
Monday:
ST - Text Generation: Radford et al. (2018), "Language models are unsupervised multitask learners"
Peer Review Exercise for Literature Review. Please bring a draft of your Related Work section for your literature review to class for our exercise.
(Note: attendance is mandatory for literature review exercise, but may be done by Zoom if you give me advance notice)
Peer Review Worksheet (make a copy and share with the person you're reviewing): https://docs.google.com/document/d/196snRKtAyGElTt4dD3wCncKgPr-GfwWxXPVEZ2dtdDU/edit?usp=sharing
If you need to upload your PDF to share it, you can do so here: https://drive.google.com/drive/u/0/folders/1todNB7O-46vQ-zrLndCTkardEg-heVlC
Assignments:
Reading linked above (no quiz)
Thanksgiving Break starts Wednesday 11/24 - no class
Week 13: Project Work and Special Topics (11/29, 12/1)
Monday:
ST - Automatic Speech Recognition: Chan et al. (2016), "Listen, attend, and spell: A neural network for large vocabulary conversational speech recognition"
Wednesday:
ST - Dialogue Systems: Serban et al. (2016), "Building end-to-end dialogue systems using generative hierarchical neural network models"
ST - Cultural Analytics: Bagga et al. (2021), “'Are you kidding me?': Detecting Unpalatable Questions on Reddit"
Assignments:
Reading linked above
Sign up for a presentation slot.
Week 14: Presentations and Wrapping Up (12/6, 12/8)
Monday, December 6: Presentation Day 1. Prepare a short presentation for class. Prof. Xanda will talk a bit about topic models, too.
Wednesday, December 8: Presentation Day 2. Prepare a short presentation for class. We'll have some reflective conversation about what we learned in this class. Also, course evaluations!
Finals Week
Final Paper due December 16th at 12 PM.