CS 173: Introduction to Natural Language Processing, Spring 2023
Welcome to CS 173!
Join Slack and eLearn (links in navigation bar above) if you haven't yet.
FAQ
Slack Sign-up: (email professor to register)
Workspace URL: ucr-cs173-2023sp.slack.com — please post all questions and answers on the #general channel. Please do not Direct Message the instructor, unless it is (1) medical or (2) grades. Make sure you are signed-up and participating on Slack BEFORE the first day of class. "I did not get invited to Slack," is no excuse for missing important announcements.
Class overview
Students will gain an overview of modern approaches for Natural Language Processing (NLP). Students will learn the theory and algorithms for NLP from applications such as part-of-speech tagging, parsing, named entity recognition, coreference resolution, sentiment analysis and machine translation.
Students who successfully complete this course will be able to:
Implement algorithms for basic language models
Install, configure and run sophisticated NLP toolkits
Identify which NLP tasks apply to given real world problems involving unstructured text data
Apply standard modelling techniques to a given NLP task
Research and apply current NLP modelling techniques to solve novel problems
Course Details
Prerequisites: CS 150
Format: The course consists of two 90-minute lectures and a one-hour discussion per week. Students are highly encouraged to utilize office hours for help on homework and exam review.
Instructor: Yue Dong, Ph.D. — please communicate via Slack (see above), not email
Teaching Assistant: Jannat Ara Meem
Lectures: TTh 9:30am - 10:50am, Materials Sci and Engineering | Room 103
Discussion: W 11:00 - 11:50 am, Winston Chung Hall | Room 143
Preliminary Schedule
Act 1: Preliminaries
Weeks 1-3:
Introduction & Regular Expressions (ch 2)
N-gram Language Models (ch 3)
EXAM (April 18th, 2023)
Naive Bayes (ch 4)
Weekly small assignments
Act 2: Modeling Techniques
Weeks 4-7:
Logistic Regression (ch 5)
Vector Semantics (ch 6)
Hidden Markov Models (Appendix A)
Weekly small assignments
EXAM
Act 3: Linguistics & Applications
Week 8-10:
Part of Speech Tagging (ch 8)
Entity and Relation Extraction (ch 18)
Question Answering, Information Retrieval (ch 14)
Chatbots and Dialogue Systems (ch 15)
FINAL PROJECTS PRESENTATIONS
FINAL EXAM
Homeworks and Tutorials
Study problems and the problems for the tutorials can be found on the Tutorials page.
You are expected to come to the tutorial class each week and make a good-faith attempt to work on the problems in your group.
Each Sunday evening (week 3 - 10), there will be one or two pieces of homework due on eLearn.
an autograded assignment
a typed out solution
Grading
Homeworks — 30% — if you do well on homeworks, you will do well in this course
Midterms (two) — 30% — if you complete homeworks successfully, you should excel on midterms
Final — 20% — if you excel on midterms you should excel on final, which is comprehensive
Quizzes, Attendance, Participation, Office Hours, and other activities — 20% — instructor's discretion
Standard +/- Scale: 92% or higher is the cutoff for an A- (similarly for B,C,D); 87% or higher is the cutoff for a B+ (and so on). A+ is reserved by the instructor's discretion. 59% or less is an F.
Textbook
Required Textbook: Dan Jurafsky and James Martin, Speech and Language Processing, 3rd ed (free)
Additional Recommended References:
Jurafsky & Martin, 3rd ed, draft is available online: web.stanford.edu/~jurafsky/slp3/
C.D. Manning & H. Schuetze, Foundations of Statistical Natural Language Processing
Office hours
Instructor Office Hours: TTh 8:30-9:20 am MRB 4135
TA Office Hours: Tuesday 11:00am -12:00 pm at WCH 363
Additional Support: Academic Resources Center (ARC), 156 Surge, http://www.arc.ucr.edu
About me
Hi, I am Yue Dong, an assistant professor of CSE at UCR. My research interests include natural language processing, machine learning, and artificial intelligence. I lead the Natural Language Processing group at UCR, which develops natural language understanding and generation systems that are controllable, trustworthy, and efficient.