CS 173: Introduction to Natural Language Processing, Spring 2023

Welcome to CS 173!

Join Slack and eLearn (links in navigation bar above) if you haven't yet.

FAQ

Slack Sign-up: (email professor to register)

Workspace URL: ucr-cs173-2023sp.slack.com — please post all questions and answers on the #general channel. Please do not Direct Message the instructor, unless it is (1) medical or (2) grades. Make sure you are signed-up and participating on Slack BEFORE the first day of class. "I did not get invited to Slack," is no excuse for missing important announcements.

Class overview

Students will gain an overview of modern approaches for Natural Language Processing (NLP). Students will learn the theory and algorithms for NLP from applications such as part-of-speech tagging, parsing, named entity recognition, coreference resolution, sentiment analysis and machine translation.

Students who successfully complete this course will be able to:

Implement algorithms for basic language models
Install, configure and run sophisticated NLP toolkits
Identify which NLP tasks apply to given real world problems involving unstructured text data
Apply standard modelling techniques to a given NLP task
Research and apply current NLP modelling techniques to solve novel problems

Course Details

Prerequisites: CS 150

Format: The course consists of two 90-minute lectures and a one-hour discussion per week. Students are highly encouraged to utilize office hours for help on homework and exam review.

Instructor: Yue Dong, Ph.D. — please communicate via Slack (see above), not email

Teaching Assistant: Jannat Ara Meem

Lectures: TTh 9:30am - 10:50am, Materials Sci and Engineering | Room 103

Discussion: W 11:00 - 11:50 am, Winston Chung Hall | Room 143

Preliminary Schedule

Act 1: Preliminaries

Weeks 1-3:

Introduction & Regular Expressions (ch 2)
N-gram Language Models (ch 3)
EXAM (April 18th, 2023)
Naive Bayes (ch 4)
Weekly small assignments

Act 2: Modeling Techniques

Weeks 4-7:

Logistic Regression (ch 5)
Vector Semantics (ch 6)
Hidden Markov Models (Appendix A)
Weekly small assignments
EXAM

Act 3: Linguistics & Applications

Week 8-10:

Part of Speech Tagging (ch 8)
Entity and Relation Extraction (ch 18)
Question Answering, Information Retrieval (ch 14)
Chatbots and Dialogue Systems (ch 15)
FINAL PROJECTS PRESENTATIONS
FINAL EXAM

Homeworks and Tutorials

Study problems and the problems for the tutorials can be found on the Tutorials page.

You are expected to come to the tutorial class each week and make a good-faith attempt to work on the problems in your group.

Each Sunday evening (week 3 - 10), there will be one or two pieces of homework due on eLearn.

an autograded assignment
a typed out solution

Grading

Homeworks — 30% — if you do well on homeworks, you will do well in this course
Midterms (two) — 30% — if you complete homeworks successfully, you should excel on midterms
Final — 20% — if you excel on midterms you should excel on final, which is comprehensive
Quizzes, Attendance, Participation, Office Hours, and other activities — 20% — instructor's discretion

Standard +/- Scale: 92% or higher is the cutoff for an A- (similarly for B,C,D); 87% or higher is the cutoff for a B+ (and so on). A+ is reserved by the instructor's discretion. 59% or less is an F.

Textbook

Required Textbook: Dan Jurafsky and James Martin, Speech and Language Processing, 3rd ed (free)

Additional Recommended References:

Jurafsky & Martin, 3rd ed, draft is available online: web.stanford.edu/~jurafsky/slp3/
C.D. Manning & H. Schuetze, Foundations of Statistical Natural Language Processing

Office hours

Instructor Office Hours: TTh 8:30-9:20 am MRB 4135

TA Office Hours: Tuesday 11:00am -12:00 pm at WCH 363

Additional Support: Academic Resources Center (ARC), 156 Surge, http://www.arc.ucr.edu

About me

Hi, I am Yue Dong, an assistant professor of CSE at UCR. My research interests include natural language processing, machine learning, and artificial intelligence. I lead the Natural Language Processing group at UCR, which develops natural language understanding and generation systems that are controllable, trustworthy, and efficient.