Natural Language Processing
Natural Language Processing
1. Course Information
Instructor: Hanjie Chen
Semester: Spring 2026
Time: Monday and Wednesday, 4:00PM - 5:15PM
Location: HRZ 210
Email: hanjie@rice.edu
Office Hour:
Haotian: Monday 9:00AM - 10:00AM, DH 3100
Zhouxiang: Tuesday 11:00AM - 12:00PM, DH 3100
Morris: Wednesday 2:00PM - 3:00PM, DH 2090
Hanjie: Friday 4:30PM - 5:00PM, DH 2081
2. Course Description
This course provides an in-depth introduction to Natural Language Processing (NLP), a field at the intersection of computer science, artificial intelligence, and linguistics that focuses on the interaction between computers and human language. As NLP technologies continue to transform industries—from chatbots to virtual assistants—this course equips students with the foundational knowledge and practical skills necessary to understand and develop NLP applications.
Throughout the semester, students will explore key concepts and methodologies in NLP, such as text preprocessing, language modeling, and deep learning approaches. This course will cover a wide range of NLP topics, including Text Classification, Word Embeddings, Language Modeling, Machine Translation, Text Generation, Transformers, Pre-trained and Large Language Models, Modern Techniques (Prompting, In-context Learning, RLHF, etc.), and Advanced Topics in NLP (e.g., Interpretability, Robustness, Bias, Safety).
3. Assignments and Evaluation
Quizzes: 8 * 5% = 40%
There will be 8 quizzes (please refer to the Schedule). Each quiz will take 15 minutes at the beginning of class and is closed-book. The format may include question answering or calculations.
Homework
Graduate students (30%): two homework assignments, each accounting for 15%.
Undergraduate students (20%): two homework assignments, each accounting for 10%
Each assignment will include additional tasks for graduate students.
Project
Graduate students (30%)
Project proposal (10%): Introduction, Problem Statement and Objective, Method, Plan.
Final presentation (20%): Introduction, Problem Statement and Objective, Method, Experiments, Conclusion (including the contributions of each team member).
Undergraduate students (40%)
Project proposal (20%): Introduction, Problem Statement and Objective, Method, Plan.
Final presentation (20%): Introduction, Problem Statement and Objective, Method, Experiments, Conclusion (including the contributions of each team member).
Project Requirements:
There is one course project, and 3-4 students can form a team. Each group will give a proposal (7-8 minutes) in the middle of the semester and give a final presentation (7-8 minutes) at the end of the semester. The presentations should cover the components listed above.
The project should focus on NLP tasks or applications. Possible projects include: Machine Translation for Low-Resource Languages, Text Summarization for Long Documents, Large Language Model Post-Training...
While many Large Language Models (LLMs) such as GPT-4 can solve a wide range of NLP tasks, students are encouraged to experiment with and compare the performance of traditional NLP models (e.g., LSTM) as part of the project. It is acceptable to use LLMs for your project, but students should not rely solely on API-based methods for their experiments.
Graduate and undergraduate students will follow the same overall project structure, but the expected level of technical depth will differ. Undergraduate projects can focus on implementing and evaluating established NLP methods or applications with clear problem formulation and experimental analysis (e.g., projects on Kaggle). Graduate students are expected to work on more challenging projects, demonstrating greater technical depth, originality, and rigor in methodology and experimentation. For teams that include both undergraduate and graduate students, the project will be evaluated according to the graduate-level standard.
Examples of original projects from previous years include: “Beyond Language: Applying PatchScope to Reveal Multi-Modal Representations in VLMs”, “Analyzing Natural Language’s Ability to Derive Semantic Meaning of Heteronyms in Unusual Grammatical Contexts”, and “LLM-Guided Uncertainty-Aware Conceptual Bottleneck Models for Robust and Interpretable Learning”, some of which have resulted in research papers.
Upload code and slides to Canvas.
Grading
The letter grade will be assigned based on the points accumulated:
A+ (96-100), A (90-95), A- (87-89), B+ (84-86), B (80-83), B- (77-79), C+ (74-76), C (70-73), C- (67-69), D+ (64-66), D (60-63), F (<60)
4. Prerequisites
Students are expected to have completed at least one machine learning course such as COMP 341 (Practical Machine Learning), COMP 441 (Large-Scale Machine Learning), and COMP 642 (Machine Learning). They should know the foundations of machine learning, such as logistic regression, cross validation, optimization with gradient descent, bias and variance decomposition, etc.
Students should be proficient in Python, with experience using packages such as SciPy and scikit-learn, and be familiar with frameworks such as PyTorch and Hugging Face. Students are also expected to have backgrounds in Calculus and Linear Algebra, Probability and Statistics, such as mean and variance, multinomial distribution, conditional dependence, maximum likelihood estimation, Bayes theorem, etc.
5. Honor Code
All students are expected to adhere to the standards of the Rice Honor Code, which you agreed to uphold upon matriculation. For detailed information on the Honor Code, including its administration and the procedures for addressing alleged violations, please refer to the Honor System Handbook available at http://honor.rice.edu/honor-system-handbook/. This handbook outlines the University’s expectations for academic integrity, the procedures for resolving any alleged violations, and the rights and responsibilities of students and faculty throughout the process.
Note that students should use AI tools carefully in their assignments. Please follow the university's AI Usage Guidelines.