Course Webpage for Spring Semester, 2025
Updates (All dates in dd-mm-yyyy)
20-03-2025: Assignment 2 is out. Deadline on 31-03-2025 05-04-2025 10-04-2025, 11:59 PM
07-02-2025: Assignment 1 is out. Deadline on 03-03-2025 09-03-2025, 11:59 PM
09-01-2025: First lecture will be delivered.
Course Description
Helen Toner defines AI Safety as follows: “AI safety focuses on technical solutions to ensure that AI systems operate safely and reliably.”
AI safety is an interdisciplinary field concerned with preventing accidents, misuse or other harmful consequences that could result from AI systems. It encompasses machine ethics and AI alignment (which aim to make AI systems moral and beneficial), monitoring AI systems for risks and making them highly reliable. Further, it also involves developing norms and policies that promote safety.
Syllabus
Basics of Transformers
Quantization + PEFT/LoRA
Prompting Techniques
Stages of LLM Safety + Hardness of Alignment
Reward Functions + Policy Optimization + RLHF + DPO
AI Vulnerabilities: Robustness and Adversaries
Jailbreak Taxonomy
Model Editing
Task Vectors and Training Time Alignment
Model Arithmetic and Decoding Time Alignment
Machine Unlearning
Mechanistic Interpretability
Scalable Oversight
Activation Patching/Causal Tracing
Model Merging
AI Watermarking
Superalignment/AI Alignment Paradox
Thursday: 3:00 PM - 5:00 PM
Friday: 3:00 PM - 4:00 PM
CSE-120
Instructors
Department of Computer Science and Engineering, IIT Kharagpur
Email: animeshm@cse.iitkgp.ac.in
Department of Computer Science and Engineering, IIT Kharagpur
Email: pawang@cse.iitkgp.ac.in
Teaching Assistants
Course Evaluation
Midsem: 20%
Endsem: 35%
Assignments: 35% (2 Programmatic Assignments- Assignment 1: 10% and Assignment 2: 25%)
Attendance (Will be taken randomly): 10%