CS60216: Safe Gen-AI

Safety Fundamentals of Generative AI

Course Webpage for Spring Semester, 2025

Updates (All dates in dd-mm-yyyy)

20-03-2025: Assignment 2 is out. Deadline on 31-03-2025 05-04-2025 10-04-2025, 11:59 PM
07-02-2025: Assignment 1 is out. Deadline on 03-03-2025 09-03-2025, 11:59 PM
09-01-2025: First lecture will be delivered.

Course Description

Helen Toner defines AI Safety as follows: “AI safety focuses on technical solutions to ensure that AI systems operate safely and reliably.”

AI safety is an interdisciplinary field concerned with preventing accidents, misuse or other harmful consequences that could result from AI systems. It encompasses machine ethics and AI alignment (which aim to make AI systems moral and beneficial), monitoring AI systems for risks and making them highly reliable. Further, it also involves developing norms and policies that promote safety.

Syllabus

Basics of Transformers
Quantization + PEFT/LoRA
Prompting Techniques
Stages of LLM Safety + Hardness of Alignment
Reward Functions + Policy Optimization + RLHF + DPO
AI Vulnerabilities: Robustness and Adversaries
Jailbreak Taxonomy
Model Editing
Task Vectors and Training Time Alignment
Model Arithmetic and Decoding Time Alignment
Machine Unlearning
Mechanistic Interpretability
Scalable Oversight
Activation Patching/Causal Tracing
Model Merging
AI Watermarking
Superalignment/AI Alignment Paradox

Class Timings

Thursday: 3:00 PM - 5:00 PM

Friday: 3:00 PM - 4:00 PM

Class Location

CSE-120

Instructors

Animesh Mukherjee

Department of Computer Science and Engineering, IIT Kharagpur
Email: animeshm@cse.iitkgp.ac.in

Department of Computer Science and Engineering, IIT Kharagpur
Email: pawang@cse.iitkgp.ac.in

Teaching Assistants

Ph.D. Fellow

Email: naqueerizwan1998@gmail.com

Ph.D. Fellow

Email: sayantanadak.skni@gmail.com

Ph.D. Fellow

Email: sagnikbasu19@gmail.com

Course Evaluation

Midsem: 20%
Endsem: 35%
Assignments: 35% (2 Programmatic Assignments- Assignment 1: 10% and Assignment 2: 25%)
Attendance (Will be taken randomly): 10%

Page updated

Google Sites

Report abuse