Class Timing: Thursday (19.45-21.15) and Saturday (9.00-10.30) at https://zoom.us/meeting/register/7t_H7iwVSfyoXBFCyBRe2Q#/registration
Midsem: 20th June (9.30-10.30) at https://zoom.us/meeting/register/7t_H7iwVSfyoXBFCyBRe2Q#/registration; Syllabus: Lecture 1-14; Submission Link:
Quiz: Every alternate weeks; with 2 week's time to submit
Project: 10th-14th August; Instructions; Submission Link:
Assessment
3-0-0-0-3
This course introduces the foundations and evolution of Large Language Models (LLMs), covering key concepts from neural language models to transformer architectures. It explores major LLM architectures such as GPT, BERT, and encoder–decoder models, along with tokenization and decoding strategies. The course further examines advanced topics including fine-tuning, scaling, retrieval-augmented generation, and prompt engineering. Practical aspects focus on real-world applications, agents, and tool integration. Key challenges such as bias, hallucination, and alignment are also discussed. The course concludes with emerging trends including multimodal and next-generation LLM systems.
Unit 1: Foundations of Large Language Models
Introduction to Language Models, Fundamentals of language models: History, evolution, and significance in NLP, Word Embeddings, Neural LMs, Transformer Architecture, Transformer basics: Self-attention mechanism, Encoder-Decoder structure, Encoder layers and encoder stack, Core Components of Transformers, Position encoding, batch normalization, and layer normalization, Teacher forcing and masked attention
Unit 2: Architectures of Large Language Models
Decoder-only Large Language Models, Detailed study of GPT architecture (Causal Language Model), Pre-training vs. fine-tuning for downstream applications, Decoding strategies: Greedy, Beam Search, Top-k, and Top-p, Encoder-only Large Language Models, BERT architecture (Masked Language Model) and training objectives, Applications and adaptation for various NLP tasks, Tokenization Techniques, Sub-word tokenization, Byte Pair Encoding, WordPiece, and SentencePiece, Introduction to Small Language Models, Overview and applications of Small Language Models in resource-constrained settings
Unit 3: Advanced Architectures and Adaptation of LLMs
Encoder-Decoder Models, Introduction to models like BART and T5, Understanding the Text-to-Text framework and zero-shot learning, Advanced Model Components, Pre-training strategies, scaling laws, and instruction fine-tuning, Advanced attention mechanisms and the Mixture of Experts approach, Parameter-Efficient Fine-Tuning (PEFT), Techniques for efficient adaptation and inference, Advantages of PEFT in optimizing LLMs for specific tasks, Efficient Model Adaptation, Retrieval and tool augmentation for enhanced model capabilities, Prompt Engineering, Chains, Memory and Agents
Unit 4: LLMs in Practice and Future Directions
Addressing Challenges in LLMs, Bias, toxicity, hallucination, and alignment in LLMs, Interpreting LLMs: Understanding the inner workings and output reasoning, Specialized Models and Emerging Trends, Multimodal LLMs, Vision-Language models and long-context LLMs, Model editing, self-evolving LLMs, and efficient inference, Future of LLMs, Ethical implications, evolving capabilities, and emerging applications in LLMs
Week 1:
Lecture 1: What is a Language Model?
Lecture 2: Word Representations: One-hot vs Embeddings
Week 2:
Lecture 3: Neural Language Models
Lecture 4: Introduction to Transformers
Week 3:
Lecture 5: Self-Attention mechanism (Q, K, V intuition)
Lecture 6: Multi-head attention
Week 4:
Lecture 7: Encoder architecture (layer stack, FFN, normalization)
Lecture 8: Decoder basics, Masked attention, Teacher forcing, End-to-end transformer flow
Week 5:
Lecture 9: Decoder-only models, GPT architecture (detailed block-level view)
Lecture 10: Pretraining objective (causal LM), Training pipeline
Week 6:
Lecture 11: Decoding strategies: Greedy, Beam search
Lecture 12: Sampling methods: Top-k, Top-p
Week 7:
Lecture 13: Encoder-only models, BERT architecture
Lecture 14: Masked Language Modeling (MLM), Fine-tuning BERT for tasks (classification, NER)
Week 8:
Midsem
Week 9:
Lecture 15: Tokenization: BPE, WordPiece, SentencePiece
Lecture 16: Small Language Models
Week 10:
Lecture 17: Encoder–Decoder models, BART and T5 overview
Lecture 18: Text-to-text paradigm, Zero-shot & few-shot learning
Week 11:
Lecture 19: Scaling laws (data, params, compute), Instruction tuning
Lecture 20: Advanced attention, Mixture of Experts (conceptual)
Week 12:
Lecture 21: Parameter Efficient Fine-Tuning (PEFT), LoRA Adapters
Lecture 22: Retrieval-Augmented Generation (RAG)
Week 13:
Lecture 23: Prompt Engineering
Lecture 24: Agents, memory, chaining
Week 14:
Lecture 25: Challenges in LLMs: Hallucination, Bias, Toxicity
Lecture 26: Alignment techniques, Evaluation of LLMs
Week 15:
Lecture 27: Multimodal LLMs
Lecture 28: Future of LLMs
Week 16:
Individual Project Presentation
Textbooks:
Introduction to Large Language Models: Generative AI for Text by Prof. Tanmoy Chakraborty
Additional Resources:
Speech and Language Processing, Dan Jurafsky and James H. Martin
Foundations of Statistical Natural Language Processing, Chris Manning and Hinrich Schütze
Natural Language Processing, Jacob Eisenstein
A Primer on Neural Network Models for Natural Language Processing, Yoav Goldberg
Datasets (for experiments & demos)
Wikipedia Corpus
Common Crawl (via filtered subsets like C4)
BookCorpus
OpenWebText
GLUE / SuperGLUE Benchmarks
SQuAD (QA tasks)
CNN/DailyMail (summarization)
IndicCorp / AI4Bharat datasets (for Indian context)
Kaggle NLP datasets (sentiment, classification, etc.)
Libraries & Frameworks
Hugging Face Transformers
Hugging Face Datasets
PyTorch
TensorFlow / Keras (optional)
SentenceTransformers
spaCy / NLTK (for basics)
Pretrained Models (for usage & demos)
GPT family (via APIs or open models)
BERT, RoBERTa
T5, BART
LLaMA / Mistral (open-weight models)
IndicBERT / multilingual models
Tools & Platforms
Google Colab / Jupyter Notebook
Hugging Face Hub
OpenAI API / similar LLM APIs
Kaggle Notebooks
GitHub (model/code access)
LLM Application Frameworks
LangChain
LlamaIndex
Haystack
Vector Databases (for RAG)
FAISS
ChromaDB
Pinecone (intro level)
Evaluation & Experimentation
BLEU, ROUGE (basic metrics)
Hugging Face Evaluate
Prompt evaluation (manual + automated)
Deployment / Lightweight Tools (optional exposure)
Gradio (quick demos)
Streamlit (simple apps)
Midterm (20%)
14 Quizzes (42%)
Project: Problem Formulation (5%), Project Presentation (5%), Project Implementation (20%)
Attendance (8% or 3%)