DSXXX Large Language Models

Class Timing: Thursday (19.45-21.15) and Saturday (9.00-10.30) at https://zoom.us/meeting/register/7t_H7iwVSfyoXBFCyBRe2Q#/registration
Midsem: 20th June (9.30-10.30) at https://zoom.us/meeting/register/7t_H7iwVSfyoXBFCyBRe2Q#/registration; Syllabus: Lecture 1-14; Submission Link:
Quiz: Every alternate weeks; with 2 week's time to submit
Project: 10th-14th August; Instructions; Submission Link:
Assessment

Course Overview

3-0-0-0-3
This course introduces the foundations and evolution of Large Language Models (LLMs), covering key concepts from neural language models to transformer architectures. It explores major LLM architectures such as GPT, BERT, and encoder–decoder models, along with tokenization and decoding strategies. The course further examines advanced topics including fine-tuning, scaling, retrieval-augmented generation, and prompt engineering. Practical aspects focus on real-world applications, agents, and tool integration. Key challenges such as bias, hallucination, and alignment are also discussed. The course concludes with emerging trends including multimodal and next-generation LLM systems.

Course Syllabus

Unit 1: Foundations of Large Language Models

Introduction to Language Models, Fundamentals of language models: History, evolution, and significance in NLP, Word Embeddings, Neural LMs, Transformer Architecture, Transformer basics: Self-attention mechanism, Encoder-Decoder structure, Encoder layers and encoder stack, Core Components of Transformers, Position encoding, batch normalization, and layer normalization, Teacher forcing and masked attention

Unit 2: Architectures of Large Language Models

Decoder-only Large Language Models, Detailed study of GPT architecture (Causal Language Model), Pre-training vs. fine-tuning for downstream applications, Decoding strategies: Greedy, Beam Search, Top-k, and Top-p, Encoder-only Large Language Models, BERT architecture (Masked Language Model) and training objectives, Applications and adaptation for various NLP tasks, Tokenization Techniques, Sub-word tokenization, Byte Pair Encoding, WordPiece, and SentencePiece, Introduction to Small Language Models, Overview and applications of Small Language Models in resource-constrained settings

Unit 3: Advanced Architectures and Adaptation of LLMs

Encoder-Decoder Models, Introduction to models like BART and T5, Understanding the Text-to-Text framework and zero-shot learning, Advanced Model Components, Pre-training strategies, scaling laws, and instruction fine-tuning, Advanced attention mechanisms and the Mixture of Experts approach, Parameter-Efficient Fine-Tuning (PEFT), Techniques for efficient adaptation and inference, Advantages of PEFT in optimizing LLMs for specific tasks, Efficient Model Adaptation, Retrieval and tool augmentation for enhanced model capabilities, Prompt Engineering, Chains, Memory and Agents

Unit 4: LLMs in Practice and Future Directions

Addressing Challenges in LLMs, Bias, toxicity, hallucination, and alignment in LLMs, Interpreting LLMs: Understanding the inner workings and output reasoning, Specialized Models and Emerging Trends, Multimodal LLMs, Vision-Language models and long-context LLMs, Model editing, self-evolving LLMs, and efficient inference, Future of LLMs, Ethical implications, evolving capabilities, and emerging applications in LLMs

Course Lectures

Week 1:

Lecture 1: What is a Language Model?
Lecture 2: Word Representations: One-hot vs Embeddings

Week 2:

Lecture 3: Neural Language Models
Lecture 4: Introduction to Transformers

Week 3:

Lecture 5: Self-Attention mechanism (Q, K, V intuition)
Lecture 6: Multi-head attention

Week 4:

Lecture 7: Encoder architecture (layer stack, FFN, normalization)
Lecture 8: Decoder basics, Masked attention, Teacher forcing, End-to-end transformer flow

Week 5:

Lecture 9: Decoder-only models, GPT architecture (detailed block-level view)
Lecture 10: Pretraining objective (causal LM), Training pipeline

Week 6:

Lecture 11: Decoding strategies: Greedy, Beam search
Lecture 12: Sampling methods: Top-k, Top-p

Week 7:

Lecture 13: Encoder-only models, BERT architecture
Lecture 14: Masked Language Modeling (MLM), Fine-tuning BERT for tasks (classification, NER)

Week 8:

Midsem

Week 9:

Lecture 15: Tokenization: BPE, WordPiece, SentencePiece
Lecture 16: Small Language Models

Week 10:

Lecture 17: Encoder–Decoder models, BART and T5 overview
Lecture 18: Text-to-text paradigm, Zero-shot & few-shot learning

Week 11:

Lecture 19: Scaling laws (data, params, compute), Instruction tuning
Lecture 20: Advanced attention, Mixture of Experts (conceptual)

Week 12:

Lecture 21: Parameter Efficient Fine-Tuning (PEFT), LoRA Adapters
Lecture 22: Retrieval-Augmented Generation (RAG)

Week 13:

Lecture 23: Prompt Engineering
Lecture 24: Agents, memory, chaining

Week 14:

Lecture 25: Challenges in LLMs: Hallucination, Bias, Toxicity
Lecture 26: Alignment techniques, Evaluation of LLMs

Week 15:

Lecture 27: Multimodal LLMs
Lecture 28: Future of LLMs

Week 16:

Individual Project Presentation

Textbook & Additional Learning Materials

Textbooks:

Introduction to Large Language Models: Generative AI for Text by Prof. Tanmoy Chakraborty

Additional Resources:

https://onlinecourses.nptel.ac.in/noc26_cs88/preview
Speech and Language Processing, Dan Jurafsky and James H. Martin
Foundations of Statistical Natural Language Processing, Chris Manning and Hinrich Schütze
Natural Language Processing, Jacob Eisenstein
A Primer on Neural Network Models for Natural Language Processing, Yoav Goldberg

Data & Tools

Datasets (for experiments & demos)
1. Wikipedia Corpus
2. Common Crawl (via filtered subsets like C4)
3. BookCorpus
4. OpenWebText
5. GLUE / SuperGLUE Benchmarks
6. SQuAD (QA tasks)
7. CNN/DailyMail (summarization)
8. IndicCorp / AI4Bharat datasets (for Indian context)
9. Kaggle NLP datasets (sentiment, classification, etc.)
Libraries & Frameworks
1. Hugging Face Transformers
2. Hugging Face Datasets
3. PyTorch
4. TensorFlow / Keras (optional)
5. SentenceTransformers
6. spaCy / NLTK (for basics)
Pretrained Models (for usage & demos)
1. GPT family (via APIs or open models)
2. BERT, RoBERTa
3. T5, BART
4. LLaMA / Mistral (open-weight models)
5. IndicBERT / multilingual models
Tools & Platforms
1. Google Colab / Jupyter Notebook
2. Hugging Face Hub
3. OpenAI API / similar LLM APIs
4. Kaggle Notebooks
5. GitHub (model/code access)
LLM Application Frameworks
1. LangChain
2. LlamaIndex
3. Haystack
4. Vector Databases (for RAG)
5. FAISS
6. ChromaDB
7. Pinecone (intro level)
Evaluation & Experimentation
1. BLEU, ROUGE (basic metrics)
2. Hugging Face Evaluate
3. Prompt evaluation (manual + automated)
Deployment / Lightweight Tools (optional exposure)
1. Gradio (quick demos)
2. Streamlit (simple apps)

Grading Policy

Midterm (20%)

14 Quizzes (42%)

Project: Problem Formulation (5%), Project Presentation (5%), Project Implementation (20%)

Attendance (8% or 3%)

Google Sites

Report abuse