ELL884 (2024) - Lectures

Introduction (01/01/24)
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lqustymr7n5bn
Regular expressions & morphology (04/01/24)
- Terminologies, regular expressions, morphology, Porter stemmer, Edit distance
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lqxy96vku3g4z0
POS and NE Tagging -- Introduction (07/01/24)
- Intro to POS tagging and NER, Unsupervised methods, evaluation
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lr3r5r313a41qi
Hidden Markov Model (11/01/24)
- Markov chain, Intro to HMM, Forward algorithm, Viterbi algorithm
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lr7yt1vqdn52md
- PDF: https://web.stanford.edu/~jurafsky/slp3/A.pdf
Hidden Markov Model and MEMM -- Part II (15/01/24)
- Forward-backword method, linear and logistic regression, MaxEnt classifier
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lrdi7fy7o7p2dx
- Lecture note: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lrezqkvd2ng2zq
- Study materials: Speech and language processing, Jurafsky & Martin (Chapter 6)
Parsing -- Part I (18/01/24)
- Introduction to statistical parsing, Constituency vs dependency parsing, CFG
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lrhxat10ieo3om
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lrhxaxjsfbr3te
- Study materials: Speech and language processing, Jurafsky & Martin
Parsing -- Part II (24/01/24)
- CYK algorithm, evaluation
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lrhxaxjsfbr3te
- Study materials: Speech and language processing, Jurafsky & Martin
Lexical Semantics -- Part I (24/01/24)
- Word relations, WordNet and word similarity
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lrs1hh1p83khj
- Study materials: Speech and language processing, Jurafsky & Martin
Distributional Semantics (29/01/24)
- Word similarity, TF-TDF, distributional similarity
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lrynyvyy26v7c7
- Study materials: Speech and language processing, Jurafsky & Martin
Introduction to Deep Learning (01/02/24)
- Perceptron, MLP, Backpropagation
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/ls2tqxvym2f65u
- Study materials: https://arxiv.org/pdf/2301.09977.pdf (math of backpropagation),
  Deep Learning by Aaron Coulville, Yoshua Benjio, and Ian Goodfellow
Language Models (04/02/24; 08/02/24)
- N-gram language models, perplexity, smoothing
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/ls7ohr9cfsjje
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lsbt3tfj8sa3zl
- Study materials: Speech and language processing, Jurafsky & Martin
Convolutional Neural Networks (12/02/24)
- Convolutional Neural Networks; Case studies on specific architectural modifications
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lsmysrg2m7f3me
- Study materials: Deep Learning by Aaron Coulville, Yoshua Benjio, and Ian Goodfellow
- CNN visualization : https://poloclub.github.io/cnn-explainer/
Word Vectors (08/02/24; 15/02/24)
- issues with the lexical-based and one-hot vectors, Word2Vec and its derivation, GloVe and its derivations, Evaluation, Bias
- Slides: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lsbt3xcyovy41r
- Study materials: Word2Vec paper: https://arxiv.org/abs/1301.3781; GloVe: https://nlp.stanford.edu/pubs/glove.pdf; Evaluation of embeddings: https://aclanthology.org/D15-1036/
- Relevant blogs: Word2Vec: https://jalammar.github.io/illustrated-word2vec/; GloVe: https://towardsdatascience.com/light-on-math-ml-intuitive-guide-to-understanding-glove-embeddings-b13b4f19c010

---------------------------------- Minor Exam -------------------------------

Recurrent Neural Networks (07/03/24)
- Fixed-window model, Intro to RNNs, Derivation of backpropagation through time, applications of RNNs
- Slides: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/ltkbmdtkd3m7d
- Reading materials: http://karpathy.github.io/2015/05/21/rnn-effectiveness/; https://www.deeplearningbook.org/contents/rnn.html
Recurrent Neural Networks -- Part II (11/03/24)
- Vanishing and exploding gradient, LSTM and GRU
- Slides: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/ltloanepvz25jv
- Reading materials: https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1224/readings/cs224n-2019-notes05-LM_RNN.pdf; http://colah.github.io/posts/2015-08-Understanding-LSTMs; Proof of vanishing gradient problem: https://arxiv.org/pdf/1211.5063.pdf
Sequence-to-Sequence Models and Attention (11/03/24; 14/03/24)
- Introduction to Seq2Seq, Beam search, maths behind attention and self-attention
- Slides: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/ltloaqrx7bf5lz
- Reading materials: Original seq2seq NMT paper: https://arxiv.org/pdf/1409.3215.pdf; Bahdanau et al., ICLR 2015 (paper that introduced attention): https://arxiv.org/pdf/1409.0473.pdf; Nice blog on Seq2Seq and attention: https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/
Transformer (18/03/24)
- Introduction to Transformer, BERT, ELMo
- Slides: Mostly whiteboard; slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/ltvpxrd2w3uwz
- Lecture note: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/ltx44kb0zu35j1
- Reading materials: Transformer paper (https://arxiv.org/pdf/1706.03762.pdf); Blog illustrating Transformers: https://jalammar.github.io/illustrated-transformer
Guest lecture by Prof Sourish Dasgupta: Positional encoding, tokenization (18/03/24)
- Lecture note: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/ltx419p9bgh5xa
Transformer (Contd.) (21/03/24)
- Types of Transformer, BERT, ELMo, pretraining encoder-only and decoder-only models
- Slides: Mostly on whiteboard
- Lecture note: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lu1co6ycybj4ny
- Reading materials: BERT: https://arxiv.org/pdf/1810.04805.pdf; ELMo: https://arxiv.org/pdf/1802.05365.pdf; Blog on Transfer learning: https://www.ruder.io/state-of-transfer-learning-in-nlp; Limits of Transfer Learning: https://arxiv.org/pdf/1910.10683.pdf; Visual illustration: http://jalammar.github.io/illustrated-bert/; Instruction fine-tuning: https://arxiv.org/pdf/2109.01652.pdf

---------------------------------- Midsem Break -------------------------------

Text-to-Text Transfer and Decoding (01/04/24)
- Explaining T5 model, and the C4 dataset, understanding the functionalities of T5
- Slides: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/luf2bom78bb669
- Reading materials: T5 paper: https://arxiv.org/abs/1910.10683
- Paper reading: https://www.ijcai.org/proceedings/2021/0315.pdf (by Sahil Mishra)
Prompting and Instruction Finetuning (04/04/24)
- Introduction to prompting, CoT, instruction finetuning
- Slides: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lul5q0vhe3l2rl
- Reading materials: Language models are few-shot learners (https://arxiv.org/abs/2005.14165), Chain-of-thought (https://arxiv.org/abs/2201.11903), https://arxiv.org/abs/2109.01652
- Paper reading: Locating and Editing Factual Associations in GPT (https://arxiv.org/abs/2202.05262) (by Palash Nandi)
Reinforcement Learning from Human Feedback (RLHF) (06/04/24)
- Introduction to RLHF, PPO, RLAIF
- Slides: On the whiteboard
- Note: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lunyqwoem7z78x
- Reading materials: https://arxiv.org/pdf/2212.08073.pdf, https://gist.github.com/yoavg/6bff0fecd65950898eba1bb321cfbd81; https://cdn.openai.com/papers/Training_language_models_to_follow_instructions_with_human_feedback.pdf
Direct Preference Optimization (DPO) (08/04/24)
- DPO derivation
- Slides: On the whiteboard
- Class Note (boardwork): https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/luqtfsgaav3h5
- Lecture note: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lur7pj0bej4wn
- Reading materials: DPO: https://arxiv.org/abs/2305.18290
- Optional reading: KTP: https://github.com/ContextualAI/HALOs/blob/main/assets/report.pdf
Model Compression Techniques (13/04/24)
- PEFT, distillation, etc
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/luybyfb1jcy3mk
- Reading materials: https://arxiv.org/abs/2308.07633; LoRA: https://arxiv.org/abs/2106.09685
Constitutional AI and Soft Prompts (15/04/24)
- Prompt tunning and prefix-tuning, introduction to retrieval-augmented models -- REALM, nearest-neighbour machine translation
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lv0rd5ccdy27il
- Reading materials: Constitutional AI: https://arxiv.org/abs/2212.08073; Prefix-tunning: https://aclanthology.org/2021.acl-long.353.pdf; Explaining in-context learning: https://arxiv.org/abs/2111.02080; REALM paper: https://arxiv.org/pdf/2002.08909.pdf (On HuggingFace: https://huggingface.co/docs/transformers/model_doc/realm); Surveys: https://arxiv.org/abs/2302.07842; https://arxiv.org/abs/2202.01110; Other readings: https://arxiv.org/pdf/2010.00710.pdf;
- Paper reading: Multilingual LLMs are Better Cross-lingual In-context Learners with Alignment (https://aclanthology.org/2023.acl-long.346.pdf) (by Eshaan Tanwar)
Retrieval-augmented Generation (18/04/24)
- Introduction to retrieval-augmented models -- REALM, ATLAS, Nearest-neighbour machine translation, RAG evaluation, etc.
- Slide:https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lv54dmyb4lhe6
- Reading materials: REALM paper: https://arxiv.org/pdf/2002.08909.pdf (On HuggingFace: https://huggingface.co/docs/transformers/model_doc/realm); Surveys: https://arxiv.org/abs/2302.07842; https://arxiv.org/abs/2202.01110; Other readings: https://arxiv.org/pdf/2010.00710.pdf; https://arxiv.org/pdf/2005.11401.pdf; ATLAS: https://arxiv.org/pdf/2208.03299.pdf; Evaluation: https://arxiv.org/abs/2309.15217
LLM Hallucinations, Multilingual LLMs (22/04/24)
- Introduction to hallucination, types of hallucination and mitigation, Intro to multilingual LLMs, etc.
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lvatck6hfva1bo; https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lvatcitqvhu1an
- Reading materials: Hallucination survey: https://arxiv.org/abs/2311.05232; https://arxiv.org/abs/2401.11817; https://aclanthology.org/2023.findings-emnlp.123; Multilingual LLMs: https://arxiv.org/pdf/2005.00052.pdf; https://arxiv.org/pdf/2010.11125.pdf
- Paper reading: Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection (https://arxiv.org/abs/2310.11511) (by Michael)
Tool Augmentation with LMs (25/04/24)
- Toolformer and SyReLM.
- Slide: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lvf38hd95vd25c
- Reading materials: Toolformer: https://arxiv.org/abs/2302.04761; SyReLM: https://arxiv.org/abs/2312.05571; Optional read: https://arxiv.org/abs/2301.13867; Survey: https://arxiv.org/abs/2302.07842
Bias, Fairness and Other Ethical Aspects + Conclusion (26/04/23)
- Different types of biases, bias mitigation, and current issues in NLP
- Slides: https://piazza.com/class_profile/get_resource/lqt8bj4zlof29w/lvgqh4dl6hf5ek
- Reading materials: A course on Computational Ethics for NLP (http://demo.clab.cs.cmu.edu/ethical_nlp2020/#syllabus)!!; Blog: https://huggingface.co/blog/evaluating-llm-bias; Cognitive bias in NLP: https://arxiv.org/abs/2304.01358