CS728 Organization of web information
CS635 (Web Search and Mining) and CS728 (Organization of Web Information) will be replaced by two new courses. The first one will be called Indexing, retrieval and learning for text and graphs and will run in Autumn. The second course, which will run in Spring, will be named presently. The course codes will change, as will the contents (significantly). This page will eventually disappear.
Course contents (through Spring 2022)
Part 1: Building blocks
Sequence, tree and graph decoding and labeling
HMM, chain MRF, CRF (CRF decoders used in deep networks)
Extensions beyond chains: PCFG, skip-chain CRF, message passing, dual decomposition
Dense text representation
word2vec, LSI, GloVE
CNN, LSTM, GRU, self-attention, transformer
Sparse attention: longformer, BigBird
DPR (replication study on DPR), dual encoder, ColBERT
Sequence to sequence (encoder-decoder)
Without attention, left-to-right
Decoder to encoder attention Bahdanau
Bidirectional, masking
Evils of MAP decoding and mitigations
Graph neural networks
GCN, GraphSAGE, RGCN ... embedding-based message passing
Graph attention network, graph transformer, TokenGT
Part 2: Applications
Applications of sequence and graph labeling
NER and fine-type tagging. Multi-instance and distant supervision.
Named entity disambiguation/entity linking. Collective inference.
Closed-domain relation classification: PCNN, BERT-based. Distant supervision.
Coreference resolution, correlation clustering, record linkage
Knowledge graphs and their deep representation
Translation models TransE, XTransY
Rotation models RotatE
Multiplicative/factorization models DistMult ComplEx etc.
Completion and alignment tasks; cross-language transfer
Question answering
Closed book and open book architectures
Against KGs (KGQA and semantic interpretation); GNNs in KGQA
Against corpus, reading comprehension (RC) ... SQuAD, HotpotQA , BigTextQA
Multi-modal: text + table (...), time (EXAQT), numerical reasoning (Atlas, MultiHiertt), chart, images etc.
Against relational tables (text2sql and semantic interpretation)
Multi-tasked language models with prompting
T5, PaLM, GPTx, BLOOM ...
Adapter architectures (GreaseLM, KAdapter), HuggingFace adapterhub
Open information extraction
OpenIE6, lifelong learning (NELL)
Cross-lingual openIE training transfer
Course material
Any credentials needed will be distributed separately.
Course calendar for 2023 Spring, work in progress.
Many past exams with solutions are provided here. Some links may ask for login and password, to be posted on Moodle.
Prerequisites
Check the prerequisites for CS728 here.
Evaluation components
(May change from year to year.)
Best 60 marks from the following 90 marks:
Midterm exam/quiz1 = 30%
Quiz2 = 30%
Final exam long answer part = 30%
Final exam MCQ part = 10%
Weekly SAFE quizzes = 10-15%
Assignments = 15-20%
Extra credits are available for projects and critical paper reviews