Course Schedule
Introduction
Sep. 9: Course description. An overview of the research topics we will cover during the semester (recording).
Sep. 11: How to read and present research material. Sign up paper preferences & teams (recording).
How to Read a CS Research Paper by Philip Fong.
How to do research by Bill Freeman.
How to write a good paper by Bill Freeman (video).
Natural Language Processing, Recap & Overview
Sep. 14: Quick introduction to NLP, word embeddings (recording).
Supplementary reading: The Illustrated Word2vec
Word Embeddings and Recurrent NNs by Eugene Charniak
Sep. 16: Sequence-to-sequence modeling, recurrent neural networks (recording).
Supplementary reading: Chapter 10 of Deep Learning
Sep. 18: Attention in NLP (recording).
Supplementary reading: Effective Approaches to Attention-based Neural Machine Translation
Assignment #1 is released!
Computer Vision, Recap & Overview
Sep. 21: Attention in Machine Translation (recording).
Supplementary reading: Effective Approaches to Attention-based Neural Machine Translation
Sep. 23: Image classification, convolutional neural networks (recording, slides).
Supplementary reading: Chapter 9 of Deep Learning
Sep. 25: Object detection and semantic segmentation (recording, slides).
Supplementary reading: Mask R-CNN
Joint visual-semantic embeddings
Sep. 28: DeViSE: A Deep Visual-Semantic Embedding Model
Presented by: Ifrah Idrees and Zhizhong Chen (recording)
Supplementary reading: WSABIE: Scaling Up To Large Vocabulary Image Annotation
Image captioning and its evaluation
Sep. 30: Baby Talk: Understanding and Generating Image Descriptions
Presented by: Xiling Zhang and Peter Lyu (recording)
Supplementary reading: Neural Baby Talk
Assignment #1 due.
Assignment #2 released.
Oct. 2: Show and Tell: A Neural Image Caption Generator
Presented by: Sean Hastings and Suchen Zheng (recording)
Supplementary reading: DenseCap: Fully Convolutional Localization Networks for Dense Captioning
Oct. 5: SPICE: Semantic Propositional Image Caption Evaluation
Presented by: William Kuenne and Tyler DeFroscia (recording)
Supplementary reading: Improved Image Captioning via Policy Gradient optimization of SPIDEr
Attention, self-attention and Transformers
Oct. 7: Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Presented by: Zachary Hoffman and Deniz Bayazit (recording)
Supplementary reading: Learning Deep Features for Discriminative Localization
Oct. 9: Attention Is All You Need
Presented by: Houyu Zhang, Yihang Dong, and Daniel Kotroco (recording)
Supplementary reading: Stand-Alone Self-Attention in Vision Models
Assignment #2 due.
Oct. 12: Indigenous Peoples’ Day, no classes.
Oct. 14: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Presented by: Min Jean Cho, Lu Shao, and Tian Yun (recording)
Supplementary reading: Language Models are Few-Shot Learners (GPT-3)
Assignment #3 released: submission
Sign up final project teams: sheet
Sign up paper presentation preferences: sheet
Oct. 16: End-To-End Memory Networks
Presented by: Nihal Nayak and Kenny Jones (recording)
Supplementary reading: REALM: Retrieval-Augmented Language Model Pre-Training
Oct. 19: VideoBERT: A Joint Model for Video and Language Representation Learning
Presented by: Michael Mao and Ziyang Long
Supplementary reading: ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Oct. 21: Image GPT: Generative Pretraining from Pixels
Presented by: Ruochen Zhang and Rafael Rodriguez (recording)
Supplementary reading: Representation Learning with Contrastive Predictive Coding
Oct. 23: Final project idea pitch
Roughly 10 groups, 5 minutes each. (recording)
Oct. 26: Invited talk by Carl Vondrick on learning from unlabeled videos.
Assignment #3 due.
Visual question answering
Oct. 28: GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
Presented by: Ruochen Zhang and Peter Lyu
Supplementary reading: CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Oct. 30: From Recognition to Cognition: Visual Commonsense Reasoning
Presented by: Xiling Zhang, Nihal Nayak
Supplementary reading: VIOLIN: A Large-Scale Dataset for Video-and-Language Inference
Nov. 2: Learning to Reason: End-To-End Module Networks for Visual Question Answering
Presented by: Zachary Hoffman and Zhizhong (Isaac) Chen
Supplementary reading: Measuring compositionality in representation learning
Embodied AI, visual-language navigation
Nov. 4: Invited talk by Peter Anderson on visual-language navigation
Nov. 6: Speaker-Follower Models for Vision-and-Language Navigation
Presented by: Rafael Rodriguez, William Kuenne and Deniz Bayazit
Supplementary reading: Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
Nov. 9: Habitat: A Platform for Embodied AI Research
Presented by: Ifrah Idress and Kenny Jones
Supplementary reading: ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
Multimodal learning
Nov. 11: See, Hear, Explore: Curiosity via Audio-Visual Association
Presented by: Sean Hastings, Lu Shao and Yihang Dong
Supplementary reading: Objects that Sound
Nov. 13: Speech2Action: Cross-modal Supervision for Action Recognition
Presented by: Suchen Zheng and Min Jean Cho
Supplementary reading: MovieGraphs: Towards Understanding Human-Centric Situations from Videos
Nov. 16: End-to-End Learning of Visual Representations from Uncurated Instructional Videos
Presented by: Tian Yun, Michael Mao and Tyler DeFroscia
Supplementary reading: Visual Grounding in Video for Unsupervised Word Translation
Nov. 18: Invited talk by Miki Rubinstein on audio-visual learning.
Nov. 20: Invited talk by Jiajun Wu on neural symbolic VQA.
Dataset and model biases
Nov. 23: Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Presented by: Houyu Zhang and Daniel Kotroco
Supplementary reading: Women also Snowboard: Overcoming Bias in Captioning Models
Nov. 25 & 27: Thanksgiving, no classes.
Nov. 30: Project presentation, part I
Dec. 2: Project presentation, part II