G
For detailed lecture schedule and recommended papers for each class, see this.
Settlement
Lecturer: Aishwarya Agrawal
Recorded Lecture (See Piazza for passcode)
Slides (key), Slides (pdf)
Lecturer: Aishwarya Agrawal
Recorded Lecture (See Piazza for passcode)
Slides (key), Slides (pdf)
Lecturer: Aishwarya Agrawal
Recorded Lecture (See Piazza for passcode)
Slides (key), Slides (pdf)
Lecturer: Aishwarya Agrawal
Recorded Lecture (See Piazza for passcode)
Slides (key), Slides (pdf)
Review Paper: Learning Deep Structure-Preserving Image-Text Embeddings
Aishwarya's class discussion slides
Paper Presentation I: None.
Paper Presentation II: A Corpus for Reasoning About Natural Language Grounded in Photographs
Lecturer: Xing Han Lu
Slides
Project Presentation: Motion, Objects, Language
Project Lead: Simon Ramstedt
Slides
Recorded Lecture (See Piazza for passcode)
Review Paper: Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Aishwarya's class discussion slides
Paper Presentation I: None.
Paper Presentation II: Exploring Nearest Neighbor Approach for Image Captioning
Lecturer: Edward Son
Slides
Project Presentation: Improving Evaluation Consistency via Image-aware Textual Expansion
Project Lead: Xing Han Lu
Slides
Recorded Lecture (See Piazza for passcode)
Review Paper: Neural Baby Talk
Aishwarya's class discussion slides
Paper Presentation I: Neural Baby Talk
Lecturer: Joshua Jacobs
Slides
Paper Presentation II: None
Project Presentation: Toward Improving Language Modeling with Visual Ground Information
Project Leads: Ge Li and Benjamin Akera
Slides
Recorded Lecture (See Piazza for passcode)
Review Paper: VQA: Visual Question Answering
Aishwarya's class discussion slides
Paper Presentation I: VQA: Visual Question Answering
Lecturer: Benjamin Akera
Slides
Paper Presentation II: None
Project Presentation: CLIPing VQA
Project Lead: Etienne Boucher
Slides
Recorded Lecture (See Piazza for passcode)
Review Paper: Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Aishwarya's class discussion slides
Paper Presentation I: Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Lecturer: Prishruit Punia
Slides
Paper Presentation II: None
Project Presentation: Referring Image Segmentation Using ROI Features
Project Lead: Edward Son
Slides
Additional Project Presentation: OOD Generalization with CLIP on VQA
Project Lead: Sai Aravind
Slides
Recorded Lecture (See Piazza for passcode)
Review Paper: Deep Compositional Question Answering with Neural Module Networks
Paper Presentation I: Deep Compositional Question Answering with Neural Module Networks
Lecturer: Emanuele Bugliarello
Slides
Paper Presentation II: Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering
Lecturer: Joshua Jacobs
Slides
Project Presentation: Multimodal Few-shot Learning with Frozen Pre-trained Models
Project Lead: Oscar Mañas
Slides
Recorded Lecture (See Piazza for passcode)
Review Paper: Visual Dialog
Paper Presentation I: Visual Dialog
Lecturer: Xing Han Lu
Slides
Paper Presentation II: Visual Dialogue: Datasets and Models
Lecturer: Etienne Boucher
Slides
Project Presentation: iCLIP
Project Lead: Farzad Salajegheh
Slides
Recorded Lecture (See Piazza for passcode)
Review Paper: Multimodal Explanations: Justifying Decisions and Pointing to Explanations
Paper Presentation I: Multimodal Explanations: Justifying Decisions and Pointing to Explanations
Lecturer: Ge Li
Slides
Paper Presentation II: None
Project Presentation: Comparative Study of Top-Down Attention Models through Bottom-up Attention
Project Lead: Romeo Anawi
Slides
Project Presentation: Text/Character Recognition with Various Models
Project Lead: Joshua Jacobs
Slides
Project Presentation: Continual Image Caption
Project Lead: Prishruit Puniao
Slides
Recorded Lecture (See Piazza for passcode)
Review Paper: ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Paper Presentation I: This presentation is moved to the next class.
Paper Presentation II: None
Project Presentation: None
Review Paper: Unifying Vision-and-Language Tasks via Text Generation
Paper Presentation I: ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Lecturer: Romeo Anawi
Slides
Paper Presentation I: VLP Models without Task-Specific Heads (Mini Survey)
Lecturer: Oscar ManasSlides
The above presentation is moved to March 22nd class due to the presenter being unwell.
Project Presentation: Analyzing the robustness of V&L evaluation metrics
Lecturer: Xing Han Lu
Slides
Recorded Lecture (See Piazza for passcode)
Review Paper: VirTex: Learning Visual Representations from Textual Annotations
Paper Presentation I: VirTex: Learning Visual Representations from Textual Annotations
Lecturer: Oscar Manas
Slides
Paper Presentation II: Learning Transferable Visual Models from Natural Language supervision
Lecturer: Farzad Salajegheh
Slides
Recorded Lecture (See Piazza for passcode)
Review Paper: Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision
Paper Presentation I: Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision
Lecturer: Farzard Salajegheh
Slides
Paper Presentation II: VIDLANKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
Lecturer: Prishruit Punia
Slides
Project Presentation: Cliping VQA
Lecturer: Etienne Boucher
Slides
Recorded Lecture (See Piazza for passcode)
Review Paper: Behind the scenes: Revealing the Secretes of Pre-trained Vision-and-Language Models
Paper Presentation I: VLP Models without Task-Specific Heads (Mini Survey)
Lecturer: Oscar Manas
Slides
Project Presentation: Improving Language Modelling with Visual Grounding Information
Lecturer: Ge Li & Benjamin Akera
Slides
Project Presentation: Analysis of Using Detection Features in Referring Image Segmentation
Lecturer: Edward Son
Slides
Recorded Lecture (See Piazza for passcode)
Review Paper: Analyzing the Behaviors of Visual Question Answering Models
Paper Presentation I: Analyzing the Behaviors of Visual Question Answering Models
Lecturer: Edward Son
Slides
Paper Presentation II: Gender Bias in Multimodal Models
Lecturer: Ge Li
Slides
Project Presentation: Data-efficient Adaptation of Learge Pretrained Models for Multimodal Few-shot Learning
Lecturer: Oscar Manas
Slides
Recorded Lecture (See Piazza for passcode)
Review Paper: Don't just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Paper Presentation I: Don't just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Lecturer: Etienne Boucher
Slides
Paper Presentation II: RUBi: Reducing Unimodal Biases for VQA
Lecturer: Romeo Anawi
Slides
Project Presentation: iCLIP
Lecturer: Fazard Salajegheh
Slides
Recorded Lecture (See Piazza for passcode)
Review Paper: Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision
Paper Presentation I: Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision
Lecturer: Benjamin Akera
Slides
Project Presentation: Analyzing Impact of Bottom-up Attention
Lecturer: Romeo Anawi
Slides
Project Presentation: Text/Character Recognition
Lecturer: Joshua Jacobs
Slides
Project Presentation: Continual Image Captioning
Lecturer: Prishruit Punia
Slides
Recorded Lecture (See Piazza for passcode)
Project Presentation: Assessing Robustness of Evaluation Metrics for Vision and Language
Lecturer: Xing Han Lu
Slides
Project Presentation: Towards Improving Language Modelling with Visual Grounding Information
Lecturer: Ge Li and Benjamin Akera
Slides
Project Presentation: CLIPing VQA
Lecturer: Etienne Boucher
Slides
Project Presentation: Analysis of Using Detection Features in Referring Image Segmentation
Lecturer: Edward Son
Slides
Recorded Lecture (See Piazza for passcode)
Project Presentation: Data Efficient Adaptation of Large Pretrained Models for Multimodal Few-Shot Learning
Lecturer: Oscar Manas
Slides
Project Presentation: iCLIP: Iterative CLIP
Lecturer: Farzad Salajegheh
Slides
Project Presentation: Analyzeing Impact of Bottom-Up Attention
Lecturer: Romeo Anawi
Slides
Project Presentation: Text and Character Recognition with Various Techniques
Lecturer: Joshua Jacobs
Slides
Project Presentation: Continual Image Caption
Lecturer: Prishruit Punia
Slides
Recorded Lecture (See Piazza for passcode)
Title: AREVL: Assessomg Robustness of Evalution Metrics for Vision and Language
Lecturer: Xing Han Lu
Video
Title: iCLIP: Iterative CLIP
Lecturer: Farzad Salajegheh
Video
Title: Data-efficient Adaptation of Large Pretrained Models for Multimodal Few-Shot Learning
Lecturer: Oscar Manas
Video
Title: Towards Improving Language Modelling with Visual Grounding Information
Lecturer: Benjamin Akera, Ge Li
Video
Title: Analysis of Using Detection Features in Referring Image Segmentation
Lecturer: Edward Son
Video
Title: Text and Character Recognition with Various Techniques
Lecturer: Joshua Jacobs
Video
Title: Analyzing Impact of Bottom-up Attention
Lecturer: Romeo Anawi
Video
Title: Continual Image Captioning
Lecturer: Prishruit Punia
Video