IFT 6765 - H2022 - Lecture Schedule

IFT 6765 - Links between Computer Vision and Language

Course Lectures

For detailed lecture schedule and recommended papers for each class, see this.

Settlement

Lecture 1 (01/18/2022) : Introduction to the course

Lecturer: Aishwarya Agrawal
Recorded Lecture (See Piazza for passcode)
Slides (key), Slides (pdf)

Lecture 2 (01/21/2022) : Vision-Language landscape before Transformer + Pre-training

Lecturer: Aishwarya Agrawal
Recorded Lecture (See Piazza for passcode)
Slides (key), Slides (pdf)

Lecture 3 (01/25/2022) : Vision-Language landscape during Transformer + Pre-training 

Lecturer: Aishwarya Agrawal
Recorded Lecture (See Piazza for passcode)
Slides (key), Slides (pdf)

Lecture 4 (01/28/2022) : Shortcomings of Vision-Language models and Open Challenges

Lecturer: Aishwarya Agrawal
Recorded Lecture (See Piazza for passcode)
Slides (key), Slides (pdf)

Lecture 5 (02/01/2022) : Image Retrieval and Referring Expressions

Review Paper: Learning Deep Structure-Preserving Image-Text Embeddings
Aishwarya's class discussion slides

Paper Presentation I: None.

Paper Presentation II: A Corpus for Reasoning About Natural Language Grounded in Photographs
Lecturer: Xing Han Lu
Slides

Project Presentation: Motion, Objects, Language
Project Lead: Simon Ramstedt
Slides

Recorded Lecture (See Piazza for passcode)

Lecture 6 (02/04/2022) : Image Captioning: Part I

Review Paper: Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Aishwarya's class discussion slides

Paper Presentation I: None.

Paper Presentation II: Exploring Nearest Neighbor Approach for Image Captioning
Lecturer: Edward Son
Slides

Project Presentation: Improving Evaluation Consistency via Image-aware Textual Expansion
Project Lead: Xing Han Lu
Slides

Recorded Lecture (See Piazza for passcode)

Lecture 7 (02/08/2022) : Image Captioning: Part II

Review Paper: Neural Baby Talk
Aishwarya's class discussion slides

Paper Presentation I: Neural Baby Talk
Lecturer: Joshua Jacobs
Slides

Paper Presentation II: None

Project Presentation: Toward Improving Language Modeling with Visual Ground Information
Project Leads: Ge Li and Benjamin Akera
Slides

Recorded Lecture (See Piazza for passcode)

Lecture 8 (02/11/2022) : Visual Question Answering: Datasets

Review Paper: VQA: Visual Question Answering
Aishwarya's class discussion slides

Paper Presentation I: VQA: Visual Question Answering
Lecturer: Benjamin Akera
Slides

Paper Presentation II: None

Project Presentation: CLIPing VQA
Project Lead: Etienne Boucher
Slides

Recorded Lecture (See Piazza for passcode)

Lecture 9 (02/15/2022) : Visual Question Answering: Models (Part I)

Review Paper: Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Aishwarya's class discussion slides

Paper Presentation I: Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Lecturer: Prishruit Punia
Slides

Paper Presentation II: None

Project Presentation: Referring Image Segmentation Using ROI Features
Project Lead: Edward Son
Slides

Additional Project Presentation: OOD Generalization with CLIP on VQA
Project Lead: Sai Aravind
Slides

Recorded Lecture (See Piazza for passcode)

Lecture 10 (02/18/2022) : Visual Question Answering: Models (Part II)

Review Paper: Deep Compositional Question Answering with Neural Module Networks

Paper Presentation I: Deep Compositional Question Answering with Neural Module Networks
Lecturer: Emanuele Bugliarello
Slides

Paper Presentation II: Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering
Lecturer: Joshua Jacobs
Slides

Project Presentation: Multimodal Few-shot Learning with Frozen Pre-trained Models
Project Lead: Oscar Mañas
Slides

Recorded Lecture (See Piazza for passcode)

Lecture 11 (02/22/2022) : Visual Dialog

Review Paper: Visual Dialog

Paper Presentation I: Visual Dialog
Lecturer: Xing Han Lu
Slides

Paper Presentation II: Visual Dialogue: Datasets and Models
Lecturer: Etienne Boucher
Slides

Project Presentation: iCLIP
Project Lead: Farzad Salajegheh
Slides

Recorded Lecture (See Piazza for passcode)

Lecture 12 (02/25/2022) : Interpretability and Explainability

Review Paper: Multimodal Explanations: Justifying Decisions and Pointing to Explanations

Paper Presentation I: Multimodal Explanations: Justifying Decisions and Pointing to Explanations
Lecturer: Ge Li
Slides

Paper Presentation II: None

Project Presentation: Comparative Study of Top-Down Attention Models through Bottom-up Attention
Project Lead: Romeo Anawi
Slides

Project Presentation: Text/Character Recognition with Various Models
Project Lead: Joshua Jacobs
Slides

Project Presentation: Continual Image Caption
Project Lead: Prishruit Puniao
Slides

Recorded Lecture (See Piazza for passcode)

Lecture 13 (03/08/2022) : VLP Models with task-specific heads

This lecture is cancelled.

Review Paper: ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

Paper Presentation I: This presentation is moved to the next class.

Paper Presentation II: None

Project Presentation: None

Lecture 14 (03/11/2022) : VLP Models without task-specific heads

Review Paper: Unifying Vision-and-Language Tasks via Text Generation

Paper Presentation I: ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Lecturer: Romeo Anawi
Slides

Paper Presentation I: ~~VLP Models without Task-Specific Heads (Mini Survey)~~
Lecturer: ~~Oscar Manas~~
S~~lides~~
The above presentation is moved to March 22nd class due to the presenter being unwell.

Project Presentation: Analyzing the robustness of V&L evaluation metrics
Lecturer: Xing Han Lu
Slides

Recorded Lecture (See Piazza for passcode)

Lecture 15 (03/15/2022) : VLP Models for Vision

Review Paper: VirTex: Learning Visual Representations from Textual Annotations

Paper Presentation I: VirTex: Learning Visual Representations from Textual Annotations
Lecturer: Oscar Manas
Slides

Paper Presentation II: Learning Transferable Visual Models from Natural Language supervision
Lecturer: Farzad Salajegheh
Slides

Recorded Lecture (See Piazza for passcode)

Lecture 16 (03/18/2022) : VLP Models for Language

Review Paper: Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision

Paper Presentation I: Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision
Lecturer: Farzard Salajegheh
Slides

Paper Presentation II: VIDLANKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
Lecturer: Prishruit Punia
Slides

Project Presentation: Cliping VQA
Lecturer: Etienne Boucher
Slides

Recorded Lecture (See Piazza for passcode)

Lecture 17 (03/22/2022) : Analyzing VLP Models

Review Paper: Behind the scenes: Revealing the Secretes of Pre-trained Vision-and-Language Models

Paper Presentation I: VLP Models without Task-Specific Heads (Mini Survey)
Lecturer: Oscar Manas
Slides

Project Presentation: Improving Language Modelling with Visual Grounding Information
Lecturer: Ge Li & Benjamin Akera
Slides

Project Presentation: Analysis of Using Detection Features in Referring Image Segmentation
Lecturer: Edward Son
Slides

Recorded Lecture (See Piazza for passcode)

Lecture 18 (03/25/2022): Shortcomings of Vision-Language Models

Review Paper: Analyzing the Behaviors of Visual Question Answering Models

Paper Presentation I: Analyzing the Behaviors of Visual Question Answering Models
Lecturer: Edward Son
Slides

Paper Presentation II: Gender Bias in Multimodal Models
Lecturer: Ge Li
Slides

Project Presentation: Data-efficient Adaptation of Learge Pretrained Models for Multimodal Few-shot Learning
Lecturer: Oscar Manas
Slides

Recorded Lecture (See Piazza for passcode)

Lecture 20 (03/29/2022) : Beyond Statistical Leanring in Vision-Language: Part 1

Review Paper: Don't just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

Paper Presentation I: Don't just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Lecturer: Etienne Boucher
Slides

Paper Presentation II: RUBi: Reducing Unimodal Biases for VQA
Lecturer: Romeo Anawi
Slides

Project Presentation: iCLIP
Lecturer: Fazard Salajegheh
Slides

Recorded Lecture (See Piazza for passcode)

04/01/2022 : NO LECTURE TODAY

Lecture 21 (04/05/2022) : Beyond Statistical Learning in Vision-Language: Part 2

Review Paper: Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision

Paper Presentation I: Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision
Lecturer: Benjamin Akera
Slides

Project Presentation: Analyzing Impact of Bottom-up Attention
Lecturer: Romeo Anawi
Slides

Project Presentation: Text/Character Recognition
Lecturer: Joshua Jacobs
Slides

Project Presentation: Continual Image Captioning
Lecturer: Prishruit Punia
Slides

Recorded Lecture (See Piazza for passcode)

Lecture 22 (04/08/2022) : Final Project Presentations

Project Presentation: Assessing Robustness of Evaluation Metrics for Vision and Language
Lecturer: Xing Han Lu
Slides

Project Presentation: Towards Improving Language Modelling with Visual Grounding Information
Lecturer: Ge Li and Benjamin Akera
Slides

Project Presentation: CLIPing VQA
Lecturer: Etienne Boucher
Slides

Project Presentation: Analysis of Using Detection Features in Referring Image Segmentation
Lecturer: Edward Son
Slides

Recorded Lecture (See Piazza for passcode)

Lecture 23 (04/12/2022) : Final Project Presentations

Project Presentation: Data Efficient Adaptation of Large Pretrained Models for Multimodal Few-Shot Learning
Lecturer: Oscar Manas
Slides

Project Presentation: iCLIP: Iterative CLIP
Lecturer: Farzad Salajegheh
Slides

Project Presentation: Analyzeing Impact of Bottom-Up Attention
Lecturer: Romeo Anawi
Slides

Project Presentation: Text and Character Recognition with Various Techniques
Lecturer: Joshua Jacobs
Slides

Project Presentation: Continual Image Caption
Lecturer: Prishruit Punia
Slides

Recorded Lecture (See Piazza for passcode)

Lecture 24 (04/19/2022) : Spotlight Video

Title: AREVL: Assessomg Robustness of Evalution Metrics for Vision and Language
Lecturer: Xing Han Lu
Video

Title: iCLIP: Iterative CLIP
Lecturer: Farzad Salajegheh
Video

Title: Data-efficient Adaptation of Large Pretrained Models for Multimodal Few-Shot Learning
Lecturer: Oscar Manas
Video

Title: Towards Improving Language Modelling with Visual Grounding Information
Lecturer: Benjamin Akera, Ge Li
Video

Title: Analysis of Using Detection Features in Referring Image Segmentation
Lecturer: Edward Son
Video

Title: Text and Character Recognition with Various Techniques
Lecturer: Joshua Jacobs
Video

Title: Analyzing Impact of Bottom-up Attention
Lecturer: Romeo Anawi
Video

Title: Continual Image Captioning
Lecturer: Prishruit Punia
Video

Report abuse

IFT 6765 - Links between Computer Vision and Language

Course Lectures

Lecture 1 (01/18/2022) : Introduction to the course

Lecture 2 (01/21/2022) : Vision-Language landscape before Transformer + Pre-training

Lecture 3 (01/25/2022) : Vision-Language landscape during Transformer + Pre-training

Lecture 4 (01/28/2022) : Shortcomings of Vision-Language models and Open Challenges

Lecture 5 (02/01/2022) : Image Retrieval and Referring Expressions

Lecture 6 (02/04/2022) : Image Captioning: Part I

Lecture 7 (02/08/2022) : Image Captioning: Part II

Lecture 8 (02/11/2022) : Visual Question Answering: Datasets

Lecture 9 (02/15/2022) : Visual Question Answering: Models (Part I)

Lecture 10 (02/18/2022) : Visual Question Answering: Models (Part II)

Lecture 11 (02/22/2022) : Visual Dialog

Lecture 12 (02/25/2022) : Interpretability and Explainability

Lecture 13 (03/08/2022) : VLP Models with task-specific headsThis lecture is cancelled.

Lecture 14 (03/11/2022) : VLP Models without task-specific heads

Lecture 15 (03/15/2022) : VLP Models for Vision

Lecture 16 (03/18/2022) : VLP Models for Language

Lecture 17 (03/22/2022) : Analyzing VLP Models

Lecture 18 (03/25/2022): Shortcomings of Vision-Language Models

Lecture 20 (03/29/2022) : Beyond Statistical Leanring in Vision-Language: Part 1

04/01/2022 : NO LECTURE TODAY

Lecture 21 (04/05/2022) : Beyond Statistical Learning in Vision-Language: Part 2

Lecture 22 (04/08/2022) : Final Project Presentations

Lecture 23 (04/12/2022) : Final Project Presentations

Lecture 24 (04/19/2022) : Spotlight Video

Lecture 3 (01/25/2022) : Vision-Language landscape during Transformer + Pre-training 

Lecture 13 (03/08/2022) : VLP Models with task-specific heads

This lecture is cancelled.