IFT 6765 - Links between Computer Vision and Language

Course Lectures

Lecture 1 (01/17/2024) : Introduction to the course

Lecturer: Aishwarya Agrawal
Slides (pdf, keynote)

Lecture 2 (01/19/2024, 01/24/2024, 01/26/2024) : Vision-Language landscape before Transformer + Pre-training

Lecturer: Aishwarya Agrawal
Slides (pdf, keynote)

Lecture 3 (01/26/2024, 01/31/2024) : Vision-Language landscape during Transformer + Pre-training


Lecturer: Aishwarya Agrawal
Slides (pdf, keynote)

Lecture 4 (02/02/2024) : Shortcomings of Vision-Language models and Open Challenges

Lecturer: Aishwarya Agrawal
Slides (pdf, keynote)

Lecture 5 (02/16/2024) : Image captioning

Review paper

Paper presentation 1:  Image captioning  Slides

Paper presentation 2:  Image captioning  Slides

Project presentation : Multimodal Retrieval Augmented Generationfor Natural Language Query in Egocentric Video Slides

Lecture 6 (02/21/2024) : Visual Question Answering: Datasets

Review paper

Paper presentation 1: Visual Question Answering  Slides

Paper presentation 2: VQA:NMN models Slides

Project presentation 1:  Augmenting Language Models with Vision Capabilities Slides

Project presentation 2: Enhancing the diffusion model to understand simple-prompt Slides


Lecture 7 (02/23/2024) :  Visual Dialog: Datasets and Models

Review paper

Paper presentation : Visual Dialog: Datasets and Models  Slides

Project presentation 1:  Knowledge Graphs to facilitate Domain Adaptation? A biomedicine study case Slides

Project presentation 2:  Evaluating Adversarial Robustness of VLMs Slides

Lecture 8 (02/28/2024) :  Interpretability and Explainability

Review paper

Paper presentation 1: Generating Visual Explanations and Grounding Visual Explanations  Slides

Paper presentation 2:  Interpretability and Explainability Slides

Project presentation 1:  Text-to-Image Generation with Mamba Slides

Project presentation 2:  Video Narration : Recursive Captioning and Query-Driven Conversations for Enhanced Video Understanding​ Slides

Lecture 9 (03/01/2024) :  Finetuning based VLP models

Review paper

Paper presentation 1:  Fine Tuning based VLP models Slides

Paper presentation 2:  Fine Tuning based VLP models Slides

Project presentation 1:  Solving Geometry Problems by Generating Modular Code through VLMs Slides

Project presentation 2:  Dataset and Facial skin VQA Slides

Lecture 10 (03/13/2024) :  LLM based vision-language models

Review paper

Paper presentation 1: Instruction Following LLM based VLMs Slides

Paper presentation 2:  Parameter efficient LLM based vision-language models  Slides

Project presentation 1:  Spatially Aware VLM for Autonomous Driving Slides

Project presentation 2:  Unsupervised Multi-Source Domain Generalization Fine-Tuning for CLIP  Slides

Lecture 11 (03/15/2024) : VLP models for vision: classification, image generation

Review paper

Paper presentation 1: Learning Vision Representation with Vision-Language Models Slides

Project presentation 1:  Text-Guided World-to-3D Generation on Mobile Devices Slides

Lecture 12 (03/20/2024) : Vision-language models for language-only tasks

Review paper

Paper presentation 1: Vision-Language Models for Language-only Tasks Slides

Project update 1:  Retrieval Augmented Generation for Natural Language Query in Egocentric Video Slides

Lecture 13 (03/22/2024) : Shortcomings of Vision-Language models

Review paper

Paper presentation 1: Shortcomings of Vision-Language models Slides

Paper presentation 2: Shortcomings of Vision Language Models slides

Project update 1:  team 4 Hanrui Huang & Cheng Chen Slides

Project upate 2:  Augmenting Language Models with Vision Capabilities Slides

Lecture 14 (03/27/2024) : Beyond statistical learning in vision-language

Review paper

Paper presentation 1: Beyond statistical learning in vision-language Slides

Paper presentation 2: Beyond statistical learning in vision-language slides

Project update 1:  Evaluating Adversarial Robustness of VLMs Slides

Project upate 2:  Knowledge Graphs to facilitate Domain Adaptation, A biomedicine study case Slides

Lecture 15 (04/03/2024) : Image Generation

Review paper

Paper presentation 1: Evaluation of Text to Image Models Slides

Paper presentation 2: Text-to-Image Generation slides

Project update 1:  Text-to-Image Generation with Mamba Slides

Project upate 2:  Augmented Video Understanding: Soccer games dense captioning Slides

Lecture 16 (04/05/2024) : Video understanding and generation

Review paper

Paper presentation 1: Video Understanding Slides

Paper presentation 2: Video Understanding and Generation slides

Project update 1:  Projects Slides

Project upate 2:  Skin VQA  Slides

Lecture 17 (04/10/2024) : Embodied AI

Review paper

Paper presentation 1: LLM for Embodied AI Slides

Paper presentation 2: Embodied AI Platform slides

Project update 1:  Spatially Aware VLM for Autonomous Driving slides

Project upate 2:  Multi-Source Domain Generalization using CLIP​  Slides

Lecture 18 (04/12/2024) : 3D understanding and generation

Review paper

Paper presentation 1: Text-to-3D Generative Models Slide

Project update 1: Text-Guided World-to-3D Generation on Mobile Devices​  Slides