IFT 6765 - H2024 - Lecture Schedule

IFT 6765 - Links between Computer Vision and Language

Course Lectures

Lecture 1 (01/17/2024) : Introduction to the course

Lecturer: Aishwarya Agrawal
Slides (pdf, keynote)

Lecture 2 (01/19/2024, 01/24/2024, 01/26/2024) : Vision-Language landscape before Transformer + Pre-training

Lecturer: Aishwarya Agrawal
Slides (pdf, keynote)

Lecture 3 (01/26/2024, 01/31/2024) : Vision-Language landscape during Transformer + Pre-training 

Lecturer: Aishwarya Agrawal
Slides (pdf, keynote)

Lecture 4 (02/02/2024) : Shortcomings of Vision-Language models and Open Challenges

Lecturer: Aishwarya Agrawal
Slides (pdf, keynote)

Lecture 5 (02/16/2024) : Image captioning

Review paper

Paper presentation 1: Image captioning Slides

Paper presentation 2: Image captioning Slides

Project presentation : Multimodal Retrieval Augmented Generationfor Natural Language Query in Egocentric Video Slides

Lecture 6 (02/21/2024) : Visual Question Answering: Datasets

Review paper

Paper presentation 1: Visual Question Answering Slides

Paper presentation 2: VQA:NMN models Slides

Project presentation 1: Augmenting Language Models with Vision Capabilities Slides

Project presentation 2: Enhancing the diffusion model to understand simple-prompt Slides

Lecture 7 (02/23/2024) : Visual Dialog: Datasets and Models

Review paper

Paper presentation : Visual Dialog: Datasets and Models Slides

Project presentation 1: Knowledge Graphs to facilitate Domain Adaptation? A biomedicine study case Slides

Project presentation 2: Evaluating Adversarial Robustness of VLMs Slides

Lecture 8 (02/28/2024) : Interpretability and Explainability

Review paper

Paper presentation 1: Generating Visual Explanations and Grounding Visual Explanations Slides

Paper presentation 2: Interpretability and Explainability Slides

Project presentation 1: Text-to-Image Generation with Mamba Slides

Project presentation 2: Video Narration : Recursive Captioning and Query-Driven Conversations for Enhanced Video Understanding Slides

Lecture 9 (03/01/2024) : Finetuning based VLP models

Review paper

Paper presentation 1: Fine Tuning based VLP models Slides

Paper presentation 2: Fine Tuning based VLP models Slides

Project presentation 1: Solving Geometry Problems by Generating Modular Code through VLMs Slides

Project presentation 2: Dataset and Facial skin VQA Slides

Lecture 10 (03/13/2024) : LLM based vision-language models

Review paper

Paper presentation 1: Instruction Following LLM based VLMs Slides

Paper presentation 2: Parameter efficient LLM based vision-language models Slides

Project presentation 1: Spatially Aware VLM for Autonomous Driving Slides

Project presentation 2: Unsupervised Multi-Source Domain Generalization Fine-Tuning for CLIP Slides

Lecture 11 (03/15/2024) : VLP models for vision: classification, image generation

Review paper

Paper presentation 1: Learning Vision Representation with Vision-Language Models Slides

Project presentation 1: Text-Guided World-to-3D Generation on Mobile Devices Slides

Lecture 12 (03/20/2024) : Vision-language models for language-only tasks

Review paper

Paper presentation 1: Vision-Language Models for Language-only Tasks Slides

Project update 1: Retrieval Augmented Generation for Natural Language Query in Egocentric Video Slides

Lecture 13 (03/22/2024) : Shortcomings of Vision-Language models

Review paper

Paper presentation 1: Shortcomings of Vision-Language models Slides

Paper presentation 2: Shortcomings of Vision Language Models slides

Project update 1: team 4 Hanrui Huang & Cheng Chen Slides

Project upate 2: Augmenting Language Models with Vision Capabilities Slides

Lecture 14 (03/27/2024) : Beyond statistical learning in vision-language

Review paper

Paper presentation 1: Beyond statistical learning in vision-language Slides

Paper presentation 2: Beyond statistical learning in vision-language slides

Project update 1: Evaluating Adversarial Robustness of VLMs Slides

Project upate 2: Knowledge Graphs to facilitate Domain Adaptation, A biomedicine study case Slides

Lecture 15 (04/03/2024) : Image Generation

Review paper

Paper presentation 1: Evaluation of Text to Image Models Slides

Paper presentation 2: Text-to-Image Generation slides

Project update 1: Text-to-Image Generation with Mamba Slides

Project upate 2: Augmented Video Understanding: Soccer games dense captioning Slides

Lecture 16 (04/05/2024) : Video understanding and generation

Review paper

Paper presentation 1: Video Understanding Slides

Paper presentation 2: Video Understanding and Generation slides

Project update 1: Projects Slides

Project upate 2: Skin VQA Slides

Lecture 17 (04/10/2024) : Embodied AI

Review paper

Paper presentation 1: LLM for Embodied AI Slides

Paper presentation 2: Embodied AI Platform slides

Project update 1: Spatially Aware VLM for Autonomous Driving slides

Project upate 2: Multi-Source Domain Generalization using CLIP Slides

Lecture 18 (04/12/2024) : 3D understanding and generation

Review paper

Paper presentation 1: Text-to-3D Generative Models Slide

Project update 1: Text-Guided World-to-3D Generation on Mobile Devices Slides