IFT 6765 - Links between Computer Vision and Language
Course Lectures
Lecture 6 (02/21/2024) : Visual Question Answering: Datasets
Paper presentation 1: Visual Question Answering Slides
Paper presentation 2: VQA:NMN models Slides
Project presentation 1: Augmenting Language Models with Vision Capabilities Slides
Project presentation 2: Enhancing the diffusion model to understand simple-prompt Slides
Lecture 8 (02/28/2024) : Interpretability and Explainability
Paper presentation 1: Generating Visual Explanations and Grounding Visual Explanations Slides
Paper presentation 2: Interpretability and Explainability Slides
Project presentation 1: Text-to-Image Generation with Mamba Slides
Project presentation 2: Video Narration : Recursive Captioning and Query-Driven Conversations for Enhanced Video Understanding Slides
Lecture 9 (03/01/2024) : Finetuning based VLP models
Paper presentation 1: Fine Tuning based VLP models Slides
Paper presentation 2: Fine Tuning based VLP models Slides
Project presentation 1: Solving Geometry Problems by Generating Modular Code through VLMs Slides
Project presentation 2: Dataset and Facial skin VQA Slides
Lecture 10 (03/13/2024) : LLM based vision-language models
Paper presentation 1: Instruction Following LLM based VLMs Slides
Paper presentation 2: Parameter efficient LLM based vision-language models Slides
Project presentation 1: Spatially Aware VLM for Autonomous Driving Slides
Project presentation 2: Unsupervised Multi-Source Domain Generalization Fine-Tuning for CLIP Slides
Lecture 13 (03/22/2024) : Shortcomings of Vision-Language models
Paper presentation 1: Shortcomings of Vision-Language models Slides
Paper presentation 2: Shortcomings of Vision Language Models slides
Project update 1: team 4 Hanrui Huang & Cheng Chen Slides
Project upate 2: Augmenting Language Models with Vision Capabilities Slides
Lecture 14 (03/27/2024) : Beyond statistical learning in vision-language
Paper presentation 1: Beyond statistical learning in vision-language Slides
Paper presentation 2: Beyond statistical learning in vision-language slides
Project update 1: Evaluating Adversarial Robustness of VLMs Slides
Project upate 2: Knowledge Graphs to facilitate Domain Adaptation, A biomedicine study case Slides
Lecture 15 (04/03/2024) : Image Generation
Paper presentation 1: Evaluation of Text to Image Models Slides
Paper presentation 2: Text-to-Image Generation slides
Project update 1: Text-to-Image Generation with Mamba Slides
Project upate 2: Augmented Video Understanding: Soccer games dense captioning Slides