IFT 6765 - Links between Computer Vision and Language

Winter 2023, A seminar course offered by the Université de Montréal

(Guidelines credits: Advanced Computer Vision course taught by Devi Parikh at Virginia Tech)

Course Overview:

What is this course about? This will be a seminar course on recent advances in vision and language research – a sub-field of artificial intelligence that studies multimodal tasks at the intersection of computer vision and natural language processing. Some examples of these tasks include image / video captioning (automatically describing images / videos in natural language), visual question answering (automatically answering natural language questions about images / videos), visual dialog (holding a conversation with a human grounded in an image), visual commonsense reasoning (automatically answering questions involving commonsense reasoning about situations described in images) etc. 


Why study Vision and Language: Vision and Language research has seen tremendous progress over the past decade, owing to the availability of large-scale datasets, development of high-capacity deep learning models and availability of computational resources. There are various motivations behind studying vision and language: 


Topics covered: Major Vision and Language tasks, datasets, modelling techniques and their shortcomings, such as: 


Course Objectives: Gain a thorough understanding of recent advances in Vision and Language (tasks, datasets, modelling techniques, shortcomings).


Course Structure: This is a seminar course. The vast majority of the lecture time will be devoted to (i) students presenting papers to each other,  (ii) group discussion of the papers, (iii) students presenting their project ideas and updates to the class, and (iv) group discussion and brainstorming of the project presentations. A more detailed course structure is outlined below.


Prerequisites: Please note that this is an advanced course at the intersection of computer vision and natural language processing. As prerequisites, you should have the basic knowledge of computer vision, machine learning, deep learning, natural language. Also, please note that projects are a major part of this course. So you should be well versed in programming and be comfortable with using deep learning frameworks such as PyTorch, TensorFlow etc. If you have any concerns about whether you have the required prerequisites, feel free to talk to the instructor about it in the first class.

Class Timings:


First class: Jan 17th

Last scheduled class: April 14th (classes could end earlier if we manage to finish the course)


Last class before Winter break: Feb 24th

First class after Winter break: March 7th

No class on April 7th due to Easter holiday


Class Format:  


In-person at Auditorium 1 at Mila (6650 Rue St. Urbain, Montreal). 

Students are required to attend all classes in-person. There is no online access to the class. The lectures will be recorded but shared only if a student misses the class due to exceptional circumstances (such as medical reasons etc.).

 

Evaluation:


Instructor and TA:


Communication Platform:

We will use Piazza for (access code to join Piazza has been shared via Studium)

Writing a question:

Privately emailing TA or instructor vs submitting a private question on Piazza:


Office Hours: