COMP 590/790: Vision Transformers
Spring 2025
Course Description
This is an advanced course that will focus on the latest research in vision transformers. It consists of research paper presentations, paper discussions, and a semester-long course project. Topics will include transformer architectures for classification and dense prediction tasks in images and videos, efficient transformer architectures, data-efficient learning, self-supervised learning, multi-modal learning, image/video generation, and others. A background in deep learning is required.
Administrative Information
Instructor: Gedas Bertasius
Time: Tue & Thu 11:00 am - 12:15 pm
Location: FB 007
Office Hours: By Appointment
TA: Ce Zhang
TA Office Hours: Fridays 10am-12pm at SN252
Canvas Site: link
Grading
Class Participation: 10%
Paper Critiques: 20%
Paper Presentations: 30%
Course Project: 40%
Course Policies
Class Participation: Please come to class prepared for a paper discussion with your peers.
Late Submissions: The class is structured around a tight paper presentation schedule. Therefore, late assignments will not be accepted.
Academic Integrity: For your presentations and projects, you are allowed to use materials from external sources. However, you must clearly acknowledge those sources.