COMP 590/790: Vision Transformers 

Spring 2025

Course Description

This is an advanced course that will focus on the latest research in vision transformers. It consists of research paper presentations, paper discussions, and a semester-long course project. Topics will include transformer architectures for classification and dense prediction tasks in images and videos, efficient transformer architectures, data-efficient learning, self-supervised learning, multi-modal learning, image/video generation, and others.  A background in deep learning is required.

Administrative Information

Instructor: Gedas Bertasius

Time: Tue & Thu 11:00 am - 12:15 pm

Location: FB 007

Office Hours: By Appointment

TA: Ce Zhang

TA Office Hours: Fridays 10am-12pm at SN252

Canvas Site: link

Grading

Class Participation: 10%

Paper Critiques: 20%

Paper Presentations: 30%

Course Project: 40%

Course Policies