OSU CSE 5524

OSU CSE 5524 (2025 Fall)

Foundations of Computer Vision

Class time: Tuesday and Thursday, 12:45 pm - 2:05 pm

Classroom: Cockins Hall 240

Course website: https://sites.google.com/view/osu-cse-5524-au25-chao

Instructor: Prof. Wei-Lun (Harry) Chao

Email: chao.209@osu.edu

Office hours: tentatively Tuesday 3 pm - 4 pm & Friday 9 am - 10 am (DL587)

TA: Zheda Mai

Email: mai.145@osu.edu

Office hours: tentatively Monday 11 am - 12 pm & Wednesday 2 pm - 3 pm (BE406, Station #5)

Course information

Syllabus: Link (Pay attention to the academic misconduct statement)

Course Description:

Computer vision algorithms for use in human-computer interactive systems; image formation, image features, image processing, object recognition, image generation, 3D from images, and applications.

This course focuses on the foundations of computer vision, with particular emphasis on learning-based methods and 3D. To build background, the course covers the basics of image formation, camera modeling, machine learning, and neural networks. With this groundwork, the course introduces image-processing-based methods and probabilistic models of images. Then, the course explores modern neural network architectures for computer vision, including convolutional neural networks and transformers. The course then builds upon these models to develop algorithms for image feature extraction, visual recognition, image generation, and vision-and-language understanding. Moving beyond single images, the course further introduces stereo vision and multi-view vision, including structure from motion and neural radiance fields. Finally, the course introduces algorithms for motion estimation and tracking. Along with the course, representative applications of computer vision will be introduced and discussed.

Course Goals / Objectives:

Master fundamental and recent computer vision concepts and algorithms
Be competent with computer vision application design and evaluation
Gain a deep understanding of learning-based algorithms and 3D inference for computer vision
Be exposed to original research and applications in computer vision
Be familiar with the Python/PyTorch programming environment
More broadly, the aim is to provide students with a strong foundational background, enabling them to pursue computer-vision-centered or machine-learning-centered MS/PhD paths or explore future opportunities in the computer vision, machine learning, and artificial intelligence industries.

Course Topics (subject to change):

1. Introduction to computer vision

a. Introduction to the course

b. A simple vision system

2. Image formation

a. Concepts of imaging and lenses

b. Images and 3D geometry

c. Camera modeling

d. Cameras as linear systems

3. Foundations of image processing

a. Linear filtering and convolution

b. Fourier analysis

c. Blur filters, image derivatives, and filter banks

d. (Up/down) sampling

e. Image pyramids

4. Foundations of learning

a. Introduction to learning

b. Gradient-based learning algorithms

c. Generalization

d. Neural networks as distribution transformers

5. Probabilistic models of images

a. Color

b. Statistical image models

c. Textures

6. Neural architectures for vision

a. Convolutional neural nets

b. Transformers

7. Generative image models and representation learning

a. Representation learning

b. Generative models

8. Understanding vision with semantics and language

a. Visual recognition

b. Vision and language

9. Challenges in learning-based vision

a. Data bias and shift

b. Robustness and generality

c. Transfer learning and adaptation

10. Understanding geometry

a. Stereo vision

b. Homographies

c. Depth estimation from single images

d. Feature detection and matching

e. Multi-view geometry and structure from motion

f. Radiance fields

11. Understanding motion

a. Motion estimation

b. Optical flow estimation

c. Object tracking

Course Credits: 3 units

Pre-requisites:

Required background:

§ Data structures and algorithms: 2331

§ Statistics and probability: 5522, Stat 3460, or 3470

Suggested background:

§ Linear algebra: Math 2568, 2174, 4568, or 5520H

§ Artificial intelligence: 3521, 5521, or 5243

Students are expected to have a decent degree of mathematical sophistication, including linear algebra, multivariate calculus, probability, and statistics. Students are also expected to know programming, algorithm design, and data structures.
Programming in Python 3 is required. Programming in PyTorch and using Hugging Face might be needed.
Review materials can be found here: linear algebra, probability, Python-1, Python-2, Python-3
Also, check HERE for a set of slide decks for linear algebra

Announcements, communications, and discussions:

We will make normal announcements using the Carmen Canvas. Announcements of urgent matters will be mailed to your name.#@osu.edu address. If you do not regularly read that account, make sure you forward it to somewhere that does.
We will use Piazza for discussions. If you have questions about the course materials or policy, please post them on Piazza. The TA and I will also monitor these discussions and answer as appropriate, but students should be active and feel free to use the forums to have group discussions as well.
Please only use email to contact the instructor or the TA for urgent or personal issues. Any e-mails sent to the instructor or TA should include the tag "[OSU-CSE-5524]" in the subject line. (This ensures we can filter and prioritize your messages.) We reserve the right to forward any questions (and their answers) to the entire class if they should prove relevant. Please indicate if you wish to be anonymized (i.e., have your name removed) in this case.

Textbooks and References

Reading after or before the class plays an important part in your learning. Readings will be assigned for each lecture. Scientific papers, book chapters, and technical material may be suggested in class and provided by the instructor.

Required Textbook:

Antonio Torralba, Phillip Isola, and William T. Freeman, Foundations of Computer Vision. MIT Press, 2024. (Ebook available through the OSU library website)

Suggested References:

Richard Szeliski, Computer Vision: Algorithms and Applications (second edition). Springer, 2022.
David Foster, Generative Deep Learning: Teaching Machines To Paint, Write, Compose, and Play (second edition). O'REILLY, 2023. (Ebook available through the OSU library website)

Other Good References:

Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola, Dive into Deep Learning. 2021. https://d2l.ai/index.html
Simon J. D. Prince, Understanding Deep Learning. The MIT Press, 2023. https://udlbook.github.io/udlbook/
Christopher M. Bishop and Hugh Bishop, Deep Learning: Foundations and Concepts. Springer, 2024.

Other Good CV Courses:

Stanford CV: http://vision.stanford.edu/teaching/cs131_fall2223/ and https://cs231n.stanford.edu/
MIT CV: http://6.869.csail.mit.edu/sp22/schedule.html and https://advances-in-vision.github.io/schedule.html
CMU CV: http://16385.courses.cs.cmu.edu/spring2024/
Brown CV: https://browncsci1430.github.io/
NYU CV: https://www.sainingxie.com/cv-fall2024/
Wisconsin-Madison CV: https://sites.google.com/view/cs639spring2023dlcv
Michigan CV: https://web.eecs.umich.edu/~justincj/teaching/eecs442/WI2021/ and https://web.eecs.umich.edu/~justincj/teaching/eecs498/WI2022/
Cornell CV: https://www.cs.cornell.edu/courses/cs4670/2021sp/ and https://www.cs.cornell.edu/courses/cs6670/2023fa/

PyTorch:

Useful Reference:

Kaare Brandt Petersen and Michael Syskind Pedersen, The Matrix Cookbook

Grading and Homework Assignments

Grading (tentative):

Quizzes (linear algebra): 4%
Homework: 50% (9 + 5 + 9 + 9 + 9 + 9)
Midterm exam (10/23/2025, in class): 20%
Final project (presentation on 12/16/2025, in class): 26%
- Please reserve 12/16/2025 from 2:00 - 6:00 pm for the presentation time.
- See Carmen announcements for detailed breakdowns and requirements.

Homework:

There will be around 6 homework assignments.
Each assignment may include a problem set and a programming set.
Programming in Python 3 is required.
Carmen (and other platforms, such as GitHub) will be used for submissions.
For the problem set and the report of the programming set, we will only allow PDF submission.
You must strictly follow the homework and submission instructions.

Quizzes:

We will use Carmen's quizzes if there are any.

Midterm exam:

The midterm is in person.
Exam materials/questions may come from the reading listed in the schedule below.

Final project:

Strongly suggest that you get familiar with PyTorch, GitHub, and Hugging Face ASAP
Details will be announced in class and can be found on Carmen.

Policy (see the syllabus for details)