Machine Learning for Visual Understanding
Spring 2021 (Mon/Wed 12:30 - 1:45)
Instructor: Joonseok Lee
TA: TBD
Summary
This course covers mathematical modeling and machine learning techniques to analyze visual (and other multimedia) data. Specifically, this course focuses on fundamental machine learning and recent deep learning methods that are widely used in visual data analysis, and discusses how these methods are applied to solve various problems with visual data. This course consists of lectures, practices, and a team project. Topics include
Review of machine learning and neural networks
Convolutional Neural network (CNNs)
Recurrent neural networks (RNNs)
Image problems (image classification, object detection, segmentation)
Video problems (video classification, action recognition, temporal localization, tracking)
Multi-modal data analysis (visual-audio-text)
Generative modeling
Logistics
Textbook
"Probabilistic Machine Learning: An Introduction (2nd Ed.)" by Kevin Murphy, 2021, MIT Press.
"Deep Learning" by Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2015, MIT Press.
Additional reading materials and papers will be provided.
Prerequisite
Intermediate+ Python programming: you should be able to code what you think in Python.
Machine learning basics: took this course or equivalent
Basic calculus, linear algebra, data structures and algorithms
Grading
Assignments 20%, Mid-term exam 25%, Final exam 25%, Team project 30% (proposal 5%, mid-term 10%, final 15%)
Content
Course Introduction
First Approaches for Image Classification
Loss Functions and Optimization
Neural Networks Basics & Backpropagation
Convolutional Neural Networks
Training Neural Networks
Transfer Learning, CNN Case Studies
Object Detection
Video Classification (Action Recognition)
Recurrent Neural Networks
RNN-based Video Models
Metric Learning
Multimodal Learning
Generative Models
Self-supervised Learning
Style Transfer
Scientific Applications