A note on recordings:
Recording links below will direct you to YouTube lecture recordings.
For events specific to our course (office hours, project presentations, etc.), videos can be found on Canvas under Media Gallery.
Please allow 24 hours for recording uploads (though typically they are available sooner). There is a lag in Zoom processing, and sometimes the clips require editing - we try our best to get these videos to you quickly to study.
Prerequisite: MATH 18 AND MATH 20B AND (CSE 103 or ECON 120A or MATH 183 or ECE 109 or MATH 180A or MATH 181A) or instructor approval. Programming ability in Python required. Musical skills are not required but would be an advantage.
To enroll: Submit course clearance request via Enrollment Authorization System (EASy)
Description: The course covers topics of Machine Learning dealing with music and audio signals, including basic concepts in digital signal processing, MIDI, audio analysis and feature extraction, temporal models including Markov and autoregressive models, and generative neural networks representation learning with applications to automatic music generation and sound synthesis. There will be several short programming assignments that correspond to the lecture materials. Students are given an option to choose between a more advanced final programming assignment set or performing a small group final project of their choice. Prior musical knowledge is not required but would be an advantage.
Schedule
— WEEK 1 —
Class 1 (slides):
Welcome Session, Syllabus Overview, and Introduction to Assignments (recording available on Canvas)
Features, Structures, and Representation of Sound and Music Data (MIDI, audio)
Audio features: pitch, timbre, loudness, MFCC, Chroma, Pitch
Class 2 (slides):
Aleatoric music, stochastic processes in music (Mozart Dice Game, Xenakis)
Noise, Periodicity, Spectral Flatness, Perception and Cognition in Music Information Dynamics
Class 3 (slides):
Spectral Analysis, Fourier transform
Linear Filters and Convolution Theorem
Class 4 (slides):
Short Time Fourier Analysis, Perfect Reconstruction (COLA), and Griffith-Lim phase reconstruction
— WEEK 2 —
Class 5 (slides):
Information Theory and Music, Shannon’s Theorems for Compression and Rate-Distortion,
Markov Models for Text and Music, Lempel-Ziv Algorithm and Musical Style
Class 6 (slides):
History of the voder / vocoder, Text to Speech
Linear Prediction, Formants and Spectral Density
Class 7 (slides):
HMM (Hidden Markov Model) in Speech and Music
Variable Markov Oracle (VMO), Music Information Dynamics
Class 8 (slides):
Introduction to Neural Networks & Keras
— WEEK 3 —
Class 9 (slides):
Neural Network Models of Music
Autoencoder (AE) and Feature Learning
Class 10 (slides):
Class 11 (slides):
Recurrent Neural Network for Music
Generative Adversarial Networks
Class 12 (slides):
— WEEK 4 —
Class 13:
Diffusion Models (slides, extra slides 1, extra slides 2):
PF-ODE, DDIM and Consistency, DSM Colab Simulation
Project Proposal Presentations
Class 14:
Project Proposal Presentations
Class 15 (slides):
(Note: There is no Quiz in the class and this video is for review purposes only).
Class 16 (slides):
VMO Threshold, Deep Music Information Dynamics
--- WEEK 5 —
Project Presentations and Discussion