Disclaimer: Courses mainly based on Guillaume Gravier's original course. I thank a lot Guillaume for the course material and for is its numerous recommendations regarding the details of the course.
Data, whatever they are, are of very limited value without the possibility to extract valuable information to better synthesize, understand, predict. Statistical methods for data analysis and probabilistic models for statistical machine learning are commonly used to do so. This course aims at acquiring the basic techniques for data analysis (exploratory statistics) and probabilistic modeling (inferential statistics) and to study their application to different types of data (symbolic data, language, numerical data, signals, images, etc.). The lectures naturally articulate around the two major steps of any modeling process: understand your data then design an adequate model.
Keywords: data analysis, factor analysis, variance analysis, clustering, hypothesis testing, decision theory, estimation theory, Gaussian mixture models, EM algorithm, Markov chains, Markov fields, hidden Markov chains, Viterbi algorithm, Bayesian networks, token passing algorithm
(Lectures are 1.5-hour sessions, held in Room Guernesey unless otherwise specified)
([prev.] indicates last year material which may be updated during the semester)
Wed. Sep. 11 9h45. A gentle reminder of the basics of probability: Kolmogorov
Fri. Sep. 13 9h45. A gentle reminder of the basics of probability: random variables, moments, classical laws
Wed. Sep. 18 8h. Exploratory statistics: visualization, summaries, correlation
Fri. Sep. 20 11h30 (12D-i50).
Exploratory statistics: PCA/LDA
Cluster analysis: k-means
Mon. Sep. 23 8h.
Fri. Sep. 27 11h30.
Wed. Oct. 2 8h.
Cluster analysis: agglomerative/divisive clustering, spectral clustering and other weird things
Thu. Oct. 3 8h. Fundamentals of statistical machine learning and estimation theory: cost function, decision theory, basic estimators
Wed. Oct. 9 9h45 (12D-i213). Fundamentals of statistical machine learning and estimation theory: estimation theory, practical estimation techniques
Thu. Oct. 10 13h15 (12D-i52). Mixture models: mixture models, hidden variables, estimation-maximization (EM) algorithm
Wed. Oct. 16 8h. Observable and hidden Markov models: Markov property, Markov chain, hidden Markov chain, Viterbi algorithm, Baum-Welsh algorithm, practical examples
Fri. Oct. 18 8h. Observable and hidden Markov models: Markov property, Markov chain, hidden Markov chain, Viterbi algorithm, Baum-Welsh algorithm, practical examples
Thu. Oct. 24 13h15. Graphical models and Bayesian network: directed/undirected graphical models, Bayesian networks, inference and reasoning, moralization, variable elimination, junction tree algorithm
Fri. Oct. 25 8h00. Entropy and conditional random fields: maximum entropy principle, maxent model, logistic regression, log-linear sequence models, parameter estimation
Wed. Nov. 6 9h45. Hypothesis testing: typology, likelihood ratio test, classical mean value tests, comparison and statistical significance, variance analysis
Fri. Nov. 8 8h. Hypothesis testing: multiple-hypothesis testing, exercices
Fri. Nov. 15 8h. Final exam.
Homework. Write a short comment on either one of the articles below. Maximum length is 1000 words, ca. 1.5-2 pages single column 11 point font (English or French, as you wish). Your report shall identify the techniques seen in the classroom, explain why they are appropriate in the context of this paper and what efforts authors have made to cast their work into a probabilistic framework, explain how they were adapted and/or extended, discuss the limits you foresee (whether mentioned in the paper or not). Deadline for mailing comment: before Nov. 12, 2024, 08:00 CET
Douglas Reynolds, Thomas Quatieri and Robert Dun. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10:19-41, 2000
Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579-2605, 2008
Tao Xiang and Shaogang Gong. Spectral clustering with eigen vector selection. Pattern recognition letter, 41:1012-1029, 2008
Final exam. Standard 1h30 written exam. You can check the text of past exams below.
2023 in French, in English
2022 in French, in English
2019 in French
2018 in French, in English
2017 in French, in English