ADM

ADM - Analyse de Données et Modélisation probabiliste

M. Sc. course - Data analysis and probabilistic modeling

Making data speak: Advanced probabilistic data analysis and modeling

Disclaimer: Courses mainly based on Guillaume Gravier's original course. I thank a lot Guillaume for the course material and for is its numerous recommendations regarding the details of the course.

Data, whatever they are, are of very limited value without the possibility to extract valuable information to better synthesize, understand, predict. Statistical methods for data analysis and probabilistic models for statistical machine learning are commonly used to do so. This course aims at acquiring the basic techniques for data analysis (exploratory statistics) and probabilistic modeling (inferential statistics) and to study their application to different types of data (symbolic data, language, numerical data, signals, images, etc.). The lectures naturally articulate around the two major steps of any modeling process: understand your data then design an adequate model.

Keywords: data analysis, factor analysis, variance analysis, clustering, hypothesis testing, decision theory, estimation theory, Gaussian mixture models, EM algorithm, Markov chains, Markov fields, hidden Markov chains, Viterbi algorithm, Bayesian networks, token passing algorithm

Lectures, with the 2024-2025 dates

(Lectures are 1.5-hour sessions, held in Room Guernesey unless otherwise specified)
([prev.] indicates last year material which may be updated during the semester)

Wed. Sep. 11 9h45. A gentle reminder of the basics of probability: Kolmogorov
Fri. Sep. 13 9h45. A gentle reminder of the basics of probability: random variables, moments, classical laws
Wed. Sep. 18 8h. Exploratory statistics: visualization, summaries, correlation
Fri. Sep. 20 11h30 (12D-i50).
- Exploratory statistics: PCA/LDA
- Cluster analysis: k-means
Mon. Sep. 23 8h.
Fri. Sep. 27 11h30.
Wed. Oct. 2 8h.
- Cluster analysis: agglomerative/divisive clustering, spectral clustering and other weird things
Thu. Oct. 3 8h. Fundamentals of statistical machine learning and estimation theory: cost function, decision theory, basic estimators
Wed. Oct. 9 9h45 (12D-i213). Fundamentals of statistical machine learning and estimation theory: estimation theory, practical estimation techniques
Thu. Oct. 10 13h15 (12D-i52). Mixture models: mixture models, hidden variables, estimation-maximization (EM) algorithm
- see also handnotes on the EM for a two Gaussian mixture model
Wed. Oct. 16 8h. Observable and hidden Markov models: Markov property, Markov chain, hidden Markov chain, Viterbi algorithm, Baum-Welsh algorithm, practical examples
Fri. Oct. 18 8h. Observable and hidden Markov models: Markov property, Markov chain, hidden Markov chain, Viterbi algorithm, Baum-Welsh algorithm, practical examples
Thu. Oct. 24 13h15. Graphical models and Bayesian network: directed/undirected graphical models, Bayesian networks, inference and reasoning, moralization, variable elimination, junction tree algorithm
Fri. Oct. 25 8h00. Entropy and conditional random fields: maximum entropy principle, maxent model, logistic regression, log-linear sequence models, parameter estimation
Wed. Nov. 6 9h45. Hypothesis testing: typology, likelihood ratio test, classical mean value tests, comparison and statistical significance, variance analysis
Fri. Nov. 8 8h. Hypothesis testing: multiple-hypothesis testing, exercices
Fri. Nov. 15 8h. Final exam.

Evaluation / exams

Homework. Write a short comment on either one of the articles below. Maximum length is 1000 words, ca. 1.5-2 pages single column 11 point font (English or French, as you wish). Your report shall identify the techniques seen in the classroom, explain why they are appropriate in the context of this paper and what efforts authors have made to cast their work into a probabilistic framework, explain how they were adapted and/or extended, discuss the limits you foresee (whether mentioned in the paper or not). Deadline for mailing comment: before Nov. 12, 2024, 08:00 CET
- Douglas Reynolds, Thomas Quatieri and Robert Dun. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10:19-41, 2000
- Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579-2605, 2008
- Tao Xiang and Shaogang Gong. Spectral clustering with eigen vector selection. Pattern recognition letter, 41:1012-1029, 2008
Final exam. Standard 1h30 written exam. You can check the text of past exams below.
- 2023 in French, in English
- 2022 in French, in English
- 2019 in French
- 2018 in French, in English
- 2017 in French, in English

Google Sites

Report abuse