ADM
ADM - Analyse de Données et Modélisation probabiliste
M. Sc. course - Data analysis and probabilistic modeling
Making data speak: Advanced probabilistic data analysis and modeling
Disclaimer: Courses mainly based on Guillaume Gravier's original course. I thank a lot Guillaume for the course material and for is its numerous recommendations regarding the details of the course.
Data, whatever they are, are of very limited value without the possibility to extract valuable information to better synthesize, understand, predict. Statistical methods for data analysis and probabilistic models for statistical machine learning are commonly used to do so. This course aims at acquiring the basic techniques for data analysis (exploratory statistics) and probabilistic modeling (inferential statistics) and to study their application to different types of data (symbolic data, language, numerical data, signals, images, etc.). The lectures naturally articulate around the two major steps of any modeling process: understand your data then design an adequate model.
Keywords: data analysis, factor analysis, variance analysis, clustering, hypothesis testing, decision theory, estimation theory, Gaussian mixture models, EM algorithm, Markov chains, Markov fields, hidden Markov chains, Viterbi algorithm, Bayesian networks, token passing algorithm
Lectures, with the 2023-2024 dates
([prev.] indicates last year material which may be updated during the semester)
Wed. Sep. 13 8h00. A gentle reminder of the basics of probability: Kolmogorov
Fri. Sep. 15 11h30. A gentle reminder of the basics of probability: random variables, moments, classical laws
Thu. Sep. 21 09h45 (12D-213). Exploratory statistics: visualization, summaries, correlation, factor analysis, PCA/LDA
Fri. Sep. 22 11h30 (Guernesey).
Exploratory statistics: PCA/LDA
Cluster analysis: k-means, agglomerative/divisive clustering
Wed. Sep. 27 11h30 (Guernesey).
Cluster analysis: spectral clustering and other weird things
Fundamentals of statistical machine learning and estimation theory: cost function, decision theory
Fri. Sep. 29 11h30 (Guernesey). Fundamentals of statistical machine learning and estimation theory: empirical estimation, estimation theory
Thu. Oct. 5 13h15. (12D - 52). Fundamentals of statistical machine learning and estimation theory: practical estimation techniques
Fri. Oct. 6 11h30 (Guernesey). Mixture models: mixture models, hidden variables, estimation-maximization (EM) algorithm
see also [prev.] handnotes on the EM for a two Gaussian mixture model
Thu. Oct. 12 8h00 (12D-52) & 09h45 (12D-50). Observable and hidden Markov models: Markov property, Markov chain, hidden Markov chain, Viterbi algorithm, Baum-Welsh algorithm, practical examples
Fri. Oct. 13 11h30 (Guernesey). Entropy and conditional random fields: maximum entropy principle, maxent model, logistic regression, log-linear sequence models, parameter estimation
Mon. Oct. 16 16h45 (Guernesey).
Fri. Oct. 20 8h00 (Guernesey). Graphical models and Bayesian network: directed/undirected graphical models, Bayesian networks, inference and reasoning, moralization, variable elimination, junction tree algorithm
Wed. Oct. 25 8h00 (Guernesey). Hypothesis testing: typology, likelihood ratio test, classical mean value tests, comparison and statistical significance, variance analysis
Fri. Oct. 27 11h30 (Guernesey). Hypothesis testing: multiple-hypothesis testing, exercices
Fri. Nov. 10 11h30 (Guernesey). Final exam.
Evaluation / exams
Homework. Write a short comment on either one of the articles below. Maximum length is 1000 words, ca. 1.5-2 pages single column 11 point font (English or French, as you wish). Your report shall identify the techniques seen in the classroom, explain why they are appropriate in the context of this paper and what efforts authors have made to cast their work into a probabilistic framework, explain how they were adapted and/or extended, discuss the limits you foresee (whether mentioned in the paper or not). Deadline for mailing comment: before Nov. 13, 2023, 08:00 CET
Douglas Reynolds, Thomas Quatieri and Robert Dun. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10:19-41, 2000
Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579-2605, 2008
Tao Xiang and Shaogang Gong. Spectral clustering with eigen vector selection. Pattern recognition letter, 41:1012-1029, 2008
Final exam. Standard 1h30 written exam. You can check the text of past exams below.
2023 in French, in English
2022 in French, in English
2019 in French
2018 in French, in English
2017 in French, in English