Courses in English
Data Analysis (2016-2024) - ISEP (ING2)
Data analysis is the process of exploring raw data and making sense of them using various statistical and Machine Learning tools. The process includes cleaning the data, as well as transforming and modeling them with the goal of finding useful knowledge. As a field at the cross between Mathematics and Computer Science, data analysis is used in many research fields as well as in several industries where it helps to reach enlighted decisions.
In this course, I introduce several key tools from data analysis: This course first covers some basic elements of statistic that are useful to analyse numerical data (univariate, bivaritate and multivariate statistics, estimates, and confidences intervals). It also explains how to deal with different types of data : numerical, binary, categorical, text data and time series, all the whil providing specific tools for each type of data. This course also gives some important elements of data visualization including techniques such as PCA, ISOMAP, LLE and t-SNE that are useful to visualize high dimensional data. And finally, 2 lectures are given about clustering and classification and can be seen as a brief introduction to Machine Learning and its most basic methods and concepts.
Lecture 1 : Introduction to data Analysis : Univariate statistics of variables & Random variables - [PDF]
Lecture 2 : Mining bivariate data - [PDF]
Lecture 3 : Mining categorial bivariate data & introduction to multivariate - [PDF]
Lecture 4 : Data Visualization - Feature selection and linear methods - [PDF]
Lecture 5 : Data Visualization - Non-linear methods - [PDF]
Lecture 6 : Introduction to unsupervised learning and clustering - [PDF]
Lecture 7 : Introduction to supervised learning - [PDF]
Lecture 8 : Time series analysis, ARIMA models - [PDF]
Lecture 9 : Time series analysis, Hidden Markov Models - [PDF]
Lecture 10 : Introduction to text Mining - [PDF]
Introduction to Data Stream Processing (2018-2024) - UPSaclay & IPP (M2)
This lecture is part of a course I give in the Data Science Master of Paris Polytechnic Institute and is an introduction to the main concepts of datastream processing, and in particular difficulties that may arise when one try to do clustering on data streams. Other key elements and tools such as Kafka are covered by other colleagues that participate in the same course. The courses I teach are the following:
Basics of Datastream processing, datastreams and clustering : From online clustering to datastream clustering [PDF]