Data Wrangling Course
This web page contains the slides of the course as well as the subjects of lab assignments. I will put them in here prior to the lecture dates.
General introduction: Data Quality and Data Wrangling
A First hand-on Data Wrangling: Data Extraction
A Second hand-on Data Wrangling: Data Cleaning
Cities.csv
Data cleaning using Constraints
Object identification (isn't included in the exam)
Datasets that can be used for Functional Dependencies Discovery
IRIS (CSV): 5 columns, 150 rows, 4 fnctional dependencies
Adult Dataset (CSV): 14 columns, 48.842 rows, 78 functional dependencies
Resources:
- Voici un article sur la qualité de données. Vous y trouverez les termes (dimensions) de qualité de données en français.
- Yannis Sismanis, Paul Brown, Peter J. Haas, Berthold Reinwald: GORDIAN: Efficient and Scalable Discovery of Composite Keys. 691-702
Internships:
- Research Internship: On Enhancing Knowledge Graphs with Provenance Support
- Internship in a company: Gestion, Analyse de Données et Recommandations - Application Au Psychométrie