Massive Data Mining by Sampling


Research in data mining is the scientific discipline of making software that is able to extract (or "mine") interesting patterns from possibly huge data sets. With the abundance of information being collected and stored in society, business and science it is no longer feasible for humans to find and extract all relevant facts, models, insights etc. that may be buried in the data. In contrast to traditional databases and search engines, data mining algorithms find features or models of data that may only be vaguely described by a human user.

In this project we:
-  Investigate data mining algorithms that are based on sophisticated sampling and sketching methods. Roughly speaking, the idea is that useful information about significant patterns in a data set may be inferred from samples of the set of possible patterns, or more generally by computing a summary or "sketch" of the possible patterns.
-  Work on basic aspects of data mining and, in collaboration with external partners, application areas of data mining: Financial forecasting, cross-analysis of genetic and phenotype data, and recommendation systems.


MaDaMS PhD students: Ninh Pham, Morten Stöckel, and Konstantin Kutzkov

News: HTML / feed