One-day Workshop on Constraint Programming and Data Mining

Orléans, 7th November 2014
Amphi Herbrand, building 3IA.

Recent work on pattern mining and on clustering  have shown the interest of Constraint Programming for specifying and solving Data Mining tasks. Declarativeness combined to a large set of available constraints allow complex data-mining queries to be specified with unprecedented ease. On the other hand, formulating Data Mining tasks in Constraint Programming leads to new challenges for Constraint Programming, requiring to adapt existing tools or to define new primitives adapted to the task at hand. Moreover,  Data Mining can be a tool for Constraint Programming

Research topics that exploit Constraint Programming and  Data Mining interest more and more researchers within the two communities. This one-day workshop allows to present some contributions to the field and to discuss new challenges in this domain.

The registration is free but mandatory  (before October, 20) for organization reasons. This workshop is supported by LIFO (Laboratoire d'Informatique Fondamentale d'Orléans).

The workshop will take place at amphi Herbrand, building 3IA, on the campus of University Orléans.


Program
  • On Preference-based (soft) pattern sets
    Bruno Crémilleux, GREYC, Université de Caen
  • Towards a Cross-fertilization between Data Mining and Constraint Programming
    Lakhdar Saïs, CRIL, Université d'Artois
  • Relational Constraint Programming for Data Mining
    Siegfried Nijssen, KU Leuven
  • Trends in the use of constraint solving for data mining
    Tias Guns, KU Leuven
  • Distance-Based Constrained Clustering by Constraint Programming
    Khanh-Chuong Duong, Université d'Orléans

Abstracts

On Preference-based (soft) pattern sets
Patrice Boizumault, Bruno Crémilleux, Samir Loudni et Willy Ugarte
GREYC, Université de Caen

In the last decade, the pattern mining community has witnessed a sharp shift from efficiency-based approaches to methods which can extract more meaningful patterns.  Recently, new methods adapting results from multi criteria decision analyses such as Pareto efficiency, or skylines, have been studied.  Within pattern mining, this novel line of research allows the easy expression of preferences according to a dominance relation on a set of measures and avoids the well-known threshold issue.

In this talk, we present the discovery of soft skyline patterns (or soft skypatterns) based on theoretical relationships with condensed representations of patterns and the dynamic constraint satisfaction problems framework. To avoid an apriori choice of measures, we propose to use the skypattern cube according to the set of measures.  We show how to efficiently build the skypattern cube and provide a concise representation of the cube based on skypattern equivalence classes. Navigation trough the cube indicates differences and similarities
between skypattern sets when a measure is added or removed, it highlights the role of the measures and helps to discover the most interesting skypattern sets. We set our work in the global picture of the discovery of pattern sets and k-pattern sets/n-ary patterns.


Towards a Cross-fertilization between Data Mining and Constraint Programming
Said Jabbour, Lakhdar Sais et Yakoub Salhi
CRIL, Université d'Artois

In this talk, we overview our contribution to data mining and more generally to the cross-fertilization between data mining, constraint programming and propositional satisfiability (http://www.cril.univ-artois.fr/decMining/).  We will focus on two contributions. First, we show how propositional satisfiability can be used to model and solve problems in data mining. As an illustration, we present a SAT-based declarative approach for enumerating top-k (closed, frequent) itemsets in transactional databases. Secondly, we discuss the potential contribution of data mining to propositional satisfiability. In this context, we present a first application of data mining to compress Boolean formulas conjunctive normal form. This work is supported by the ANR DAG project « Declarative approches for enumerating interesting patterns » (http://liris.cnrs.fr/dag/).


Relational Constraint Programming for Data Mining

Siegfried Nijssen
KU Leuven

In this talk, I will present the framework of Relational Constraint Programming, which combines relational algebras with constraint programming in order to allow for the specification and solution of a wider range of data mining tasks than can be solved using constraint programming alone. I will show how pattern mining tasks like skyline pattern mining and sequence  mining fit in this framework, and will also illustrate its use on other
tasks, such as probabilistic inference tasks.


Trends in the use of constraint solving for data mining
Tias Guns
KU Leuven

The use of constraints is prevalent in data mining, most often to express background knowledge or feedback from the user. Computationally, constraints have been studied for many years by the constraint programming community. In recent years, such constraint solving technology is being applied more and more in the field of data mining.
Using examples from pattern mining, clustering and structure learning, we discuss recent trends in the use of constraint solving for data mining. This ranges from aspects of modeling the problem (high level languages, low level encodings, new primitives) to solving the problem (redundant constraints, relaxations, propagators, search strategies).
Three recurring motivations are: the ability to handle complex constraints, the flexibility towards combinations of constraints and the ability to reuse state-of-the-art solvers. To conclude, we discuss a number of challenges in the use of constraint solvers for data mining.


Distance-Based Constrained clustering by Constraint programming
Thi-Bich-Hanh Dao, Khanh-Chuong Duong and Christel Vrain
LIFO, Université d'Orléans

Constrained Clustering has received much attention this last decade. It allows to make the clustering task either easier or more accurate by integrating user-constraints. In this talk, we present a framework based on Constraint Programming for distance-based Constrained Clustering. The framework is general and declarative, it allows to choose among different optimization criteria and to integrate different kinds of user-constraints. The model is flexible since it allows to set only a lower and un upper bound on the number of clusters. We show that the framework can easily be embedded in a more general process and we illustrate this on the problem of finding the optimal Pareto solutions of a bi-criterion optimization process.