This page presents the experiments, the datasets and the results used for the invited paper to the Advances in Knowledge Discovery and Management book, vol. 7 (Spinger, LNAI serie) (under submission). This book is a post-proceedings of the french conference on Knowledge Discovery and Management 2016 (Conférence EGC).
This article presents the use of Answer Set Programming (ASP) to mine sequential patterns. ASP is a high-level declarative logic programming paradim for high level encoding combinatorial and optimization problem solving as well as knowledge representation and reasoning. Thus, ASP is a good candidate for implementing pattern mining with background knowledge, which has been a data mining issue for a long time.
We propose encodings of the classical sequential pattern mining tasks within two representations of embeddings (fill-gaps vs skip-gaps) and for various kinds of patterns: frequent, constrained and condensed.
We compare the computational performance of these encodings with each other to get a good insight into the efficiency of ASP encodings. The results show that the fill-gaps strategy is better on real problems due to lower memory consumption. Finally, compared to a constraint programming approach (cpsm), another declarative programming paradigm, our proposal showed comparable performance.
The encodings presented in the article can be downloaded here: encodings.zip.
You'll find below the ASP facts of the datasets we used to benchmark our ASP encoding. Simulated datasets are also available in a format that can be read by CPSM (Constraint Based Sequence Mining)
Simulated datasets used to evaluate computing performances:
Real datasets used to compare mining tasks (.lp files, the original datasets for CPSM can be found here):