Mining rare sequential patterns

This page presents the results and experiments used for the ILP 2017 article.

Download the article here.

Abstract

We present an approach of meaningful rare sequential pattern mining based on the declarative programming paradigm of ASP -- Answer Set Programming. The new setting of rare sequential pattern mining is introduced. To cope with the huge amount of meaningless rare patterns, our ASP approach provides an easy manner to encode expert constraints on the expected patterns. We use clingo 5.0 as a solver.

Authors

    • A. Samet
    • T. Guyet
    • B. Negrevergne

Encodings

ASP-Rare Sequence Miner

ASP-Minimal Rare Sequence Miner

En optimized version to mine minimal rare patterns have been designed:

Experiments

Data Simulated datasets used to evaluate computing performances

    • dataset generator: generator.py use -h option to detail about how to use this generator
    • dataset ZIP contains a set of databases of simulated sequences. The mean length of sequences they contains are from 10 to 20. The file database_100_2_10.lp is a file containing sequences of 100 records and mean length 10. The 2 indicates the dataset id.

Apriori Rare & MRG_EXP

Our ASP encodings are compared with procedural approaches.

  • code of the procedural version to mine rare patterns
  • code of the procedural version to mine MRPs