Introduction to the contrast mining

In many domains, statistical reports aim at showing contrasts between groups. For example, a poll after election gives the vote (candidate A or candidate B), the social Category (rich or poor) and the Age (old or young) of some electors. The electors are grouped according to their votes. The aim is to find significant contrasts of proportion between the two groups of electors. A result can be for instance that 30% of the electors of candidate A are Rich and Old, whereas only 10% for the candidate B. I leave you find names for candidates A and B in your own country.

In this very simple example, the statistician who is a laborious woman/man can perform manually a complete search of the contrasts. In more realistic case, there are more than two candidates and more than two features for the electors. The statistician will be overwhelmed by the amount of data and him/her statistical reports will be biased with prior beliefs depending of course on its age and its salary.

Therefore computationally mining the interesting differences between pre-defined data groups is a promising Data Mining task. It enables to automate the analyses of the statisticians. The research publication in this field began in 1999.

The below document has been written in 2008. It tries to summarize concepts and algorithms introduced in the main papers on this subject.

Antoine Botrel