We worked out
an approach based on combined matrix reduction techniques and clustering that aims at finding homogenous
groups (clusters) of observations (cases) within a dataset that contain the highest proportion of identified « rare » cases. I refined and coded the all approach in R (with a shiny
application) in order to easily and quickly reproduce it on any kind of
data.
Unlike predictive methods that purposely look for a relationship between the inputs and a chosen target to produce parameter estimates and model this relationship, the method proposed here is unsupervised and exposes the data to many clustering model trials, maximizing the chance that a cluster is found that regroups a significant number of these "rare" cases. We let the data « speak » and « reveal » the structure of the relationships within the data that will form natural groups. We then check any potential interesting association by exposing the « rare » events to the clustering model by cross-tabulating these rare events with the groups in the cluster model.