Skewed Component Distributions or Transformations? Modelling Skewness in Cluster Analysis.
Paul McNicholas, McMaster University.
19th of March 2026
Abstract:Â
Because of its mathematical tractability, the Gaussian mixture model holds a special place in the model-based clustering literature. For all its benefits, however, the Gaussian mixture model can be ineffective for clustering when there is skewness in one or more clusters. For this reason, approaches have been developed over the years for handling data with skewed clusters. Broadly, there are two types of approaches. The first is to consider a mixture of skewed distributions, and the second is based on incorporating a transformation to near normality within the clustering algorithm. A detailed comparison of these approaches is presented to help determine if and when one method might be more suitable than the other. Results are presented on several benchmark datasets is provided and there is also some discussion about how to assess cluster separation.