PMoGT_TNN13

A Parsimonious Mixture of Gaussian Trees Model for Oversampling in Imbalanced and Multi-Modal Time-Series Classification

Cao, H; Tan, V.Y.F and Pang J.Z.F.

Neural Network and Learning Systems, IEEE Transactions on

Submitted, March, 2013; Accepted, February, 2014

Abstract:

We propose a novel framework of using a parsimonious statistical model, known as mixture of Gaussian trees, for modeling multi-modal minority class time-series data to solve the problem of imbalanced classification. By exploiting the fact that close-by time points are highly correlated, our model significantly reduces the number of covariance parameters to be estimated from O(d2) to O(Ld), where L is the number of mixture components and d is the dimensionality of the data. Thus our model is particularly effective for modelling high-dimensional time-series with limited number of time-series instances in the minority class. In addition, the computational complexity for learning the model is also low—of the order O(Ln+d2) where n+ is the number of positively-labelled samples. We conduct extensive classification experiments based on several well-known time-series datasets (both single- and multi-modal) by first randomly generating synthetic instances from our estimated mixture model to correct the imbalance. We compare our results to several state-of-the-art oversampling techniques and the results demonstrate that when our proposed model is used in oversampling, the same support vector machines classifier achieves much better classification accuracy across the range of data sets. In fact, the proposed method achieves the best average performance 30 times out of 36 multi-modal data sets according to the F-value metric. Our results are also highly competitive compared to non-oversampling-based classifiers for dealing with imbalanced time-series data sets.

Paper Download

-----------------------

The imbalanced datasets used are downloadable here.

The supplementary results of this paper are available in the PDF file here.

Send me an email for requesting mogt MATLAB code.