Nonlinear Rescaling Method for Machine Learning and Data Mining
Abstract: We apply the nonlinear rescaling (NR) method to machine learning and data mining problems.
First, we construct the linear support vector machine (SVM) . The formulation of the linear SVM based on the
NR method leads to an algorithm which reduces the number of support vectors without compromising the classification performance compared to the linear soft-margin SVM formulation. The NR algorithm computes both the primal and the dual approximation at each step. The dual variables associated with the given data-set provide important information about each data point and play the key role in selecting the set of support vectors. Experimental results on ten benchmark classification problems show that the NR formulation is feasible. The quality of discrimination, in most instances, is comparable to the linear soft-margin SVM while the number of support vectors in several instances were substantially reduced.
Second, a novel machine learning algorithm to identify relevant objects from a large amount of data is proposed. This ap-
proach is driven by the linear SVM based on NR method and transductive inference. The linear SVM computes both the primal and the dual approximation at each step. The dual variables associated with the given labeled data-set provide important information about the objects in the data-set and play the key role in ordering these objects. A confidence score based on a transductive inference procedure using the linear SVM is used to rank and identify the relevant objects from a pool of unlabeled data. Experimental results on an unbalanced protein data-set for the drug target prioritization and identification problem are used to illustrate the feasibility of the proposed identification algorithm.
Patent:
R. Polyak, S.-S. Ho, and I. Griva, Classification Tool, US Patent No. 7,840,505, Nov. 23, 2010.
References:
S.-S. Ho and R. Polyak, Confident Identification of Relevant Objects Based on Nonlinear Rescaling Method and Transductive Inference, Proc. 7th Int. Conf. on Data Mining (ICDM 2007), Omaha, NE, Oct. 28 - 31, 2007.
R. Polyak, S.-S. Ho and I. Griva, Support Vector Machine via Nonlinear Rescaling Method, in Optimization Letters, vol. 1, no. 4, 2007, 367-378.