In this article we show how to retrieve a set of good features via logistic regression. Logistic regression is a linear classifier whose parameters are weights, usually in terms of weight vector w, and lambda, the regularization parameter. After training logistic regression, w is estimated, and we show that the value of each weight represent how important that weight is to classification of the train set. Here we compare the weights and the mutual information (mi) at each feature. The following is our experiment:
We generate our own data set consisting of 3 classes, each of which contains 100 examples, generated by adding a certain noise level to the class template.
The template of the 3 classes, each pixel's value is 0 or 1, are shown below:
An example for each class is generated by adding some noise to the corresponding template. The examples for each class with zero-mean Gaussian noise with standard deviation = 0.5, 2 are shown below:
noise level = 0.5
noise level = 2
Notice that when standard deviation = 2, the SNR is very low and, in fact, the noise level is even higher than the signal level.
We train linear-kernel logistic regression using lambda = 0.1. The dataset is separated into train set and test set randomly by the ratio of 6:4. Each feature is z-scored before applying the classifier in order to help with the convergence speed. After having the logistic regression trained, the weight vector is used for predicting the class label for each example in the test set
Using the weight vector obtained from the training step, the accuracies for the two noise-level settings are 100% and 80% for noise level = 0.5 and 2 respectively.
In order to compare with the logistic regression weight for each feature, we calculate the mutual information (mi) for each feature independently as well.
noise level = 0.5
noise level = 2
In order to select a set of good feature, we can simply pick a weight threshold and keep only the features whose values exceed the set threshold. We first plot the cumulative distribution of the weight value of the weights of class 1.
Observe the abrupt change from plateau on the right end of the cumulative curve. The change in weight implies the group of weights standout from the rest, which also means the correspond area might be good features. So, we pick threshold = -8.5, and we get the region below from class1 weight map.
Note that the selected region with threshold >= -8.5 does not include the small rectangles despite they also contribute to the classification boundary. There are 2 things worthy telling here: 1) the big top-right rectangular is sufficient to distinguish class 1 from the rest, and so the small rectangles receive small weights, which is not picked up from the histogram. 2) Eye-ball picking threshold might not be sufficiently sensitive to pick up those small weights. Now, we decrease the threshold to -10 in the hope that the small rectangles will be retrieved.
When we decrease the threshold to -11, we totally recover the small rectangles, but the noisy features are included as well.
MATLAB code is made available here. The code will need the the following packages: 1) logistic regression and 2) miscellaneous package, which can be downloaded from here. Note that this approach can be used with other linear classifier like SVM too.