Disadvantage of rotationally-invariant classifiers on feature selection

Post date: Mar 29, 2013 4:59:16 AM

There are very nice slides showing that rotationally-invariant classifiers are usually inefficient for feature selection.

http://cseweb.ucsd.edu/~elkan/254spring05/Hammon.pdf

Discussion#1: why l1-norm is usually better than l2-norms at feature selection

  • Theoretical results show that classifiers with rotational invariance property are inferior to those without such properties.
  • Rotational-invariant classifier requires a number of training examples n that is at least linear O(n) in the number of input features. That said, rotationally-invariant learning algorithms would not perform well when n<<m, where the dimensionality m is very high.
  • logistic regression w/o regularization and LR+l2-norm regularization have the rotational invariance property, so they are inferior to LR+l1-norm which does not have such property.

Discussion#2: There are so many classifiers fall into the rotationally-invariant category, see page 26 of the slides. The extended analysis also explains that although the irrelevant feature does not effect the SVM's margin, they change the radius of the data, which harms SVM's performance, on page 30.