In this part I will relate the current work in additive classifier learning in the computer vision community to work in the 90's in the statistics community. Additive kernel SVMs lead to a class of functions called Generalized Additive Models (GAMs), which were proposed in the 90's by Hastie and Tibshirani. We consider an optimization framework popular for learning such models that is based on regularized empirical loss minimization where the regularization prefers smooth functions. We propose representations of the classifier and regularization for which the optimization problem can be efficiently solved. The framework allows one to learn approximate additive classifiers directly, without having to approximate the kernel first. The first class of representations are based on splines and are related to a "penalized-spline" learning framework of Eilers & Marx '02, and are inspired by our earlier work on approximating learned additive classifiers with splines. An attractive property of this framework is that for a certain choice of regularization and spline basis, the optimization closely approximates the learning problem of an intersection kernel SVM. The framework also allows us to consider other representations of the functions such as a generalized Fourier expansion. We identify a class of orthogonal basis functions with orthogonal derivatives, which are particularly well suited for carrying out this optimization. I'll also discuss tradeoffs between training time, test time, memory overhead and accuracy of these representations.
- Summary of the tutorial: Three main points
- Additive kernels are widely used in computer vision
Learning additive classifiers directly aka. generalized additive models
- Additive kernel SVMs can be efficiently evaluated
- Additive kernel SVMs can be efficiently trained
- An optimization framework (regularized empirical loss)
- Search for efficient representations of the function and regularization
- Representation and regularization
- Linearization and visualizing the implicit kernel
- Efficiently solving the optimization
- Computational tradeoffs
Experiments, Conclusions, Software, References
- Regularization -- penalty on derivatives
- Practical basis - orthogonal basis with orthogonal derivatives
A PDF copy of the slides is included at the end of this page.
- Fast additive kernel SVM evaluation code
- LIBSPLINE - direct learning of additive classifiers based on splines and Fourier embeddings (fast dual coordinate descent algorithm)
- PWLSGD - primal stochastic gradient descent method for learning piecewise linear classifiers (approximate IKSVMs)
- S. Maji and A. C. Berg. Max-margin additive classifiers for detection. In Proc. ICCV, 2009.
- S. Maji, A. C. Berg, and J. Malik. Classification using intersection kernel SVMs is efficient. In Proc. CVPR, 2008.
- S. Maji, Smooth Linearized Additive Classifiers, ECCV Workshop on Web-scale vision and social media, 2012.
- Eilers, P., Marx, B., Generalized linear additive smooth structures. Journal of Computational and Graphical Statistics 11(4), 758–783 (2002)
- F. Perronnin, J. Sánchez, and Y. Liu. Large-scale image categorization with explicit data embedding. In Proc. CVPR, 2010.
- A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. PAMI, 34(3), 2012.