under construction ...
Pedestrian (Human/People) Detection and Posture Estimation
1. Haar-like feature + Cascaded AdaBoost (OpenCV)
Haar like features: box like, line, edge and center surround features;
Integral image for fast feature extraction in rectangle sub-windows;
Cascaded adaboost for quickly rejecting negative samples;
1-a. HoG feature + Cascaded AdaBoost (OpenCV)
2. HoG + SVM (OpenCV)
HoG introduces invariance
Bias / gain / nonlinear transformations
– bias: gradients / gain: local normalization
– nonlinearity: clamping magnitude, orientations
Small deformations
– spatial subsampling
– local “bag” models
At each pixel
Gradient magnitude:
m = || (Ix, Iy) ||
Gradient orientation:
o = tan-1(Iy / Ix)
Quantize orientation: vote into bin (weighted)
SVM as classifier: Multi-scales.
3. DPM: Pictorial Structure + Latent SVM (OpenCV)
Pictorial Structure: Objects are modeled by a collection of parts in a deformable configuration;
Statistical framework
Prior distribution is defined as a tree-structured Markov random field where no preference is given to the absolute location of each part;
Model parts based on the response of Gaussian derivative filters of different orders, orientations and scales;
Connections modeled by springs between parts: Gaussian Distribution.
Best match is found by minimizing function that measures both individual match costs and connection costs;
How each part matches at its location which agree with the deformable model;
Matching a pictorial structure does not involve making any decisions about location of individual parts;
It is solved independently and implies that any kind of part model can be used as long as maximum likelihood can be computed for an individual part;
Latent SVM Model training: A discriminative model with latent variable
The learned positions of object-parts and the position of the whole object are the Latent Variables;
Training data consists of images with labeled bounding boxes;
Need to learn the model structure, filters and deformation costs;
Detection: 8x8 blocks, HOG feature at different resolution;
Root filter: rectangular templates defining weights for features
Learn root filter by standard SVM;
Part filter: Multi-scale model captures features;
Deformation model: matching with pictorial structures.
4. Integral Channel Features + Soft cascades (decision trees) + Feature Pyramid
Integral Channel Features (ICF):
Multiple registered image channels are computed using image linear/non-linear transformations;
Features such as local sums, histograms, Haar features and their various generalizations are computed using integral images;
ICF naturally integrate heterogeneous sources of information, have few parameters, and result in fast, accurate detectors.
Boosted classifiers: soft cascades
Two level decision trees.
5. Aggregate Channel Features + Soft cascades (decision trees) + Feature Pyramid
ACF: Aggregate Channel Feature;
Compute image’s several channels and sum every block of pixels, smooth the resulting LR channels;
Features are single pixel lookups in the aggregated channels;
Boosting is used to learn decision trees over these features (pixels) to distinguish object from background;
A multiscale sliding window is applied;
With the appropriate choice of channels to design, ACF achieves SoA performance in pedestrian detection;
Normalized gradient magnitude, histogram of oriented gradients and LUV color;
Compute feature pyramid at octave-spaced scale intervals;
Adaboost for training, combining 2048 depth-two trees over 5120 candidate features in each search 128x64 window.
6. Pose Estimation with Flexible Mixture-of-Parts (Matlab)
Human pose estimation is challenging:
People clothing, strong articulation, partial occlusion, truncation at scene border;
sports, front facing, interaction with objects, up-right standing.
Pictorial structure models is the pioneering work as the base for various methods;
Poselet provides a new data-driven feature to be learned, used to infer the body posture;
Methods of full body posture estimation:
Flexible mixture of parts [Yang & Ramanan, 2011];
Pictorial structure with spatial and appearance modeling [Pishschulin et al., 2013];
Methods of upper body posture estimation:
MODEC (multimodal decomposable model) [Sapp & Taskar, 2013];
Poselet-based armlet [Gkioxari et al., 2013].
Flexible mixture of parts as human body model;
Define a representation of deformable part models with a mixture of small, non-oriented parts;
Jointly capture spatial relations between part locations and co-occurrence relations between part mixtures;
Extend the pictorial structure model by including the appearance modeling;
Learn the spatial and appearance model by structured SVM.
References
1. R. Lienhart and J. Maydt. An extended set of Haar-like features for rapid object detection. IEEE ICIP, 2002.
2. N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, CVPR, 2005.
3. P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detection with Discriminatively Trained Part Based Models, IEEE T-PAMI, Sep. 2010.
4. P. Doll´ar, Z. Tu, P. Perona, and S. Belongie, Integral channel features, BMVC, 2009.
5. P. Doll´ar, R Appel, S Belongie, and P Perona, Fast Feature Pyramids for Object Detection, IEEE T-PAMI, July, 2014.
6. R. Benenson, M. Mathias, R. Timofte, L. Van Gool. Pedestrian detection at 100 frames per second, IEEE CVPR 2012.
7. P Dollár, C. Wojek, B. Schiele and P. Perona. Pedestrian Detection: An Evaluation of the State of the Art. IEEE T-PAMI, Feb., 2012.
8. Y. Yang, D. Ramanan. Articulated Pose Estimation using Flexible Mixtures of Parts. IEEE CVPR 2011.
9. M Andriluka, L Pishschulin, P Gehler, B Schiele, 2D human pose estimation: new benchmark and state-of-art analysis, IEEE, CVPR, 2014.
Appendix: DPM for Generic Object Detection
Cats, Dogs, Horses, Cars, Motors, Bikes.