Human Action, Pose, and Gait

1. A Belief-Theoretical Approach to Example-Based Pose Estimation

  • Description:

This paper pIn example-based human pose estimation, the configuration of an evolving object is sought given visual evidence, having to rely uniquely on a set of sample images. We assume here that, at each time instant of a training session, a number of feature measurements is extracted from the available images, while ground truth is provided in the form of the true object pose. In this scenario, a sensible approach consists in learning maps from features to poses, using the information provided by the training set. In particular, multivalued mappings linking feature values to set of training poses can be constructed. To this purpose we propose a belief modeling regression (BMR) approach in which a probability measure on any individual feature space maps to a convex set of probabilities on the set of training poses, in a form of a belief function. Given a test image, its feature measurements translate into a collection of belief functions on the set of training poses which, when combined, yield there an entire family of probability distributions. From the latter either a single central pose estimate or a set of extremal ones can be computed, together with a measure of how reliable the estimate is. Contrarily to other competing models, in BMR the sparsity of the training samples can be taken into account to model the level of uncertainty associated with these estimates. We illustrate BMR’s performance in an application to human pose recovery, showing how it outperforms our implementation of both relevant vector machine and Gaussian process regression. Finally, we discuss motivation and advantages of the proposed approach with respect to its most direct competitors.

2. Human Pose Estimation from Monocular Images: A Comprehensive Survey

Figure 2. The Composition of The Review. The survey considers three processing units, and dedicates one section to each. After these three processing units, human poses can be estimated from images. Each directed flow chart denotes the composition of specific types of methods. Rectangle units are motion-based components.

  • Description:

Human pose estimation refers to the estimation of the location of body parts and how they are connected in an image. Human pose estimation from monocular images has wide applications (e.g., image indexing). Several surveys on human pose estimation can be found in the literature, but they focus on a certain category; for example, model-based approaches or human motion analysis, etc. As far as we know, an overall review of this problem domain has yet to be provided. Furthermore, recent advancements based on deep learning have brought novel algorithms for this problem. In this paper, a comprehensive survey of human pose estimation from monocular images is carried out including milestone works and recent advancements. Based on one standard pipeline for the solution of computer vision problems, this survey splits the problem into several modules: feature extraction and description, human body models, and modeling methods. Problem modeling methods are approached based on two means of categorization in this survey. One way to categorize includes top-down and bottom-up methods, and another way includes generative and discriminative methods. Considering the fact that one direct application of human pose estimation is to provide initialization for automatic video surveillance, there are additional sections for motion-related methods in all modules: motion features, motion models, and motion-based methods. Finally, the paper also collects 26 publicly available data sets for validation and provides error measurement methods that are frequently used.

3. Fisher Tensor Decomposition for Unconstrained Gait Recognition

Fig.3. A bird’s eye view of the proposed tensor classification framework

  • Description:

This paper proposes a simplified Tucker decomposition of a tensor model for gait recognition from dense local spatio-temporal (S/T)features extracted from gait video sequences. Unlike silhouettes, local S/T features have displayed state-of-art performances on challenging action recognition test beds, and have the potential to push gait ID towards real-world deployment. We adopt a Fisher representation of S/T features,rearranged as tensors. These tensors still contain redundant information,and are projected onto a lower dimensional space with tensor decomposition.The dimensions of the reduced tensor space can be automatically selected by keeping a proportion of the energy of the original tensor.Gait features can then be extracted from the reduced “core” tensor, and ranked according to how relevant each feature is for classification. We validate our method on the benchmark USF/INIST gait data set, showing performances in line with the best reported results.