Action Recognition

Action Recognition Using Hierarchical Hidden Markov Models

Although there are many classification approaches in IMU-based human action recognition, they are in general not explicitly designed to consider the particular nature of temporality of human actions. Techniques like Hidden Markov Models (HMMs) have shown promising performance on this task, due to their ability to model the dynamics of such activities. In this work, we proposed a novel classification technique for human activity recognition. Our technique involves the use of hidden Markov models (HMMs) to characterize and model the sequential nature of the timeseries samples and the subsequent classification based on a dissimilarity between HMMs generated from unseen samples and previously-generated HMMs from training/template samples. So, this can be considered as a variant of nearest neighbor method. We apply our method to two publicly-available action recognition datasets and also compare it against an extant approach utilizing feature extraction and another technique utilizing a deep Long Short-Term Memory (LSTM) classifier. Our experimental results indicate that our proposed method outperforms both of these baselines in terms of several standard metrics.

Framework of the proposed model. AM represents training signals of activities M. TSn represents testing signals for activity n. While N is the number of hidden Markov models in training stage and T is the number of hidden Markov models in testing phase which are corresponding to TSn.

This research is therefore focused on the design of a new method for the recognition and monitoring of detailed human activities, using a combination of multiple HMMs and dissimilarity measures. Such research requires the classification of different detailed behaviors. It is our belief that a more accurate view of a subject’s health and lifestyle may be provided through more accurate/precise human activity monitoring. The contributions of this work are as follows:

Proposing a composite recognition model called multiple HAR-HMM, comprising of individual HMM models per sensor axis per activity. In contrast to previous works, single HMM models are built for each activity to be recognized. Then for a given sample, it calculates the probability of the sample originating from each activity model and chooses the activity with the largest probability as the recognition result.
Proposing a novel dissimilarity measure for testing a new timeseries inertial sample against the prototypical HMMs obtained during the training process to represent given activities.
We provided detailed experimental tests and analyses on the performance of the proposed multiple HAR-HMM model. The results suggest that the proposed technique is more reliable in precision, recall, and F-metric compared with the recent study on the same datasets.

The developed dissimilarity measure is based on the application of the likelihood function of the generative ability of the HMMs in conjunction with the KL-divergence.

One of the advantages of this approach is that adding new actions/activities can be done in an incremental straightforward way by modeling the inertial samples of the new actions using HMMs and producing temples for this new activity to use during test time. So, the training process is done in a disentangled way across the different actions.

We adopt the following configuration for all tests performed:

In each test we performed a total of 5 experiments. Results are averaged over those 5 experiments.
For each activity, the number of samples taken as the prototypes (templates) is 66% of the total number of samples. The remaining 34% are used for testing.
For each experiment we produced the following performance metrics:
- The confusion matrix for all the 14 activities.
- The overall accuracy and its 95% confidence interval.
- The sensitivity and specificity for each activity.
- The average sensitivity, average specificity, average precision, and average F-score.

Samples of the results are as follows.

The table on the right shows the accuracy of the method when used with different number of HMM hidden states on the E-JUST-ADL-1 dataset. It seems that binary encoding (just two hidden states) is just enough for differentiating amongst the different actions in this dataset. (EJUST-ADL-1 has 14 actions performed by only 3 subjects).

The table on the right shows the effect of using different combination of inertial modalities on the accuracy of action classification (tested over the EJUST-ADL-1 dataset). It is apparent that single motion modality is fair enough to give the best accuracy (with respect to the actions given in the target dataset).

The table on the right gives a comparison between the proposed HMM-based method, LSTM using raw data, and RF (random forest) using two different public datasets.

In this work, we presented an improved HMM-based technique for human activity recognition based on IMU-streamed data. We evaluated our technique on two publicly-available activity recognition datasets and also compared it against two baseline methods: one based on traditional feature extraction and the other based on a deep LSTM-based technique using raw data. The experimental results yielded indicate that the proposed method is effective for the stated task, as it outperforms both baseline methods in terms of several metrics eg., accuracy, sensitivity, specificity, precision, and F-Measure. A potential drawback of the proposed method is its computational complexity as it requires the training and retention of a large number of HMM models. This weakness can be overcome using parallelization methods such as GPU-based acceleration. In the future, we intend to investigate the use of multivariate HMMs to cater for single modalities rather than individual HMMs per individual axes’ per modality, as well as multivariate HMMs for all the axes simultaneously. Furthermore, we intend to investigate the effect of different HMM (dis)similarity measures on the performance of the proposed method.

References

Sara Ashry., Walid Gomaa., Mubarak G. Abdu-Aguye, and Nahla El-borae. Improved imu-based human activity recognition using hierarchical hmm dissimilarity. In Proceedings of the 17th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,, pages 702–709. INSTICC, SciTePress, 2020.