Human Activity Recognition

Activity Recognition on Border Secured Data

Multi-label, Multi-class Human Activity Recognition on Full-Frame Video

Introduction

The project focuses on predicting human activity from untrimmed videos. There can be single or multiple instances of one or several types of actions in one-clip. The problem has been explored by predicting activity localization tubes and then classify features correspond to that predicted activity regions. Imbalance distribution of actions over clips impact greatly in this case. So, to handle immense class imbalance in the dataset, weighted binary cross-entropy loss focusing separately on positive and negative samples, has been implemented to leverage the outcome.

The Project was accomplished during my works as a Research Assistant in Center for Research in Computer Vision (CRCV) at the University of Central Florida. The project was funded by Elbit Systems of America. The project closely followed what Rizve et al. executed in [1] for real-time object detection on untrimmed videos from MEVA dataset. As the dataset from Elbit is not public, the project detail is outlined here on the public dataset, MEVA.

System Flow

Feature Pooling Using predicted Localization Tubes

Localization tubes are predicted by passing a video clip through encoder-decoder network. 16-frame clip has been used as input. Using predicted tubes, features have been extracted from the encoder. These extracted features have been classified further by a classifier. The output is a probability vector on the number of activity classes.

Loss-modification: Using Weighted BCE with Different Weights for Positive and Negative Samples

The weighted binary cross-entropy loss has been implemented to focus more for classes with small number of samples. Weights for each class have been selected based on the data distribution of respective classes.

Here N_T is the total number of sample and N_P and N_N is the number of positive and negative sample for each respective class. As we can see, the more the sample, the less the weight becomes for adjusting loss of that class.

With this weighted BCE, we achieved further performance improvement.

References

M. N. Rizve et al., "Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos," 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 4237-4244, doi: 10.1109/ICPR48806.2021.9412791.

Questions?

Email mdmahfuzalhasan@ufl.edu for any question on the project

Page updated

Google Sites

Report abuse