Dynamic Mode Decomposition of Blood Volume Pulse Signals for Heart Rate Estimation
K. Kurihara, Y. Maeda, D. Sugimura and T. Hamamoto, IEEE Access, IEEE, 2023. [paper]
K. Kurihara, Y. Maeda, D. Sugimura and T. Hamamoto, IEEE VCIP, IEEE, 2022. [paper]
Abstract
This article proposes a novel blood volume pulse (BVP) signal extraction method for heart rate (HR) estimation that incorporates medical knowledge of the spatio-temporal BVP dynamics. Previous methods merely exploited the spatial similarity of BVPs observed from multiple facial patches and performed the low-rank approximation to extract BVP signals. If noise components are superimposed over the entire face, the previous methods have difficulty distinguishing between the BVP component and noise even in the low-rank subspace. The main novelty of the proposed method is the exploitation of the BVP characteristics in the spatial and temporal domains in a unified manner based on a dynamic mode decomposition (DMD) framework, which is used to extract spatio-temporal structures from multidimensional time-series signals. To analyze the BVP dynamics that exhibit nonlinearity and quasi-periodicity, physics-informed DMD was performed on the time-series signals extracted from facial patches in a time-delay coordinate system. This approach enables the estimation of the DMD modes, which effectively represent the spatio-temporal structures of the BVP dynamics. The other novelty of the proposed method is the incorporation of medical knowledge of the HR frequency band to select the optimal DMD mode. By incorporating this medical knowledge of HR into the proposed framework, the proposed method can accurately estimate the BVP signal and HR. The experimental results obtained using three publicly available datasets yielded a root-mean-square error of the HR estimation results of 6.37 bpm, a 36.5 % improvement over the state-of-the-art methods.
Group-Level Emotion Recognition
K. Fujii, D. Sugimura and T. Hamamoto, IEEE TMM, IEEE, 2020. [paper]
K. Fujii, D. Sugimura and T. Hamamoto, IEEE FG, IEEE, 2019. [paper]
Abstract
Group-level emotion recognition is a technique for estimating the emotion of a group of people. In this paper, we propose a novel method for group-level emotion recognition. Our method lies in the two-fold contributions: (1) recognition of group-level emotion using a hierarchical classification approach; (2) incorporation of novel features to contribute to the description of the group-level emotion. We consider that the use of facial expressions of people will only be effective in differentiating images labeled as “Positive” because those labeled as “Neutral” or “Negative” are likely to include similar facial expressions. Therefore, we first perform binary classification based on facial expression recognition to distinguish “Positive” labels that include discriminative facial expressions (e.g., smile) from the others. We evaluate outcomes that are not classified as “Positive” during the first classification by exploiting scene features that describe what type of events (e.g., demonstration or funeral) are shown in the image. The other novelty of our method lies in two-fold. The first is the exploitation of visual attention for the first classification. It allows us to estimate which faces are the main subjects in the target image, thereby suppressing the influences of faces in the background that contribute less to group-level emotion. The second is the exploitation of object-wise semantic information (labels) for the second classification. This allows a more detailed description of the scene context in the image and enables performance enhancement in the second classification. We demonstrate the effectiveness of our method through experiments using public datasets.
Point-Cloud 3D Object Detection
T. Yamazaki, D. Sugimura and T. Hamamoto, IEEE TCSVT, IEEE, 2019. [paper]
T. Yamazaki, D. Sugimura and T. Hamamoto, IEEE ICASSP, IEEE, 2018. [paper]
Abstract
Three-dimensional (3D) object detection in point clouds is an important technique for various high-level computer vision tasks. In this study, we propose a method for point-wise detection of regions of objects in a scene. We regard the 3D object detection problem as a series of optimal matching problems between object and scene images, which are obtained by projecting point clouds into multiple viewpoints. The main novelty of this study is treating the 3D object detection problem as the determination of optimal correspondence among image sets. Unlike the existing methods that directly employ individual correspondences between projected image pairs, the simultaneous matching of projected image sets allows the evaluation of the appearance consistency of the target object in multi-viewpoint scene images. The other novelty of the proposed method is using principal component analysis to estimate effective image-projection directions for object point clouds. By projecting object point clouds in directions orthogonal to the first principal component basis, the projected images can include plenty of point clouds information, thus providing highly discriminative features for image matching. We back-project reliable matching results retrieved from the image-set correspondence into 3D space to achieve point-wise object detection. Experiments using public datasets demonstrate the effectiveness and performance of the proposed method.
Online Background Subtraction with Freely Moving Cameras
D. Sugimura, F. Teshima and T. Hamamoto, IMAVIS, Elsevier, 2018. [paper]
Abstract
We propose a method for online background subtraction from a successive-frame video captured using a freely moving camera. Our method exploits a technique of interactive image segmentation with seeds (the subsets of pixels marked as “foreground” and “background”). The key novelty of our method is to automatically estimate the seeds by exploiting two different motion boundaries that are respectively computed using the magnitude and direction of the flow field. The magnitude of flow field is likely to be useful in differentiating the foreground and background motions when the moving objects and the camera make a movement towards the same direction. In contrast, the direction of flow field helps in discriminating the observed motions when the amount of displacement of the moving objects and the camera is the same. By adaptively exploiting the advantages of these different motion boundaries, our method enables to estimate the reliable foreground/background seeds. With the estimated seeds, our method performs accurate background subtraction even when the complex camera movements (e.g., large pan-tilt-zoom, rotation) are made. Our experiments demonstrate the effectiveness of our method using public dataset and other real image sequences.
Pedestrian Detection from Aerial Images
D. Sugimura, T. Fujimura and T. Hamamoto, IJPRAI, World Scientific, 2016. [paper]
Abstract
We propose a method for pedestrian detection from aerial images captured by unmanned aerial vehicles (UAVs). Aerial images are captured at considerably low resolution, and they are often subject to heavy noise and blur as a result of atmospheric influences. Furthermore, significant changes to the appearance of pedestrians frequently occur because of UAV motion. In order to address these crucial problems, we propose a cascading classifier that concatenates a pre-trained classifier and an online learning-based classifier. We construct the first classifier using deep belief network (DBN) with an extended input layer. Unlike previous approaches that use raw images as the input layer of the DBN, we exploit multi-scale histogram of oriented gradients (MS-HOG) features. The MS-HOG enables us to supply better and richer information than low-resolution aerial images for constructing a reliable deep structure of DBN, because the dimensions of the input features can be expanded. Furthermore, the MS-HOG effectively extracts the necessary edge information while reducing trivial gradients and noise. The second classifier is based on online learning, and it uses predictions of the target appearance using UAV motions. Predicting the target appearance enables us to collect reliable training samples for the classifier’s online learning process. Experiments using aerial videos demonstrate the effectiveness of the proposed method.
Gaussian Process Regression for Detection of Flaws in Golf Swing
D. Sugimura, H. Tsutsui and T. Hamamoto, MVAP, Springer, 2016. [paper]
Abstract
We propose a golf swing instruction system for detecting important flaws to facilitate the improvement of a user’s golf swing. Golf players generally differ greatly in terms of their body size and flexibility; these individual differences make it difficult to identify the underlying characteristics of a good swing. In this study, we exploit common movements made by professional players to establish golf swing instruction for diverse users. The common movements of professionals are likely to be similar without dependence on their individual differences because being important for performing professional golf swing. This suggests that the common movements of professionals are helpful components for achieving appropriate golf swing instructions for diverse users. We construct an ideal posture estimator by aggregating the movements of professionals. In our ideal posture estimator, we use a Gaussian process regression to infer the parts of the golf swing that characterize the common or individual movements. Using the estimation results inferred by our ideal posture estimator, we estimate the important joints to improve the golf swing of each user. Our experiments demonstrate that the use of the common movements made by professionals significantly improves the detection of flaws in the swing of individual users.
Tracking People in Crowds using Gait Features
D. Sugimura, K. Kitani, T. Okabe, Y. Sato and A. Sugimoto, IEEE ICCV, IEEE, 2009. [paper]
Abstract
In this work, we propose a method for tracking individuals in crowds. Our method is based on a trajectory-based clustering approach that groups trajectories of image features that belong to the same person. The key novelty of our method is to make use of a person's individuality, that is, the gait features and the temporal consistency of local appearance to track each individual in a crowd. Gait features in the frequency domain have been shown to be an effective biometric cue in discriminating between individuals, and our method uses such features for tracking people in crowds for the first time. Unlike existing trajectory-based tracking methods, our method evaluates the dissimilarity of trajectories with respect to a group of three adjacent trajectories. In this way, we incorporate the temporal consistency of local patch appearance to differentiate trajectories of multiple people moving in close proximity. Our experiments show that the use of gait features and the temporal consistency of local appearance contributes to significant performance improvement in tracking people in crowded scenes.