Freehand interactions with augmented and virtual reality are growing in popularity, but they lack reliability and robustness. Implicit behavior from users, such as hand or gaze movements, might provide additional signals to improve the reliability of input. In this paper, the primary goal is to improve the detection of a selection gesture in VR during point-and-click interaction. Thus, we propose and investigate the use of information contained within the hand motion dynamics that precede a selection gesture. We built two models that classified if a user is likely to perform a selection gesture at the current moment in time. We collected data during a pointing-and-selection task from 15 participants and trained two models with different architectures, i.e., a logistic regression classifier was trained using predefined hand motion features and a temporal convolutional network (TCN) classifier was trained using raw hand motion data. Leave-one-subject-out cross-validation PR-AUCs of 0.36 and 0.90 were obtained for each model respectively, demonstrating that the models performed well above chance (=0.13). The TCN model was found to improve the precision of a noisy selection gesture by 11.2% without sacrificing recall performance. An initial analysis of the generalizability of the models demonstrated above-chance performance, suggesting that this approach could be scaled to other interaction tasks in the future.
The task was a VR version of Yahtzee. The participant’s goal was to compete against the computer to collect as many points as possible within a specifc time period (i.e., a three minute block). Each turn started by rolling fve dice and the number of turns depended on how fast the participant played the game.
Feature-based logistic regression model development. (a) The feature exploration pipeline. Hand features were frst extracted from the raw hand motion time series data. To identify a set of predictive features, training samples were generated using a sliding window approach followed by an recursive feature addition (RFA) method. A logistic regression model leveraged this set of selected features. (b) A fowchart of the RFA method. (c) The feature correlation matrix.
The TCN model leave-one-subject-out cross-validation (LOSOCV) performance. The ROC and PR curves are depicted on the left and right panel, respectively. Each colored curve represents each participants’ performance. The thick red curve represents the averaged curve across participants.
To explore the model’s generalizability to other point-and-select task scenarios, the trained model was applied to an existing dataset from a reciprocal pointing task in VR and vice verse.