Multimodal context-aware estimation of social signals

A key aspect of situated social intelligence is the ability to model and predict human behaviors, e.g., head and body poses, gaze and attention, and conversation groups. This is achieved by computationally integrating diverse data streams (after sensing) to create a holistic understanding of the environment and the social context. Context can be categorized along multiple dimensions such as spatial, temporal, conversational, social role, prior relationships, etc. -- all of which could be important for social signal estimation and understanding. Consequently, any modeling approach should account for the context of interactions, which can be modeled through appropriate choice and design of features, modalities, and algorithms.

Model accounts for interaction partners' behaviors

Multimodal head orientation via proxemics and dynamics

Head orientation estimation is important as it indicates social attention. Most current methods depend on visual data and deep learning, but they overlook social contexts in crowded, unstructured environments. We demonstrate that alternative inputs, such as speaking status, body location, orientation, and movement, improve head orientation estimation, particularly in settings with visual occlusions or space limitations. We propose a method that accounts for group dynamics and predicts head orientations for all group members. Our model outperforms baselines that ignore group context and generalizes to unseen social event data.

Matrix-completion model for joint head and body orientation estimation

Head and body orientation estimation with sparse weak labels

We aim to estimate human head and body orientations, key social cues in free-standing conversations. Automatically estimating these orientations supports research on conversation dynamics like involvement and influence. However, collecting and annotating large-scale interaction data is challenging and costly. Our approach reduces the need for extensive training labels by framing the task as a transductive low-rank matrix-completion problem with sparse labels. Unlike traditional supervised deep learning, our method thrives in low-data environments by leveraging social context, diverse information sources, and physical priors.

Publications

S. Tan, D.M.J. Tax, H. Hung, Multimodal joint head orientation estimation in interacting groups via proxemics and interaction dynamics, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), 2021, Vol.5, No.1, 1-22.
S. Tan, D.M.J. Tax, H. Hung, Head and body orientation estimation with sparse weak labels in free standing conversational settings, Proceedings of Machine Learning Research, 2022, 179-203.

TU Delft

Building 28, Van Mourik Broekmanweg 6, 2628 XE Delft

Google Sites

Report abuse