Social scene understanding

Scene understanding of human behaviors involves interpreting and analyzing human actions, gestures, and interactions within a given environment. It integrates social signals, visual and contextual cues to make sense of what people are doing, their intentions, and how they interact with one another and the surrounding space. This understanding can include recognizing activities, social dynamics, body language, and spatial relationships.

Conversation group detection with spatio-temporal context

The automatic detection of conversation groups is a compelling challenge with applications in areas such as social surveillance and social robotics . In social environments like cocktail parties or professional networking events, interactions take place across multiple conversation groups that continuously shift in response to the flow of human behaviors shaping these social exchanges. Understanding the interpersonal relationships, such as the spontaneous affinity between individuals, that lead people to gather and form focused interactions could provide valuable insights into the quality and experience of these exchanges. However, due to the complexity and nuances of human social dynamics, which vary depending on the context and environment, the task of automatically detecting conversation groups remains a challenging and ongoing area of research.

Shape Language as an alternative and complementary modality

While audiovisual and wearable sensors provide valuable insights into human behaviors, they fall short in identifying complex and abstract human values such as integrity, respect, honesty, and fairness. Shape Language is a novel communication modality that complements traditional methods in identifying these implicit and abstract concepts. It involves 2D and 3D geometric shapes (e.g., pyramids, spheres, cube, etc.) that individuals interact with directly during face-to-face encounters. By analyzing their choices and interaction patterns with various shapes, colors, and sizes, Shape Language enables identifiable representations of abstract concepts like human norms and values, which passive audiovisual and movement signals cannot achieve. Understanding scenes constructed using shapes, combining with classical audiovisual and movement signals enables a complementary and holistic multimodal framework for perceiving complex value-laden behaviors

Publications

S. Tan, D.M.J. Tax, H. Hung, Conversation group detection with spatio-temporal context, Proceedings of 2022 International Conference on Multimedia (ICMI), 2022, Pages 170–180, Oral Presentation.

TU Delft

Building 28, Van Mourik Broekmanweg 6, 2628 XE Delft

Google Sites

Report abuse