1st Workshop on
Crossmodal Social Animation

The XS-Anim workshop is organized in conjunction with

ICCV 2021 - International Conference on Computer Vision, Montreal, Canada, October 11th - October 17th, 2021

The workshop will be held virtually


Find the video recording of the workshop here: video


Computer vision has seen an extraordinary level of innovation in the past decade, with the advent of deep neural architectures. This progress was particularly impressive in the domain of generative modeling, such as automatic generation of images. The next frontier for generative computer vision models is to also model human motion, and its close relationship with language and speech. Generating videos is already a big technical challenge, but when it comes to animating human motion, the level of complexity and minutiae is particularly high given our years of experience interacting with other people. This animation generation challenge becomes even more interesting given its multimodal nature: human motion is a communicative channel linked to other modalities such as language and speech. It is a timely opportunity to study this topic of crossmodal social animation. The study of crossmodal social animation will require both data-driven empirical modeling and the integration of social and communication theories. It also requires a multidisciplinary approach, inviting researchers from computer vision, computer graphics, social robotics, virtual reality and human social communication. This workshop is a unique occasion to study crossmodal factors involved in naturalistic and engaging body motion generation.

We are in the middle of a revolution for artificial intelligence where new technologies are becoming more interactive (e.g., virtual assistant such Alexa, Google Assistant, Siri, Cortina). The next generation of these interactive technologies is likely to present some embodiment, such as virtual character or social robot. It is essential to better understand the link between human body motion and other communicative channels such as language and speech. This research has the potential to enable more realistic and engaging social interactions, which are central in sharing knowledge, ideas and important parts of successful collaborations and teamwork. Furthermore, it also encourages the study of effective communication by intelligent tutoring systems in classroom settings, empathy in clinical psychology and tools to aid animation generation. It is also a key building block for forging new relationships through self-expression as well as understanding others' emotions and thoughts.

Invited Speakers

Yaser Sheikh

Facebook Reality Labs Carnegie Mellon University, USA

Maja Matarić

University of Southern California, USA

Stacy Marsella

Northeastern University, USA

Hae Won Park

MIT Media Lab MIT, USA

Richard Bowden

University of Surrey, UK

Workshop Schedule

Date: October 16, 2021

12:00 pm - 12:15 pm Introduction and Opening Remarks (Chaitanya Ahuja)

Invited Speakers - Session 1

12:15 pm - 01:05 pm Yaser Sheikh (video)

Telepresence with codec avatars

01:05 pm - 01:55 pm Richard Bowden (video)

Towards Computational Sign Language Translation

Spotlight Talks (video)

02:00 pm - 02:15 pm Shyam Krishna, Vijay Vignesh P, Dinesh Babu J

SignPose: Sign Language Animation Through 3D Pose Lifting

02:15 pm - 02:30 pm Xiaopeng Lu, Zhen Fan, Yansen Wang, Jean Oh, Carolyn Rosé

Localize, Group, and Select: Boosting Text-VQA by Scene Text Modeling

02:30 pm - 02:45 pm Jonathan Windle, Sarah Taylor, David Greenwood, Iain Matthews

Motion Symmetry in Conversation

02:45 pm - 03:00 pm Jón Helgason, Johann Skulason, Anna Islind, Steinunn Sigurðardóttir, Hannes Vilhjálmsson

Integrating Video with Artificial Gesture

Invited Speakers - Session 2

03:00 pm - 03:50 pm Maja Matarić (video)

Multimodal Human-Robot Interaction: Understanding, Engaging, and Supporting Each User

03:50 pm - 04:40 pm Hae Won Park (video)

Long-term Relational Agents - Design and Impact

04:40 pm - 05:30 pm Stacy Marsella (video)

Gestures: Some thoughts on form, function and aesthetics

*All times in EST

Call for Papers

Topics for submission include (but are not limited to):

(1) Generative animation models

  • Generative Modeling of human body motion

  • Facial motion and facial expression generation and representation

  • Body (including hands, arms, head, eye-gaze) shape and motion representation

  • Multi-party social interactions and generative models

(2) Vision-Language-Speech Grounding

  • Natural language grounding with human body motion

  • Grounding of speech and acoustics signals

  • Co-speech gesture grounding modeling

  • Multimodal and multi-party grounding

(3) Gesture and Animation Styles

  • Style content disentanglement of grounded body motion

  • Style transfer for grounded body motion

(4) Privacy and ethical issues

  • Detecting biases in generated animations

  • Detecting Fake Animations

  • Ramifications of socially adept virtual agents/robots in societies

(5) Data Efficiency and Resources

  • Few-shot generative modeling and domain transfer of animation models

  • Semi-supervised or self-supervised generative animation modeling

  • Body motion corpora, including diverse speakers, styles and topics

  • Semi-automatic corpora annotation tools

(6) Application domains

  • Embodied agents, including robot and virtual humans

  • Social Interaction in Virtual and Augmented Reality

  • Sign-Language generation

  • Locomotion modeling and animation

  • Rhythmic body motion animation (e.g., dance)

Submission Guidelines

The format for paper submission is the same as the ICCV 2021 submission format. Papers that violates anonymity or do not use the ICCV submission template will be rejected without review. Papers will be selected based on relevance, significance and novelty of results, technical merit, and clarity of presentation. In submitting a manuscript to this workshop, the authors acknowledge that no paper substantially similar in content has been submitted to another workshop or conference during the review period.

Main Track*

8 pages (excluding references)

*Accepted papers will appear in the proceedings of ICCV 2021 workshops

Late Breaking Results

4 pages (excluding references)

Important Dates

Main Track

Abstract Submission: July 20, 2021 July 27, 2021

Paper submission: July 24, 2021 July 27, 2021

Notification of acceptance: August 10, 2021

Camera-ready submission: August 16, 2021

Presentation Materials Submission: TBA

*All deadlines are at 23:59 PST

Late Breaking Results

Abstract Submission: September 3, 2021

Paper submission: September 7, 2021

Notification of acceptance: September 27, 2021

Camera-ready submission: October 7, 2021

Presentation Materials Submission: TBA


Chaitanya Ahuja

Carnegie Mellon University, USA

Louis-Philippe Morency

Carnegie Mellon University, USA

Yukiko I. Nakano

Seikei University, Japan

Ryo Ishii

NTT, Japan

Publication Chair

Dong Won Lee

Carnegie Mellon University, USA