Program


The HANDS workshop will be held on Monday afternoon (13:30-17:30, Paris time), October 2 2023, W5, Paris Convention Center, France. 

The program outline and the recordings are shown as below:

13:30 - 13:40

Angela Yao

Opening Remarks

13:40 - 14:10

Invited Talk: He Wang

Title: Learning universal dexterous grasping policy from 3D point cloud observations

Abstract: Dexterous hand grasping is an essential research problem for vision, graphics, and robotics communities. In this talk, I would first cover our recent work, DexGraspNet, on synthesizing million-scale diverse dexterous hand grasping data, which won ICRA 2023 outstanding manipulation paper award finalist. Based on this data, our CVPR 2023 work, UniDexGrasp, learns a generalizable point cloud based dexterous grasping policy that can generalize across thousands of objects. We further extend this work to UniDexGrasp++, accepted as an ICCV oral, that proposes a general framework that greatly enhances the success rate to more than 80%.


14:10 - 14:40

Invited Talk: Gül Varol

Title: Automatic annotation of open-vocabulary sign language videos

Abstract: Research on sign language technologies has suffered from the lack of data to train machine learning models. This talk will describe our recent efforts on scalable approaches to automatically annotate continuous sign language videos with the goal of building a large-scale dataset. In particular, we leverage weakly-aligned subtitles from sign interpreted broadcast footage. These subtitles provide us candidate keywords to search and localise individual signs. To this end, we develop several sign spotting techniques: (i) using mouthing cues at the lip region, (ii) looking up videos from sign language dictionaries, and (iii) exploring the sign localisation that emerges from the attention mechanism of a sequence prediction model. We further tackle the subtitle alignment problem to improve their synchronization with signing. With these methods, we build the BBC-Oxford British Sign Language Dataset (BOBSL), continuous signing videos of more than a thousand hours, containing millions of sign instance annotations from a large vocabulary. These annotations allow us to train large-vocabulary continuous sign language recognition (transcription of each sign), as well as subtitle-video retrieval, which we hope will open up new possibilities towards addressing the currently unsolved problem of sign language translation in the wild.

14:40 - 15:10

Invited Talk: Gyeongsik Moon

Title: Towards 3D Interacting Hands Recovery in the Wild

Abstract: Understanding interactions between two hands is critical for analyzing various hand-driven social signals and the manipulation of objects using both hands. Recently introduced large-scale InterHand2.6M dataset enabled learning-based approaches to recover 3D interacting hands from a single image. Despite the significant improvements, most methods have focused on recovering 3D interacting hands mainly from images of InterHand2.6M, which have very different image appearances compared to those of in-the-wild images as it was captured in a constraint studio. For the 3D interacting hands recovery in the wild, this talk will introduce two recent works: one for the algorithmic approach and the other for the dataset approach where each is accepted by CVPR 2023 and NeurIPS 2023. For the algorithmic approach, we introduce InterWild, a 3D interacting hands recovery system that brings inputs from in-the-lab and in-the-wild datasets to a shared domain to reduce the domain gap between them. For the dataset approach, we introduce our new dataset, Re:InterHand, which consists of accurately tracked 3D geometry of interacting hands and rendered images with a pre-trained state-of-the-art relighting network. As the images are rendered with lighting from high-resolution environment maps, our Re:InterHand dataset provides images with highly diverse and realistic appearances. As a result, 3D interacting hands recovery systems trained on Re:InterHand achieve better generalizability to in-the-wild images than simply training it on in-the-lab datasets.

15:10 - 16:10

Coffee break time & Poster

16:10 - 16:40

Invited Talk: David Fouhey

Title: From Hands In Action to Possibilities of Interaction

Abstract: In this talk, I'll show some recent work from our research group spanning the gamut from understanding hands in action to imagining possibilities for interaction. In the first part, I'll focus on a new system and dataset for obtaining a deeper basic understanding of hands and in-contact objects, including tool use. The second part looks forward towards the future and will show a new system that aims to provide information at potential interaction sites.

16:40 - 17:10

Invited Talk: Lixin Yang

Title: Paving the way for further understanding in human interactions with objects in task completion: the OakInk and OakInk2 datasets

Abstract: Researching how humans accomplish daily tasks through object manipulation presents a long-standing challenge. Recognizing object affordances and learning human interactions with these affordances offers a potential solution. In 2022, to facilitate data-driven learning methodologies, we proposed OakInk, a substantial knowledge repository consisting of two wings: 'Oak' for object affordances and 'Ink' for intention-oriented, affordance-aware interactions.This talk will introduce our work in 2023: we expanded the OakInk methodology, giving rise to OakInk2 - a comprehensive dataset encompassing embodied hand-object interactions during complex, long-horizon task completion. OakInk2 incorporates demonstrations of 'Primitive Tasks', defined as minimal interactions necessary for fulfilling object affordance attributes, and 'Combined Tasks', which merge Primitive Tasks with specific dependencies. Both OakInk and OakInk2 capture multi-view image streams, provide detailed pose annotations for embodied hands and diverse interacting objects, and scrutinize dependencies between Primitive Task completion and underlying object affordance fulfillment. With all these knowledge incoporated, we show that OakInk and OakInk2 will provide strong support for a variety of tasks including hand-object reconstruction, motion synthesis, and the planning, imitation, and manipulation within the scope of embodied AI.

17:10 - 17:17

Report: Aditya Prakash

Title: Reducing Scale Ambiguity due to Data Augmentation

17:17 - 17:24

Report: Karim Abou Zeid

Title: Joint Transformer

17:24 - 17:31

Report: Zhishan Zhou

Title: A Concise Pipeline for Egocentric Hand Pose Reconstruction

17:31 - 17:31

Closing Remarks