Category-level Garments Pose Tracking via Integrated 2D Deformation and 3D

Reconstruction

Under Review

Anonymous authors

Codes and Dataset (Anonymous Github)

Garments are common in daily life and are important for embodied intelligence community. Current category-level garments pose tracking works focus on predicting point-wise canonical correspondence and learning a shape deformation in point cloud sequences. In this paper, motivated by the 2D warping space and shape prior, we propose GaPT-DAR a novel category-level Garments Pose Tracking framework with integrated 2D Deformation And 3D Reconstruction function, which fully utilize 3D-2D projection and 2D-3D reconstruction to transform the 3D point-wise learning into 2D warping deformation learning. Specifically, GaPT-DAR firstly builds a Voting-based Project module that learns the optimal 3D-2D projection plane for maintaining the maximum orthogonal entropy during point projection. Next, a Garments Deformation module is designed in 2D space to explicitly model the garments warping procedure with deformation parameters. Finally, we build a Depth Reconstruction module to recover the 2D images into 3D warp field. We provide extensive experiments on VR-Folding dataset to evaluate our GaPT-DAR and the results show obvious improvements on most of the metrics compared to state-of-the-arts (i.e., GarmentNets and GarmentTracking).

Illustration of Our Method. Category-level Garments Pose Tracking framework with integrated 2D Deformation And 3D Reconstruction (GaPT-DAR). Given the partial point clouds from adjacent frames and pose prediction result from the previous frame, we propose a garment pose tracking pipeline consisting of 3D-2D projection, 2D deformation learning, and 2D-3D reconstruction steps. The output is the tracked pose (complete point cloud) of garments in task space.

Framework

The Pipeline of Our GaPT-DAR Framework. Formally, taking partial observation P_{k} in k-th frame and P_{k+1} in (k+1)-th frame as input, using PC NOCS P^{*}_{k} in k-th frame and mesh NOCS P'_{k+1} in (k+1)-th frame as auxiliary information, we output the complete (k+1)-th frame observation (mesh Task) in task space. GaPT-DAR consists of the following components: (a) Inter-frame Feature Fusion. from input used for the downstream network. (b) Voting-based Projection. A voting-based mechanism is proposed to conduct 3D-2D projection via the optimal projection plane. (c) Garment Deformation. We perform garment deformation guided by TPS transformation parameter θ. (d) Depth Reconstruction. We recover the depth for point sets G_{W} and output the complete point cloud (mesh Task) in the (k+1)-th frame.