Under Review
Anonymous authors
Garments are common in daily life and are important for embodied intelligence community. Current category-level garments pose tracking works focus on predicting point-wise canonical correspondence and learning a shape deformation in point cloud sequences. In this paper, motivated by the 2D warping space and shape prior, we propose GaPT-DAR a novel category-level Garments Pose Tracking framework with integrated 2D Deformation And 3D Reconstruction function, which fully utilize 3D-2D projection and 2D-3D reconstruction to transform the 3D point-wise learning into 2D warping deformation learning. Specifically, GaPT-DAR firstly builds a Voting-based Project module that learns the optimal 3D-2D projection plane for maintaining the maximum orthogonal entropy during point projection. Next, a Garments Deformation module is designed in 2D space to explicitly model the garments warping procedure with deformation parameters. Finally, we build a Depth Reconstruction module to recover the 2D images into 3D warp field. We provide extensive experiments on VR-Folding dataset to evaluate our GaPT-DAR and the results show obvious improvements on most of the metrics compared to state-of-the-arts (i.e., GarmentNets and GarmentTracking).
Illustration of Our Method. Category-level Garments Pose Tracking framework with integrated 2D Deformation And 3D Reconstruction (GaPT-DAR). Given the partial point clouds from adjacent frames and pose prediction result from the previous frame, we propose a garment pose tracking pipeline consisting of 3D-2D projection, 2D deformation learning, and 2D-3D reconstruction steps. The output is the tracked pose (complete point cloud) of garments in task space.
The Pipeline of Our GaPT-DAR Framework. Formally, taking partial observation P_{k} in k-th frame and P_{k+1} in (k+1)-th frame as input, using PC NOCS P^{*}_{k} in k-th frame and mesh NOCS P'_{k+1} in (k+1)-th frame as auxiliary information, we output the complete (k+1)-th frame observation (mesh Task) in task space. GaPT-DAR consists of the following components: (a) Inter-frame Feature Fusion. from input used for the downstream network. (b) Voting-based Projection. A voting-based mechanism is proposed to conduct 3D-2D projection via the optimal projection plane. (c) Garment Deformation. We perform garment deformation guided by TPS transformation parameter θ. (d) Depth Reconstruction. We recover the depth for point sets G_{W} and output the complete point cloud (mesh Task) in the (k+1)-th frame.
Download the dataset (VR-Folding) at huggingface, and save in /data
Visualization of 3D-2D Projection used in GaPT-DAR