Xiaotong Chen

Google Scholar                 CV                 cxt@umich.edu

I am now a research scientist working in ProtagoLabs.AI. My current working direction is to combine robotics and NLP to realize more intelligent general-purpose AI systems. I am happy to chat about AI future perspective and open to possible research collaborations.

I finished my PhD in Department of Robotics at the University of Michigan, Ann Arbor. I work on visual perception for robot manipulation in Laboratory for Progress led by Prof. Chad Jenkins. My research goal is to enhance robot's ability in goal-directed task completion by integrating 3D vision with robot action design. My  main focus is on the visual perception domain for object-centric robotic manipulation, such as object pose estimation, category-level object manipulation, and dataset creation. One prospective application is to build a home serving robot that's able to recognize, localize, manipulate novel objects and interact with human.

Before coming to Michigan, I participated in research in soft robotics group led by Prof. Xiaoping Chen in University of Science and Technology of China, with focus on design, fabrication and control of a soft manipulator named "Honeycomb Pneumatic Networks". Check our recent paper on IJRR here.

Besides, I worked as a research engineer on self-driving truck localization at TuSimple during summer 2020. During summer 2022, I worked as an applied scientist intern at Amazon Lab 126.

Recent Research Projects

T-Recs: Transparency Reconstruction from Single-view Fusion (PDF)

Xiaotong Chen, Zheming Zhou, Zhuo Deng, Omid Ghasemalizadeh, Cheng-Hao Kuo. Humanoids 2023

This work proposes a two-stage reconstruction pipeline on transparent object scenes, based on single-view depth completion + segmentation, and multi-view optimization using Bundle Adjustment and epipolar-assisted optical flow 2D landmark estimation.

TransNet: Transparent Object Manipulation Through Category-Level Pose Estimation (PDF)

Huijie Zhang, Anthony Opipari, Xiaotong Chen, Jiyue Zhu, Zeren Yu, Odest Chadwicke Jenkins. ECCV workshop 2022

This work proposes a category-level object pose estimation network, TransNet, that works on transparent objects in highly challenging scenes. In the work the input modality combination, network architecture, and comparison with recent state-of-the-art is analyzed.

VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation (PDF, code)

Kaizhi Zheng, Xiaotong Chen, Odest Chadwicke Jenkins, Xin Eric Wang. NeuIPS 2022

This work proposes a compositional benchmark on robotic manipulation tasks with language instructions. Different from other benchmarks, tasks are categorized by 3D position and 3D orientation constraints, and could be automatically generalized to novel object shapes.

ClearPose: Large-scale Transparent Object Dataset and Benchmark (PDF, code)

Xiaotong Chen, Huijie Zhang, Zeren Yu, Anthony Opipari, Odest Chadwicke Jenkins. ECCV 2022

This work proposes a large-scale dataset of daily transparent objects (63 objects, 350K frames), with rgb, depth, normal, mask images and object-level 6D pose annotations. We provided benchmark results of recent depth completion and object pose estimation deep network solutions as well.

ProgressLabeller: Visual Data Stream Annotation for Training Object-Centric 3D Perception (PDF, code)

Xiaotong Chen, Huijie Zhang, Zeren Yu, Stanley Lewis, Odest Chadwicke Jenkins. IROS 2022

This work proposes a rapid customized dataset labeller for objects' 6D poses. The labelled dataset is proved to be more accurate than LabelFusion and synthesized data from Blenderproc with respect to deep CNN pose estimation and pose-based robot grasping.

Manipulation-Oriented Object Perception in Clutter through Affordance Coordinate Frames (PDF, video, code)

Xiaotong Chen, Kaizhi Zheng, Zhen Zeng, Shreshtha Basu, James Cooney, Jana Pavlasek, Odest Chadwicke Jenkins. Humanoid 2022

This work proposes Affordance Coordinate Frame as a novel representation to connect the visual perception and action execution for semantic tasks more than pick and place.

LIT: Light-field Inference of Transparency for Refractive Object Localization (PDF, video, site)

Zheming Zhou, Xiaotong Chen, Odest Chadwicke Jenkins. RA-L 2020 

Best Paper Award

This work incorporates specific feature extraction using light-field sensors with CNNs for better accuracy in transparent material object 6D pose estimation. The estimated poses of glass cups made it possible for robots to build a transparent champagne tower.

GRIP: Generative robust inference and perception for semantic robot manipulation in adversarial environments (PDF, video)

Xiaotong Chen, Rui Chen, Zhiqiang Sui, Zhefan Ye, Yanqi Liu, Iris R. Bahar, Odest Chadwicke Jenkins. IROS 2019

This work combines discriminative power of CNNs and robustness from probabilistic inference in object pose estimation in dark environments. In the proposed two-stage pipeline, a particle optimization process takes input from neural network results as prior and sample in SE(3) space to finalize the object pose.

Hierarchical control of soft manipulators towards unstructured interactions (PDF, video)

Hao Jiang, Zhanchi Wang, Yusong Jin, Xiaotong Chen, Peijin Li, Yinghao Gan, Sen Lin and Xiaoping Chen. IJRR 2021

Performing daily interaction tasks such as opening doors and pulling drawers in unstructured environments is a challenging problem for robots. The emergence of soft-bodied robots brings a new perspective to solving this problem. In this paper, inspired by humans performing interaction tasks through simple behaviors, we propose a hierarchical control system for soft arms, in which the low-level controller achieves motion control of the arm tip, the high-level controller controls the behaviors of the arm based on the low-level controller, and the top-level planner chooses what behaviors should be taken according to tasks.