Object Rearrangement with Nested Nonprehensile Manipulation Actions
IROS 2019 [PDF]
This paper considers the problem of rearrangement planning, i.e finding a sequence of manipulation actions that displace multiple objects from an initial configuration to a given goal configuration. Rearrangement is a critical skill for robots so that they can effectively operate in confined spaces that contain clutter. Examples of tasks that require rearrangement include packing objects inside a bin, wherein objects need to lay according to a predefined pattern. In tight bins, collision-free grasps are often unavailable. Nonprehensile actions, such as pushing and sliding, are preferred because they can be performed using minimalistic end-effectors that can easily be inserted in the bin. Rearrangement with nonprehensile actions is a challenging problem as it requires reasoning about object interactions in a combinatorially large configuration space of multiple objects. This work revisits several existing rearrangement planning techniques and introduces a new one that exploits nested nonprehensile actions by pushing several similar objects simultaneously along the same path, which removes the need to rearrange each object individually. Experiments in simulation and using a real Kuka robotic arm show the ability of the proposed approach to solve difficult rearrangement tasks while reducing the length of the end-effector's trajectories.
Inferring 3D Shapes of Unknown Rigid Objects in Clutter through Inverse Physics Reasoning with Monte Carlo Tree Search
We present a probabilistic approach for building, on the fly, 3-D models of unknown objects while being manipulated by a robot. We specifically consider manipulation tasks in piles of clutter that contain previously unseen objects. Most manipulation algorithms for performing such tasks require known geometric models of the objects in order to grasp or rearrange them robustly. One of the novel aspects of this work is the utilization of a physics engine for verifying hypothesized geometries in simulation. The evidence provided by physics simulations is used in a probabilistic framework that accounts for the fact that mechanical properties of the objects are uncertain. We present an efficient algorithm for inferring occluded parts of objects based on their observed motions and mutual interactions. Experiments using a robot show that this approach is efficient for constructing physically realistic 3-D models, which can be useful for manipulation planning. Experiments also show that the proposed approach significantly outperforms alternative approaches in terms of shape accuracy.
Towards Robust Product Packing with a Minimalistic End-Effector,
Advances in sensor technologies, object detection algorithms, planning frameworks and hardware designs have motivated the deployment of robots in warehouse automation. A variety of such applications, like order fulfillment or packing tasks, require picking objects from unstructured piles and carefully arranging them in bins or containers. Desirable solutions need to be low-cost, easily deployable and controllable, making minimalistic hardware choices desirable. The challenge in designing an effective solution to this problem relates to appropriately integrating multiple components, so as to achieve a robust pipeline that minimizes failure conditions. The current work proposes a complete pipeline for solving such packing tasks, given access only to RGB-D data and a single robot arm with a minimalistic, vacuum-based end-effector. To achieve the desired level of robustness, three key manipulation primitives are identified, which take advantage of the environment and simple operations to successfully pack multiple cubic objects. The overall approach is demonstrated to be robust to execution and perception errors. The impact of each manipulation primitive is evaluated by considering different versions of the proposed pipeline that incrementally introduce reasoning about object poses and corrective manipulation actions.
3D Monocular Multiview Tracker with 3D Aspect Parts
(ECCV 2014 paper) [PDF]
In this work, we focus on the problem of tracking objects under significant viewpoint variations, which poses a big challenge to traditional object tracking methods. We propose a novel method to track an object and estimate its continuous pose and part locations under severe viewpoint change. In order to handle the change in topological appearance introduced by viewpoint transformations, we represent objects with 3D aspect parts  and model the relationship between viewpoint and 3D aspect parts in a part-based particle filtering framework. Moreover, we show that instance-level online-learned part appearance can be incorporated into our model, which makes it more robust in difficult scenarios with occlusions. Experiments are conducted on a new dataset of challenging YouTube videos and a subset of the KITTI dataset  that include significant viewpoint variations, as well as a standard sequence for car tracking. We demonstrate that our method is able to track the 3D aspect parts and the viewpoint of objects accurately despite significant changes in viewpoint. (a) An example output of our tracking framework. Our multiview tracker provides the estimates for continuous pose and 3D aspect parts of the object. (b) An example of the 3D aspect part representation of a 3D object (car) and the projections of the object from different viewpoints.
Multiview Tracking Dataset ~650MB (uploaded on stanford)
Y. Xiang and S. Savarese. Estimating the Aspect Layout of Object Categories. In CVPR, 2012.
A. Geiger, P. Lenz and R. Urtasun. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In CVPR, 2012.
We acknowledge the support of DARPA UPSIDE grant A13-0895-S002 and NSF CAREER grant N.1054127.
Human Intuitive Hierarchical Model for Fine-Grained Categorization
(Machine Learning Class Project, 2013.Jan ~ 2013.Apr) [PDF]
This project focuses on the fine-grained categorization problem, such as Caltech-UCSD Birds, which gives very low performance with the-state-of-art methods. This method categorizes images according to its hierarchical model, and the hierarchical model is built from categorization examples by human. We collected enough number of human-examples by asking human to classify a small set of training data. After that, we transform each classification result to a hierarchical tree, and synthesize all the trees into a hierarchical tree. For synthesizing trees into one, we transform the binary tree to a matrix form and find a new matrix which has minimum norm error with those matrices and represents a new binary tree. We proposed a categorization method based on human intuition and proposed a novel method to find a new binary tree which contains common hierarchical characteristics of a number of binary trees.
Dynamic Resource Allocation by Particle Filter Tracking
We propose a dynamic resource allocation algorithm based on Ranking Support Vector Machine (R-SVM)  for particle filter tracking. We adjust the number of observations in each frame adaptively and automatically, where tracker performs measurement for a subset of highly ranked particles in likelihood to preserve mode locations in the posterior and allocates the rest of particles to maintain the diversity of the posterior without actual measurements.