Abstract: Vision in robotics is still a challenging problem today. High-priced sensors that require strong expertise and engineered environments to achieve robust operations are some of the main obstacles for achieving versatile perception for generic applications. Roboception works toward providing easy-to-use perception solutions for robotics, by providing both hardware and software capable of robustly performing common tasks, including tag and object pose estimation, and providing potential grasp positions to close the value chain of “Sense.Reason.Act.” Application examples include robust bin picking and pick-from-shelf operations, implemented in real production settings. Finally, current developments in the fusion of visual and tactile information will be discussed.
Abstract: Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. However, it is non-trivial to manually design a robot controller that combines modalities with very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to deploy on real robots due to sample complexity. We use self-supervision to learn a compact and multimodal representation of visual and haptic sensory inputs, which can then be used to improve the sample efficiency of policy learning. I present experiments on a peg insertion task where the learned policy generalises over different geometry, configurations, and clearances, while being robust to external perturbations.
Abstract: While joint position sensing is commonplace in robot hands, joint torque sensing is relatively rare, and often overlooked as a sensing modality for manipulation. Changing this state of affairs holds much promise: joint torques can provide a direct measure of contact forces, with relatively simple hardware, no occlusion problems, and few distractions (such as lighting, texture, etc.). This talk will present our work investigating the ability to grasp unknown objects with a two-finger gripper equipped exclusively with proprioception. We use two very different approaches (feedback control on the real gripper and model-free Reinforcement Learning on a simulated gripper) and arrive at similar results, showing that proprioception does indeed provide the information needed to achieve stable grasping in this scenario. We believe this can serve as a useful study of proprioception in isolation, and a foundation for future, more powerful multimodal systems.
Abstract: In applications of deep reinforcement learning to robotics, it is often the case that we want to learn pose invariant policies: policies that are invariant to changes in the position and orientation of objects in the world. For example, consider a peg-in-hole insertion task. If the agent learns to insert a peg into one hole, we would like that policy to generalize to holes presented in different poses. Unfortunately, this is a challenge using conventional methods. In this talk, I will describe a novel state and action abstraction that is invariant to pose shifts called "deictic image maps" that can be used with deep reinforcement learning. I will provide broad conditions under which optimal abstract policies are optimal for the underlying system. Finally, I will show that the method can help solve challenging robotic manipulation problems.
A dexterous manipulation by a multi-fingered hand is still one of the on-going hot topics. In order to realize such a manipulation task, an integration of sensing information to the robotic hand system is very important. Especially, the visual and tactile information is important for the dexterous manipulation. However, such sensing information intrinsically includes a considerable time-delay and noise. In this talk, some integration approaches for utilizing the visual and tactile information in grasping and manipulation of an object by a multi-fingered hand are introduced, which are based on our proposed "fingers-thumb opposition" controller.
Abstract: It is a long-term goal of AI research to understand the complex neural, cognitive and computational mechanisms of cross-modal learning and to use this understanding for (1) enhancing human performance, and (2) improving the performance of artificial systems. The term cross-modal learning refers to the synergistic synthesis of information from multiple sensory modalities such that the learning that occurs within any individual sensory modality can be enhanced with information from one or more other modalities. Cross-modal learning is crucial for human understanding of the world, and examples are ubiquitous, such as: learning to grasp and manipulate objects; learning to walk; learning to read and write; learning to understand language and its referents; etc. In all these examples, visual, auditory, somatosensory or other modalities have to be integrated, and learning must be cross-modal. In fact, the broad range of acquired human skills are cross-modal, and many of the most advanced human capabilities, such as those involved in social cognition, require learning from the richest combinations of cross-modal information.
In a dynamic and changing world, a robust and effective robot system must have adaptive behaviors, incrementally learnable skills and a high-level conceptual understanding of the world it inhabits, as well as planning capabilities for autonomous operations. Future intelligent robot systems will benefit from the recent research on neurocognitive models in processing cross-modal data, exploiting synergy, integrating high-level knowledge and learning, etc. I will first introduce cross-modal learning issues of intelligent robots. Then I will present our investigation and experiments on synergy technique which uses fewer parameters to govern the high DOF of robot movement. The third part of my talk will demonstrate how an intelligent system like a robot can evolve its model as a result of learning from experiences; and how such a model allows a robot to better understand new situations by integration of knowledge, planning and learning.
Abstract: Context is an important part of manipulation. Humans demonstrate a remarkable ability to integrate contextual information such as tactile feedback with visual stimuli and their intuitions to perform complex manipulation skills. We not only see, but feel our actions. In contrast, most current robotic learning methodologies exploit only visual information, leveraging recent advances in computer vision and deep learning to acquire data-hungry pixel-to-action policies. These methodologies do not exploit force signatures or basic intuitive structure such as causality. In this talk, I'll discuss an alternative approach that emulates human causal reasoning using a learned generative representation that naturally enables multi-sensory fusion. I'll demonstrate how a robot can use this approach to learn to play Jenga, a complex manipulation task. Playing Jenga is an interesting manipulation skill because successful execution requires physical interaction and vision alone is not enough. During game-play, the robot is able to first learn, then leverage this representation to make inferences about the tower and block states, perform low-level control, and some degree of high-level decision making. I'll end the talk on some of the lessons we learned and a conceptual comparison to other approaches.
Abstract: Up to now, industrial robots are mostly programmed with procedural programming languages that support the specification of robot motion sequences. Still, creating a robot application requires relatively deep robot programming knowledge, and is time consuming. For decades, the robotics community has been working on approaches for supporting robot programming on a task-oriented level i.e. defining the robot tasks rather than motion. One of them is introducing robot skills, which are robot functions representing robot capabilities to perform certain tasks. However, existing robot skill models involve knowledge and control policies that are usually complex and difficult to implement, as well as not easy to use by application engineers. To achieve practically usable results, we focus on assembly applications and introduce a relatively simple control policy for robotic skills utilizing compliant motion control. Furthermore, we propose a software concept for application programming based on such robot skills and demonstrate its feasibility with test implementations.
Abstract: In this talk, we will focus on sensing method and strategy for robot manipulation in industry applications that requires accuracy and speed. Different sensing modalities, such as vision, force/torque, tactile, can be fused together in parallel or serial manners depending on the applications requirements and constraints. Furthermore, the design of suitable end-effectors could simplify sensing modality and manipulation. We will introduce our findings through item-picking robots as participating team in 2015 Amazon Robotics Challenge, winning team in 2017 Amazon Robotics Challenge, and Runner up team in 2017 DHL Robotics Challenge, as well as a few e-commerce scenarios. Sensing modality fusions are introduced through our projects in hospital logistics, aerospace part repair and manufacturing applications.
Abstract: Traditional dexterous manipulation algorithms require complex mechanisms and sensing modalities. In this talk, the role of underactuation will be discussed for conducting simple and reliable manipulation strategies. Compliance achieved via underactuation removes the necessity of force sensing, joint encoders and accurate systems models, when adaptive vision-based methods are employed. These methods are also supported with data driving state estimation methods for handling workspace constraints and avoiding failures. In addition, a simple mechanism to vary surface friction of the fingers allows us to slide and flip the object within-hand without sophisticated planning and force control strategies.