To perform dynamic cable manipulation to realize the configuration specified by a target image, we formulate dynamic cable manipulation as a stochastic forward model. Then, we propose a method to handle uncertainty by maximizing the expectation, which also considers estimation errors of the trained model. To avoid issues like multiple local minima and requirement of differentiability by gradient-based methods, we propose using a black-box optimization (BBO) to optimize joint angles to realize a goal image. Among BBO, we use the Tree-structured Parzen Estimator (TPE), a type of Bayesian optimization. By incorporating constraints into the TPE, the optimized joint angles are constrained within the range of motion. Since TPE is population-based, it is better able to detect multiple feasible configurations using the estimated inverse model. We evaluated image similarity between the target and cable images captured by executing the robot using optimal transport distance. The results show that the proposed method improves accuracy compared to conventional gradient-based approaches and methods that use deterministic models that do not consider uncertainty.
[1] Kuniyuki Takahashi, Tadahiro Taniguchi: "Goal-Image Conditioned Dynamic Cable Manipulation through Bayesian Inference and Multi-Objective Black-Box Optimization," 2023 IEEE International Conference on International Conference on Robotics and Automation (ICRA2023), 2023
https://arxiv.org/abs/2301.11538
We propose a new method for collision-free planning using Conditional Generative Adversarial Networks (cGANs) to transform between the robot's joint space and a latent space that captures only collision-free areas of the joint space, conditioned by an obstacle map. Generating multiple plausible trajectories is convenient in applications such as the manipulation of a robot arm by enabling the selection of trajectories that avoids collision with the robot or surrounding environment. In the proposed method, various trajectories that avoid obstacles can be generated by connecting the start and goal state with arbitrary line segments in this generated latent space. Our method provides this collision-free latent space, after which any planner, using any optimization conditions, can be used to generate the most suitable paths on the fly. We successfully verified this method with a simulated and actual UR5e 6-DoF robotic arm. We confirmed that different trajectories could be generated depending on optimization conditions.
[1] Tomoki Ando, Hiroto Iino, Hiroki Mori, Ryota Torishima, Kuniyuki Takahashi, Shoichiro Yamaguchi, Daisuke Okanohara, Tetsuya Ogata, "Learning-based Collision-free Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs," Advanced Robotics, 2023
https://arxiv.org/abs/2202.13062
[2] Tomoki Ando, Hiroki Mori, Ryota Torishima, Kuniyuki Takahashi, Shoichiro Yamaguchi, Daisuke Okanohara, Tetsuya Ogata, "Collision-free Path Planning in the Latent Space through cGANs," arXiv, 2022
https://arxiv.org/abs/2202.07203
Food packing industries typically use seasonal ingredients with immense variety that factory workers manually pack. For small pieces of food picked by volume or weight that tend to get entangled, stick or clump together, it is difficult to predict how intertwined they are from a visual examination, making it a challenge to grasp the requisite target mass accurately. Workers rely on a combination of weighing scales and a sequence of complex maneuvers to separate out the food and reach the target mass. This makes automation of the process a non-trivial affair. In this study, we propose methods that combines 1) pre-grasping to reduce the degree of the entanglement, 2) post-grasping to adjust the grasped mass using a novel gripper mechanism to carefully discard excess food when the grasped amount is larger than the target mass, and 3) selecting the grasping point to grasp an amount likely to be reasonably higher than target grasping mass with confidence. We evaluate the methods on a variety of foods that entangle, stick and clump, each of which has a different size, shape, and material properties such as volumetric mass density. We show significant improvement in grasp accuracy of user-specified target masses using our proposed methods.
[1] Kuniyuki Takahashi, Naoki Fukaya, Avinash Ummadisingu: " Target-mass Grasping of Entangled Food using Pre-grasping & Post-grasping", IEEE Robotics and Automation Letters (RA-L) with ICRA2022, pp. 1222-1229, 2022
https://arxiv.org/abs/2201.00933
Explanation blog
The food packaging industry handles an immense variety of food products with wide-ranging shapes and sizes, even within one kind of food. Menus are also diverse and change frequently, making automation of pick-and-place difficult. A popular approach to bin-picking is to first identify each piece of food in the tray by using an instance segmentation method. However, human annotations to train these methods are unreliable and error-prone since foods are packed close together with unclear boundaries and visual similarity making separation of pieces difficult. To address this problem, we propose a method that trains purely on synthetic data and successfully transfers to the real world using sim2real methods by creating datasets of filled food trays using high-quality 3d models of real pieces of food for the training instance segmentation models. Another concern is that foods are easily damaged during grasping. We address this by introducing two additional methods -- a novel adaptive finger mechanism to passively retract when a collision occurs, and a method to filter grasps that are likely to cause damage to neighbouring pieces of food during a grasp. We demonstrate the effectiveness of the proposed method on several kinds of real foods.
[1] Avinash Ummadisingu, Kuniyuki Takahashi, Naoki Fukaya: "Cluttered Food Grasping with Adaptive Fingers and Synthetic-Data Trained Object Detection," 2022 IEEE International Conference on International Conference on Robotics and Automation (ICRA2022), 2022
https://arxiv.org/abs/2203.05187
Food packing industry workers typically pick a target amount of food by hand from a food tray and place them in containers. Since menus are diverse and change frequently, robots must adapt and learn to handle new foods in a short period of time. Learning to grasp a specific amount of granular food requires a large training dataset, which is challenging to collect reasonably quickly. In this study, we propose ways to reduce the necessary amount of training data by augmenting a deep neural network with models that estimate its uncertainty through self-supervised learning. To further reduce human effort, we devise a data collection system that automatically generates labels. We build on the idea that we can grasp sufficiently well if at least one high-confidence (low-uncertainty) grasp point among the various grasp point candidates. We evaluate the methods we propose in this work on a variety of granular foods- coffee beans, rice, oatmeal, and peanuts, each of which has a different size, shape, and material properties, such as volumetric mass density or friction. For these foods, we show significantly improved grasp accuracy of user-specified target masses using smaller datasets by incorporating uncertainty.
[1] Kuniyuki Takahashi, Wilson Ko, Avinash Ummadisingu, Shin-ichi Maeda: "Uncertainty-Aware Self-Supervised Target-Mass Grasping of Granular Foods," 2021 IEEE International Conference on International Conference on Robotics and Automation (ICRA2021), 2021
https://arxiv.org/abs/2105.12946
Explanation blog
We propose a method to annotate segmentation masks accurately and automatically using an invisible marker for object manipulation. The invisible marker is invisible under visible (regular) light conditions but becomes visible under invisible light, such as ultraviolet (UV) light. Massively annotated datasets are created quickly and inexpensively by painting objects with the invisible marker and by capturing images while alternately switching between regular and UV light at high speed. We show a comparison between our proposed method and manual annotations. Under controlled environmental light conditions, we demonstrate semantic segmentation for deformable objects, including clothes, liquids, and powders. In addition, we show demonstrations of liquid pouring tasks under uncontrolled environmental light conditions in complex environments such as inside the office, house, and outdoors. Furthermore, capturing data while the camera is in motion makes it easier to capture large datasets, as shown in our demonstration.
[1] Kuniyuki Takahashi*, Kenta Yonekura*: "Invisible Marker: Automatic Annotation of Segmentation Masks for Object Manipulation, " 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS2020), 2020
*The starred authors are contributed equally
https://arxiv.org/abs/1909.12493
Dataset
Explanation blog
For in-hand manipulation, estimation of the object pose inside the hand is one of the essential functions of manipulating objects to the target pose. Since in-hand manipulation tends to cause occlusions by the hand or the object itself, image information only is not sufficient for in-hand object pose estimation. Multiple modalities can be used in this case, and the advantage is that other modalities can compensate for occlusion, noise, and sensor malfunctions. Even though deciding the utilization rate of a modality (referred to as reliability value) corresponding to the situations is essential, the manual design of such models is difficult, especially for various situations. In this paper, we propose deep gated multi-modal learning, which self-determines the reliability value of each modality through end-to-end deep learning. For the experiments, an RGB camera and a GelSight tactile sensor were attached to the parallel gripper of the Sawyer robot, and the object pose changes were estimated during grasping. A total of 15 objects were used in the experiments. In the proposed model, the reliability values of the modalities were determined according to each modality's noise level and failure, and it was confirmed that the pose change was estimated even for unknown objects.
[1] Tomoki Anzai*, Kuniyuki Takahashi*: "Deep Gated Multi-modal Learning: In-hand Object Pose Changes Estimation using Tactile and Image Data, " 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS2020), 2020
*The starred authors are contributed equally
https://arxiv.org/abs/1909.12494
Explanation blog
Estimating tactile properties from vision, such as slipperiness or roughness, is essential to interact with the environment effectively. These tactile properties help us decide which actions we should choose and how to perform them. E.g., we can drive slower if we see bad traction or grasp tighter if an item looks slippery. We believe that this ability also helps robots enhance their understanding of the environment and thus enables them to tailor their actions to the situation. We, therefore, propose a model to estimate the degree of tactile properties from visual perception alone (e.g., the level of slipperiness or roughness). Our method extends an encoder-decoder network, wherein the latent variables are visual and tactile features. In contrast to previous works, our method does not require manual labeling, only RGB images, and the corresponding tactile sensor data. All our data is collected with a webcam and uSkin tactile sensor mounted on the end-effector of a Sawyer robot, which strokes the surfaces of 25 different materials. We show that our model generalizes to materials not included in the training data by evaluating the feature space, indicating that it has learned to associate important tactile properties with images.
[1] Kuniyuki Takahashi, Jethro Tan: "Deep Visuo-Tactile Learning: Estimation of Tactile Properties from Images," 2019 IEEE International Conference on International Conference on Robotics and Automation (ICRA2019), 20-24th May, 2019
Best Paper Award Finalist (Selected 32 papers from about 2900 papers)
https://arxiv.org/abs/1803.03435
Explanation blog
[2] Kuniyuki Takahashi, Jethro Tan: "Deep Visuo-Tactile Learning: Estimation of Tactile Properties from Images (Extended Abstract), " The 29th International Joint Conference on Artificial Intelligence (IJCAI 2020), pp. 4780-4784, 2020
https://www.ijcai.org/Proceedings/2020/665
Comprehension of spoken natural language is an essential skill for robots to communicate with humans effectively. However, handling unconstrained spoken instructions is challenging due to (1) complex structures and the wide variety of expressions used in spoken language and (2) inherent ambiguity of human instructions. In this research, we propose the first comprehensive system for controlling robots with unconstrained spoken language, which can effectively resolve ambiguity in spoken instructions. Specifically, we integrate deep learning-based object detection together with natural language processing technologies to handle unconstrained spoken instructions and propose a method for robots to resolve instruction ambiguity through dialogue. Through our experiments on both a simulated environment as well as a physical industrial robot arm, we demonstrate the ability of our system to understand natural instructions from human operators effectively and show how higher success rates of the object picking task can be achieved through an interactive clarification process.
[1] Jun Hatori*, Yuta Kikuchi*, Sosuke Kobayashi*, Kuniyuki Takahashi*, Yuta Tsuboi*, Yuya Unno*, Wilson Ko, Jethro Tan: "Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions," 2018 IEEE/RSJ International Conference on International Conference on Robotics and Automation (ICRA2018), 21-25th May, 2018
*The starred authors are contributed equally and ordered alphabetically.
Best Paper Award on HRI (Selected from about 2600 papers)
https://arxiv.org/abs/1710.06280
We propose a tool-body assimilation model that considers grasping during motor babbling for using tools. A robot with tool-use skills can be helpful in human-robot symbiosis because it allows it to expand its task-performing abilities. Past studies that included tool-body assimilation approaches were mainly focused on obtaining the functions of the tools and demonstrated the robot starting its motions with a tool pre-attached to the robot. This implies that the robot would not be able to decide whether and where to grasp the tool. In real-life environments, robots need to consider the possibilities of tool-grasping positions and then grasp the tool. The robot performs motor babbling by grasping and non-grasping the tools to learn the robot's body model and tool functions to address these issues. In addition, the robot grasps various parts of the tools to learn different tool functions from different grasping positions. The motion experiences are learned using deep learning. In model evaluation, the robot manipulates an object task without tools and with several tools of different shapes. The robot generates motions after being shown the initial state and a target image by deciding whether and where to grasp the tool. Therefore, the robot can generate the proper motion and grasping decisions when the initial state and a target image are provided.
[1] Kuniyuki Takahashi, Kitae Kim, Tetsuya Ogata, Shigeki Sugano: Robotics and Autonomous Systems, Volume. 91, pp. 115-127, IF: 1.616, January, 2017
DOI: 10.1016/j.robot.2017.01.002
[2] Kuniyuki Takahashi, Tetsuya Ogata, Hadi Tjandra, Yuki Yamaguchi, Shigeki Sugano: "Tool-body Assimilation Model Based on Body Babbling and Neuro-dynamical System," Mathematical Problems in Engineering, Article ID 837540, IF: 1.082, vol. 2015, 15 pages, 2015
DOI: 10.1155/2015/837540
[3] Kuniyuki Takahashi, Hadi Tjandra, Tetsuya Ogata, Shigeki Sugano: "Body Model Transition by Tool Grasping During Motor Babbling using Deep Learning and RNN," Lecture Notes in Computer Science (The 25th International Conference on Artificial Neural Networks (ICANN 2016)), pp 166-174, Barcelona, Spain, September 6th-9th, 2016
DOI: 10.1007/978-3-319-44778-0_20
[4] Kuniyuki Takahashi, Tetsuya Ogata, Hadi Tjandra, Shingo Murata, Hiroaki Arie, Shigeki Sugano: "Tool-body Assimilation Model based on Body Babbling and a Neuro-dynamical System for Motion Generation," Lecture Notes in Computer Science (The 24th International Conference on Artificial Neural Networks (ICANN 2014)), pp. 363-370, Hamburg, Germany, September 15th-19th, 2014
DOI: 10.1007/978-3-319-11179-7_46
This paper proposes a learning strategy for robots with flexible joints with multiple degrees of freedom to achieve dynamic motion tasks. Despite several potential benefits of flexible-joint robots, such as exploitation of intrinsic dynamics and passive adaptation to environmental changes with mechanical compliance, controlling such robots is challenging because of the increased complexity of their dynamics. To achieve dynamic movements, we introduce a two-phase learning framework of the robot's body dynamics using a recurrent neural network motivated by a deep learning strategy. The proposed methodology comprises a pre-training phase with motor babbling and a fine-tuning phase with additional learning of the target tasks. We consider active and passive exploratory motions in the pre-training phase to efficiently acquire body dynamics. The learned body dynamics are adjusted for specific tasks in the fine-tuning phase. We demonstrate the effectiveness of the proposed methodology in achieving dynamic tasks involving constrained movement requiring interactions with the environment on a simulated robot model and an actual PR2 robot, both of which have a compliantly actuated seven-degree-of-freedom arm. The results illustrate a reduction in the required number of training iterations for task learning and generalization capabilities for untrained situations.
[1] Kuniyuki Takahashi, Tetsuya Ogata, Jun Nakanishi, Gordon Cheng, Shigeki Sugano: “Dynamic Motion Learning for Multi-DOF Flexible-Joint Robots Using Active-Passive Motor Babbling through Deep Learning,” Advanced Robotics, vol. 31, issue 18, pp. 1002-1015, 2017 Open Access
DOI: 10.1080/01691864.2017.1383939
2019 Advanced Robotics Best Paper Award
2019 FA Foundation Paper Award
[2] Kuniyuki Takahashi, Tetsuya Ogata, Hiroki Yamada, Hadi Tjandra, Shigeki Sugano: "Effective Motion Learning for a Flexible-Joint Robot using Motor Babbling," 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS2015), pp. 2723-2728, Hamburg, Germany, September 28th - October 2nd, 2015
DOI: 10.1109/IROS.2015.7353750
[3] Kuniyuki Takahashi, Tetsuya Ogata, Shigeki Sugano, Gordon Cheng : "Dynamic Motion Learning for a Flexible-Joint Robot using Active-Passive Motor Babbling, " The 33rd Annual Conference of the Robotics Society of Japan ( Domestic conference in Japan),2G1-07,Tokyo,September 3rd-5th, 2015 [PDF]
We propose an exploratory form of motor babbling that uses variance predictions from a recurrent neural network as a method to acquire the body dynamics of a robot with flexible joints. In conventional research methods, it is challenging to construct real robots because of the large number of motor babbling motions required. In motor babbling, different motions may be easy or difficult to predict. The variance is significant in difficult-to-predict motions, whereas the variance is slight in easy-to-predict motions. We use a Stochastic Continuous Timescale Recurrent Neural Network to predict the accuracy and variance of motions. Using the proposed method, a robot can explore motions based on variance. To evaluate the proposed method, experiments were conducted in which the robot learns crank turning and door opening/closing tasks after exploring its body dynamics. The results show that the proposed method can efficiently generate motion for any given motion task.
[1] Kuniyuki Takahashi, Kanata Suzuki, Tetsuya Ogata, Hadi Tjandra, Shigeki Sugano: "Efficient Motor Babbling Using Variance Predictions from a Recurrent Neural Network," Lecture Notes in Computer Science (22nd International Conference on Neural Information Processing (ICONIP2015)), pp.26-33, Istanbul, Turkey, November 9th - 12th, 2015
DOI: 10.1007/978-3-319-26555-1_4
We applied a deep learning framework to a robot with flexible joints having multi-degrees of freedom to achieve the folding task with a flexible object. In general, it is challenging to design the motion of a robot having a flexible joint and flexible object. Therefore, we use a deep neural network (DNN) that can self-organize models. A two-phase deep learning model is utilized in the proposed approach. A deep auto-encoder extracts image features and reconstructs images, and a deep time delay neural network learns the dynamics of a robot task process from the extracted image features and motion angle signals. The “PR2” humanoid robot is used as an experimental platform to evaluate the proposed model.
[1] Kanata Suzuki, Kuniyuki Takahashi, Gordon Cheng, Tetsuya Ogata: "Motion Generatino of Flexible Object Folding Task applied on Humanoid Robot using Deep Learing," The 78th national convention of information Processing Society of Japan (IPSJ2016, Domestic conference in Japan), Japan, 10th-12th March, 2016 [PDF]
2016 Student Encouragement Award & Conference Encouragement Award