University of Leeds
Title: Manipulation Planning as Model Simplification
Abstract: I will talk about robotic planners, controllers, and perception systems that use physics-based
predictions about the motion of contacted objects. In my group at the University of Leeds, we are interested in developing such systems for cluttered scenes that include rigid and deformable objects. Predicting the dynamics of such high-dimensional objects (i.e., soft/deformable objects, as well as a collection of multiple in-contact rigid objects) is computationally extremely expensive. I will particularly talk about model-reduction approaches we investigated to address this problem.
Korea Advanced Institute of Science and Technology
Title: Hierarchical and Modular Neural Network for Manipulation Skill Discovery
Abstract: The current idea in vogue is big model, big data, and end-to-end training for developing a general-purpose robot. But here is the problem with this approach: it consumes too much power. For instance, LLAMA 8 billion model uses 250-300 watts just to make a single inference. And that’s for a language model which only has to process a discrete set of symbols. We can only expect the power requirement would be larger for robotics, which has to process a continuous stream of high dimensional sensory data to output a sequence of continuous actions. In contrast, humans on average use only 20 watts of power. This tells us that there is something wrong with how we are building our AI models. In this talk, I will talk about our lab's recent effort to discover useful inductive biases for robot manipulation so that we can do more with less data and smaller models, much like what CNNs did for images.
Monash University
Title: Convex Learning-Based Model Predictive Control for Trajectory Tracking in Robotic Manipulators
Abstract: Accurate trajectory tracking is a fundamental control task in robotics, typically achieved using model-based methods. Classical approaches, such as Computed Torque Control (CTC), model the dynamics explicitly using physical laws in inverse dynamics to feedback linearize. These nominal models can be enhanced with learning-based methods to capture the discrepancies between model predictions and actual observations. Although computationally efficient, this class of methods cannot handle system constraints explicitly. Optimization-based methods, such as Model Predictive Control (MPC), inherently handle constraints but often involve solving a nonlinear optimization problem, especially when the dynamics model has learned components. Since MPC uses the dynamics model for rollout prediction, it is typically formulated in forward dynamics.
This talk introduces a convex Model Predictive Control (MPC) formulation that incorporates learned residual dynamics for trajectory tracking on robot manipulators. Simulation and experimental results demonstrate that our approach outperforms both CTC and nonlinear MPC with residual models in handling a broad range of model mismatches and can execute at 500 Hz control rate, under significant torque constraints and model discrepancy across multiple links.
Roma Tre University
Title: Visual Action Planning with Multiple Heterogeneous Agents
Abstract: Visual planning methods are promising for tackling complex environments where extracting system states analytically is difficult. At the same time, leveraging multiple heterogeneous agents, with distinct capabilities or embodiments, can significantly enhance efficiency, robustness, and flexibility of the system. This talk presents a method for enabling visual planning from raw observations using a team of heterogeneous agents. The approach relies on a roadmap constructed in a low-dimensional latent space to guide planning. To support multi-agent execution, potential parallel actions are inferred from a dataset of individual action tuples. Feasibility and cost of these combinations are then evaluated based on the capabilities of the team, and endowed within the latent space roadmap for effective multi-agent coordination.
Google DeepMind
Title: Gemini Robotics: Bringing AI to the Physical World
Abstract: TBA
University of Southern California
Title: Benchmarking and Guiding Physical Reasoning in Vision-Language Models for Robotic Manipulation
Abstract: Recent vision-language models (VLMs) offer promising generalization for open-world manipulation, but their reasoning abilities over low-level physical interactions remain relatively less-studied. In this talk, I present ManipBench, a benchmark that evaluates VLMs on over 12,000 multiple-choice questions requiring low-level physical reasoning, such as object-object interactions and deformable material handling. I then introduce IMPACT, a motion planning framework that prompts VLMs to infer which objects can tolerate physical contact, enabling efficient and semantically informed planning in clutter. These tools will highlight some use cases, strengths, and limitations of current VLMs for physical reasoning.
Google DeepMind
Title: A Seemingly Doable Robotics+Reasoning Problem
Abstract: In this talk, I present a problem that should be solvable by existing robotics and reasoning approaches: Embodied Chess. Playing chess doesn't require dexterous skills, and we already know how to build superhuman chess agents. So, why haven't we combined these capabilities to enable an end-to-end chess-playing system using a camera and a robot arm?
Worcester Polytechnic Institute
Title: Perception-Action Synergy for Robotic Manipulation
Abstract: Many robotic applications require a robot to manipulate objects in an environment with unknowns or uncertainty. The robot must rely on sensing and perception to guide its actions and obtain feedback about the outcomes to handle errors due to uncertainties. In this talk, I will address the importance of tight perception and action synergy for general-purpose robot manipulators to accomplish complex and contact-rich manipulation tasks autonomously and robustly, such as complex assembly tasks and manipulation of deformable objects.