Invited Talks

Mehmet Dogar

University of Leeds

Title: Manipulation Planning as Model Simplification

Abstract: I will talk about robotic planners, controllers, and perception systems that use physics-based

predictions about the motion of contacted objects. In my group at the University of Leeds, we are interested in developing such systems for cluttered scenes that include rigid and deformable objects. Predicting the dynamics of such high-dimensional objects (i.e., soft/deformable objects, as well as a collection of multiple in-contact rigid objects) is computationally extremely expensive. I will particularly talk about model-reduction approaches we investigated to address this problem.

Beomjoon Kim

Korea Advanced Institute of Science and Technology

Title: Hierarchical and Modular Neural Network for Manipulation Skill Discovery

Abstract: The current idea in vogue is big model, big data, and end-to-end training for developing a general-purpose robot. But here is the problem with this approach: it consumes too much power. For instance, LLAMA 8 billion model uses 250-300 watts just to make a single inference. And that’s for a language model which only has to process a discrete set of symbols. We can only expect the power requirement would be larger for robotics, which has to process a continuous stream of high dimensional sensory data to output a sequence of continuous actions. In contrast, humans on average use only 20 watts of power. This tells us that there is something wrong with how we are building our AI models. In this talk, I will talk about our lab's recent effort to discover useful inductive biases for robot manipulation so that we can do more with less data and smaller models, much like what CNNs did for images.

Dana Kulic

Monash University

Title: Convex Learning-Based Model Predictive Control for Trajectory Tracking in Robotic Manipulators

Abstract: Accurate trajectory tracking is a fundamental control task in robotics, typically achieved using model-based methods. Classical approaches, such as Computed Torque Control (CTC), model the dynamics explicitly using physical laws in inverse dynamics to feedback linearize. These nominal models can be enhanced with learning-based methods to capture the discrepancies between model predictions and actual observations. Although computationally efficient, this class of methods cannot handle system constraints explicitly. Optimization-based methods, such as Model Predictive Control (MPC), inherently handle constraints but often involve solving a nonlinear optimization problem, especially when the dynamics model has learned components. Since MPC uses the dynamics model for rollout prediction, it is typically formulated in forward dynamics.

This talk introduces a convex Model Predictive Control (MPC) formulation that incorporates learned residual dynamics for trajectory tracking on robot manipulators. Simulation and experimental results demonstrate that our approach outperforms both CTC and nonlinear MPC with residual models in handling a broad range of model mismatches and can execute at 500 Hz control rate, under significant torque constraints and model discrepancy across multiple links.

Martina Lippi

Roma Tre University

Title: Visual Action Planning with Multiple Heterogeneous Agents

Abstract: Visual planning methods are promising for tackling complex environments where extracting system states analytically is difficult. At the same time, leveraging multiple heterogeneous agents, with distinct capabilities or embodiments, can significantly enhance efficiency, robustness, and flexibility of the system. This talk presents a method for enabling visual planning from raw observations using a team of heterogeneous agents. The approach relies on a roadmap constructed in a low-dimensional latent space to guide planning. To support multi-agent execution, potential parallel actions are inferred from a dataset of individual action tuples. Feasibility and cost of these combinations are then evaluated based on the capabilities of the team, and endowed within the latent space roadmap for effective multi-agent coordination.

Keerthana Gopalakrishnan

Google DeepMind

Title: Gemini Robotics: Bringing AI to the Physical World

Abstract: TBA

Daniel Seita

University of Southern California

Title: Benchmarking and Guiding Physical Reasoning in Vision-Language Models for Robotic Manipulation

Abstract: Recent vision-language models (VLMs) offer promising generalization for open-world manipulation, but their reasoning abilities over low-level physical interactions remain relatively less-studied. In this talk, I present ManipBench, a benchmark that evaluates VLMs on over 12,000 multiple-choice questions requiring low-level physical reasoning, such as object-object interactions and deformable material handling. I then introduce IMPACT, a motion planning framework that prompts VLMs to infer which objects can tolerate physical contact, enabling efficient and semantically informed planning in clutter. These tools will highlight some use cases, strengths, and limitations of current VLMs for physical reasoning.

Mohit Shridhar

Google DeepMind

Title: A Seemingly Doable Robotics+Reasoning Problem

Abstract: In this talk, I present a problem that should be solvable by existing robotics and reasoning approaches: Embodied Chess. Playing chess doesn't require dexterous skills, and we already know how to build superhuman chess agents. So, why haven't we combined these capabilities to enable an end-to-end chess-playing system using a camera and a robot arm?

Jing Xiao

Worcester Polytechnic Institute

Title: Perception-Action Synergy for Robotic Manipulation

Abstract: Many robotic applications require a robot to manipulate objects in an environment with unknowns or uncertainty. The robot must rely on sensing and perception to guide its actions and obtain feedback about the outcomes to handle errors due to uncertainties. In this talk, I will address the importance of tight perception and action synergy for general-purpose robot manipulators to accomplish complex and contact-rich manipulation tasks autonomously and robustly, such as complex assembly tasks and manipulation of deformable objects.

Page updated

Google Sites

Report abuse