Robotic Tasks and How to Specify Them?

Task Specification for General-Purpose Intelligent Robots

Delft, Netherlands

July 19th, 2024

With the rise of foundation models, there has been a collective, accelerated pursuit of general-purpose robots. However, we are still far from developing truly versatile robotic systems. A key challenge lies in how we can effectively impart human knowledge about tasks to robots: the problem of task specification [1, 2, 3, 4, 5, 6, 7, 8]. Despite its importance, the idea of task specification has been largely fragmented in robotics research. Researchers from different disciplines choose particular formalisms of tasks in different environments (e.g., goal and subgoal poses, numerical rewards and discount factors, natural and formal language, images, and demonstrations). Furthermore, task specification is often tailored to or exploits a particular algorithm, environment, or inductive bias. A general-purpose robot that can operate in unstructured, human-centric environments must be able to understand new tasks and perceive feedback in various forms.

Consider this scenario: you wake up in the morning and ask your robot butler to make a cup of coffee. The robot must decipher the meaning of "a cup of coffee," identify required actions, detect relevant objects, and monitor task progress. Hang on, are object-centric representations the desired way to encode the scene? Perhaps operating on pixels or even voxels gives us more details and opens up the potential for solving more sophisticated tasks. Moreover, the robot has to understand whether “a cup of coffee” is made successfully. How do we evaluate this? Based on what kind of input modalities? By comparing it to an image of coffee? Have a taste of coffee? Human feedback? Or some black-box classifier that can magically score the quality of the coffee?

From this simple example, we already see several notions of task specification emerge. To facilitate discussion, we break instances of task specification into two complementary groups: “formalisms” and “modalities”. The former specifies structured constructs that formally convey a task to a robot. This includes task objectives in the form of reward functions, demonstrations, or feedback and other implicit task knowledge. The latter encompasses sensory modalities that ground and contextualize various specification “formalisms” to the particular robot environment, such as vision, touch, sound, language, gestures, eye gaze, and physical interactions.

In this workshop, we are interested in developing a shared community-level understanding of task specification in various formalisms and modalities. These topics have been studied with different emphases in reinforcement learning [1, 9], human-robot interaction [2, 10], natural language processing [3, 4], formal methods [11], foundation models [5, 12], representation learning [6, 7], and cognitive science [8]. We hope to bring together these sub-communities, assimilate recent advances from diverse perspectives, and lead in-depth discussions toward developing a shared vision of the key open problems in the area.

Discussion topics:

What constitutes a task specification:

E.g., task objective, environment definition, robot’s capability.
What do we need to explicitly represent vs. implicitly learn from human data (e.g., instructions, demonstrations, preferences, etc.)?

What are the characteristics that a task specification framework should have, and how do we integrate them:

E.g., expressivity, verifiability, unambiguity, compactness, and compositionality.
What are the advantages and drawbacks of existing forms of task specification?
Can we invent a universal task specification language?

Communicating task specifications with robots:

How do humans communicate tasks? How can human-human communication inform human-robot communication?
How do we quantify the expertise-intensiveness of each form of task specification?
What are “proper” ways for humans to communicate task knowledge with robots that ensure alignment with human intents?

Compatibility of specification formalisms, sensory modalities, and algorithms:

Are there inherent limitations to some modalities used in task specification? Will using different modalities help? If so, how can we combine them effectively?
How does a good task specification inform data collection in robotics?
How do we design efficient supervision techniques that allow robots to learn from less data or ambiguous information? How can robots express uncertainty over tasks?
What are the suitable algorithms for different task specifications?

Speakers and Panelists

Talk recordings are here

Hadas Kress-Gazit

Cornell University, USA

Peter Stone

The University of Texas at Austin, USA

Yonatan Bisk

Carnegie Mellon University, USA

David Abel

Google DeepMind, UK

Cédric Colas

MIT, USA

Accepted Papers

DaTAPlan: Data-driven Task Anticipation and Knowledge-driven Planning for Human-robot Collaboration. Karthik Swaminathan, Shivam Singh, Raghav Arora, Snehasis Banerjee, Mohan Sridharan, Madhava Krishna. PDF.
Boosting Autonomous Reinforcement Learning via Action-Free Video and Plasticity Preservation. Daesol Cho, Jigang Kim, H. Jin Kim. PDF.
Few-Shot Task Learning Through Inverse Generative Modeling. Aviv Netanyahu, Yilun Du, Antonia Bronars, Jyothish Pari, Joshua B. Tenenbaum, Tianmin Shu, Pulkit Agrawal. PDF.
''Set It Up!'': Functional Object Arrangement with Compositional Generative Models. Yiqing Xu, Jiayuan Mao, Yilun Du, Tomas Lozáno-Pérez, Leslie Pack Kaebling, David Hsu. PDF.
Reward Machines for Deep RL in Noisy and Uncertain Environments. Andrew C Li, Zizhao Chen, Toryn Q. Klassen, Pashootan Vaezipoor, Rodrigo Toro Icarte, Sheila A. McIlraith. PDF.
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics. Wentao Yuan, Jiafei Duan, Valts Blukis, Wilbert Pumacay, Ranjay Krishna, Adithyavairavan Murali, Arsalan Mousavian, Dieter Fox. PDF.
Reducing Human-AI Goal State Divergence with Environment Design. Kelsey Sikes, Sarath Sreedharan, Sarah Keren. PDF.
Composing Option Sequences by Adaption: Initial Results. Charles A Meehan, Paul Rademacher, Mark Roberts, Laura M. Hiatt . PDF.
Helpful Robots Requesting Relevant Help. Sarah Keren. PDF.
Affordance-Guided Reinforcement Learning via Visual Prompting. Olivia Y. Lee, Annie Xie, Kuan Fang, Karl Pertsch, Chelsea Finn. PDF.
WoCoCo: Learning Whole-Body Humanoid Control with Sequential Contacts. Chong Zhang, Wenli Xiao, Tairan He, Guanya Shi. PDF.
Contrastive Learning for Scene-Agnostic Task-Orientated Visual Representation. Jiaxu Xing, Leonard Bauersfeld, Yunlong Song, Chunwei Xing, Davide Scaramuzza. PDF.
Multi-Stage Task Specification Learning From Demonstration. Mattijs Baert, Sam Leroux, Pieter Simoens. PDF.
CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations. Xuzhe Dang, Stefan Edelkamp. PDF.
Learning Task Decompositions for Multi-agent Teams. Thomas Chen, Nikhil Pitta, Ameesh Shah, Niklas Lauffer, Sanjit A. Seshia. PDF.
OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics. Peiqi Liu, Yaswanth Orru, Jay Vakil, Chris Paxton, Nur Muhammad Mahi Shafiullah, Lerrel Pinto. PDF.
Verifiably Following Complex Robot Instructions with Foundation Models. Benedict Quartey, Eric Rosen, Stefanie Tellex, George Konidaris. PDF.
LaNMP: A Multifaceted Mobile Manipulation Benchmark for Robots. Ahmed Jaafar, Shreyas Sundara Raman, Yichen Wei, Sofia E. Juliani, Anneke Wernerfelt, Ifrah Idrees, Jason Xinyu Liu, Stefanie Tellex. PDF.
Equivariant Open-vocabulary Pick and Place via Language Kernels and Patch-level Semantic Maps. Mingxi Jia, Haojie Huang, Zhewen Zhang, Chenghao Wang, Linfeng Zhao, Dian Wang, Jason Xinyu Liu, Robin Walters, Robert Platt, Stefanie Tellex. PDF.
Towards Adapting Reinforcement Learning Agents to New Tasks: Insights from Q-Values. Ashwin Ramaswamy. Ransalu Senanayake. PDF.
Cook2LTL: Translating Cooking Recipes to LTL Formulae using Large Language Models. Angelos Mavrogiannis, Christoforos Mavrogiannis, Yiannis Aloimonos. PDF.
Vote-Tree-Planner: Optimizing Execution Order in LLM-based Task Planning Pipeline via Voting Mechanism. Chaoyuan Zhang, Zhaowei Li, Wentao Yuan, Zhihao Zhao. PDF.
Find It Like a Dog: Using Gesture to Improve Robot Object Search. Ivy Xiao He, Madeline H. Pelgrim, Kyle Lee, Falak Pabari, Stefanie Tellex, Thao Nguyen, Daphna Buchsbaum. PDF.
Grounding Language Plans in Demonstrations Through Counterfactual Perturbations. Yanwei Wang, Tsun-Hsuan Wang, Jiayuan Mao, Michael Hagenow, Julie Shah. PDF.
A Minimalist Prompt for Zero-Shot Policy Learning. Meng Song, Xuezhi Wang, Tanay Biradar, Yao Qin, Manmohan Chandraker. PDF.

Organizing Committee

Jason Liu

Brown University, USA

Jason Ma

University of Pennsylvania, USA

Yiqing Xu

National University of Singapore, Singapore

Andi Peng

MIT, USA

Ankit Shah

Brown University, USA

Andreea Bobu

University of California Berkeley, USA

Anca Dragan

University of California Berkeley, USA

Julie Shah

MIT, USA

Dinesh Jayaraman

University of Pennsylvania, USA

Stefanie Tellex

Brown University, USA

Program Committee

Ever paper received at least two reviews. We thank the help from all our reviewers!

Alper Ahmetoglu

Akhil Bagaria

Ali Baheri

Yue Cao

Xuzhe Dang

Lakshita Dodeja

Haotian Fu

Ivy Xiao He

Benned Hedegaard

Ahmed Jaafar

Andrew C Li

Hongyu Li

Jason Xinyu Liu

Peiqi Liu

Ziqian Luo

Charles A Meehan

Karan Muvvala

Siddharth Nayak

Naman Shah

Shane Parr

David Paulius

Benedict Quartey

Eric Rosen

Ameesh Shah

Meng Song

Karthik Swaminathan

Zhanyi Sun

Christopher Thierauf

Yichen Wei

Jiaxu Xing

Yiqing Xu

Xinchen Yang

Chong Zhang

Page updated

Google Sites

Report abuse