Robotic Tasks and How to Specify Them?
Task Specification for General-Purpose Intelligent Robots
Delft, Netherlands
July 19th, 2024
With the rise of foundation models, there has been a collective, accelerated pursuit of general-purpose robots. However, we are still far from developing truly versatile robotic systems. A key challenge lies in how we can effectively impart human knowledge about tasks to robots: the problem of task specification [1, 2, 3, 4, 5, 6, 7, 8]. Despite its importance, the idea of task specification has been largely fragmented in robotics research. Researchers from different disciplines choose particular formalisms of tasks in different environments (e.g., goal and subgoal poses, numerical rewards and discount factors, natural and formal language, images, and demonstrations). Furthermore, task specification is often tailored to or exploits a particular algorithm, environment, or inductive bias. A general-purpose robot that can operate in unstructured, human-centric environments must be able to understand new tasks and perceive feedback in various forms.
Consider this scenario: you wake up in the morning and ask your robot butler to make a cup of coffee. The robot must decipher the meaning of "a cup of coffee," identify required actions, detect relevant objects, and monitor task progress. Hang on, are object-centric representations the desired way to encode the scene? Perhaps operating on pixels or even voxels gives us more details and opens up the potential for solving more sophisticated tasks. Moreover, the robot has to understand whether “a cup of coffee” is made successfully. How do we evaluate this? Based on what kind of input modalities? By comparing it to an image of coffee? Have a taste of coffee? Human feedback? Or some black-box classifier that can magically score the quality of the coffee?
From this simple example, we already see several notions of task specification emerge. To facilitate discussion, we break instances of task specification into two complementary groups: “formalisms” and “modalities”. The former specifies structured constructs that formally convey a task to a robot. This includes task objectives in the form of reward functions, demonstrations, or feedback and other implicit task knowledge. The latter encompasses sensory modalities that ground and contextualize various specification “formalisms” to the particular robot environment, such as vision, touch, sound, language, gestures, eye gaze, and physical interactions.
In this workshop, we are interested in developing a shared community-level understanding of task specification in various formalisms and modalities. These topics have been studied with different emphases in reinforcement learning [1, 9], human-robot interaction [2, 10], natural language processing [3, 4], formal methods [11], foundation models [5, 12], representation learning [6, 7], and cognitive science [8]. We hope to bring together these sub-communities, assimilate recent advances from diverse perspectives, and lead in-depth discussions toward developing a shared vision of the key open problems in the area.
Discussion topics:
What constitutes a task specification:
E.g., task objective, environment definition, robot’s capability.
What do we need to explicitly represent vs. implicitly learn from human data (e.g., instructions, demonstrations, preferences, etc.)?
What are the characteristics that a task specification framework should have, and how do we integrate them:
E.g., expressivity, verifiability, unambiguity, compactness, and compositionality.
What are the advantages and drawbacks of existing forms of task specification?
Can we invent a universal task specification language?
Communicating task specifications with robots:
How do humans communicate tasks? How can human-human communication inform human-robot communication?
How do we quantify the expertise-intensiveness of each form of task specification?
What are “proper” ways for humans to communicate task knowledge with robots that ensure alignment with human intents?
Compatibility of specification formalisms, sensory modalities, and algorithms:
Are there inherent limitations to some modalities used in task specification? Will using different modalities help? If so, how can we combine them effectively?
How does a good task specification inform data collection in robotics?
How do we design efficient supervision techniques that allow robots to learn from less data or ambiguous information? How can robots express uncertainty over tasks?
What are the suitable algorithms for different task specifications?
Hadas Kress-Gazit
Cornell University, USA
Peter Stone
The University of Texas at Austin, USA
Yonatan Bisk
Carnegie Mellon University, USA
David Abel
Google DeepMind, UK
Cédric Colas
MIT, USA
Accepted Papers
DaTAPlan: Data-driven Task Anticipation and Knowledge-driven Planning for Human-robot Collaboration. Karthik Swaminathan, Shivam Singh, Raghav Arora, Snehasis Banerjee, Mohan Sridharan, Madhava Krishna. PDF.
Boosting Autonomous Reinforcement Learning via Action-Free Video and Plasticity Preservation. Daesol Cho, Jigang Kim, H. Jin Kim. PDF.
Few-Shot Task Learning Through Inverse Generative Modeling. Aviv Netanyahu, Yilun Du, Antonia Bronars, Jyothish Pari, Joshua B. Tenenbaum, Tianmin Shu, Pulkit Agrawal. PDF.
''Set It Up!'': Functional Object Arrangement with Compositional Generative Models. Yiqing Xu, Jiayuan Mao, Yilun Du, Tomas Lozáno-Pérez, Leslie Pack Kaebling, David Hsu. PDF.
Reward Machines for Deep RL in Noisy and Uncertain Environments. Andrew C Li, Zizhao Chen, Toryn Q. Klassen, Pashootan Vaezipoor, Rodrigo Toro Icarte, Sheila A. McIlraith. PDF.
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics. Wentao Yuan, Jiafei Duan, Valts Blukis, Wilbert Pumacay, Ranjay Krishna, Adithyavairavan Murali, Arsalan Mousavian, Dieter Fox. PDF.
Reducing Human-AI Goal State Divergence with Environment Design. Kelsey Sikes, Sarath Sreedharan, Sarah Keren. PDF.
Composing Option Sequences by Adaption: Initial Results. Charles A Meehan, Paul Rademacher, Mark Roberts, Laura M. Hiatt . PDF.
Helpful Robots Requesting Relevant Help. Sarah Keren. PDF.
Affordance-Guided Reinforcement Learning via Visual Prompting. Olivia Y. Lee, Annie Xie, Kuan Fang, Karl Pertsch, Chelsea Finn. PDF.
WoCoCo: Learning Whole-Body Humanoid Control with Sequential Contacts. Chong Zhang, Wenli Xiao, Tairan He, Guanya Shi. PDF.
Contrastive Learning for Scene-Agnostic Task-Orientated Visual Representation. Jiaxu Xing, Leonard Bauersfeld, Yunlong Song, Chunwei Xing, Davide Scaramuzza. PDF.
Multi-Stage Task Specification Learning From Demonstration. Mattijs Baert, Sam Leroux, Pieter Simoens. PDF.
CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations. Xuzhe Dang, Stefan Edelkamp. PDF.
Learning Task Decompositions for Multi-agent Teams. Thomas Chen, Nikhil Pitta, Ameesh Shah, Niklas Lauffer, Sanjit A. Seshia. PDF.
OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics. Peiqi Liu, Yaswanth Orru, Jay Vakil, Chris Paxton, Nur Muhammad Mahi Shafiullah, Lerrel Pinto. PDF.
Verifiably Following Complex Robot Instructions with Foundation Models. Benedict Quartey, Eric Rosen, Stefanie Tellex, George Konidaris. PDF.
LaNMP: A Multifaceted Mobile Manipulation Benchmark for Robots. Ahmed Jaafar, Shreyas Sundara Raman, Yichen Wei, Sofia E. Juliani, Anneke Wernerfelt, Ifrah Idrees, Jason Xinyu Liu, Stefanie Tellex. PDF.
Equivariant Open-vocabulary Pick and Place via Language Kernels and Patch-level Semantic Maps. Mingxi Jia, Haojie Huang, Zhewen Zhang, Chenghao Wang, Linfeng Zhao, Dian Wang, Jason Xinyu Liu, Robin Walters, Robert Platt, Stefanie Tellex. PDF.
Towards Adapting Reinforcement Learning Agents to New Tasks: Insights from Q-Values. Ashwin Ramaswamy. Ransalu Senanayake. PDF.
Cook2LTL: Translating Cooking Recipes to LTL Formulae using Large Language Models. Angelos Mavrogiannis, Christoforos Mavrogiannis, Yiannis Aloimonos. PDF.
Vote-Tree-Planner: Optimizing Execution Order in LLM-based Task Planning Pipeline via Voting Mechanism. Chaoyuan Zhang, Zhaowei Li, Wentao Yuan, Zhihao Zhao. PDF.
Find It Like a Dog: Using Gesture to Improve Robot Object Search. Ivy Xiao He, Madeline H. Pelgrim, Kyle Lee, Falak Pabari, Stefanie Tellex, Thao Nguyen, Daphna Buchsbaum. PDF.
Grounding Language Plans in Demonstrations Through Counterfactual Perturbations. Yanwei Wang, Tsun-Hsuan Wang, Jiayuan Mao, Michael Hagenow, Julie Shah. PDF.
A Minimalist Prompt for Zero-Shot Policy Learning. Meng Song, Xuezhi Wang, Tanay Biradar, Yao Qin, Manmohan Chandraker. PDF.
Organizing Committee
Jason Liu
Brown University, USA
Jason Ma
University of Pennsylvania, USA
Yiqing Xu
National University of Singapore, Singapore
Andi Peng
MIT, USA
Ankit Shah
Brown University, USA
Andreea Bobu
University of California Berkeley, USA
Anca Dragan
University of California Berkeley, USA
Julie Shah
MIT, USA
Dinesh Jayaraman
University of Pennsylvania, USA
Stefanie Tellex
Brown University, USA
Program Committee
Ever paper received at least two reviews. We thank the help from all our reviewers!
Alper Ahmetoglu
Akhil Bagaria
Ali Baheri
Yue Cao
Xuzhe Dang
Lakshita Dodeja
Haotian Fu
Ivy Xiao He
Benned Hedegaard
Ahmed Jaafar
Andrew C Li
Hongyu Li
Jason Xinyu Liu
Peiqi Liu
Ziqian Luo
Charles A Meehan
Karan Muvvala
Siddharth Nayak
Naman Shah
Shane Parr
David Paulius
Benedict Quartey
Eric Rosen
Ameesh Shah
Meng Song
Karthik Swaminathan
Zhanyi Sun
Christopher Thierauf
Yichen Wei
Jiaxu Xing
Yiqing Xu
Xinchen Yang
Chong Zhang