Robotic Tasks and How to Specify Them?
Task Specification for General-Purpose Intelligent Robots
Delft, Netherlands
July 19th, 2024
With the rise of foundation models, there has been a collective, accelerated pursuit of general-purpose robots. However, we are still far from developing truly versatile robotic systems. A key challenge lies in how we can effectively impart human knowledge about tasks to robots: the problem of task specification [1, 2, 3, 4, 5, 6, 7, 8]. Despite its importance, the idea of task specification has been largely fragmented in robotics research. Researchers from different disciplines choose particular formalisms of tasks in different environments (e.g., goal and subgoal poses, numerical rewards and discount factors, natural and formal language, images, and demonstrations). Furthermore, task specification is often tailored to or exploits a particular algorithm, environment, or inductive bias. A general-purpose robot that can operate in unstructured, human-centric environments must be able to understand new tasks and perceive feedback in various forms.
Consider this scenario: you wake up in the morning and ask your robot butler to make a cup of coffee. The robot must decipher the meaning of "a cup of coffee," identify required actions, detect relevant objects, and monitor task progress. Hang on, are object-centric representations the desired way to encode the scene? Perhaps operating on pixels or even voxels gives us more details, and opens up the potential for solving more sophisticated tasks. Moreover, the robot has to understand whether “a cup of coffee” is made successfully. How do we evaluate this? Based on what kind of input modalities? By comparing it to an image of coffee? Having a taste of coffee? Human feedback? Or some black-box classifier that can magically score the quality of the coffee?
From this simple example, we already see several notions of task specification emerge. To facilitate discussion, we break instances of task specification into two complementary groups: “formalisms” and “modalities”. The former specifies structured constructs that formally convey a task to a robot. This includes task objectives in the form of reward functions, demonstrations, or feedback, as well as additional factors such as the environment definition, robot capabilities, and other implicit task knowledge. The latter encompasses sensory modalities that ground and contextualize various specification “formalisms” to the particular robot environment, such as vision, touch, sound, language, gestures, eye-gaze, and physical interactions.
In this workshop, we are interested in developing a shared community-level understanding of task specification in various formalisms and modalities. These topics have been studied with different emphases in reinforcement learning [1, 9], human-robot interaction [2, 10], natural language processing [3, 4], formal methods [11], foundation models [5, 12], representation learning [6, 7], and cognitive science [8]. We hope to bring together these sub-communities, assimilate recent advances from diverse perspectives, and lead in-depth discussions toward developing a shared vision of the key open problems in the area.
Discussion topics:
What constitutes a task specification:
E.g., task objective, environment definition, robot’s capability.
What do we need to explicitly represent vs. implicitly learn from human data (e.g., instructions, demonstrations, preferences, etc.)?
What are the characteristics that a task specification framework should have and how do we integrate them:
E.g., expressivity, verifiability, unambiguity, compactness, and compositionality.
What are the advantages and drawbacks of existing forms of task specification?
Can we invent a universal task specification language?
Communicating task specifications with robots:
How do humans communicate tasks? How can human-human communication inform human-robot communication?
How to quantify the expertise-intensiveness of each form of task specification?
What are “proper” ways for humans to communicate task knowledge with robots that ensure alignment with human intents?
Compatibility of specification formalisms, sensory modalities, and algorithms:
Are there inherent limitations to some modalities used in task specification? Will using different modalities help? If so, how can we combine them effectively?
How does a good task specification inform data collection in robotics?
How to design efficient supervision techniques that allow robots to learn from less data or ambiguous information? How can robots express uncertainty over tasks?
What are the suitable algorithms for different task specifications?
Speakers and Panelists
Hadas Kress-Gazit
Cornell University, USA
Peter Stone
The University of Texas at Austin, USA
Jesse Thomason
University of Southern California, USA
David Abel
Google DeepMind, UK
Dagmar Sternad
Northeastern University, USA
Cédric Colas
MIT, USA
Organizing Committee
Jason Liu
Brown University, USA
Jason Ma
University of Pennsylvania, USA
Yiqing Xu
National University of Singapore, Singapore
Andi Peng
MIT, USA
Ankit Shah
Brown University, USA
Andreea Bobu
University of California Berkeley, USA
Anca Dragan
University of California Berkeley, USA
Julie Shah
MIT, USA
Dinesh Jayaraman
University of Pennsylvania, USA
Stefanie Tellex
Brown University, USA