Robotic Tasks and How to Specify Them? 

Task Specification for General-Purpose Intelligent Robots

Delft, Netherlands

July 19th, 2024

With the rise of foundation models, there has been a collective, accelerated pursuit of general-purpose robots. However, we are still far from developing truly versatile robotic systems. A key challenge lies in how we can effectively impart human knowledge about tasks to robots: the problem of task specification [1, 2, 3, 4, 5, 6, 7, 8]. Despite its importance, the idea of task specification has been largely fragmented in robotics research. Researchers from different disciplines choose particular formalisms of tasks in different environments (e.g., goal and subgoal poses, numerical rewards and discount factors, natural and formal language, images, and demonstrations). Furthermore, task specification is often tailored to or exploits a particular algorithm, environment, or inductive bias. A general-purpose robot that can operate in unstructured, human-centric environments must be able to understand new tasks and perceive feedback in various forms.


Consider this scenario: you wake up in the morning and ask your robot butler to make a cup of coffee. The robot must decipher the meaning of "a cup of coffee," identify required actions, detect relevant objects, and monitor task progress. Hang on, are object-centric representations the desired way to encode the scene? Perhaps operating on pixels or even voxels gives us more details, and opens up the potential for solving more sophisticated tasks. Moreover, the robot has to understand whether “a cup of coffee” is made successfully. How do we evaluate this? Based on what kind of input modalities? By comparing it to an image of coffee? Having a taste of coffee? Human feedback? Or some black-box classifier that can magically score the quality of the coffee?


From this simple example, we already see several notions of task specification emerge. To facilitate discussion, we break instances of task specification into two complementary groups: “formalisms” and “modalities”. The former specifies structured constructs that formally convey a task to a robot. This includes task objectives in the form of reward functions, demonstrations, or feedback, as well as additional factors such as the environment definition, robot capabilities, and other implicit task knowledge. The latter encompasses sensory modalities that ground and contextualize various specification “formalisms” to the particular robot environment, such as vision, touch, sound, language, gestures, eye-gaze, and physical interactions.


In this workshop, we are interested in developing a shared community-level understanding of task specification in various formalisms and modalities. These topics have been studied with different emphases in reinforcement learning [1, 9], human-robot interaction [2, 10], natural language processing [3, 4], formal methods [11], foundation models [5, 12], representation learning [6, 7], and cognitive science [8]. We hope to bring together these sub-communities, assimilate recent advances from diverse perspectives, and lead in-depth discussions toward developing a shared vision of the key open problems in the area.

Discussion topics:

What constitutes a task specification:

What are the characteristics that a task specification framework should have and how do we integrate them: 

Communicating task specifications with robots:

Compatibility of specification formalisms, sensory modalities, and algorithms:



Speakers and Panelists

Hadas Kress-Gazit

Cornell University, USA

Peter Stone

The University of Texas at Austin, USA

Jesse Thomason

University of Southern California, USA

David Abel

Google DeepMind, UK

Dagmar Sternad

Northeastern University, USA

Cédric Colas

 MIT, USA

Organizing Committee

Jason Liu

Brown University, USA

Jason Ma

University of Pennsylvania, USA

Yiqing Xu

National University of Singapore, Singapore

Andi Peng

MIT, USA

Ankit Shah

Brown University, USA

Andreea Bobu

University of California Berkeley, USA

Anca Dragan

University of California Berkeley, USA

Julie Shah

MIT, USA

Dinesh Jayaraman

University of Pennsylvania, USA

Stefanie Tellex

Brown University, USA