Reasoning is crucial for developing robots capable of open-world manipulation. Humans learn to interpret the world through numerical and physical laws as well as logical principles, which begs the question: can we equip robots with the same capacity for reasoning? Numerous everyday manipulation tasks necessitate simple reasoning based on visual perception and natural language understanding. Open-vocabulary semantic segmentation models empower robots to handle diverse visual and linguistic inputs, providing a solid foundation that we can build upon to enable reasoning. The arrival of these models requires a new set of tools for practitioners to use and elicit reasoning capabilities from them, including prompt engineering, in-context learning, and fine-tuning. We will discuss how to formalize and codify these practices for both fundamental developers and applied practitioners.
University of Leeds
Korea Advanced Institute of Science and Technology
Monash University
Roma Tre University
Google DeepMind
University of Southern California
Carnegie Mellon University
University of Washington
The University of Queensland