R3: Reasoning for Robust Robot Manipulation in the Open World

Outline

Reasoning is crucial for developing robots capable of open-world manipulation. Humans learn to interpret the world through numerical and physical laws as well as logical principles, which begs the question: can we equip robots with the same capacity for reasoning? Numerous everyday manipulation tasks necessitate simple reasoning based on visual perception and natural language understanding. Open-vocabulary semantic segmentation models empower robots to handle diverse visual and linguistic inputs, providing a solid foundation that we can build upon to enable reasoning. The arrival of these models requires a new set of tools for practitioners to use and elicit reasoning capabilities from them, including prompt engineering, in-context learning, and fine-tuning. We will discuss how to formalize and codify these practices for both fundamental developers and applied practitioners.