Composing Pre-Trained Object-Centric Representations for Robotics
From "What" and "Where" Foundation Models