Imitation Learning(IL)


Robot imitation learning enables robots to acquire task behaviors directly from expert demonstrations, making it an effective approach for learning complex manipulation skills with high data efficiency. Our lab studies imitation learning in the context of vision-based robot control and Vision-Language-Action models. In particular, we propose RetoVLA, an architecture that reuses discarded register tokens from ViTs and injects them into the action generation module to improve spatial reasoning. This approach has shown strong performance gains in real robot tasks that require complex spatial understanding. We also develop Depth-ACT, a framework that combines RGB and depth encoders to provide richer 3D scene information for imitation learning policies.

Through these efforts, our research aims to improve spatial awareness, data efficiency, and real-world applicability in robot imitation learning.