Featured Latest... 🌱🌷🍃

💎 Robotic VLA: We proposed a joint learning strategy with motion image diffusion that enhances VLA models with motion reasoning capabilities, by extending VLA into a dual-head architecture with a DiT-based motion head for language-conditioned optical flow prediction alongside the standard action head. Great job, Yu! 👍

💎 Video agent: The old passive video-perception setup just doesn't make sense anymore. Grabbing all visual info once, with fixed granularity and no query awareness, is inefficient and overloads the model. So we built Active Video Perception (AVP) — an agentic, evidence-seeking framework that treats a video like an interactive environment that you actively explore in a goal-directed manner. Check out my LinkedIn post. Excellent work, Ziyang!