This tutorial aims to provide a structured introduction to the foundations and frontiers of Embodied AI (EAI), with a particular focus on the computational architectures behind robot motion generation. We begin by tracing the evolution of EAI models from early transformer-based agents to the latest vision-language-action frameworks employing diffusion and flow-matching techniques. The tutorial will dissect current action expert module architectures—token-based transformers and diffusion-based denoising models—and discuss their implications on dexterity, speed, and real-world deployment. In addition to covering state-of-the-art models such as RT-2, PaLM-E, and π-0, we explore unsolved challenges in scaling laws, OOD generalization, and real-world data bottlenecks. Finally, we invite participants into open discussions on the deeper theoretical gaps in EAI—what might be the “aerodynamics” of robotics—and how we might shape the next generation of general-purpose embodied agents. This tutorial is designed for researchers and practitioners seeking both foundational insights and forward-looking perspectives in Embodied AI.