This project develops a software-first pipeline for a hand-launched fixed-wing glider that autonomously returns to its launch point while maximizing airtime and/or path length. An RL guidance policy is trained in a physics-based 6-DOF simulator with wind, actuator limits, and sensor noise/dropouts, using domain randomization for sim-to-real robustness. The deployed system is hierarchical: an ESP32 executes real-time stabilization and safety enforcement, while a Raspberry Pi 5 runs RL inference and navigation at a lower rate. Sensing uses GPS and IMU for state estimation and a downward LiDAR for height-above-ground constraints. This Term 1 report specifies requirements, architecture, and an evaluation plan for simulation, training, and embedded deployment.