We aim at enhancing the accuracy and efficiency of physical simulations by leveraging advanced computer vision techniques. The project focuses on developing a comprehensive system that integrates spatial perception capabilities to capture and interpret real-world environments in three dimensions. By utilizing cutting-edge algorithms and machine learning models, the system can accurately reconstruct spatial layouts and dynamically track objects within a scene. This rich spatial data is then used to drive highly realistic physics simulations, enabling applications in fields such as robotics, augmented reality, virtual reality, and autonomous systems. The ultimate goal is to create a seamless interface between the digital and physical worlds, allowing for more intuitive and precise interaction with complex environments.