Abstract
Physics Engines (PEs) are foundamental software frameworks that simulate physical interactions in applications ranging from entertainment to safety-critical systems. Despite their importance, PEs suffer from physics failures, deviations from expected physical behaviors that can compromise software reliability, degrade user experience, and potentially cause critical failures in autonomous vehicles or medical robotics. Current testing approaches for PEbased software are inadequate, typically requiring white-box access and focusing on crash detection rather than semantically complex physics failures. This paper presents the first large-scale empirical study characterizing physics failures in PE-based software. We investigate two research questions addressing the manifestations of physics failures, and the effectiveness of detection techniques. Our contributions include: (1) a taxonomy of physics failure manifestations; (2) a comprehensive evaluation of detection methods including deep learning, prompt-based techniques, and large multimodal models; and (3) recommendations for improving detection approaches.
we arrived at a final dataset, PhysiXFails, comprising 1,000 video clips. These include 500 instances where the game adhered to real-world physical laws (non-buggy videos) and 500 instances of physics-related bugs (buggy videos). The non-buggy videos were all sourced from YouTube, while the buggy videos were distributed across multiple platforms: 429 from YouTube, 54 from Reddit, and 17 from the GlitchBench dataset.
Results
Here are a series of data results from the test methods we used.
Code
Here we have released our relevant code.
💻 Code
Performance on Complex Multi-Violation Cases
Figure present performance on physicsbenchmulti, a more challenging subset containing videos with multiple simultaneous physics violations. The results reveal interesting patterns in how different approaches handle this increased complexity.
Performance Summary of Violation Detection (VD) on physicsbenchmulti. This figure compares different methods based on F1 Score (x-axis) and AUC-ROC (y-axis) for VD tasks on physicsbenchmulti. Higher values indicate better performance, with the upper-right quadrant representing the most effective models. Triangles denote multi-violation cases, while squares represent single-violation cases.
Performance Summary of Violation Identification (VI) on physicsbenchmulti, This figure compares different methods based on F1 Score (x-axis) and AUC-ROC (y-axis) for VI tasks on physicsbenchmulti. Higher values indicate better performance, with the upper-right quadrant representing the most effective models. Triangles denote multi-violation cases, while squares represent single-violation cases.
Details of Manifestation Categories
In Section 4.2 of the paper, we provide a detailed description of each manifestation category, along with examples to enhance understanding. Here, we show examples of all categories.
Weightlessness: Objects remain stationary, move uniformly, or drift erratically in midair without any external force, violating the principle of gravity. For example, Figure shows a police car hovering above a city street without visible support.
Anti-Gravity: Objects accelerate upward without any external force acting against gravity, contradicting the expected direction of gravitational pull. For example, Figure depicts a cup resting on a balcony suddenly floating upwards without cause.
Delayed Gravity Effect: Gravity is delayed or temporarily suspended, causing objects to remain suspended in the air for a period before falling, which defies the principle of instantaneous gravitational influence. For example, a person floating in midair before eventually beginning to fall.
Clipping-Through (Phantom Overlap): Objects fail to register collisions upon contact, resulting in mutual penetration or embedding. This behavior violates the law of impenetrability. For example, Figure shows a car driving through and embedding itself into the road surface.
Spontaneous Rapid Spinning: Objects begin to spin rapidly without the presence of external torque or energy input, violating the conservation of angular momentum and mechanical energy. For example, a cone-shaped barrier that spontaneously begins spinning at high speed without any external force.
Sudden Appearance or Disappearance: Objects materialize or vanish without following the law of conservation of mass and matter. Figure presents a scenario where a person intermittently appears and disappears in front of an autonomous vehicle’s imaging system.
Spontaneous Motion: Objects accelerate or decelerate without any external force, violating Newton’s First Law of Motion. For example, person running inside a stationary kayak causes the kayak to move forward, incorrectly treating internal forces as external.
Uncaused Directional Change: Objects change directio n abruptly without external force, violating the principle of inertia.Figure shows a bowling ball abruptly swerving from side to side while rolling forward.
Instantaneous State Transition: The movement or position of an object undergoes a sudden, discontinuous jump, violating the continuity of space and time. For example, Figure depicts a soldier teleporting instantaneously to a distant location.
Incorrect Force-Acceleration Relation: The net force applied to an object does not match the product of its mass and acceleration, violating Newton’s Second Law. For example, a person kicks a heavy truck, causing the truck to fly away, indicating an incorrect force calculation.
Biomechanical Failures: Biomechanical failures occur when the motion, posture, or anatomy of biological entities violate fundamental biological principles. For example, Figure shows a pit crew member in a racing game missing their head, violating basic anatomical rules.
Buoyancy Violations: Buoyancy, governed by an object’s density and fluid properties, is violated when objects do not float or sink in accordance with these factors. Figure illustrates a person appearing to float on the ocean’s surface, contradicting the expected behavior based on buoyancy laws.
Fluid Dynamics Failures: Fluid dynamics involves the interaction of liquids with objects and other fluids. Physics failures in this area occur when liquid motion or interaction defies physical laws governing fluid behavior. For example, a person walks into a lake, but the water's surface shows no ripples or disturbance.
Aerodynamics Violations: Aerodynamics governs the motion of objects in air, including drag, lift, and fluid flow. Failures in aerodynamics are observed when objects behave in ways that contradict these principles. Figure shows a helicopter flying inverted between mountains, defying the laws of lift and drag.
Thermodynamic Anomalies: Thermodynamics relates to heat transfer, energy conversion, and changes in entropy. Failures in thermodynamics occur when systems violate these laws. For example, a person on fire suddenly extinguishing without any external cause, violating conservation of energy and heat transfer principles.
Optical Physics Violations: Optical physics governs the behavior of light, including reflection, refraction, and linear propagation. Failures in this domain include violations of these principles, such as incorrect light propagation. Figure shows a telescope aimed at the sky displaying an image inside that does not match the external view.
Structural and Material Mechanics Failures: Structural and material mechanics govern how materials behave under stress and deformation, guided by principles such as Hooke’s Law and Young’s Modulus. Failures in this area include unrealistic material deformations under force. For example, a car’s body continuously twisting and deforming while driving, violating principles of structural integrity and elasticity.