RVPLAYER is a novel forensic analysis technique for robotic vehicles (RVs). It supports replay with what-if reasoning inside simulator (e.g., checking if an accident can be avoided by changing some control parameter, code, or vehicle states). It is a low-cost replacement of the expensive field test based forensics. It features an efficient demand-driven adaptive logging method to capturing non-deterministic physical conditions, and a novel replay technique supporting various replay policies that selectively enable/disable information during replay.
This webpage provides supplementary materials for our paper:
Hongjun Choi, Zhiyuan Cheng, Xiangyu Zhang. "RVPLAYER: Robotic Vehicle Forensics by Replay with What-if Reasoning", Network and Distributed Systems Security (NDSS) Symposium 2022 27 February - 3 March 2022, San Diego, CA, USA
A poorly designed logging system that does not fully consider CPS's characteristics could cause missing important information (i.e., forensic evidence) at runtime. We observed such problems in our target systems. For example, critical data is missing because of buffer overflow, preemption of a logging task, stale log removal, log halt, and low resolution. We elaborate on each case below.
Buffer overflow: Because of limited computing resources and high real time requirements, IO tasks with large overhead and lower priority will have less execution time. As such, log data in a buffer cannot be written to SD card timely, and subsequent data is dropped until the current buffer gets written out and cleared (See Figure A1).
Preempted log task: To keep a constant main loop frequency, time for each iteration is constrained. In some iterations with more tasks scheduled, time-consuming ones will be skipped together with log requests in such tasks (See Figure A2).
Stale log removal: When it is out of space, Ardupilot deletes stale log entries to keep a 10% free space while powering on (See Figure A3).
Log halt: While the system is in operation, if the disk is out of space, Ardupilot's logging module will directly stop logging (See Figure A4).
Low resolution: To mitigate the above unpredictable missing log problems, some systems reduce log frequency, leading to low resolution and providing a new surface for advanced attacks like inter-sample attacks.
Due to the first two reasons, even we have enough memory space, some logs will still be missing in the final log file. The more log options we turn on, the more data will be missing. Table S1 shows the log missing ratio in 3DR Solo under different log options. When using the default logging configuration, few data are missing but in this case the logging frequency of some important states is too low compared with the 400Hz main frequency. For instance, attitude angles are logged at 10Hz, IMU data are logged at 25Hz and positions are logged at 35Hz. When the PID option (PID controllers data) is turned on, the missing rate doesn't increase much because the logging frequency is still low. However, if fast attitude logging (i.e. ATT_FAST) is turned on to make the logging frequency of attitude data 400Hz, 66.33% data points are missing. If PID is further enabled, the situation becomes worse than 85.06% data points are missing. From the table, it's easy to see that the bottleneck of logging in 3DR Solo is under 500 data points per second. More requests would be dropped.
Figure A1: AP_Logger_File.cpp Buffer overflow problem
Figure A2: AP_Scheduler.cpp Preempted task problem
Figure A3: AP_Logger_File.cpp Stale log removal problem
Figure A4: AP_Logger_File.cpp Log halt problem
Table A1: Logging Rate (Unit: Data Points/s) and Missing Rate in 3DR Solo
To facilitate understanding the quadrotor model, we first introduce the coordinate system in which states are quantified. The state values are usually represented in two different frames (coordinate systems), as shown in Figure B1. The first one is the East, North, Up (ENU) inertia frame of reference (or the earth frame) at the left-bottom corner. The other is the body coordinate frame at the center of the vehicle, with the vehicle nose pointing XB, the left body pointing YB, and the top pointing ZB. With the coordinate frames, the vehicle's dynamics are described using the standard Newton-Euler equations as follows.
Quadrotor states s is a vector [x, y, z, φ, θ, ψ, ẋ,ẏ,ż, p, q, r], with vector [x, y, z] representing the position coordinates of the vehicle in the inertia frame, [φ, θ, ψ] denoting the roll, pitch, yaw Euler angles in the inertia frame, [ẋ,ẏ,ż] representing velocities in the inertia frame, and [p, q, r] denoting angular velocities in the body frame. Wi denotes the square of the i-th motor's angular velocity, which is determined by a control program. The equation includes parameters that are determined by the vehicle's physical properties; KT is a lift (thrust) constant, KQ a drag constant, Ix, Iy, Iz inertia values, la, lb, lc, ld distances between the rotors and the center of mass m of the quadrotor. Intuitively, the first three equations denote how to project angular velocities in the body frame to those in the inertia frame. The next three equations describe the relations between force (generated by the rotors) and accelerations. Note that symbol ẍ denotes the second order derivative of x and hence acceleration along the X direction. The last three equations denote that relations between force and angular accelerations. Note that accelerations determine velocities, which in turn determine positions and angles. The parameters (e.g., KT and Ix) in the equations are static and vehicle-specific. Therefore, for a target vehicle, their values are identified through SI.
Figure B1: A Coordinate Frame of a Quadrotor (3DR Solo)
Figure B2: Quadrotor Dyanmcis Equation
To build a model for a specific vehicle, we need to concretize parameter values of the given mathematical equations. To this end, we utilize the MATLAB System Identification Toolbox. The procedure works as follows. First, the model equations are written in the MATLAB language directly or a low-level programming language such as C/C++ in a MEX file, enabling us to call functions written in other languages. The file is then loaded and executed by the SI Toolbox, and an idnlgrey model object (i.e., nonlinear grey-box model object) is created from the file. The user can execute a Matlab build-in function, nlgreyest with initial states, initial parameters, and operational data (i.e., observed actuation and state data). It searches for the best parameter set that minimizes errors between model outputs and the profile data. We use a constrained nonlinear optimization solver fmincon provided by the Matlab Optimization ToolBox. It offers better efficiency and accuracy for bounded parameters and MIMO (multiple inputs multiple outputs) models, and is hence suitable for our purpose.
Table B1 shows the example parameters identified for the X-shape quadcopter, 3DR Solo. Note that the model with the estimated parameters does not consider unusual environmental conditions (e.g., wind gust) or malicious attacks. When a vehicle encounters an unexpected condition, model responses would substantially deviate from real states. We leverage this observation to perform adaptive logging to effectively record unusual physical disturbances. Modeling using SI is a standard procedure and used in many existing works. We mainly use the same procedure and do not claim any novelty. We include this step for completeness.
Table B1: Identified Parameters (3DR Solo)
Figure B3 illustrates a model of a four-wheel two-axis vehicle. The vehicle's dynamics is described using Newton's law of motion equations as follows.
Here, the vehicle has 6 states [x, y, ψ, vx, vy, r]; x and y are the positions with respect to the fixed reference frame (i.e., earth frame), vx and vy the longitudinal and lateral velocity, and ψ and r the yaw and yaw rate measured around the Center Of Gravity (COG) of the vehicle. u is an input signal vector that consists of [sFL, sFR, sRL, sRR, 𝜹]. sx is the tire slip derived from the speed of each wheel, denoted as FL, FR, RL, RR and 𝜹 the steering angle. Intuitively, the first three equations describe how to project linear and angular velocities in the body frame to those in the inertia frame. The next three equations describe the relations between force (generated by wheels) and accelerations.
The equation includes parameters that are determined by the vehicle’s physical properties: m is the mass, J a moment of inertia, a and b the distances from the center of gravity, Cx and Cy the longitudinal and lateral tire stiffness, and CA air resistance, respectively. The SI method identifies their values; Table B2 sows the example parameters identified via grey-box system identification for Erle-Rover.
Figure B3. Four Wheel Vehicle Model
Figure B4 Four Wheel Vehicle Dynamics Equation
Table B2. Idnetified Parameters (Erle-Rover)
ArduCopter determines a drone is crashing if the angle error is larger than 30 degrees for 2 seconds. These thresholds are provided by control software developers based on their intuitions. As such, they may not be well thought out. For example, a crash with an object may cause a drone to tilt more than 30 degrees, but it may not last for 2 seconds. In CPI[ccs2020choi] paper, it was reported that defective checks could lead to false warnings (or false positives) and failure to activate counter-measures in real accidents (or false negatives). These defective conditions can be exploited by fabricated environmental conditions.
Figure C1 shows a highly simplified crash check function of ArduCopter adopted from the original paper. It regularly executes multiple if-statements and determines whether a vehicle encounters a crash. If all the conditions are satisfied, it disarms the motors as a countermeasure. The checks at lines 14, 18, and 24 are potentially problematic, leading to a false negative.
Video C1 Figure shows how the function misses a crash under a crafted exploit condition, i.e., a heavy wall causing a side collision with a 30-degree angle. When the vehicle crashes into the wall, the code is supposed to detect it and disarms the motors to avoid subsequent crashes (e.g., hitting a nearby person). However, the checks fail to detect the crash, causing the vehicle to continue flying out of control.
Figure C2-(a) shows the attitude changes during the crash. To identify the root cause of false negative, that is, identify the check(s) that fail to detect the crash, RVPlayer searches for minimum threshold changes that allow the RV to detect the crash during replay.
In this case, after 144 rounds of replay RVPlayer identifies the third (line 14) and fifth (line 24) conditions are the root cause since the mutated conditions ( CRASH_CHECK_ACCEL_MAX to 6 and CRASH_CHECK_TRIGGER_SEC to 0.8) can successfully detect the crash without affecting normal operation (before the crash).
Figure C2-(b) shows the attitude changes during replay with mutation (see Video C2). Observe the angles become stationary after the crash checks successfully detect the crash and disarm motors at the 28th second.
Figure C1: Simlified Crash Check Function in ArduCopter
Video C1: Crash Detection Failed (False Negative): crashing into a wall and multiple subsequent crashes without warnings or proper stop
Figure C2: Traces of what-if anlaysis. (a) crash and failed detection (left), (b) replay with mutation (right)
Video C2: Replay with Mutation.
A parameter list used in the real parameter tampering attack of ArduPilot.
Table: A List of Updated Parameters
Parameter Tampering Attack
We reproduce 12 parameter tampering attacks, with 6 on SimQuad, 2 on SimRover, 2 on Solo, and 2 on Erle-Rover. We tamper with different sets of parameters for each attack. The attacks for the simulated vehicles are performed on two different missions: the square route mission (M1) and the zig-zag route mission (M2). The attacks for the physical vehicles are performed on a mission where the vehicles go straight and make a turn. In each attack, besides the malicious parameter updates, we also perform a few normal parameter updates to test if RVPlayer can isolate the root causes. In addition, we include 4 accidents that are caused by natural perturbations (i.e., not by compromised parameters). We also perform parameter updates in these accidents. The goal is to check if RVPlayer reports false positive parameter tampering. Parameter updates are done by sending messages, which are recorded and can be disabled/enabled during what-if analysis.
Short-duration Sensor Spoofing Attack
We reproduce 7 short-duration spoofing attacks. We attack the barometer, rangefinder and IMU sensors of SimQuad, the barometer and IMU sensors of Solo, and the IMU sensor of SimRover. As in the original works, we simulate these attacks by inserting a piece of attack code at sensor interfaces inside the control programs and manipulate sensor measurements maliciously.
Gradual Sensor Spoofing Attack
We reproduce 16 gradual attacks. We attack the GPS, barometer, rangefinder, gyro, accelerometer, and compass sensors of SimQuad, the GPS, gyro, accelerometer, compass sensors of SimRover, the GPS sensor of Solo. We attack one sensor at a time. These attacks use the same attack code as in the short-duration spoofing attacks, but increase inserted values gradually from small initial ones.
MSF Attack
We reproduce MSF attack with GPS spoofing. The attack particularly aims to defeat sensor fusion in auto-driving cars such that a long-term GPS spoofing can succeed. MSF works by dynamically adapting weight values for different sensors. The attack identifies moments that the weight values for GPS are large such that the system is particularly vulnerable to GPS spoofing. Following the same setup in the original paper, we leverage their trace based attack method that uses the original sensor traces to generate attacks. Intuitively, it synthesizes sensor measurements corresponding to reactions of vehicles under the attack. The control algorithm calculates the amount of control steering based on the attacked MSF outputs. The control steering is then mathematically converted to state changes and added to real-world sensor measurements to reflect the attack consequences. We use the 1146 attack traces generated from 5 real-world traces of 2 autonomous driving cars from the original paper. These traces are for different road types. The attack traces are generated with different start-times. The magnitude of spoofing is different for different start-times, even for the same vehicle.
Defective Safety-check Attack
We reproduce 3 safety-check attacks: a crash-check attack in ArduCopter, a crash-check attack in APMRover2, and a ground-check attack in PX4. In these attacks, malfunctions of safety-check functions are triggered by crafted environmental conditions such as walls, pedestrians, and wind gusts with certain conditions.
Space Overhead
Since RVPlayer instruments the controllers, it induces static space overhead. It enlarges the firmware size by 0.10% to 0.28% on average. Details are elided.
RVPlayer's dynamic space overhead is mainly due to logging. Table D1 shows the results on different vehicles with multiple missions and environmental effects. Originally (without adaptive logging), Ardupilot-based drones have the space consumption rate of 4.04GB/day and APMRover-based rovers have the rate of 0.35GB/day. After adaptive logging, RVPlayer reduces more than 90\% of space consumption.
Note that the reductions depend on external disturbances. Stronger environmental effect variances allow less reduction.
Our experiments simulate the normal operation scenarios (that is, not under extreme conditions). We collect the data for real vehicles in regular weather conditions and use the built-in noise model to simulate regular real-world environments for simulated vehicles.
Runtime Overhead
For the runtime overhead, we report the ratio of RVPlayer's execution time to the execution time of each loop iteration. For simulated vehicles, since RVPlayer's execution time is negligible (less than 1 microsecond) because of the use of powerful CPU, we thus use Callgrind to analyze the ratio of RVPlayer's instructions to the instructions executed in a main loop iteration.
We measure the runtime overhead for missions of different complexities with different levels of disturbances. Table D2 shows the results. On the simulated vehicles, the runtime overhead introduced by RVPlayer is under 5%. Because of the use of powerful desktop CPU, the overhead is marginal (less than 1 microsecond) compared with the iteration time of the main loop. On the real vehicles, RVPlayer requires slightly more time because of the limited CPU performance. The overhead is lower than 8%. RVPlayer doesn't reduce main loop frequency or affect controller's performance. Additionally, we measure the CPU utilization rate for the real vehicles to see whether RVPlayer interferes with other tasks. Since RVPlayer is added to the main loop and executed with the highest priority, we can determine our module does not impact normal operations by observing its execution time fits into the remaining time for each loop. After RVPlayer is enabled, the CPU utilization rate increases from 55.19% to 57.07% and 20.03% to 20.46% for 3DR Solo and Erle-rover respectively. The observed overhead is marginal and easy to be accommodated in practice.
Table D1: Space Overhead (Unit: GB/Day)
Table D2: runtime Overhead
We log states at the highest frequency possible (i.e., every control loop iteration), in addition to the replay log. We then compare the replayed executions with the detailed operation traces. We also evaluate a baseline in which we replay without using the recovered disturbances. Table 1 shows the errors in position (in meters) and attitude (in degrees). We report two kinds of errors, the average errors of whole missions, and the average errors for duration with non-trivial variations. We partition a mission into 3s time windows and consider those with variations larger than the variation of the whole mission.
Observe that mean position error of RVPlayer is less than 0.73 meter and the mean attitude error of reproduction is less than 6.4 degrees across all target vehicles no matter in the whole mission or the high-variance duration. In other words, we successfully reproduce the original traces with marginal errors with space consumption less than 10% of original traces. In comparison, the baseline's mean position and attitude errors are 364.8% and 290.5% larger for the whole mission, and 361.3% and 453.6% larger in the high variance durations.
Next, we use a case study to illustrate the effectiveness of our trace reproduction.
Figure 1 shows comparison between the original trace and the replayed traces for a mission of Solo. The mission has a square shape trajectory with each edge of 30 meters and 4 waypoints. Observe that the errors by RVPlayer are much smaller than those by the baseline, especially in latitude and longitude positions, and roll and pitch angles.
Table 1: Replay Errors in Position and Attitude
Figure 1: Comparison between original trace and replayed traces in a complex mission of 3DR Solo. Replaying without captured disturbance is a baseline.
This attack is from a previous GPS attack paper. As in the original paper, we add a gradually growing offset value from 0 to 32m over a duration of 15s to the longitude value of GPS signal.
Note that a small constant offset does not have persistent effect on the vehicle whereas a large instantaneous offset can be easily detected.
We launched the attack in a straight-line mission (Video E5). From Figure E5, we can see the planned trajectory and the actual (i.e., deviated) trajectory under attack. We add a positive offset to the longitude signal (towards the right in Figure E5). Thus the drone deviates to the left to compensate it. The attack is started at 16.12s.
The vanilla replay in Figure E5 shows that the reproduced position does not match the accident position, which indicates some sensor is spoofed. In the spoofing analysis, RVPlayer first validates GPS by comparing the recorded GPS position with the accident position and concludes that GPS is spoofed because the two do not match. Next, RVPlayer leverages the recorded IMU data to identify the start time of GPS spoofing through the second derivative analysis. We trust IMU because we assume IMU and GPS are not spoofed at the same time. Figure E6 presents the results. Figure E6-a shows the recorded position and the replayed longitude position using IMU, as well as the error between them. Figure E6-b shows the second derivative of the replay error. Observe that there is a significant peak at 17s, which indicates the start time of spoofing.
Video E5: Gradual GPS Spoofing Attack
Figure E5: Real Trajectory under the Attack and Replayed Trajectory
Figure E6: Spoofing Analysis. (a) Longitude Position (left), (b) Second Derivative of Error (right)
Figure E1 shows the crash check code in APMRover2. It regularly executes multiple if-statements with thresholds and determines whether a rover encounters crashes. If all the conditions are satisfied, it will make an alert and disarm motors. Note that vanilla replay would require significant manual efforts to identify which condition can be problematic.
To show a malfunction of the code, we reproduced the crash in simulation, which leads to a wrong decision. Video E1 shows crash simulation. When the vehicle hits a person, the code is supposed to detect the crash and launch an emergency stop to avoid further damage or subsequent crashes. However, the code fails to detect it and leads to further damage to pedestrians (i.e., crash injury).
To diagnose the malfunction of the safety check, an investigator uses~\project{}. Given the code (i.e., a list of if-statements), he replays the crash multiple times in simulation with changes to identify which code leads to malfunctioning (i.e., missing crash detection). Specifically, RVPlayer uses a genetic algorithm to find problematic if-condition(s) with threshold changes that can prevent the malfunction. It finds the minimal set of changes which is the root cause.
In the above case, after 6 evolutionary iterations (population size N = 30), RVPlayer determines the conditions including CRASH_CHECK_VEL_MIN and CRASH_CHECK_TRIG_SEC which preclude the malfunction (i.e., successfully detect the crash) with the changed values (0.3 and 1.0, respectively) as the root cause.
Figure E2 shows a comparison of motor signals during the original run (left) and the replayed run (right). After replay with changed predicates, the crash check codes launch an emergency stop successfully as shown in Video E2.
Figure E1: Simplified Code of Crash Checker in APMRover2
Video E1: Crash Detection Failure: a rover crashes into pedestrians and continues to drive without safety countermeasure
Figure E2: What-if analysis of a defective crash checking code. (a) motor signals of the failed case (left), (b) replay with mutation (right)
Video E2: Replay with mutation. Crash check codes launch an emergency stop successfully.
This case study illustrates that RVPlayer diagnoses a code defect attack via replay with what-if analysis. We use a defective safety-check function in PX4, which has a cyber-physical inconsistency vulnerability.
Figure E3 shows the simplified code of a safety check function from the PX4 controller. It checks if a vehicle contacts the ground. If all the conditions are satisfied, it raises a warning and launches a counter-measure action (e.g., emergency landing). Note that there are multiple if-statements that include several parameters and variables interacting with external conditions. By investigating the program code alone, it is difficult to determine the root cause of the malfunction.
Video E3 illustrates the ground contact situation of the drone in simulation. We inject a wind gust (26N force for 10 seconds) to push the drone down, and the wind force lets the drone contact the ground instantly. However, the code fails to detect the contact (i.e., no alert), and when the wind gust is stopped (i.e., the wind force is removed), it suddenly flies away due to the full thrust.
As discussed in Section IV-A, RVPlayer replays with different mutations of candidate predicates. It aims to find the minimum mutations that allow the vehicle to detect the ground contact and issue the warning.
It takes RVPlayer 48 replay runs until it finds that changing MPC_LAND_SPEED from 0.5 to 0.7 allows detecting the ground contact without affecting normal operation. Intuitively, it means that the changed parameter relaxes a condition of the ground contact detection, which determines no vertical movement. The original condition determines there is vertical movement because of a small bounce caused by the hit to the ground and hence fails the detection.
Figure~ E4 shows the altitude change in the original failure and the replay with the mutation. Observe in (b), the landing mode is activated (see Video E4).
Figure E3: Simplified Code of Ground Contact Detection in PX4
Video E3: Ground Contact Detection Failure: the drone hits the ground due to a strong wind gust but no (ground contact) warning reported.
Figure E4: What-if analysis of a defective checking code. (a) altitude of the failed case (left), (b) replay with mutation (right)
Video E4: Replay with mutation. Crash check codes launch an emergency stop successfully.
We investigate state-of-the-art RV attacks with our technique to find attack root causes.
APT-style parameter tampering attacks
Gradual and short-duration sensor spoofing attacks
Defective code attacks (exploiting malfunctions of safety-check functions)
The following videos show selected attacks and replays to demonstrate the effectiveness of our technique for attack investigation in the simulated and real-world environment.
In this experiment, we additionally conduct an advanced parameter tampering attack with multiple root causes and show that how the combination of multiple parameters, not a single root cause, makes the search process of RVPlayer difficult. While SimQuad performs our M1 mission, we update total 10 parameters, including 3 malicious (root cause) and 7 benign parameter updates. In this case, the combination of all the three malicious parameter updates together actually causes a crash. Any partial combinations do not cause the same accident.
The following table summarizes the results. Observe that RVPlayer still successfully identifies the combination as the root cause, with reasonable search time. Compared with the analysis results of two-parameter attack (A4) and one-parameter attack (A1) in Table III of the paper, the search time increased by 18 min and 29 min (by 28% and 48%), respectively. The result indicates that even if an advanced attacker may exploit our algorithm to find combinations of multiple parameters to evade our search technique, RVPlayer can successfully identify the minimum combination of multiple malicious parameters with acceptable increased search time.
* malicious parameters
Table: Multi root causes attack analysis result
Canyon-like urban environments (e.g., dense, tall buildings) affect various conditions of drone operations, including wind, air quality, and radio reception. In particular, GPS signals are susceptible to these effects, and thus urban canyons can cause significant measurement errors of GPS and degrade GPS accuracy. To mitigate such GPS glitches, modern drones support GPS protection features in real-time flights such as EKF (Extended Kalman Filter) based error collection.
The demo videos show simulated urban canyon effects on GPS (GPS signal loss) and how our subject drones respond under the bad GPS conditions. The real trajectories are reproduced from the logs recorded at runtime.
Real trajectory. Solo1 performs the multi-waypoint mission. We remotely trigger GPS loss between the waypoint 4 and 5.
Real trajectory. Solo2 performs the multi-waypoint mission. We remotely trigger GPS loss right after the drone passes the waypoint 5.
We additionally evaluate the urban canyon effects in different settings (e.g., effect duration and GPS protection enabled/disabled) with real drones to see how these systems respond to GPS glitches. We perform the test flights with multi-waypoint missions in an open and safe space.
Table G1 shows the summary of the experiments for different durations of GPS loss on two drones. We report the maximum deviation, consequence, and whether a mission is completed. In general, Solo1 performs better than Solo2 since the GPS protection module mitigates the GPS loss effects, and thus the deviations are smaller than those of Solo2. For short durations of GPS loss (i.e., <3 sec), both systems have small errors and continue to operate after regaining the GPS signal. For the longer GPS loss (i.e., >5 sec), Solo1 triggers safe-landing due to the EKF-based fail-safe function, while Solo2 drifts away.
Table G1: Urban Canyons Effects (GPS Signal Loss) with/without GPS Protection
In this experiment, we show that how replay errors and log reduction rates change with different values of Emax in our adaptive logging, compared with a threshold-based logging with different values of threshold Ethr. Figure H shows the result. We use Em to denote the maximum error in our profile runs as defined in Section III-B2 in the paper. We set Emax or Ethr to different multiples of Em and measure the logging ratio (the lower the better) and the average reproduction error of roll angles (the lower the better) in a complex mission for Solo. Note that larger Emax and Ethr values lead to lower logging rates and higher reproduction errors.
Comparing the adaptive logging and the threshold-based logging, the reproduction error is not sensitive to Emax in the adaptive logging but sensitive to Ethr in the threshold-based logging, and RVPlayer has smaller reproduction errors than the threshold-based logging while producing smaller logs.
Figure H: Log ratio and replay error using different Emax in adaptive logging and Ethr in threshold-baed logging
In this experiment, we show that how model accuracy impacts the performance of RVPlayer. We artificially inject different levels of random noises into our model to test the impact of varying model accuracy. The added noise follows a normal distribution with zero means and various standard deviations from 0.5 Emax to 2 Emax with a step of 0.5.
Figure I shows the log ratio and replay error changes with different noises (thus, different model accuracy). The noise Level 0 indicates our model, and Level 4 denotes a model with the highest noise level. Observe that the more accurate model (i.e., the lower noises) yields improvement on log reduction (i.e., the smaller log ratio).
Meanwhile, an inaccurate model increases the prediction errors, which are then recorded as external disturbances in part. However, as shown in the figure, the replay errors are similar regardless of the different levels of model accuracy, since recorded external disturbances compensate model errors during vanilla replay. Therefore, model accuracy determines the log reduction effectiveness mainly. Note that the inaccurate model (thus, inaccurately recorded disturbances) will significantly affect the fidelity of what-if analysis. As shown in Figure~\ref{fig:eval_replay_comp}, disabling one factor can lead to substantial errors.
Table I: Log ratio and replay error with different levels of model accuracy
Git for a prototype RVPlayer and RV and AD datasets
https://anonymous.4open.science/r/4fde3179-a0f9-46ec-a63c-90be771a97f3/