We propose the first robustness benchmark of point cloud detectors against common corruption patterns. Due to the page limit of the paper requirement, we present some tables and figures (e.g., about Pedestrian, weather simulation naturalness validation, and corruption simulation implementations) on this website. We encourage readers to refer to this website for more complete details.
Table S14: CEAP (%) of different detectors under different corruptions on Pedestrian detection
As shown in Table S14, the average mCEAP of 8.18% still anticipates a noticeable accuracy drop of Pedestrian detectors against diverse corruption patterns. Specifically, scene-level {uniform_rad, gaussian_rad, local_dec} and object-level {cutout} corruptions have an AP loss of more than 20%, which presents a serious degradation of detection accuracy. By contrast, some corruption patterns (e.g. scene-level {background, beam_del}, object-level {upsample, rotation, translation}) show fewer effects on detectors (absolute value of CEAP less than 1.25%), demonstrating that background and locally upsampling noise, sparse beam loss, and slight rotation and translation don’t affect Pedestrian detectors’ accuracy. Surprisingly, compared to Car detection, Pedestrian detection is much less affected by rain and snow, presented by the average CEAP of 2.42% and 2.58%. After the investigation of point clouds, we found it is because the proportion (58.94%) of points with zero-value reflection intensity on cars is much higher than that (10.01%) of pedestrians, and those points with zero-value reflection intensity are easy to be blocked by dense rain and snow droplets.
Table S15: CEAP (%) under different severity levels of different common corruptions on Pedestrian detection
Table S15 shows the CEAP under different severity levels of corruptions on Pedestrian detection. According to Table S15, albeit with some minor exceptions, the CEAP of each corruption increases as the severity level increases, which especially rigorously applies to those relatively severe corruptions with an average CEAP of more than 5%.
Figure S6: mCEAP of detectors with different representations on Pedestrian detection ({red, green, blue} for {voxel-based, point-based, voxel-point-based} detectors and {circle, triangle} for {two-stage, one-stage} ones)
Figure S6 depicts the relationship between AP and mCEAP of Pedestrian detectors. As shown in Figure S6, similar to Car detection, the mCEAP of Pedestrian detection increases as its AP increases.
Figure S7: TD rate of all frames under scene-level corruptions w.r.t. different distances of objects to the LiDAR sensor (green dotted lines for the median and green triangles for the mean)
Figure S8: TD rate of all frames under object-level corruptions w.r.t. different distances of objects to the LiDAR sensor (green dotted lines for the median and green triangles for the mean)
Figure S9: Damage of KNN-RO on Car imaging in the pointclouds
Figure S10: Point clouds of the clean Car (left) after the 3D-Vfield augmentation (right)
Figure S11: More visualizations of real rain and simulated rain
Figure S12: Clean data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Figure S13: Rain data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Figure S14: Snow data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Rain and Snow. We adopt the rain and snow simulators of LISA (introduced in the paper) to simulate rain and snow corruption. For LISA, the parameters rainfall rate and snow rate can be regulated to simulate corruptions at different severity levels. After the investigation of the real-world rainfall rates, we set rainfall rate and snow rate to {0, 5.0, 15.0, 50.0, 150.0, 500.0} mm/hr and {0, 0.5, 1.5, 5.0, 15.0, 50.0} mm/hr as 6 severity levels (i.e., 0 to 5) for rain and snow corruption, respectively. Figures S13 and S14 display point cloud examples under rain and snow. Note that, severity level 0 stands for the original clean data, which also applies to the rest of the corruptions.
Figure S15: Fog data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Fog. We adopt the fog simulator of LFS (introduced in the paper) to simulate fog corruption in point clouds. For LFS, the parameter α is set to regulate the corruption severity levels. Following the recommended setting in LFS (with slight modifications for the relatively wide range of severity), we set α to {0, 0.005, 0.01, 0.02, 0.05, 0.1}. Figure S15 displays the point cloud example under fog.
Figure S16: Uniform_rad data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Figure S17: Gaussian_rad data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Uniform_rad and Gaussian_rad. To fit into the mechanism of LiDAR scanning, we first convert the coordinates of points from the Cartesian system into the spherical system (i.e., [x, y, z] to [r, θ, φ]). Then, we add the uniform or Gaussian noise into the r of every point. The upper and lower bounds o the range of uniform_rad are set to +/- {0, 0.04, 0.08, 0.12, 0.16, 0.2}m and the standard deviation of gaussian_rad to {0, 0.04, 0.06, 0.08, 0.10, 0.12}m. Figures S16 and S17 display the examples under uniform_rad and gaussian_rad.
Figure S18: Impulse_rad data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Impulse_rad. Likewise, we first convert the coordinates of points from the Cartesian system to the spherical system, and then, we add deterministic perturbations of +/ − 0.2m to r of a certain portion of points. The portion is set to {0, N/30, N/25, N/20, N/15, N/10} for 6 severity levels, where N represents the number of points. Figure S18 displays the point cloud example under impulse_rad.
Figure S19: Background data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Background. For background, within the spatial range of the scene, background points are randomly sampled uniformly and concatenated to the original points. The number of background points is set to {0, N/45, N/40, N/35, N/30, N/20} where N represents the number of original points. Figure S19 displays the point cloud example under background.
Figure S20: Upsample data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Upsample. For upsample, we spatially upsample points (with the random bias within [-0.1, 0.1]) nearby a certain portion of the original points. The portion is set to {0, N/10, N/8, N/6, N/4, N/2} for 6 severity levels, where N represents the number of original points. Figure S20 displays the point cloud example under upsample.
Figure S21: Cutout data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Cutout. For cutout, we first randomly select a certain portion of points as centers. By KNN, we erase the distance-related neighborhood of every center. The portion of selected points and the number of neighbor points are set to {(0,0), (N/2000,100), (N/1500,100), (N/1000,100), (N/800,100), (N/600,100)}. Figure S21 displays the point cloud example under cutout.
Figure S22: Local_dec data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Figure S23: Local_inc data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Local_dec and local_inc. For local_dec, we randomly select a certain portion of points as the centers and delete the 75% points of the spatial neighborhood of every center. For local_inc, within the neighborhood of every center, we utilize quadratic-polynomial fitting to upsample points as many as the neighbor points and concatenate them into the original points. The portion of selected points and the number of neighbor points are set to {(0, 0), (N/300,100), (N/250,100), (N/200,100), (N/150,100), (N/100,100)} for local_dec and {(0, 0), (N/2000,100), (N/1500,100), (N/1000,100), (N/800,100), (N/600,100)} for local_inc. Figures S22 and S23 display the point cloud examples under local_dec and local_inc.
Figure S24: Beam_del data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Beam_del. For beam_del, we randomly delete a certain portion of points in the point cloud. The portion is set to {0, N/100, N/30, N/10, N/5, N/3} for 6 severity levels. Figure S24 displays the point cloud example under beam_del.
Figure S25: Layer_del data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Layer_del. First, we convert the coordinates of points from the Cartesian system into the spherical system and obtain the range of the polar angle θ of all points in the point cloud. Then, based on the layer number of LiDAR scanning (e.g., 64 for KITTI), we divide the range of θ into 32 or 64 bins. Finally, we randomly select a certain number of θ bins and delete the corresponding points. For KITTI, the number of deleted bins is set to {0, 3, 7, 11, 15, 19}. Figure S25 displays the point cloud example under layer_del.
Figure S26: Uniform_obj data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Figure S27: Gaussian_obj data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Figure S28: Impulse_obj data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Uniform_obj, Gaussian_obj, and Impulse_obj. For every annotated object (e.g., Car or Cyclist), we add the noise into the Cartesian coordinates of its points. The upper and lower bounds of the range of uniform_obj are set to +/ − {0, 0.02, 0.04, 0.06, 0.08, 0.10}m. The the standard deviation of gaussian_obj set to {0, 0.02, 0.03, 0.04, 0.05, 0.06}m. The number of points affected by impulse_obj with the bias of +/ − 0.1m is set to {0, N/30, N/25, N/20, N/15, N/10}. Figures S26, S27, and S28 display the point cloud examples under uniform_obj, gaussian_obj, and impulse_obj, respectively.
Figure S29: Upsample_obj data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Upsample_obj. We upsampled points (with the spatial bias within [-0.05, 0.05]) nearby a certain portion of points of annotated objects. The portion is set to {0, N/5, N/4, N/3, N/2, N}, where N represents the number of original points of objects. Figure S29 displays the point cloud example under upsample_obj.
Figure S30: Cutout_obj data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Cutout_obj. For cutout_obj, we erase the distance-related neighborhoods of selected points from every annotated object in the point cloud. For the individual object, the number of selected points and the number of neighbor points are set to {(0, 0), (1, 20), (2, 20), (3, 20), (4, 20), (5, 20)}. Figure S30 displays the point cloud example under cutout_obj.
Figure S31: Local_dec_obj data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Figure S32: Local_inc_obj data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Local_dec_obj and Local_inc_obj. For local_dec_obj, we randomly select a certain number of points of every annotated object as the centers and delete the 75% points of the spatial neighborhood of every center. For local_inc_obj, within the neighborhood of every centers on objects, we utilize linearly fitting to upsample points as many as the neighbor points. The number of selected points and the number of neighbor points are set to {(0, 0), (1, 30), (2, 30), (3, 30), (4, 30), (5, 30)} for local_dec_obj and local_inc_obj. Figures S31 and S32 display the point cloud examples under local_dec_obj and local_inc_obj.
Figure S33: Shear data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Shear. For shear, we slant points of objects on X and Y- axis by a transformation matrix A = [[1, a, b], [c, 1, d], [0, 0, 1]] where a, b, c, d are +/ − 1 × a float sampled on the uniform distribution. The upper and lower boundaries of the uniform distribution are set to {(0, 0), (0, 0.10), (0.05, 0.15), (0.10, 0.20), (0.15, 0.25), (0.20, 0.30)}. Figure S33 displays the point cloud example under shear.
Figure S34: FFD data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
FFD. We use the FFD tool of the pygem package to distort points of objects. With the prior setting of 5 × 5 × 5 control points, the distortion ratio is sampled on the uniform distribution. The upper and lower boundaries of the uniform distribution are set to +/ − {0, 0.1, 0.2, 0.3, 0.4, 0.5}. Figure S34 displays the point cloud example under FFD.
Figure S35: Scale data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Scale. Along the randomly selected axis (height, length, or width), we scale up or down points of objects by a transformation matrix A = [[xs, 0, 0], [0, ys, 0], [0, 0, zs]]. The scaling parameter randomly selected among xs, ys, zs are set to 1+/−{0, 0.04, 0.08, 0.12, 0.16, 0.20}. Note that for scaling on Z-axis, we correspondingly move the object to the ground. Also, the ground-truth labels of objects (specifically, dimensions and locations of BBoxes) are modified accordingly. Figure S35 displays the point cloud example under scale.
Figure S36: Rotation data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Figure S37: Translation data and the Car detection by PVRCNN (red BBoxes for the ground-truth and green ones for the PVRCNN detection)
Rotation and Translation. We rotate or translate annotated objects to a milder degree. Specifically, objects are 1) moved forward or backward on the X and Y-axis at a distance sampled on the uniform distribution Udistance, and are 2) rotated in a clockwise or anticlockwise direction to a degree sampled on the uniform distribution Udegree. The lower and upper boundaries of Udegree are set to {(0, 0), (0, 2), (3, 4), (5, 6), (7, 8), (9, 10)}degree and those of Udistance to {(0, 0), (0.0, 0.2), (0.3, 0.4), (0.5, 0.6), (0.7, 0.8), (0.9, 0.1)}m. Note that the ground-truth labels of objects are modified accordingly. Figures S36 and S37 display the examples under rotation and translation.