We supplement the paper to display the box plots of the Top1 change rate for all of the seven DNNs in terms of two categories (i.e., inconsistency and consistecy). For each seed DNN, we randomly selected 150-200 inputs for the two categories, respectively. In total, we studied 1145 inconsistencies and 1332 consistencies to observe the distribution of top-1 change rate. Details for each DNN are shown as follows.
Note that, the distribution of Top1 change rate on ResNet-20 is quite different from that on other models. It is understandable because there exist several layers that can originally introduce large layer distances (i.e., Conv2D layers with strides=2) in ResNet-20 seed models, due to the implementation difference on Padding scheme between TensorFlow and CNTK, as stated in the paper. That is to say, it still shows large top-1 change rate for majority of inputs, even they share the same prediction label bewteen TensorFlow and CNTK. Even so, it does not affect the subsequent bug localization on ResNet-20 by using the determined thresholds.