Abstract
Limited resources on mobile devices have necessitated a collaboration with cloud servers, called “Collaborative Intelligence”, to process growing Deep Neural Network (DNN) model sizes. Collaborative intelligence takes a long time to send a lot of feature data from clients to servers. One can reduce the transfer time using User Datagram Protocol (UDP), but a dropped packet during UDP transfer reduces inference accuracy. This paper proposed a DNN retraining method to develop a robust DNN model. The server-side layers are retrained to avoid lossy features by modeling continuous feature losses resulting from a packet drop. Our results showed that it can reduce accuracy reduction from packet losses, provide high accuracy reliability against changes in the communication environment, and reduce the storage overheads of mobile devices.
Proposed Method
Figure 1. Comparison of training method, prior work [1] and our work. Figure (a) shows the overall architecture of prior work which inserts dropout layer between input sub-DNN and output sub-DNN for model retraining. As it retrains the whole part of the model, both input sub-DNN and output sub-DNN are newly generated. Figure (b) shows that our method pre-processes the intermediate features computed from input sub-DNN and retrains only the output sub-DNN. Pre-processing includes making some of the continuous features to zero.
The paper proposes a deep neural network retraining method to mitigate the decrease in accuracy of artificial intelligence models in packet loss scenarios. The proposed method can reduce the degradation in image inference accuracy compared to [1] even when packets are lost during feature transmission and alleviate the burden on model storage capacity in mobile devices with limited storage space.
Figure 1(b) illustrates the overall structure of the proposed method. Similar to [1], the proposed method consists of retraining and inference processes. During retraining, to mimic packet loss situations, consecutive features transmitted from the input sub deep neural network to the output sub deep neural network are intentionally lost. Instead of adding separate layers as in previous studies, in the preprocessing stage, all consecutive intermediate features are masked to zero (indicated in red). The number of features masked to zero is determined based on the packet loss rate for retraining. Subsequently, the input sub deep neural network is frozen and only the output sub deep neural network undergoes retraining. Upon completion of retraining for all packet loss rates, one original input sub deep neural network and multiple output sub deep neural networks retrained with various packet loss rates can be obtained.
During the inference process, the mobile device possesses only one input sub deep neural network. The cloud can determine the actual packet loss rate based on the received packets from the mobile device and the packet sequence number [2]. This enables the selection of the neural network with the optimal accuracy from multiple output sub deep neural networks, facilitating further computation.
The proposed method differs significantly from the existing method [1] in three main aspects. Firstly, the retraining method differs. The existing method utilizes dropout layers to randomly zero out individual nodes, which does not effectively simulate real packet loss situations. As packets are typically larger, containing consecutive features, this study intentionally induces consecutive feature losses during preprocessing, better simulating actual packet loss environments.
Secondly, the ability to utilize sub deep neural networks with optimal accuracy for the actual packet loss rate differs. The existing method selects input/output sub deep neural networks based on previously occurring packet loss rates when the mobile device and cloud choose sub deep neural networks. However, as communication environments fluctuate and packet loss rates vary, discrepancies between predicted and actual packet loss rates can significantly affect image inference accuracy. In contrast, the proposed method employs a single input sub deep neural network regardless of previous packet loss rates. When transmitting intermediate features from the mobile device to the cloud, the cloud calculates the actual packet loss rate based on packet sequence numbers. Consequently, selecting the output sub deep neural network providing the optimal accuracy resolves the issues mentioned in previous research.
Lastly, the proposed method reduces the storage burden on mobile devices by storing only one input sub deep neural network. For instance, conducting retraining for ten different packet loss rates reduces the storage burden of the input sub deep neural network on the mobile device by tenfold compared to the existing method. Particularly in mobile devices with limited storage space, this can offer significant advantages.
Experimental Result
Figure 2. Comparison of inference accuracy against various packet loss rate, prior work [1] and our work. Our work provides higher accuracy than the best case of prior work, when the packet.
Figure 3. Inference accuracy of our work and prior work [1] against various packet loss rate. Overall accuracy reduction rate across the increase of packet loss rate is smaller in prior work than that of our work. However, our method shows higher inference accuracy when the packet loss rate is 0.2 or less, which is an acceptable range in real communication environments.
Figure 2 compares the measured inference accuracy with the existing research. In the existing research, accuracy varies depending on which packet loss rate the input/output sub deep neural networks, retrained with, are used. Generally, models retrained with the same loss rate as the actual packet loss rate exhibit higher accuracy. However, in some loss rates, models trained with different loss rates show higher accuracy (e.g., when the actual loss rate is 0.1, a model retrained with a 0.2 loss rate exhibits the highest accuracy). Conversely, when the actual loss rate significantly differs from the one used in training, accuracy drastically decreases. Considering these aspects, the accuracy of the existing research is represented in the best-case (solid line) and worst-case (dotted line). In contrast, the proposed method determines the actual packet loss rate based on the sequence numbers of packets arrived at the server and selects the optimal output sub deep neural network accordingly.
Overall, the proposed method achieves similar accuracy to the best case of the existing method. It outperforms by 0.6 percentage points at a loss rate of 0.2 and underperforms by 1.44 percentage points at a loss rate of 0.4. However, compared to the worst case, it provides significantly higher accuracy. When the loss rate is below 0.3, the accuracy is over 4 percentage points higher (reaching a maximum of 4.58 percentage points when the loss rate is 0), and at a loss rate of 0.3, the accuracy is 3.26 percentage points higher. This demonstrates its ability to provide more stable and higher accuracy in actual communication environments where predicting the real loss rate is challenging.
Figure 3 presents detailed information on the accuracy graphs of the proposed method and the existing method shown in Figure 2. The left graph represents the proposed method, and the right graph represents the existing method's results. The black line indicates the results of experiments conducted without retraining the model. The other five lines (red to purple) represent the results of experiments conducted with models retrained with loss rates ranging from 0.1 to 0.5. As the actual packet loss rate increases, it can be observed that the decrease in accuracy trend in the right graph (existing method) is more gradual than that in the left graph (proposed method). This is because the existing method uses random dropout layers to prevent overfitting and retrains the model to be not limited to specific packet loss rates. However, as shown in Figure 3, the proposed method consistently outperforms the existing method's accuracy in the acceptable range of packet loss rates, 0.0 to 0.2. Additionally, as shown in Figure 2, the proposed method can use sub deep neural networks that achieve the optimal accuracy in actual packet loss scenarios, reducing accuracy degradation compared to the existing method.
Conclusion
In this paper, we conducted retraining on pre-trained deep neural networks to reflect actual data loss due to packet loss. Unlike [1], which mimicked situations where transmitted data is randomly lost, we mimicked situations where transmitted data is consecutively lost. Furthermore, by only retraining the output sub deep neural network, we can dynamically use weights with the highest accuracy for each packet loss rate at the cloud server when packet loss occurs, resulting in higher accuracy than the existing method. Compared to previous research, we observed an increase in accuracy of up to 4.58 percentage points. Additionally, the storage capacity required for weights to be stored on mobile devices decreased by a factor of 10 compared to the existing research based on our experiments.
References
[1] S. Itahara, T. Nishio, and K. Yamamoto, ``Packet-loss-tolerant split inference for delay-sensitive deep learning in lossy wireless networks,'' in Proc.IEEE Global Commun. Conf. (GLOBECOM), Dec. 2021, pp. 16.
[2] Jian, Zhang, et al. "An Approach for Storage and Search of UDP Packet Data." IEEE International Conference on Computer Science and Electronics Engineering. Vol. 2. 2012.