[1]He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
[2]Yalniz I Z, Jégou H, Chen K, et al. Billion-scale semi-supervised learning for image classification[J]. arXiv preprint arXiv:1905.00546, 2019.
[3]Woodworth B, Patel K K, Stich S, et al. Is local SGD better than minibatch SGD?[C]//International Conference on Machine Learning. PMLR, 2020: 10334-10343.
[4]Erhan D, Courville A, Bengio Y, et al. Why does unsupervised pre-training help deep learning?[C]//Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2010: 201-208.