During the training, the input images are resized to the corresponding resolution according to the different models. The training loss is smoothed cross-entropy, and the validation loss is cross-entropy. Because there are only four classes in total, the final evaluation metric is top-1 accuracy. The default optimizer is the SGD (stochastic gradient descent) with momentum and weight-decay, which can be changed to Adam later. The training process contained 200 epochs and was run on a GTX 1080 GPU, with 8GB memory. The whole code is forked from “rwightman/pytorch-image-models”