1) The network structure of resnet18 is adopted and the redundant fully connected layers in the baseline are discarded in order to learn more discriminative features.
2) The pre-training method is adopted to prevent the resnet18 network from falling into overfitting, and the parameter weights trained on the imagenet dataset can better optimize the entire network parameters.
3) Most of the time Adam gets the optimal solution is Sharp Minimum, and SGD gets Flat Minimum, which is why we set optimizer to SGD.
4) The imagenet dataset is very large, so the provided flower dataset and the flower samples in the imagenet dataset have high similarity, and the number of samples in the provided dataset is also relatively large, so different learning rates are adopted in different Network layers.
Accuracy Curve Loss Curve