By checking the dataset, we notice that the number of pictures under each category is close to 400. As shown in the picture, we can see that although in the same category, there also has the huge difference which will influence the accuracy. Such as the second picture of sunflower, the woman actually occupy most of the entire picture.
This baseline uses two convolutional layers to extract features and four fully-connected layers to classify the ex-tracted features. The problem is that the number of convolutional layers is too small, which leads to inadequate extrac-tion of features, and the number of fully-connected layers is too large, which leads to overfitting of the whole network.