MNIST: MNIST is one of the most iconic datasets in the field of machine learning. It consists of 70,000 handwritten digit images (0 through 9) split into a training set of 60,000 images and a test set of 10,000 images. Each image is grayscale and has a resolution of 28x28 pixels.
Fashion-MNIST: Fashion-MNIST serves as a more challenging alternative to the traditional MNIST dataset, containing 70,000 grayscale images across 10 fashion categories (such as T-shirt/top, Trouser, Pullover, etc.). Each image is 28x28 pixels, and the dataset is similarly split into 60,000 training images and 10,000 test images.
CIFAR-10: The CIFAR-10 dataset consists of 60,000 32x32 color images in 10 different classes, with 6,000 images per class. The dataset is divided into 50,000 training images and 10,000 testing images. The classes include objects like airplanes, cars, birds, cats, etc.
ImageNet: ImageNet is a vast dataset designed for use in visual object recognition software research. It contains over 14 million images that have been hand-annotated to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. The dataset is organized according to the WordNet hierarchy, where each meaningful concept (synonym set or "synset") is depicted by hundreds and thousands of images.
MLP (Multilayer Perceptron): An MLP is a fundamental type of neural network composed of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. This MLP configuration in our work includes two hidden layers: the first hidden layer has 512 neurons, and the second hidden layer contains 256 neurons.
AlexNet: AlexNet is a convolutional neural network developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. It marked a significant achievement in the field of computer vision by outperforming other models in the 2012 ImageNet Large Scale Visual Recognition Challenge. A key innovation in AlexNet is the use of the ReLU (Rectified Linear Unit) activation function, which helped mitigate the vanishing gradient problem during training.
VGG-16: Developed by the Visual Graphics Group at Oxford, VGG-16 is known for its simplicity and effectiveness in the convolutional neural network architecture realm. It features a series of 3x3 convolutional layers stacked in increasing depth, demonstrating the power of deep networks in image recognition tasks. The architecture's straightforward design has made it a popular choice for many computer vision applications.
ResNet-101: ResNet-101, part of the Residual Network family, was developed by Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. It features 101 layers and incorporates residual connections to address the vanishing gradient problem, facilitating the training of deep networks. These residual connections allow information to bypass one or more layers, enhancing the network's training efficiency and performance.
VGG-19: An extension of VGG-16, VGG-19 includes 19 layers, adding more convolutional layers to capture finer details in image recognition tasks. Despite its structural similarity to VGG-16, the added depth in VGG-19 provides the model with greater learning capacity, making it adept at handling more complex image recognition challenges.