Developed model that receives 28x28 gray scale image of handwritten digits and outputs the confidence that the image is of a 0,1,2,... in percentages. Initially created a model with 2 linear layers trained using SGD w/momentum optimizer with cross entropy loss and got 95% accuracy on the test set.
Revised the model to consist of 6 stride 2 convolutional layers with a kernel size of 3 , normalized the input, added a batch normalization layer between each conv layer, and used one cycle LR scheduler during training. These changes resulted in 98.7% accuracy on the test set. These improvements are because convolutions allow the model to extract features from the images that may be unique to certain numbers which can help the model predict. The One Cycle LR scheduler is a LR scheduler that starts at max_lr/div_factor LR then linearly increases to max_lr then linearly decreases to max_lr/final_div_factor. This improves model training and prevents overfitting as the very high learning rate causes a regularization effect. (for more information visit https://doi.org/10.48550/arXiv.1708.07120)