I see a cute dog.
I only have a vague idea of what kind of dog it is.
Luckily, I had a (computer) vision.
I leveraged an existing model that utilized deep learning techniques and other convolutional neural networks.
I aimed to provide a reliable and efficient solution for dog detection, which can be valuable in various applications such as pet monitoring systems, animal shelter management, or even dog breed identification.
A short demonstration of the methodology and results of the identifier.
I utilized a pre-existing, pre-trained ResNet 50 with a fully connected layer to compress 2048 outputs from ResNet into 133 different dog classification classes.
By using a pre-trained neural net, I was able to get better results than training by myself on a local data set.
Over 10 epochs, the pre-trained neural net was able to reduce its training loss to low 3.0 values.
In contrast, the neural net created locally was only able to reduce its training loss to low 4.0 values.
Using a convolutional neural network (CNN), the image was processed, and a breed was predicted that matched the best out of 133 breeds.
Model architecture:
3 convolutional layers:
3 --> 32
32 --> 64
64 --> 128
Activation function: ReLU
Pool with a stride of 2
Loss function: cross entropy loss
After training with a small, local dataset, we only see an 8% accuracy from the model (on average over 20 runs). This is much better than the model making a random guess. With a random guess, the model would provide a correct answer roughly 1/133 times, slightly less than 1%.
Nonetheless, I thought this was really bad and realized that I should use a pre-trained model on ImageNet so that they would get better results.
Created and released by Microsoft Research
Pre-trained on ImageNet-1k at resolution 224x224.
Additional information regarding architecture of the neural net:
Loss function: cross entropy loss
Final fully connected layer
2048 --> 133
I researched popular deep neural networks that can still train quickly. ResNet was discussed briefly in class as a more effective way of creating deep neural networks while avoiding problems such as vanishing gradient and overfitting while training.
Since ResNet-50 has 2048 outputs, I added a fully connected layer to output the probabilities among the 133 different dog breeds.
After training and testing with the same dataset as the original learning model, we are able to see an 87% accuracy from the model (on average over 20 runs). I thought this was sufficient for the task!
My code and testing data that I have used for my demos are uploaded onto Github. I downloaded the data from Kaggle and performed some modifications to the dataset (such as removing outliers) then zipped it into dog_dataset.zip to be able to upload the data to Github. However, the zip was still too large to be uploaded onto GitHub. I have uploaded the data onto Google Drive. I included a terminal command in the comments in the first cell block in case you want to try it out ;)
Originally, I had used a .py file to code. However, I quickly realized that having to run the code from the first line to debug every time was very time-consuming as the self-made model took quite a while to run 10 epochs.
I switched to a Jupyter notebook and tested out the runtimes on Google Colab. However, I kept getting timeouts on Google Colab, even just running 8 epochs. I wanted to run much larger epochs for the pre-trained model to ensure that I would be getting more optimal results. Thus, I stayed on VSCode, training and testing locally, and switched to using Jupyter notebooks to leverage the ability to run single cells of code instead of blocks of code.
Running the entire code takes about 5 hours due to the amount of time to train the model through 20 epocs. If the number of epochs are reduced, the code will run much faster. Moreover, if you allow the program to use your GPU, it should run much faster.
The data was downloaded from Kaggle. All images were resized to 244x244 to fit the size of ResNet-50.
The data from Kaggle contained only 120 dog classes. After some research, I discovered that ImageNet has 133 categories. To improve my model's coverage, I found the remaining 13 dog breeds and added 10-20 images of each dog breed to my data set.
There were some categories from Kaggle that had very few dog images. When testing the classifier, the model performed poorly on a few specific categories. To improve this, I added new images that I found online.
Normalization was applied to training data to prevent excessive overfitting as the data set is not very large. Moreover, since I ran the code locally, I did not want to combine multiple data sets to allow my prediction algorithm to complete in a reasonable time.
Every image was rotated by a randomly generated number, and a random flip horizontally was applied.
For a majority of the time, The Dog Identification Classifier-Inator did a really, really good job (good boy!!).
A really good boy!
This image has been successfully categorized as a Bernese mountain dog.
Looks like a Pembroke welsh corgi, but it also looks like an absolute cutieeee!
This image was successfully categorized as a corgi.
This image was successfully categorized as a golden retriever.
Looks like the model is also able to easily identify dogs with hats on - definitely going to come in handy when it is sunny out!
Even with so many cute dogs and amazing classification, The Dog Identification Classifier-Inator still misclassifies. This may be because of poor lighting, the weird faces that the dogs are making, or just having really similar features.
This image was categorized as a Mastiff even though it is a (mildly confused) Labrador Retriever. Looks like the dog is not the only one who is confused.
Besides it, there is an image of a Mastiff, and they have similar facial features such as a dimple in the forehead.
This image was categorized as a Havanese, even though it is a Maltese.
Besides it, there is an image of a Mastiff, and they have similar facial features such as a hairier face and prominent nose.
This image was categorized as a golden retriever, even though it is a Tibetan Mastiff.
Besides it, there is an image of a Golden Retriever, and they have similar facial features such longer, yellow hair and a long nose.
After classifying so many dogs, I wondered if the classification would also work on humans. Moreover, if it does work, how similar does the dog breed appear to the human?
Rob Lowe was classified as a Dachshund!
Just hear me out.
Hear me out.
I got classified as a Bulldog!
Not sure where the resemblance is. I hope it is not the jowls.
I definitely know that I need to work on my skincare routine now.
Steve Jobs got classified as a Bullmastiff!
Sorry, Steve. I didn't know this was going to happen.
I still like the aesthetics of your products.
Sorry, Taylor, I still love you.
Pretty good!!
I walk through the process of creating the model and the types of experiments that I have done.
Researched different data set sources that were of adequate size but not too large that my model would not finish training in a reasonable time
Created a convoluted neural network from scratch
Adapted data set from Kaggle to personal usages
Identified outliers in the dataset
Adapted ResNet-50 from Microsoft Research
Applied additional fully connected layer to reduce output to 133 different classification types
Defined custom forwarding behavior
Performed post processing on outputs to identify best dog breed associated with the dog
Modified training parameters to identify best fit to the training and testing model
Varied epochs and batch sizes to identify best balance between training time and test results/outputs
Creating a neural network from scratch gave me really, really, really bad results. I wanted to see if I would be able to get at least comparable results between a neural net I built and trained on a specific dataset versus a large neural net trained on ImageNet.
The results, as I found out, were not comparable. The results that came from ResNet-50 were far beyond what I could've created locally.
I ran into many issues attempting to adopt ResNet-50 into my existing data preprocessing system. However, I was able to figure out how to alter how the data is being preprocessed.
Another issue I ran into was classification. It was pretty straightforward to have the images classified as either a human or not a human. However, it was much more difficult on identifying whether there was a dog in the image, especially when the lighting in the image is poor. It also ended up taking a reallly long time, and my program would often time out before it was possible to begin the classification of dog breeds.
If I had more time to work on this project, I would do the following:
Find different types of classification algorithms and compare them to my current algorithm to find the best algorithm
Discover better object detection and semantic separation algorithms to have a faster algorithm that identifies humans or dogs in the image
Find/combine different data sets to have better results on images with more objects
Add the ability to use the camera (or an USB-attached video device) to explore without having to upload an image to run the classifier