Discussion

The following section provides an overview of our projects strengths and weaknesses including aspects of the project that were completed with Python code implementation and those that were explored theoretically. We trained decision tree and KNN models initially to attain a baseline accuracy, To interpret the data, we researched and implemented PCA and LDA (to compare and determine the utility of the results of our feature extractions) and also used edge detection on our data before feeding it into our respective models (KNN, decision tree). We further improved our accuracy by creating a 1-layer CNN model and then a pre-trained VGG-16 CNN model.

Strengths

One of our biggest strengths of this project was understanding the mathematics and theory behind the algorithms we used. Most of our group had little to no experience with Python and ML algorithms, so each algorithm had a learning curve. Specifically understanding the "black box" that the CNN uses proved to be challenging to grasp and explain. Since our pre-trained VGG-16 is also a CNN model, we had to be able to comprehend the black box to validate its accuracy and ensure the results we achieved were reliable.

Another challenge we ran into was running the PCA model to extract meaningful features that contributed to classification of our cat and dog images. Initially we created a PCA model without using any processing techniques on the images to see if colored images assisted in classification. However, our initial PCA revealed some issues and we realized that we were plotting on the axis of greatest variability rather than axis of classification. This was not meaningful since the classification of images was based on reviewing each image holistically rather than as individual pixel contributions. To overcome this issue, we researched the use of supervised PCA and LDA. To learn more about the supervised PCA algorithm, we referred to the research paper Supervised Principal Component Analysis Via Manifold Optimization. We implemented the LDA algorithm in Python and our results are displayed on the "Gray Pixel Values as Features" and "Edge Extraction" under "Feature Extraction." To effectively utilize PCA and LDA, we used feature extraction (gray-scale pixel, edge, and object extraction) to process the images and then feed them into our PCA and LDA algorithms. This allowed us to determine if the extraction process could be used effectively to help classifying cat and dog images. Please see the page "Gray Pixel Values as Features" for a more in-depth explanation of the extraction methods.

Weaknesses

The most prominent obstacle we ran into during this project that we were unable to fully overcome was access to computational power. Since our data included 25,000 images, each 200x200x3 pixels, image processing and running machine learning models on this data required more computational power than we had available to us locally or on Google Cloud. Furthermore, if our code was inefficient in running these models or processing data, the code would take large spans of time to run or we would run out of memory before it could finish. While we could and did run on smaller subsets of data, the results from these subsets were not always an accurate representation of the population because the subsets would sometimes have less variability within themselves.

Our baseline algorithms (KNN and decision tree with and without edge extraction) had accuracies close to 50-60%, which signifies that the algorithms did not learn much and were essentially guessing at the classes that each image belonged under.

When running the extraction algorithms, we ran into other various issues. For instance, performing edge extraction by hand requires a predetermined kernel size which cannot be extrapolated and remain accurate for all subsets of data since each subset may contain different types of images (e.g. images with a single cat against a noisy background vs images with a cat against a plain background). Performing object extraction needs contour mapping and area thresholding, but since our images had a considerable amount of noise within their backgrounds, we didn't have the computational power to fully execute this algorithm. As mentioned before, the vast variety of images (e.g. images with a single cat vs multiple cats, images with a human holding the cat vs the cat alone, images with plain backgrounds vs backgrounds with other objects) meant that our models had to account for each of these scenarios without having any foresight to recognize what cats and dogs looked like first.

Future Improvements

Many of our algorithms could be refined further given more bandwidth to work on the project and computational power. The University of Michigan has a resource campus computing cluster called Great Lakes and typically is used to provide for research needs. They have a much better GPU and larger memory which would help us run our models with large datasets. This would also aid us in processing our images before feeding them into our algorithms and will help us achieve more accurate results. On that note, we could also explore other pre-trained models with the ResNet dataset or other similar datasets to compare results and extract features from the images. If we had the chance to improve our project further, we would continue our research with object extraction and using this extraction in tandem with other models. As mentioned in the "Weaknesses" section above each extraction method we used had its own set of pros and cons, many of which come from the dataset containing a variety of types of images. Using pre-trained datasets or creating a dataset with "headshots" of cats and dogs to initially train our model on may help us refine our extractions methods and make them more robust.