Results and Discussion

Decision Tree Performance

An Receiver Operator Characteristic (ROC) curve is one that is commonly used to evaluate binary classifiers. It shows the trade-off between the True Positive Rate (sensitivity) and the False Positive Rate (1 - specificity). An algorithm that does not have any classification power is theoretically a 45° straight line through the middle of the plot. The more the curve runs through the top left corner, the more accurate the classifier. The performance can also be seen with the Area Under Curve (AUC) score in which higher AUC scores generally mean better classifier performance. ROC curves are not usually done with discrete classifiers like decision trees because it only shows one point. However, I created a ROC curve to standardize comparison of performance among the classifiers. Both Figs. 1A and 1B display high classification performance. While the ROC curve for fold 4 of the Gini Index model showed the highest AUC overall, the decision tree using entropy as the criterion had a stronger performance overall. The error rate of each fold and the generalized average error rate was taken for each classifier. For the Gini Index, the average error rate was 7.07% while it was 6.41% for the entropy model. This error rate was taken for multiple runtimes which randomly generated folds, and the decision tree using entropy constantly outperformed the Gini Index even if by only a little bit. The decision trees for the top-performing fold of each decision tree were created for visual representation. Here, I noted that while the depths of both trees are equal, the entropy tree had much greater breadth. Given the similar performances, I don't think this difference changes anything about tree classification.

Gini Index Tree

Entropy Tree

K-Nearest Neighbors (KNN) Performance

For the KNN classifier, I wanted to select the best hyperparameter for algorithm performance. Here, the classification power of KNN depends heavily on the number of nearest neighbors considered, k. With this, the performance of each cross-validation loop was taken for multiple values of K as shown in the graph above. The average error rate for the weighted voting method was 3.6% while the average error rate for the weighted method was 3.8%. The weighted voting slightly outperformed the majority voting method but both KNN algorithms had low error rates and classified more accurately than the decision trees. In general, both the majority voting and weighted voting performed best when K = 13 as that minimized the error curves. As the weighted voting curve is almost always below the majority voting, this signifies a lower error rate in general and overall, better performance. This makes sense with what is expected because weighted voting also takes into account how far a neighbor is when quantifying the neighbor's classification influence.

Naive Bayes Performance

The Naive Bayes algorithm generated a really strong ROC curve which most folds have an AUC of 0.99 or 0.98. The error rate for this classifier was 5.79% which is better than both of the decision trees but not as good as the KNN classifier. The Naive Bayes classifier assumes that each attribute is independent of each other which is not necessarily true and may be one drawback of this method.

Support Vector Machines (SVM) Performance

The SVM ROC curves were also quite strong which higher AUC values than for decision trees but lower than for Naive Bayes. The error rate for this classifier was 8.2% which is the highest of any of the other classifiers by a little. Comparing all of the types of classifiers tested, it seems that Weighted Voting KNN had the best overall performance and lowest error rate. However, this classifier is not very efficient as take around 1 minute and 49 sections to run. In contrast, the error rate of Naive Bayes is only about 1.5% higher but the program runs very fast. Additionally, Naive Bayes appeared to generate the most promising ROC curve. This program also revealed how decision trees had relatively accurate performance but entropy was a better parameter than gig indix for this particular dataset.

Overall, all of the classifiers have accuracies ranging from about 91% - 96.5% which is pretty high. This demonstrates that physical tumor characteristics in breast cancer are a good indicator of whether a tumor will be benign or malignant. These types of classifiers will be groundbreaking in cancer diagnoses if only a quick MRI or CT scan of a tumor can reveal the nature of the cancer before taking invasive biopsies. In the future, I would like to see if the error rates will decrease even more for this breast cancer model if I start layering various different classifiers on top of each other.