Breast cancer is one of the most common types of cancer affecting women. It makes up about one-third of all newly diagnosed cancer cases in women annually [1]. After a patient is diagnosed with breast cancer, the doctor must complete multiple invasive tests such as biopsies to better understand the nature of the tumor: what grade the cancer is at, and whether the tumor is malignant or benign. While benign cancers are unable to spread to nearby tissues and are relatively harmless, malignant cancers are fast-growing and can escalate throughout the body [2]. Given the major differences in the tumor types, a patient's treatment plan depends heavily on whether the cancer is malignant or benign. However, there is a buffer period between when a patient is diagnosed with cancer and when they find out about the nature of their tumor. This can lead to high patient anxiety, and for patients with malignant tumors, even worsen their prognosis. With this, early classification of breast cancer tumors is of great value.
Benign and malignant tumors tend to differ strongly in their shapes. While a benign tumor tends to be more rounded and smooth, malignant tumors have vastly irregular borders [2]. These physical differences in tumor types may lead to a potential method for faster cancer classification. Machine learning classification can be used to predict the labeling of an item given a set of attributes. In this case, several tumor properties such as tumor area, compactness, and concavity can be utilized to predict whether a tumor is malignant or benign.
There are various different types of classifier algorithms such as decision trees, K-nearest neighbors, Naive Bayes, and support vector machines (SVMs). In this supervised machine learning model, a classifier will be built using training data and its performance will be evaluated through the accuracy of predictions with testing data. Cross-validation will be utilized to generate the training and testing datasets, and the average error rate will be used to determine algorithm performance. The ultimate goal of this project is to test out various classifier algorithms with a breast cancer tumor dataset and determine which machine learning model leads to the best prediction of whether a cancer in benign or malignant.