In this step, we used a feature extraction algorithm to select the image features that best distinguished between active and inactive cells. This allowed us to later remove all insignificant features, which vastly improved the predictive power of the ML algorithm.
In computer vision, a feature refers to a measurable piece of data in an image that is unique to the specific object-- an infected cell, in our case. Features can include shapes, lines, edges, and distinct colours, but these features are often hard to identify with the human eye.
Image Description: A feature extraction algorithm identifies the key components that define a motorcycle, which will later be used in classification.
In our case, feature extraction was much more complicated than the above example, which is why we had to use a feature extraction algorithm to extract the best features for us.
There are a few characteristics that all good features possess:
Invariance to rotation, scaling, and changes in illumination
Repeatable
Precise
Distinctive
Significant influence on the output of the classifier
We tested two methods of feature extraction:
Raw Pixel Feature Vector Extraction involves directly extracting pixel values from each image and appending them one after the other to create a long feature vector. In this case, each pixel represents a feature.
We decided to convert the image into grayscale before running the extraction algorithm, which allowed each pixel value to consist of a single number, as opposed to the three numbers that represent colours in RGB images. This shortened the feature fector.
Edge Feature Extraction involves extracting edges within the image as features and using those edges as the input for the machine learning model. Edges are defined as sharp changes in colour and image intensity.
Filters can also be applied in edge detection to enhance image properties and make the edges more visible. Filters modify the pixel values of images, lightening or darkening certain parts of the image, but they do not change the pixel positions. Some examples of filters include Sobel, Canny, and Prewitt. We used the Sobel filter in the code above.
Different filters produce very different edge detection results. This is what edge detection using the Canny filter would look like:
We also applied both horizontal and vertical filters to the images. The following images of puppies demonstrate what an edge feature extraction would look like with a horizontal filter and a vertical filter individually:
Horizontal filter applied:
Vertical filter applied:
Both filters applied:
We preferred edge feature extraction for visualization purposes because it allowed us to better observe the features that would influence the output of the machine learning model, as can be seen from the cell edge detection images. On the contrary, the long feature vector produced through Raw Pixel Feature Vector Extraction was relatively difficult to interpret, so we chose not to use it. For our classification model, we decided to use the feature embeddings already in the dataset, as these features had already been defined as the best features.