Classification of images is ultimately the main purpose of our project. For classifying images or videos, the most common approach is to use convolutional neural networks (CNNs). However, we cannot use CNNs if we wish to compare feature extraction methods as CNNs generally learn their own feature extraction methods. Therefore, we decided to use more traditional machine learning algorithms, in particular we decided to use boosting. Boosting is a meta algorithm which uses an ensemble of weak learners such as short decision trees to derive a strong learner.
The image on above shows how the boosting algorithm derives a strong learner. During training, several weak learners (take short decision trees for example) are trained to classify the dataset. While these weak learners are being trained, the boosting algorithm learns a weight for each weak classifier based on how well that classifier classifies the dataset. If weak learner A does a better job of classifying the dataset that weak learner B, then the weight corresponding to weak learner A will be higher. When classifying an example, the boosting algorithm asks each weak learner to classify the example. It then takes a weighted average of the predictions of each weak classifier using the weights it learned during training. This weighted average determined the label assigned to the input example.
One advantage of the boosting method is that it is a relatively intuitive way to classify a dataset which is not linearly separable. Each example is treated as a point in N dimensional space. The example in the image above uses 2-dimensional space since it is difficult to visualize anything beyond that. Each weak learner will learn some simplistic method to create a linear decision boundary to separate the data. It is not expected that any weak classifier be able to perfectly separate the data, however each weak classifier learns a different boundary. Boosting combines these boundaries using a weighted average to create a non-linear decision boundary.
In our project, we are classifying images using a popular boosting algorithm called AdaBoost. It is not obvious how to represent an image as a point in N-dimensional space, our project explores several ways of accomplishing this. A description of each method is available here. Our classification problem has 29 different classes. We classify the 26 letters of the English alphabet along with three other symbols: space, del, and nothing (the absence of a symbol). It stands to reason that by simply guessing we would achieve only a 3.4% accuracy, which is terrible. The success of our classifier depended on the feature extraction method. The actual results are discussed here, but all of our feature extraction methods performed better than guessing.