Support Vector Machine (SVM) Model
[In Progress]
[In Progress]
SVM models are classification models that, in short, linearly separate data into two classes1. A relatively simple-sounding goal, but there's quite a bit that goes into getting a model to do this, and even more if the data isn't already separable in its original dimensions.
There are several components that go into how the model goes about accomplishing this.
Optimization: maximize margin
while satisfying certain constraints
Transform data to a way it is linearly separable (cast into higher dimensions)
Tools used to do this:
Lagrangian multipliers
Kernel transformation functions
(the result of) dot products2 between "data vectors" that may or may not be cast into higher dimensions
There's actually quite a bit that goes into understanding why the model uses the formulas and tools it does to accomplish its goals (and one might consider it to be mathematically heavy), but it's pretty interesting and fascinating.
The whole story is too long to include in a simple "overview" section of this page, so, the following are some resources to go through that journey if you so wish to do so.
Resources made by my professor, Dr. Ami Gates3, on her webpage: gatesboltonanalytics.com/?page_id=304
An explanation subpage I made as a form of learning "output" (based on my understanding of my professor's resources above): SVM Detailed Explanation [still under construction]
---
Footnotes:the 'temp-classification-test2.csv' data that I cleaned up and created on the Classification Data Prep page.
Used 'TAVG_label' as the target label. Everything else (except 'year') was used as the data.
The following figures are the confusion matrix results for each evaluated kernel type.
A linear, polynomial (of varying degrees) and rbf kernel were tested and evaluated.
Linear Kernel
Polynomial Kernel (degree=5)
Polynomial Kernel (degree = 8)
rbf Kernel
Technically various degree values were tested for the polynomial kernel, specifically degrees ranging from 2 to 10. The two selected in the figures above were the ones that seem to have the most interesting results. However below is a carousel of all tested polynomial kernel degrees.
One would probably that the results start to saturate and look similar with degrees 7 through 10.
Much like the other classification models so far, the SVM models don't seem to predict the "Below" class in particular very well, if at all.
In a very broad sense, all kernels performed more or less kind of similarly (not any drastic differences. The general "shapes" in the confusion matrices are relatively similar). Once again, this might be telling me about how the data was discretized (I think the data was relatively small also).
If I were to nit-pick the results, I could say the following.
The linear kernel appears to predict "Extreme-above" better than the rest of the kernels. All kernels predict the "Above" class more than any other class (again, probably says something about the balance in the training data). The results for the varying polynomial kernel start to saturate and look similar with degrees 7 through 10. The only kernel with the most "evenly" distributed of correctly predicted classes is the polynomial kernel of degree = 5.