Support Vector Machine (SVM) Model

[In Progress]

Overview

Bare-minimum explanation of SVMs

More detailed information regarding SVMs

Overview

Bare-minimum explanation of SVMs

SVM models are classification models that, in short, linearly separate data into two classes1. A relatively simple-sounding goal, but there's quite a bit that goes into getting a model to do this, and even more if the data isn't already separable in its original dimensions.

There are several components that go into how the model goes about accomplishing this.

Optimization: maximize margin
- while satisfying certain constraints
Transform data to a way it is linearly separable (cast into higher dimensions)

Tools used to do this:

Lagrangian multipliers
Kernel transformation functions
- (the result of) dot products2 between "data vectors" that may or may not be cast into higher dimensions

More detailed information regarding SVMs

There's actually quite a bit that goes into understanding why the model uses the formulas and tools it does to accomplish its goals (and one might consider it to be mathematically heavy), but it's pretty interesting and fascinating.

The whole story is too long to include in a simple "overview" section of this page, so, the following are some resources to go through that journey if you so wish to do so.

Resources made by my professor, Dr. Ami Gates3, on her webpage: gatesboltonanalytics.com/?page_id=304
An explanation subpage I made as a form of learning "output" (based on my understanding of my professor's resources above): SVM Detailed Explanation [still under construction]

---

Footnotes:

SVMs can be applied to data with more than two classes also, but you just have to use more than one model. (If you have n classes, you have to use n-1 models)
a.k.a inner products
Professor Ami Gates is not only the professor for this machine learning course, but she did her Ph. D on (ensembles of) SVMs

Data

the 'temp-classification-test2.csv' data that I cleaned up and created on the Classification Data Prep page.

Used 'TAVG_label' as the target label. Everything else (except 'year') was used as the data.

Code

Link to code: https://github.com/Rokkaan5/5622-PublishedCode/blob/main/Classification/SVM.py

Results

The following figures are the confusion matrix results for each evaluated kernel type.

A linear, polynomial (of varying degrees) and rbf kernel were tested and evaluated.

Linear Kernel

Polynomial Kernel (degree=5)

Polynomial Kernel (degree = 8)

rbf Kernel

Technically various degree values were tested for the polynomial kernel, specifically degrees ranging from 2 to 10. The two selected in the figures above were the ones that seem to have the most interesting results. However below is a carousel of all tested polynomial kernel degrees.

One would probably that the results start to saturate and look similar with degrees 7 through 10.

Conclusions

Much like the other classification models so far, the SVM models don't seem to predict the "Below" class in particular very well, if at all.

In a very broad sense, all kernels performed more or less kind of similarly (not any drastic differences. The general "shapes" in the confusion matrices are relatively similar). Once again, this might be telling me about how the data was discretized (I think the data was relatively small also).

If I were to nit-pick the results, I could say the following.

The linear kernel appears to predict "Extreme-above" better than the rest of the kernels. All kernels predict the "Above" class more than any other class (again, probably says something about the balance in the training data). The results for the varying polynomial kernel start to saturate and look similar with degrees 7 through 10. The only kernel with the most "evenly" distributed of correctly predicted classes is the polynomial kernel of degree = 5.

Page updated

Google Sites

Report abuse