In this part of the project, after we did the features extraction and selection we tried different types of classifiers to find the best one that suite our problem. We are looking for the classifier that maximizes the true positive rate and minimizes the false negative rate. Now we are going to list all the classifiers that we tried in this project.
1) Decision Tree C4.5 :
The Decision Tree classifier tries to predict the value of a target variable based on several input variables. The basic idea is to break up a complex decision into a union of several simpler decisions. The tree consists of root node, intermediate nodes, leaf nodes and edges. Each of the nodes represents one of the features and each of the edges represents a path that we have to follow based on the feature value. The tree design consists of two stages. In the first stage, we need to decide what is the node splitting criteria. That involves specifying the split condition and evaluate the robustness of that decision. The second stage is to define the stopping criteria. That defines the depth of the tree or when to stop growing the tree. The easiest answer for the second stage is to keep splitting till we don't have anymore features. The decision tree has many advantages such as:
-Inexpensive to construct
-Fast
-Easy to visualize and interpreter
In the project we have used the decision tree that implemented in Weka which uses the C4.5 algorithm. We have 4493 instances each of them has 10 attributes (After features selection using Information Gain). The tree we got has 13 leaves and of size 25. The tree has a very high accuracy of 99.5771%. We observed that decision tree has very few instances that are packed and classified as unpacked which means we have very small false negative rate that is around 0.001112842. The tree we got is shown in the figure below:
2) Rule Based Classifier:
In this classifier the classification is based on if and then conditions that called the rule set. The rules have to have two properties. The first one is that the rules should be mutually exclusive. The second one is that the rules should be exhaustive. The rule based classifier is very similar to the tree based classifier and its very easy to convert a tree to a rule based classifier. Actually in order to guarantee the above two properties its recommended to construct a tree and then convert it to a set of rules. However, rule base classifier has advantage over decision tree that its rules can be simplified.
Rule based classifier has many advantages such as:
- As highly expressive as decision tree
-Easy to interpreter
-Easy to generate
-classify new instances rapidly
-comparable performance to decision tree
In the project we have used the rule based classifier implemented by Weka. Weka uses the RIPPER method to eliminate instances. We got a classifier with 11 rules before pruning and 8 rules after pruning with accuracy of 99.3991%. We observed that this classifier just like the decision tree has very low false negative around 0.003338527. The rules we got are the following:
JRIP rules:
===========
(FE <= 6.685571) and (IAT >= 17) and (NSS <= 1) => class=not_packed (2080.0/2.0)
(FE <= 5.900778) and (RWX <= 0) and (FE <= 5.209362) => class=not_packed (66.0/0.0)
(IAT >= 30) and (HE >= 1.358913) and (RWX <= 0) => class=not_packed (55.0/0.0)
(IAT >= 15) and (RWX <= 0) and (CE >= 6.280178) and (CE <= 6.636319) => class=not_packed (16.0/1.0)
(FE <= 6.750366) and (IAT <= 1) => class=not_packed (6.0/1.0)
(IAT >= 169) => class=not_packed (5.0/0.0)
(SS >= 4) and (FE <= 6.628477) => class=not_packed (3.0/0.0)
=> class=packed (2262.0/4.0)
Number of Rules : 8
3) Naive Bayes Classifier:
This classifier is a probabilistic classifier that is based on Bayes theorem and assumes conditional independence between features. The classifier tries to find u and pi that maximize the following equation:
p(y)= u^y (1 - u ^(1-y))
p(x | y=k)= pi ^ x (1 - pi ^x)
where the join likelihood function is:
Naive Bayes classifier has many advantages such as its not very sensitive to noise also it can capture inherent uncertainties in the data and prediction task.
We have used the Naive Bayes classifier that implemented by Weka. Weka uses normal distribution to model numeric attributes. We got accuracy of 98.442% and the false negative rate is around 0.003783663. That false positive rate is very close to the rule based classifier fp rate.
4) Support Vector Machine:
The SVM classifier tries to find a hyper plane that will separate the data. As we can see from the figure below for the linear classifier case the idea is to find the best hyper plane that will separate the data perfectly. SVM tries to maximize the margin which is a constrain optimization problem. The black points in the graph nows as the support vectors.
we used the SVM classifier that implemented by Weka. Weka implements John Platt's sequential minimal optimization algorithm for training a support vector classifier. This algorithm tires to achieve the following:
-replaces all missing values and transforms nominal attributes into binary ones
-normalizes all attributes
-Multi-class problems are solved using pairwise classification
The accuracy of the classifier was 98.0414% and the false negative rate is around 0.007789895.