Post-Add-on-Lab

In the hands-on section, we applied the decision tree classifier for detecting malicious website phishing attacks. Following are some ideas to your understanding and develop other models on the same dataset:

- We used entropy as an criterion for training the model. Instead, you can use gini index as an criterion and train the model and compare the results.
  - Hints: DecisionTreeClassifier(criterion ='gini', random_state = 0)
- There are many hyperparameters, the parameter we can manually input into the model, that we can tune and find the optimal results
  - max_depth: maximum depth of the tree. By default, the model (decision tree) will expanded until all leaves are pure (all samples in a leaf are from same class) if no value is given. Thus, we can prune or stop expanding by setting an integer value such as 2, 3, and 4
  - min_samples_split: the minimum number of samples required to split an internal node. By default, an internal node split if it contains minimum two samples. Instead, we can increase the number for splitting by setting the value to 3, 4, 5, 6 and so forth.
- In the hands-on section, we only calculate the overall accuracy score for evaluating the performance. You can find accuracy, precision, recall and F1 score for each of the classes to check in-depth performance of the model
  - Hint: from sklearn.metrics import classification_reports --> classification report(y_true, y_predict)

Page updated

Google Sites

Report abuse