Decision Trees (DT) is a supervised learning method used for classification and regression. They split data into branches based on feature values, constructing a tree like structure where each node represents a decision rule, and each leaf node represents an outcome or prediction. DTs are widely used in classification tasks because they are interpretable, handle non-linear relationships well, and do not require feature scaling.
And there are three important key point of decision trees. They are GINI, Entropy, and Information Gain. GINI is for calculate and measures impurity in a split. Lower GINI means more clean and good split. Entropy is for calculate and measures class distributions. Entropy values are higher when class distribution is uncertain. Finally, Information Gain determines the "goodness" of a split by measuring the reduction in impurity from parent to child nodes. Higher Information Gain implies a more effective split.
For this, let's use the same binary "patent_kind" target variable (values B2 and B1) and the randomly picked three continuous features "patent_num_claims", "detail_desc_length", and "patent_processing_time" as inputs. And split this data into training and testing sets, following an 80%/20% split, and display sample rows to confirm data preparation.
And for this dataset, adding more columns and drop different columns for each decision tree and all other methods were keep pointing same root node. So, manually split the first node to occur different root node.
Used GINI as the criterion with a maximum depth of 3, achieving an accuracy of 83.08%.
Used Entropy as the criterion with a maximum depth of 3, achieving an accuracy of 83.91%.
Used GINI as the criterion with a maximum depth of 3 and minimum sample split of 20. Achieving an accuracy of 75.41%.
Tree 1 (Gini) has balanced performance with a good balance of correct classifications for both classes. Tree 2 (Entropy) are slightly improved performance, particularly for one of the classes, reflecting its higher accuracy. Tree 3 (Gini, Min Split = 20) has lower accuracy, with a noticeable increase in misclassifications due to the minimum split constraint.
The three Decision Trees show similar performance, with a slight improvement in accuracy for the third model. Increasing depth and altering split parameters generally increases the model's ability to capture complexity but may risk overfitting. The root nodes and branching criteria in these models highlight different features for classification, providing insights into feature importance.
Decision Trees effectively classified the patent kinds with moderate depth and achieved good accuracy. These models show that the features claims, description length, and processing time are reasonably predictive for distinguishing between B1 and B2 patent types. The Decision Tree structure also offers interpretability, with each split indicating feature-based decision criteria.