Which feature has contributed most for the ML training

Introduction

Technical explanation

Benefits

Better understanding the data

Better understanding a model

Reducing the number of input features

Importance identification approach

Should I Compute Importance on Training or Test Data?

Reference

Introduction

Laymen explanation

Feature importance refers to techniques that assign a score to input features based on how useful they are at predicting a target variable. This helps in model quality. If you like to know more, then this document helps.

Technical explanation

There are many types and sources of feature importance scores, although popular examples include statistical correlation scores, coefficients calculated as part of linear models, decision trees, and permutation importance scores.

Benefits

Better understanding the data

The relative scores can highlight which features may be most relevant to the target, and the converse, which features are the least relevant. This may be interpreted by a domain expert and could be used as the basis for gathering more or different data.

Better understanding a model

Inspecting the importance score provides insight into that specific model and which features are the most important and least important to the model when making a prediction.

Reducing the number of input features

This can be achieved by using the importance scores to select those features to delete (lowest scores) or those features to keep (highest scores).

Importance identification approach

As an approach, A feature is "important" if shuffling its values increases the model error, because in this case the model relied on the feature for the prediction. A feature is "unimportant" if shuffling its values leaves the model error unchanged, because in this case the model ignored the feature for the prediction.

Above approach is known as permutation feature importance. Refer here for more detail on this.

Random forest ML algorithm also provides capability to get the feature importance score (Refer: below example score chart). LASSO is another such ML model which provides feature importance score.

Should I Compute Importance on Training or Test Data?

If you measure the model error (or performance) on the same data on which the model was trained, the measurement is usually too optimistic, which means that the model seems to work much better than it does in reality.

The feature importance based on training data makes us mistakenly believe that features are important for the predictions, when in reality the model was just overfitting and the features were not important at all. Refer below picture.

Above picture shows distributions of feature importance values by data type. An SVM was trained on a regression dataset with 50 random features and 200 instances. The SVM overfits the data: Feature importance based on the training data shows many important features. Computed on unseen test data, the feature importances are close to a ratio of one (=unimportant).

Reference

https://christophm.github.io/interpretable-ml-book/feature-importance.html

https://machinelearningmastery.com/calculate-feature-importance-with-python/

https://images.app.goo.gl/3SXfL7o9Eu22etux8

https://stackoverflow.com/questions/44101458/random-forest-feature-importance-chart-using-python

https://www.researchgate.net/figure/Feature-importance-values-after-training-lasso-model_fig1_330114046

Page updated

Google Sites

Report abuse