Laymen explanation
Feature importance refers to techniques that assign a score to input features based on how useful they are at predicting a target variable. This helps in model quality. If you like to know more, then this document helps.
There are many types and sources of feature importance scores, although popular examples include statistical correlation scores, coefficients calculated as part of linear models, decision trees, and permutation importance scores.
The relative scores can highlight which features may be most relevant to the target, and the converse, which features are the least relevant. This may be interpreted by a domain expert and could be used as the basis for gathering more or different data.
Inspecting the importance score provides insight into that specific model and which features are the most important and least important to the model when making a prediction.
This can be achieved by using the importance scores to select those features to delete (lowest scores) or those features to keep (highest scores).
As an approach, A feature is "important" if shuffling its values increases the model error, because in this case the model relied on the feature for the prediction. A feature is "unimportant" if shuffling its values leaves the model error unchanged, because in this case the model ignored the feature for the prediction.
Above approach is known as permutation feature importance. Refer here for more detail on this.
Random forest ML algorithm also provides capability to get the feature importance score (Refer: below example score chart). LASSO is another such ML model which provides feature importance score.
If you measure the model error (or performance) on the same data on which the model was trained, the measurement is usually too optimistic, which means that the model seems to work much better than it does in reality.
The feature importance based on training data makes us mistakenly believe that features are important for the predictions, when in reality the model was just overfitting and the features were not important at all. Refer below picture.
Above picture shows distributions of feature importance values by data type. An SVM was trained on a regression dataset with 50 random features and 200 instances. The SVM overfits the data: Feature importance based on the training data shows many important features. Computed on unseen test data, the feature importances are close to a ratio of one (=unimportant).
https://christophm.github.io/interpretable-ml-book/feature-importance.html
https://machinelearningmastery.com/calculate-feature-importance-with-python/
https://images.app.goo.gl/3SXfL7o9Eu22etux8
https://stackoverflow.com/questions/44101458/random-forest-feature-importance-chart-using-python
https://www.researchgate.net/figure/Feature-importance-values-after-training-lasso-model_fig1_330114046