Our project followed a structured approach to build and evaluate a credit score prediction model using machine learning techniques. The steps involved were as follows:
1. Data Cleaning: Initially, we cleaned the data by removing irrelevant columns and handling missing values. We also converted categorical variables to numerical values where necessary[3].
2. Data Balancing: We identified an imbalance in the target label Credit_Score. To address this, we used the Synthetic Minority Over-sampling Technique (SMOTE) to balance the dataset, ensuring an equal representation of each class.
3. Feature Scaling: We scaled the features using StandardScaler to standardize the data, which is essential for many machine learning algorithms to perform optimally.
4. Initial Modeling: We trained an Elastic Net model to identify the most influential features. The model's coefficients were analyzed, and non-zero coefficient features were selected for further analysis. These coefficients include alpha, l1_ratio, and random_state.
5. Cross-Validation with Multiple Models: We performed cross-validation using Ridge, Lasso, and Elastic Net models to compare their performance. This step ensured the robustness and reliability of our models by evaluating them on different subsets of the data.
6. Stacking Classifier: We employed a stacking classifier to combine multiple models for improved prediction performance. The base classifiers included a calibrated Support Vector Classifier (SVC) and an XGBoost classifier, with a Logistic Regression model as the final estimator. This ensemble method aimed to leverage the strengths of different models to create a stacking model with a high prediction accuracy.
7. Evaluation and Visualization: We used K-Fold cross-validation to train and evaluate the stacking classifier. Confusion matrices and accuracy scores were generated for each fold to assess model performance. Finally, we visualized the importance of the selected features based on their coefficients from the Elastic Net model using a horizontal bar plot.
Through this systematic approach, we effectively prepared and modeled the data, ultimately achieving a robust and interpretable credit score prediction model.
For the detailed explanation of the methodologies used, please refer to the in depth documentation.