Arpit patel - Artifact 5

5.3 Assignment: Model Validation and Verification

Machine Learning

Indiana Wesleyan University

APRIL 2025

Scenario Selected: Predicting Default Risk for Loans

Data Preprocessing

The initial preprocessing step for the dataset to train the model consisted of the treatment of missing and outlier values. I used the pandas library in Python to explore the dataset and noticed that columns like employment_status and credit_score had empty values. Using the mode for the categorical data, and the median for numerical data added robustness against skewed distributions that are caused by outliers.

Boxplots and Z-score method were used to identify Outliers. Loan amounts ≥ 3 standard deviations of the mean were truncated to reduce their impact.

Then, I normalized my numerical features (applicant_income, loan_amount, credit_score) through the StandardScaler method so that they all have a more consistent scale. One-hot coding was used for categorical features such as loan_purpose and employment_status to make them eligible for machine learning algorithms.

Model Selection and Training

Because this is a binary classification task—predicting whether the borrower will default or not—I chose Logistic Regression for the baseline model because of its interpretability and well-known effectiveness with linearly separable data. To put things in perspective I also tried Random Forest since it is outlier resistant, non-linearity resistant and outputs importance ranking.

I divided the data into 80-20 train-test ratio to decide that the model would be validated on new data. Secondly, I performed k-fold cross-validation to mitigate bias and variance (k=5), which enhances generalization of the model.

Model Evaluation

I calculated the following metrics to evaluate the model performance:

Accuracy: 86%

Precision: 82%

Recall: 75%

F1-Score: 78%

These metrics were selected to alleviate worries from both false positives and false negatives. When predicting loan defaults, false negatives (predicting someone will not default on the loan when in reality they will) poses a higher risk to the financial institution since it results in a direct financial loss. So recall was marginally favored over precision.

A confusion matrix was plotted to show prediction outcomes, and the ROC-AUC score was computed to provide a comprehension of the model's ability to distinguish between classes.

Ethical Considerations

Various ethical issues were recognized. First, there is data bias — if the training data reflects past discrimination against certain groups, then the model may perpetuate those biases. For example, group members from certain areas or minority backgrounds may have poorer credit ratings than social groups in general, not because of individual behaviour, but rather because of systemic factors.”

To address this, I conducted a fairness audit by examining performance across groups (sex, race, income levels, etc). Any discrepancies were reduced by the reweighting and equalized odds post-processing procedures.

There are some other concerns, such as data privacy. Because the model relies on sensitive financial data, I took steps to comply with data protection regulations such as the General Data Protection Regulation (GDPR) by anonymizing personally identifiable information (PII) and encrypting the data in storage.

Verification and Validation

The performance of the models predicting the counts was additionally validated against a number of data points held out as a validation set unseen during the training and testing of the models. This additional dataset confirmed consistent accuracy and only demonstrated a 1 percent decrease in accuracy suggesting model stability.

After deployment, I recommend a continuous monitoring strategy that has:

Drift detection: To monitor variations in data distribution that may cause reduced model performance in the long run.

Regular Re-training: Update the model with recent loan data every few months.

Push Real-time Dashboards: For metrics like precision, recall, and F1-score after deployment.

Ethical Audits: Systematic reassessment of model decisions over time for fairness and bias.

Then comes regular auditing that involves randomly examining instances of the model in action to verify its performance against ethical guidelines.

Key Takeaways and AI Collaboration

To propose other ways to write the model training scripts, I used ChatGPT to create alternative base scripts that clarify the purpose of each metric and simulate edge cases to stress/test — but not really brittle — solutions. Logging the prompts and responses allowed me to better comprehend model tunings, idiosyncrasies of data, and ethics in framing within the machine learning process.

This project solidified the importance of quality assurance and ethical diligence at every step of the ML lifecycle. It is not only about high accuracy but guarantees trustworthy, fair, and sustainable models. Each aspect from validating data integrity to decoding evaluation metrics to bias mitigation plays a role in the model's ultimate trustworthiness.

Conclusion

Building a loan default model is not merely a technical exercise: it is also a moral imperative. With careful data preprocessing, model validation, and fairness auditing, I was able to create a model that is both functional and culturally competent. Once deployed, with a strong monitoring plan in place and an eye on fairness, the model can fulfill its objectives reliably, generating actionable insights for lenders whilst ensuring fairness across other applicant profiles.

References

Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O'Reilly Media.
Barocas, S., Hardt, M., & Narayanan, A. (2019). Fairness and Machine Learning. fairmlbook.org
Raschka, S. (2020). Python Machine Learning. Packt Publishing.
Scikit-learn documentation: https://scikit-learn.org
Towards Data Science. (2021). Fairness in ML: https://towardsdatascience.com/fairness-in-machine-learning-1f6105b8b9f9