GOAL
The project aimed to predict customer churn within a banking context, crucial for preemptive strategies to retain customers. Leveraging Azure Machine Learning tools, the objective was to develop an accurate predictive model. Through iterative refinement, the best-performing model was selected, enabling proactive identification of potential churn. Additionally, the analysis delved into identifying the employee characteristics most influential on customer charm, providing valuable insights for optimizing customer satisfaction and retention strategies.
Opting for Azure Machine Learning Studio over individual algorithm development in Python offers compelling benefits. The platform provides a streamlined workflow, eliminating the need for separate algorithm coding by offering a range of pre-built modules and tools. This significantly reduces development time and complexity, allowing quick experimentation and model deployment. Moreover, Azure Machine Learning Studio's graphical interface simplifies the entire machine learning process, enabling seamless data preparation, model training, and deployment without the intricacies of coding. Its intuitive environment encourages collaborative work, facilitates version control, and seamlessly integrates with other Azure services, ensuring a more efficient, scalable, and integrated approach to machine learning tasks
ABOUT DATA
In this dataset, we delve into a comprehensive array of factors potentially influencing customer churn within a bank. The dataset encompasses diverse attributes such as Customer Id, Surname, and Row Number, which primarily serve as identification and indexing variables with no direct impact on the customers' decision to leave the bank. However, crucial elements such as Credit Score, Geography, and Gender offer intriguing insights into customer behavior regarding their propensity to exit the bank. Age and Tenure emerge as pivotal indicators, showcasing the significance of seniority and longevity in the banking relationship as determinants of customer loyalty. Balance, Number Of Products, and Has Credit Card shed light on financial behaviors, while Is Active Member denotes the engagement level of customers—a key factor influencing their likelihood to depart. Additionally, factors like Estimated Salary, Complain, Satisfaction Score, and Card Type further enrich our understanding of customer dynamics, providing a holistic view of the various dimensions contributing to customer churn within the banking sector.
EXPLORATORY DATA ANALYSIS
Firstly, an exploratory analysis of the data was conducted to understand the information provided by the company and the quality of this information. It was found that a sufficient number of features have been provided to initially analyze customer churn behavior. Subsequently, several graphs are presented.
The data provided shows that twenty percent of the customers have churned from the bank. This proportion can serve as a viable basis for constructing a predictive model
Understanding how customer churn varies based on the geographical location of the bank is crucial for targeted retention campaigns. It's evident that in Germany, there's a higher churn rate compared to Spain, despite the latter having the second-largest number of enrolled customers.
It's observable that as the number of products held by the bank increases, the likelihood of customer churn decreases. This graph serves as an initial analysis, suggesting that campaigns targeting potential customers for new product openings could be a viable strategy to help reduce churn
The behavior of age among customers who remain with the bank and those who have left appears to be quite similar
Clearly depicted in this graph is the direct correlation between customer complaints and churn: those who voice complaints have ultimately departed from the bank. Consequently, addressing and effectively resolving customer complaints stands as a pivotal strategy in the endeavor to retain clientele
Having a credit card with the bank doesn't ensure customer retention, unlike the number of products held, where the credit card doesn't emerge as a strong retention factor."
DATA PREPROCESSING
After analyzing the data, a thorough check was conducted to identify high cardinality features, yet none were found. Furthermore, no missing values were detected within the training dataset's features. Equally important, all classes exhibit balanced representation in the training data, ensuring a conducive environment for developing robust and representative models
BUILDING AND TRAINING MODEL PREDICTOR
BEST ALGORITMS AND METRICS
Within the evaluated algorithms, the following 5 were found to have the highest accuracy.
Voting Ensemble with AUC weighted 0.99978
XG Boost Classifier with AUC weighted 0.99976
Light GBM with AUC weighted 0.99973
In this analysis, a range of performance evaluation metrics has been employed to comprehensively assess the selected model. Precision and recall provide insights into the model's accuracy and its ability to correctly identify relevant instances. The ROC (Receiver Operating Characteristic) curve offers a visual depiction of the trade-off between sensitivity and specificity, aiding in understanding the model's discrimination threshold. Additionally, the calibration curve assists in evaluating the model's confidence predictions. Lastly, the confusion matrix serves as a foundational tool, offering a detailed breakdown of predicted versus actual outcomes, crucial for understanding the model's strengths and weaknesses in classification tasks
The analysis of the dataset has revealed key features significantly influencing the churn detection model within the banking sector. Primarily, the presence of complaints emerges as the most impactful factor, exhibiting substantial influence on the model's predictive capabilities. Additionally, customer age contributes moderately, portraying a discernible yet comparatively lesser influence on churn prediction. Furthermore, the number of products held by the customer with the bank proves to be a significant contributing factor to the model's efficacy. Lastly, the geographical location, representing the bank's locale, stands out as an influential element in predicting customer churn, underscoring its relevance within the predictive framework of the model.
WEB APPLICATION TO PREDICT