Prepared the datasets by creating a new variable called Total Users by adding casual users and regular users for model comparison
Data Understanding: Box Plot Graph in SAS EM to detect outliers in the datasets
The Final Diagram
Best Model: Stepwise Linear regression transformed by interval inputs into range and Class inputs into Dummy variables and connecting to neural network has given the best model compared to the Average squared Error of all the other models
Created Machine Learning Model to determine influential variables to predict consumer behavior of Upgradation to premium subscription
Data Understanding
Descriptive Statistics from SAS EG
Data Understanding
Box and Whisker Plots to identify outliers in varibales age and account_score
Data Understanding
Corelation analysis of upgrade with 7 other variables
Bar Chart and Box Plot to determine the influence of Gender on Upgrade ( Bar chart) and influence of Monthly Changes on Upgrade (Box Plot) using Python
Developed several Machine Learning Models to determine the most useful/influential variables to predict consumer behavior of Upgradation to premium subscription and scoring new customers based on prediction
Best Model:
The decision tree model (Brach=2, Depth= 6, Size=5) ), with a slightly higher misclassification rate of 14%, however, has more interpretability. Additionally, 0.339 RASE, 2.35% false positive rate, and 11.65% false negative rate. The Decision Tree model used Age, Past Purchases, Active Status, Location, and Monthly Charges as the most significant predictors of upgrades. On the other hand, the neural network model has a lower misclassification rate (13.77%) than the decision tree model but a higher false negative rate (10.54%).