This assignment focuses on selecting the best machine learning classifier to predict purchase intentions in the Soweto market, using key variables from a previous assignment on the subsistence retail dataset. It involves evaluating four classifiers by optimising hyperparameters and assessing performance with precision, recall, F1-score, and accuracy. Additionally, learning curves will be generated, misclassification errors analysed, and the top-performing model serialised for independent testing. The assignment concludes with a detailed findings summary and two business recommendations—one from model insights and one broader strategic suggestion for Soweto's market.
Learning Curves of Different Classifiers
Metrics Performance
Macro Precision: 0.89 – Out of all the positive predicted, what percentage is truly positive. Random Forest has the highest macro precision, indicating that it performs well in predicting true positive of the classes.
Macro Recall : 0.80 – Recall shows the share of true positive predictions made by the model out of all positive samples in the dataset. Random Forest has the highest macro recall, meaning it correctly identifies a higher percentage of true positives across classes. This suggests that Random Forest excels in catching true positive cases across multiple classes, despite class imbalances.
Macro F1-Score : 0.84 – The F1-score balances precision and recall, and Random Forest’s high macro F1-score indicates a well-rounded performance.
Accuracy : 87.1% – This algorithm also has the highest accuracy, reinforcing that it performs consistently well in both precision and recall across classes.
Therefore the best performing model is Random Forests.
Random Forests Confusion Matrix
Confusion Matrix (Test Data):
[[ 7 1 2 0]
[ 1 14 2 1]
[ 0 0 43 0]
[ 0 0 4 10]]
Classification Report (Test Data):
precision recall f1-score support
2 0.88 0.70 0.78 10
3 0.93 0.78 0.85 18
4 0.84 1.00 0.91 43
5 0.91 0.71 0.80 14
accuracy 0.87 85
macro avg 0.89 0.80 0.84 85
weighted avg 0.88 0.87 0.87 85
Accuracy Score (Test Data):
0.8705882352941177
The analysis found that the Random Forest model was the top-performing classifier, achieving high accuracy (87.1%) and strong precision (0.89), recall (0.80), and F1-score (0.84), making it highly reliable for predicting purchase intentions.
Recommendation: A business in Soweto should use this model to identify high-purchase-likelihood customers based on demographic and behavioral data, allowing for targeted marketing.
Application: By analysing factors such as demographics, empathy, convenience, price sensitivity, physical environment, perceived product quality, customer trust, and perceived value, businesses can create targeted ad campaigns to maximise conversions and reduce marketing costs.
The Random Forest model helps businesses identify high-potential customer segments by analysing factors like demographics, empathy, convenience, price sensitivity, and customer trust. This allows for more effective resource allocation and targeted marketing.
Key strategies for Soweto businesses include offering flexible payment options to demonstrate empathy, focusing on affordable products for price-sensitive customers, and promoting convenience through services like delivery. Trust-building and ensuring a safe, welcoming store environment are also important for long-term customer relationships.
Lastly, segmenting customers by age and income can help tailor offerings, with younger, lower-income groups attracted to affordable, trendy products, while older, working-class customers may prioritise reliability and family-oriented goods. Engaging with the local community strengthens a business’s presence in Soweto.