This assignment focuses on using Bayesian Networks to examine relationships in subsistence retail consumer purchase behavior in South Africa. It starts with an Exploratory Data Analysis (EDA) to identify correlations and select key variables in each category. Two Bayesian Networks will be created: a manually designed Direct Acyclic Graph (DAG) and a data-driven learned structure. The learned network will then be used for inference to analyse variable combinations that impact purchase intentions, providing insights into underlying relationships.
Example of Justification of the best performing variables per category:
Correlation - In the heatmap, Shopping Frequency shows moderate to strong correlations with key demographic variables such as Age, Marital Status, Employment Status, and Regular Customer. The heatmap colours (reds and blues) indicate the strength, suggesting that Shopping Frequency is meaningfully associated with these factors.
P-Values - The p-values adjacent to Shopping Frequency are often below the 0.05 threshold, indicating statistically significant relationships. This confirms that the observed correlations are unlikely to have occurred by chance, adding credibility to Shopping Frequency as a critical variable.
Bar-Graph - Shopping Frequency stands out as one of the more substantial averages among demographic variables. Although it is not the highest, its level is significant compared to other variables such as Regular Customer and Employment Status. This highlights its importance in capturing customer behavior trends.
Pairplot - The plots are more colorful with multiple hues, reflecting different shopping frequency.
Scatter plots show distinct patterns where certain shopping frequency cluster differently, indicating stronger associations between the continuous variables and the shopping frequency categories.
Some variables clearly differentiate between shopping times per week , with noticeable separations in both density and scatter plots.
I chose Shopping Frequency because the pairplot reveals that different shopping frequency categories show clear variations in behavior across multiple features. These variations are crucial for understanding how frequently customers shop and how it influences their perceptions and experience.
The variables that are selected are: E2; C2; PS1; PE6; PPQ1; CT5; PV3; PI1 and Shopping Frequency.
Directed Acyclic Graphs (DAGs)
Two Directed Acyclic Graphs (DAGs) to model purchase intentions based on a retail dataset. The first DAG is manually designed using expert knowledge to map logical relationships between variables. The second DAG is generated by bnlearn, which uses data-driven algorithms to detect variable dependencies automatically. This comparison highlights how expert input may capture anticipated causal influences, while bnlearn may reveal hidden patterns, providing a balanced view of both theory-driven and data-driven insights. Evaluating both structures enables a comprehensive understanding of variable interactions.
Insights Gained
A medium level of empathy (the store somewhat understands customer needs) combined with a high level of convenience (easy layout) provides the best customer satisfaction.
A high level of empathy (the store has a good understanding of customer needs) combined with high price sensitivity (customers are willing to continue purchasing even with price increases) leads to the most favorable outcome.
A high level of perceived product quality (customers believe the products they buy are of excellent quality) combined with a high level of physical environment security (customers feel very safe and secure in the store) results in the most favourable purchase intentions.
A medium level of customer trust (customers believe the store consistently provides good products and services) combined with medium price sensitivity (customers are moderately willing to continue buying even if prices increase) leads to the most favourable outcome.
A high level of convenience (the store's layout is easy to navigate) combined with high perceived product quality (customers believe the products they buy are of excellent quality) leads to the most favourable outcome.
A high level of empathy (the store has a good understanding of customer needs) combined with moderate customer trust (the store consistently provides good quality products) leads to the most favourable outcome.