This assignment involves an in-depth analysis of a dataset using descriptive and inferential statistics, feature engineering, and data visualization. Key variables will be explored, and significant associations with purchase intentions will be identified. The dataset will be refined to focus on the best-performing variables overall, not just within each category. Selected variables will be transformed and engineered to improve analytical value. The findings will be presented in an interactive, visually appealing dashboard in Looker Studio, offering clear insights into the dataset.
Correlation Analysis
Regression Analysis
-- Regression Analysis for PI1 ---
Model Coefficients: [-0.09701702 0.02514731 0.03683982 0.01861369 -0.10838317 0.17015032
-0.14036088 0.04816357 0.02420297 0.18002811 0.03361933 0.05885744
-0.12535669 0.10413195 0.12117951 -0.03051598 0.16818491 0.04475443
0.03415804 -0.16124357 -0.17229295 0.1469165 -0.18244428 0.01539774
-0.00526124 -0.05383703 -0.13505006 0.11789565 0.16378804 0.33953666
-0.03400674 -0.06915525 0.07454367 0.09390615 -0.11706537 -0.45052749
-0.08727535]
Intercept: 2.7210503865426263
Mean Squared Error (MSE): 0.27493342359701173
R-squared (R²): 0.6182655156979953
Actual Predicted
33 4 4.352027
158 2 2.868208
249 3 2.995834
260 4 4.272288
101 4 3.308348
Minimum Dataset
Based on a combination of correlation matrix and regression analysis, ten key variables were selected to predict purchase intention, specifically focusing on PI1 ("I intend to purchase from this grocery store"). The correlation matrix revealed strong associations between PI1 and variables such as Perceived Value (PV3, PV2), Price Sensitivity (PS3), and Perceived Product Quality (PPQ1, PPQ4), establishing PI1 as the most influential PI variable. Regression analysis further supported PI1’s selection, with a high R² value of 0.618 and a low MSE of 0.275, indicating strong model fit and prediction accuracy. Shopping Frequency, Perceived Product Quality, Convenience, Customer Trust, and Physical Environment were among the top predictors, as these variables demonstrated high correlation coefficients and significant regression impacts, underscoring their importance in shaping customer purchase intentions. These selected variables will be used to create a comprehensive dashboard, visually representing insights to aid in decision-making.
The best performing variables are: Shopping Frequency; PV3 ; PPQ1; PS3; C2 ; PPQ4 ; CT6 ; PE6 ; PV2 ; and CT7.