In association analysis, the objective is to understand and determine the relationship among variables. For example, by using market basket analysis, if a father of a child purchase a diaper, there is a 0.40 probability that he will purchase beer.
Association analysis is to understand the co-occurencer among items. For example, which item is the antecedent and which item is the consequent. For instance, the antecedent is the input while the consequent is the output (result). And to understand what is the support and rule support of the analysis. The former refers to % of transactions that contains X= P(X), which is the antecedent support. The latter refers to % of times X and Y appear together, P(X and Y), which is the antcedent and consequent. Last but not least, the confidence of the result. The confidence is the likelihood that Y appears when X occurs, P(X and Y) / P(X).
Nevertheless, association analysis are exploratory in nature, with a view to better understand groupings and patterns in the data set.
Clustering analysis is a type of exploratory techniques that discover natural grouping in data. In clustering analysis, we group homogeneous objects into similar cluster and heterogeneous objects into different cluster.
For example, by deploying K-means in SPSS Modeler, we determine the number of clusters by:
1) Assigning objects to its closest centroid
2) Update the centroid of each cluster with every iteration
3) Continue until the stopping criterion is fulfilled
One of the example of cluster analysis is to determine the area/zone in Singapore that has a high rate of dengue fever. Area with high value will be classified as red zone while area with low value will be classified as green zone.
Referring to beloww for the clustering analysis of a cafe:
Predictive analysis is to use current or historical data to build models that predict the future values or states of certain variables. There are two kind of predicitve models:
1) Predictor (Estimator): Predicts a numeric or metric target
2) Classifier: Predicts a class or categorical (non-metric) target
The types of predictor models include:
1) Regression (Simple Linear Regression / Multiple Linear Regression)
2) Decision Tree (CHAID, CART, C5.0, and QUEST)
3) Artificial Neural Network (ANN)
Referring to the diagram above for the decision tree built by the CHAID model.
The first split of the model includes three splits which are Customer Tier A (Node 1), Customer Tier B (Node 2), and Customer Tier (Node 3). Tier A has a size value of 25.47% with a predicted value of 9.996 while Tier B has a size value of 29.593% with a predicted value of 8.854. On the other hand, Tier C has a size value of 44.936% with a predicted value of 8.159. From the preliminary analysis, it can be concluded that the customers in Tier A are more likely the targeted customers to achieve the predicted goal of the revenue. However, they have the smallest base of the customer as compared to the other tiers.
For example, under the Node 7 (Actual Revenue >9.988), it has a predicted value of 10.683. Its child node (Node 35) that require a follow-up call has a predicted value of 11.097. Despite being the smallest size value among the same level of nodes, however, it is likely to convince the customer to purchase the products and achieve the predicted goal. Moreover, it can be noticed that Node 34 with a predicted value of 10.744 is likely to achieve the predicted goal when there is a follow-up call.