Association Learning (AL) is a type of machine learning technique used to discover interesting relationships (associations) or patterns (expressed as "IF-THEN" statements) within large datasets. These algorithms aim to identify sets of items that frequently occur together, revealing dependencies between variables.
Use Case : Analyze Product dependencies to uncover cross-sell opportunities and revenue generating product placement / recommendation strategies. For example, if a customer buys bread and butter, they are most likely to buy milk
Overview : Association learning algorithms, or association rule learning, are a type of unsupervised machine learning technique. AL algorithms are valuable tools for discovering hidden insights in transactional datasets across various domains. These algorithms aim to identify sets of items that frequently occur together, revealing dependencies between variables.
Domain : Online recommendation engines , e-commerce retail marketplaces, and online auction stores
Key Topics
Support: Measures how frequently an itemset appears in a dataset. It's the fraction of transactions containing the itemset
Confidence: Measures the strength of an association rule. It's the probability that the consequent (B) will be present given that the antecedent (A) is present
Frequent Itemset: Sets of items that meet a minimum support threshold
Association Rules: IF-THEN statements derived from frequent itemset, meeting a minimum confidence threshold
Algorithms: Popular algorithms include Apriori, Eclat, and FP-growth, each with different strategies for efficiently finding frequent item sets and generating rule
Challenges
Scalability: Processing large datasets can be computationally expensive
Rule Explosion: Generating a vast number of rules, many of which may be uninteresting or redundant will pose long term maintenance and compliance challenges
Setting Thresholds: Determining appropriate support and confidence thresholds to avoid spurious or irrelevant rules
Interpretability: While association rules are generally easy to understand, dealing with a large volume of rules can be overwhelming for interpretation. Many rules will overlap at times adversely impacting the final product recommendations
Missing Data: Incomplete data can reduce the accuracy of association rule mining. Dynamic rule generation may yield non-deterministic results across time and space
Non-Causal Relationships: Association rules identify correlations, not necessarily causal relationship
Proposed Solutions
Use efficient algorithms like FP-growth and Eclat can be more efficient than Apriori for large datasets.
Apply interestingness measures beyond support and confidence, other metrics like lift and conviction help filter out less interesting rules.
Continuous pruning techniques to remove redundant or trivial rules, reducing the rule space.
Combining AL techniques with other techniques like clustering or classification (associative classification) to enhance analysis and reduce rule space
Establish an A/B testing framework to assess association rule effectiveness across categories and marketplaces, and implement continuous monitoring to eliminate redundant or ineffective rules
Applications
Recommendation Systems: Suggesting products or content to users based on their past behavior or similar user preferences. for example, generate related product item and frequently bought together item list
Market Basket Analysis: Identifying products frequently purchased together to optimize store layout, product placement, and promotional strategies. For example, customers who buy shoes will most likely purchase comfort / orthopedic inserts
Web Usage Mining: Analyzing website navigation patterns to improve user experience and website design. For example, actively engaged users will usually buy more with larger average order volume and total
Medical Diagnosis: Identifying disease symptoms or risk factors that frequently occur together
Cybersecurity: Detecting unusual patterns or anomalies in network traffic that may indicate a security threat such as IP address and fraudulent request location
Healthcare: Predicting patient outcomes or identifying optimal treatment strategies such as prescribing medication A and B to treat underlying diagnosed condition(s)
Prototype
Use Case included : Develop Market Basket Analysis using mlxtend Machine Learning library extensions such as apriori, association_rules
Use Case not included : Applying FP-growth and Eclat and combining AL techniques with other ML techniques like clustering or classification
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules
# Step 1: Define the dataset
dataset = [
['Bread', 'Butter', 'Milk'],
['Bread', 'Diaper', 'Beer', 'Eggs'],
['Milk', 'Diaper', 'Beer', 'Coke'],
['Bread', 'Milk', 'Diaper', 'Beer'],
['Bread', 'Milk', 'Diaper', 'Coke']
]
# Step 2: Encode the dataset using TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)
# Step 3: Find frequent itemsets using Apriori
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)
print("Frequent Itemsets:")
print(frequent_itemsets)
# Step 4: Generate association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
print("\nAssociation Rules:")
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])
#Output
Association Rules:
antecedents consequents support confidence lift
==================================================
0 (Diaper) (Beer) 0.6 0.75 1.2500
1 (Beer) (Diaper) 0.6 1.00 1.2500
2 (Diaper) (Bread) 0.6 0.75 0.9375
3 (Bread) (Diaper) 0.6 0.75 0.9375
4 (Milk) (Bread) 0.6 0.75 0.9375
5 (Bread) (Milk) 0.6 0.75 0.9375
6 (Diaper) (Milk) 0.6 0.75 0.9375
7 (Milk) (Diaper) 0.6 0.75 0.9375
===============================================================================================
Metric ============== Formula ================================ Purpose
Support (Transactions with A and B) / Total Transactions How often A and B occur together
Confidence (Transactions with A and B) / (Transactions with A) How often B occurs when A has occurred
Lift Confidence / (Support of B) How much more likely B is given A
===============================================================================================
Summary : Market basket analysis uses association rules to identify products often bought together. These patterns help retailers optimize placement, promotions, and cross-selling to boost sales and customer satisfaction. Metrics like support, confidence, and lift measure the strength of these relationships .