Overview
Association Rule Mining (ARM) is a technique in data mining that identifies relationships between variables in large datasets. It is widely used in applications such as market basket analysis, recommendation systems, and healthcare analytics. ARM helps uncover patterns in data, allowing businesses and researchers to understand which items frequently appear together.
Association Rules:
Association rules describe the relationship between items in a dataset. They follow the format:
If {X}, then {Y}
Here, {X} is the antecedent (condition), and {Y} is the consequent (result).
Important ARM Metrics:
Support:
Measures how frequently an item or itemset appears in the dataset.
Support = (Frequency of itemset) / (Total transactions)
Confidence:
Indicates how often the consequent {Y} appears in transactions that contain the antecedent {X}.
Confidence = (Support of {X → Y}) / (Support of {X})
Lift:
Evaluates how much more likely {Y} is to appear when {X} is present compared to when {X} is absent.
Lift = (Confidence of {X → Y}) / (Expected Confidence of {Y})
Lift > 1: {X} and {Y} occur together more often than expected.
Lift < 1: {X} and {Y} occur together less often than expected.
Lift = 1: {X} and {Y} are independent.
The Apriori algorithm is a popular method for discovering association rules in large datasets. It follows the principle that any subset of a frequent itemset must also be frequent.
How Apriori Works:
Step 1 - Identify Frequent Itemsets:
Scan the dataset to calculate the support of individual items.
Discard items below a predefined minimum support threshold.
Generate candidate itemsets by combining frequent items.
Step 2 - Generate Association Rules:
Use the frequent itemsets to generate rules.
Calculate confidence and lift for each rule.
Retain only the rules that meet the minimum confidence threshold.
Step 3 - Pruning (Reducing Computation):
Discard infrequent itemsets early to optimize performance.
Stop when no new frequent itemsets can be generated.