What Are Association Rules?
Association rules are if-then statements that reveal relationships between items in a data set.
- Antecedent (If): The item or set of items that trigger the rule.
- Consequent (Then): The item or set of items that result from the antecedent.
In a supermarket, an association rule might be: If a customer buys bread and butter, Then they are likely to buy milk.
Key Metrics in Association Rule Learning
- Support: The frequency of an itemset in the data set.
- Formula: $$\text{Support}(A) = \frac{\text{Number of transactions containing } A}{\text{Total number of transactions}}$$
- Confidence: The likelihood that the consequent occurs given the antecedent.
- Formula: $$\text{Confidence}(A \Rightarrow B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A)}$$
- Lift: The strength of the rule compared to random chance.
- Formula: $$\text{Lift}(A \Rightarrow B) = \frac{\text{Confidence}(A \Rightarrow B)}{\text{Support}(B)}$$
A lift value greater than 1 indicates a strong positive association between the antecedent and consequent.
The Apriori Algorithm: A Step-by-Step Guide
The Apriori algorithm is a popular method for mining frequent itemsets and generating association rules.
- Set Minimum Thresholds: Define minimum support and confidence levels.
- Generate Frequent Itemsets: Identify itemsets that meet the minimum support threshold.
- Create Higher-Order Itemsets: Combine frequent itemsets to form larger ones.
- Generate Association Rules: Create rules from frequent itemsets and calculate their confidence.
- Prune Rules: Discard rules that do not meet the confidence threshold.
When using association rule learning in crime analysis, ensure that the data does not disproportionately target specific communities, leading to biased policing practices.
Real-World Applications
- Market Basket Analysis: Retailers discover products often purchased together.
- Crime Analysis: Identify patterns (e.g., vandalism often co-occurs with theft).
- Medical Research: Find correlations between symptoms and diseases.
- Recommendation Systems: Suggest products/items based on frequent combinations.
Exams often ask for practical scenarios → always link to retail, healthcare, or crime analysis.
Presenting trivial associations (e.g., “people who buy bread also buy food”).
Broader Implications
- Advantages:
- Uncovers hidden patterns.
- Helps decision-making in retail, law enforcement, healthcare.
- Limitations:
- May produce too many rules, including irrelevant ones.
- Sensitive to noisy or sparse data.
Mention “interpretation” : rules must be evaluated for usefulness.
Assuming all discovered rules are causal; association ≠ causation.