What Are Association Rules?
Association rules are if-then statements that reveal relationships between items in a data set.
- Antecedent (If): The item or set of items that trigger the rule.
- Consequent (Then): The item or set of items that result from the antecedent.
In a supermarket, an association rule might be: If a customer buys bread and butter, Then they are likely to buy milk.
Key Metrics in Association Rule Learning
- Support: The frequency of an itemset in the data set.
- Formula: $$\text{Support}(A) = \frac{\text{Number of transactions containing } A}{\text{Total number of transactions}}$$
- Confidence: The likelihood that the consequent occurs given the antecedent.
- Formula: $$\text{Confidence}(A \Rightarrow B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A)}$$
- Lift: The strength of the rule compared to random chance.
- Formula: $$\text{Lift}(A \Rightarrow B) = \frac{\text{Confidence}(A \Rightarrow B)}{\text{Support}(B)}$$
A lift value greater than 1 indicates a strong positive association between the antecedent and consequent.
The Apriori Algorithm: A Step-by-Step Guide
The Apriori algorithm is a popular method for mining frequent itemsets and generating association rules.
- Set Minimum Thresholds: Define minimum support and confidence levels.
- Generate Frequent Itemsets: Identify itemsets that meet the minimum support threshold.
- Create Higher-Order Itemsets: Combine frequent itemsets to form larger ones.
- Generate Association Rules: Create rules from frequent itemsets and calculate their confidence.