Classification Techniques
- Classification = Supervised Learning used to predict discrete categorical outcomes.
- Learns from labeled training data to classify new examples.
- Examples: Spam detection, fruit type prediction, disease diagnosis.
K-Nearest Neighbors (K-NN)
How It Works
- Calculate distance between new sample and all training points.
- Select K nearest neighbors.
- Classify by majority vote.
Classifying Fruits
- Dataset Table (Weight + Color Code → Apple/Banana).
- Step-by-step: Distance calculation → Nearest neighbors → Majority voting.
- Result: New fruit classified as Apple.
Use odd K to avoid ties.
Forgetting to normalize features.
Decision Trees
How They Work
- Root Node: Start at the root node, which represents the entire dataset.
- Splitting: Split the data based on the feature that results in the most homogeneous child nodes. Common criteria include Gini impurity and entropy.
- Recursive Splitting: Apply the splitting process recursively to each child node until a stopping condition is met (e.g., all instances at a node belong to the same class).
- Pruning: Optionally, prune the tree to remove splits that have little importance, reducing overfitting.
Classifying Fruits with Decision Trees
- Dataset Table (Weight, Color Code, Sweetness → Apple/Banana).
- Build tree step by step: split on Color Code → then Sweetness.
- Result: Simple rules classify new fruits.
Always mention Gini or Information Gain when describing splitting.
Growing trees too deep → overfitting.
Classifying Fruits with K-NN
Consider a dataset with fruits classified as "Apple" or "Banana" based on weight and color code.
| Fruit | Weight (grams) | Color Code | Label |
|---|---|---|---|
| Fruit 1 | 150 | 1 | Apple |
| Fruit 2 | 170 | 1 | Apple |
| Fruit 3 | 130 | 2 | Banana |
| Fruit 4 | 180 | 2 | Banana |
A new fruit has a weight of 160 grams and a color code of 1. Using K = 3, we classify it as follows:
- Distance Calculation: Compute the Euclidean distance to each fruit in the dataset.
- Distance to Fruit 1: 10
- Distance to Fruit 2: 10
- Distance to Fruit 3: 30.41
- Distance to Fruit 4: 20.62
- Nearest Neighbors: The three closest fruits are Fruit 1, Fruit 2, and Fruit 4.
- Majority Voting: Two out of three neighbors are labeled as "Apple," so the new fruit is classified as an "Apple."
Classifying Fruits with Decision Trees
Consider a dataset with features like weight, color code, and sweetness level.
| Weight (grams) | Color Code | Sweetness Level | Label |
|---|---|---|---|
| 150 | 1 | 8 | Apple |
| 100 | 1 | 7 | Apple |
| 120 | 2 | 9 | Banana |
| 130 | 2 | 10 | Banana |