Clustering Techniques in Unsupervised Learning
What is Clustering?
Clustering
Clustering is a technique used to group a set of objects so that objects in the same group (or cluster) are more similar to each other than to those in other groups.
- Clustering is a form of unsupervised learning, meaning it works with unlabeled data.
- The algorithm identifies patterns and structures without prior knowledge of the data's categories.
How Clustering Works
- Feature Extraction: Identify the characteristics or features of the data points.
- Similarity Measurement: Use mathematical methods to determine how similar or different the data points are.
- Grouping: Organize data points into clusters based on their similarities.
- Think of clustering like organizing a library.
- Books are grouped by genre, author, or topic, even if they don't have labels.
- The goal is to place similar books together, making it easier to find related content.
Key Clustering Techniques
K-Means Clustering
K-Means is one of the most popular clustering algorithms. It partitions the data into k distinct, non-overlapping clusters.
How K-Means Works
- Initialize: Randomly select k centroids (central points) in the data.
- Assign: Assign each data point to the nearest centroid.
- Update: Recalculate the centroids as the mean of all points in each cluster.
- Repeat: Iterate the assign-update steps until the centroids stabilize.

- A centroid is the average position of all data points in a cluster.
- It represents the "center" of the cluster.
Hierarchical clustering is useful for data sets where tree-like relationships are important, such as taxonomy creation.