In the previous segments, you got an idea about how clustering works – it groups the objects on the basis of their similarity or closeness to each other.
Now, the next important thing is to get into the nitty-gritty of how clustering algorithms generally work. You will learn about the 2 types of clustering methods – K-means and Hierarchical and how they go about doing the clustering process.
We have learnt that clustering works on the basis of grouping the observations which are the most similar to each other. What does this exactly mean?
In simple terms, the algorithm needs to find data points whose values are similar to each other and therefore these points would then belong to the same cluster. The method in which any clustering algorithm goes about doing that is through the method of finding something called a “distance measure”. The distance measure that is used in K-means clustering is called the Euclidean Distance measure. Let’s look at the following lecture to understand how this value is calculated.
The Euclidean Distance between the 2 points is measured as follows: If there are 2 points X and Y having n dimensions
Then the Euclidean Distance D is given as
The idea of distance measure is quite intuitive. Essentially, the observations which are closer or more similar to each other would have a low Euclidean distance and the observations which are farther or less similar to each other would have a higher Euclidean distance. So can you now guess how the Clustering process would work based on the Euclidean distance?