Clustering is an unsupervised machine learning algorithm that groups together similar data points based on criteria like shared attributes. Each cluster has data points that are similar to the other data points in the cluster while as a whole, the cluster is dissimilar to other data points. By making use of clustering algorithms, we can uncover hidden structures, patterns, and correlations in the data. Fuzzy C Means (FCM) is one among the variety of clustering algorithms. What makes it stand out as a powerful clustering technique is that it can handle complex, overlapping clusters. Let us understand this technique better through this article.
This article was published as a part of the Data Science Blogathon.
Fuzzy C Means is a soft clustering technique in which every data point is assigned a cluster along with the probability of it being in the cluster.
But wait! What is soft clustering?
Before getting into Fuzzy C Means, let us understand what soft clustering means and how it is different from hard clustering.
Hard clustering and soft clustering are two different ways to partition data points into clusters. Hard clustering, also known as crisp clustering, assigns each data point exactly to one cluster, based on some criteria like for example – the proximity of the data point to the cluster centroid. It produces non-overlapping clusters. K-Means is an example of hard clustering.
Soft clustering, also known as fuzzy clustering or probabilistic clustering, assigns each data point a degree of membership/probability values that indicate the likelihood of a data point belonging to each cluster. Soft clustering allows the representation of data points that may belong to multiple clusters. Fuzzy C Means and Gaussian Mixed Models are examples of Soft clustering.
Now that we are clear with the difference in hard and soft clustering, let us understand the working of the Fuzzy C Means algorithm.
In a traditional k-means algorithm, we mathematically solve it via the following steps:
But in Fuzzy C-Means, the algorithm differs.
1. Our objective is to minimize the objective function which is as follows:
Here:
n = number of data point
c = number of clusters
x = ‘i’ data point
v = centroid of ‘j’ cluster
w = membership value of data point of i to cluster j
m = fuzziness parameter (m>1)
2. Update the membership values using the formula:
3. Update cluster centroid values using a weighted average of the data points:
4. Keep updating the membership values and the cluster centers until the membership values and cluster centers stop changing significantly or when a predefined number of iterations is reached.
5. Assign each data point to the cluster or multiple clusters for which it has the highest membership value.
There are various differences in both these clustering algorithms. A few of them are:
Fuzzy C Means | K-Means |
Each data point is assigned a degree of membership to each cluster, indicating the probability or likelihood of the point belonging to each cluster. | Each data point is exclusively assigned to one and only one cluster, based on the closest centroid, typically determined using Euclidean distance. |
It does not impose any constraints on the shape or variance of clusters. It can handle clusters of different shapes and sizes, making it more flexible. | It assumes that clusters are spherical and have equal variance. Thus it may not perform well with clusters of non-spherical shapes or varying sizes. |
It is less sensitive to noise and outliers as it allows for soft, probabilistic cluster assignments. | It is sensitive to noise and outliers in the data |
Let us now implement Fuzzy C Means using Python.
I have downloaded the dataset from the following source: mall_customers.csv · GitHub
!pip install scikit-fuzzy
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import skfuzzy as fuzz
from sklearn.preprocessing import StandardScaler
###Load and explore the dataset
data = pd.read_csv("/content/mall_customers.csv")
# Display the first few rows of the dataset and check for missing values
print(data.head(),"\n")
print(data.info())
# Preprocess the data
X = data[['Annual Income (k$)', 'Spending Score (1-100)']].values
print(X)
# Scale the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
print(X_scaled)
#Apply Fuzzy C Means clustering
n_clusters = 5 # Number of clusters
m = 2 # Fuzziness parameter
cntr, u, u0,d,jm,p, fpc = fuzz.cluster.cmeans(
X_scaled.T, n_clusters, m, error=0.005, maxiter=1000, init=None
)
# Visualize the clusters
cluster_membership = np.argmax(u, axis=0)
plt.figure(figsize=(8, 6))
for i in range(n_clusters):
plt.scatter(X[cluster_membership == i, 0], X[cluster_membership == i, 1], label=f'Cluster {i+1}')
plt.scatter(cntr[0], cntr[1], marker='x', color='black', label='Centroids')
plt.title('Fuzzy C-Means Clustering on Mall Customer Data')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.grid(True)
plt.show()
Here:
The function returns the following:
Output:
Here are the 5 most common applications of the FCM algorithm:
Now, let’s discuss the advantages and disadvantages of using Fuzzy C Means.
Fuzzy C Means is a clustering algorithm that is very diverse and quite powerful in uncovering hidden meanings (in the form of patterns) in data, offering flexibility in handling complex datasets. It can be considered a better algorithm compared to the k-means algorithm. By understanding its principles, applications, advantages, and limitations, data scientists and practitioners can leverage this clustering algorithm effectively to extract valuable insights from their data, making well-informed decisions.
A. Fuzzy C Means is a clustering algorithm that aims to improve the existing K-Means algorithm by allowing soft assignments of clusters to data points, based on the degree of membership/probability values so that data points can belong to multiple clusters.
A. Jim Bezdek developed the general case in 1973 for any m>1. He developed this in his PhD thesis at Cornell University. Joe Dunn first reported the FCM algorithm in 1974 for a special case (m=2).
A. Fuzziness parameter controls the degree of fuzziness or uncertainty in cluster assignments. For example: if m=1, then the data point will belong only to one cluster. Whereas, if m=2, then the data point has a degree of membership equal to 0.8 for one cluster, and a membership value equal to 0.2 for another cluster. This indicates that there is a high chance of the data point belonging to cluster 1 while also a chance of it belonging to cluster 2.
A. They are updated iteratively by performing a weighted average of the data points where the weights are the membership values of each data point.
A. Yes! Some of the fuzzy clustering algorithms apart from FCM are the Gustafson-Kessel algorithm and the Gath-Geva algorithm.
The media shown in this article on Data Visualization Tools are not owned by Analytics Vidhya and is used at the Author’s discretion.
The article is just amazing. It is very clear and concise looks like the the author has done pretty good job in researching about the topic as it's definitely reflected in the blog. Great work!!
Well explained Aditi, Keep growing, All the best