K-MODE CLUSTERING
K-Mode Clustering is one of the unsupervised Machine Learning algorithms that is used to cluster categorical variables. It uses the dissimilarities (total mismatches) between the data points. The lesser the dissimilarities the more similar our data points are. It uses Modes instead of means.
Firstly, we use MinMaxScaler to shrink the data within the given range, from 0 to 1.
Next, we use Elbow Method to identify the best k values. We can observe that the “elbow” is the number 5 which is optimal k for our case.
Fit the k value = 5 into the kmode and run it!
Create a new dataframe called latest_df and concatenate all data from final_df with kmode labels after changing the categorical data to numerical data. Drop unnecessary columns such as 'Cluster Labels' and 'Segment'.
The clusters will be used later in the features selection stage.