Notebook
1. As a preprocessing step prior to some other data analysis technique 2. As an exploratory data analysis technique in its own right
Data summarisation Recommendation Systems - Collaborative Filtering Customer Segmentation Document Clustering - Topic Modelling Biological Data Analysis - Gene networks Social Network Analysis
Good if data well described by centroids Be carefull if data has a different scale in different dimensions Remove irrelevant dimensions if possible Measuring similarity in high dimensions can get difficult Will not work well for non-isotropic data Use "Inertia" or SSE to choose between different clustering results with the same number of clusters Use "Silhouette Coefficient" to choose number of clusters
Finds clusters by looking for connectivity between cluster members Not restricted to similarity based on Euclidean distance Works for non-isotropic data Provides a way to estimate the number of clusters