Useful Notes and Links

Reynier Cruz-Torres, PhD

K Means Clustering

Begin by scaling the data:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_X = scaler.fit_transform(X)

Create model and train:

from sklearn.cluster import KMeans
model = KMeans(n_clusters=2)
cluster_labels = model.fit_predict(scaled_X)

The number of clusters is not always clear. We can use a knee method to find a good value:

ssd = []

for k in range(2,10):
    model = KMeans(n_clusters=k)
    model.fit(scaled_X)
    
    ssd.append(model.inertia_)

Color quantization example:

import matplotlib.image as mpimg
image = mpimg.imread('palm_trees.jpg')

plt.imshow(image)

(h,w,c) = image.shape
image_2d = image.reshape(h*w,c)

model = KMeans(n_clusters=6)
labels = model.fit_predict(image_2d)
rgb_codes = model.cluster_centers_.round(0).astype(int)
new_image = rgb_codes[labels]
new_image = np.reshape(new_image,(h,w,c))