Useful Notes and Links

Reynier Cruz-Torres, PhD

Principal Component Analysis (PCA)

This algorithm is used for dimensionality reduction of high-dimensional data and subsequent analysis or plotting.

from sklearn.decomposition import PCA

pca_model = PCA(n_components=2)
principal_components = pca_model.fit_transform(scaled_X)
pca_model.components_
pca_model.explained_variance_ratio_

Elbow method to determine optimal number of components

explained_variance = []

for n in range(1,30):
    pca = PCA(n_components=n)
    pca.fit(scaled_X)
    
    explained_variance.append(np.sum(pca.explained_variance_ratio_))

plt.plot(list(range(1,30)),explained_variance)
plt.xlabel('Num of components')
plt.ylabel('Variance Explained')