Principal component analysis (PCA) is a mathematical
procedure that transforms a number of (possibly)
correlated attributes into a (smaller) number of
uncorrelated attributes called principal components.
Link: Medium Article on Principal Component Analysis
This recipe includes the following topics:
- Initialize PCA class with number of components to keep to 3
- Call fit() to fit the model with X
- Display principal axes in feature space
- Call transform() to reduce X to the selected features
# 3. Principal Component Analysis
# import modules
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
# read data file from github
# dataframe: pimaDf
gitFileURL = 'https://raw.githubusercontent.com/andrewgurung/data-repository/master/pima-indians-diabetes.data.csv'
cols = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
pimaDf = pd.read_csv(gitFileURL, names = cols)
# convert into numpy array
pimaArr = pimaDf.values
# Though we won't be using the test set in this example
# Let's split our data into the usual train(X) and test(Y) set
X = pimaArr[:, 0:8]
Y = pimaArr[:, 8]
# initialize PCA class
# 1. set number of components to keep to 3
# 2. call fit() to run estimator and reduce features
pca = PCA(n_components=3).fit(X)
# display rfe attributes
print("Principal axes in feature space: %s" % pca.components_)
print('-'*60)
# call transform to reduce X to the selected features/columns
pcaArr = pca.transform(X)
# apply dimensionality reduction to X and print first 3 rows
print(pcaArr[:3,])
Principal axes in feature space: [[-2.02176587e-03 9.78115765e-02 1.60930503e-02 6.07566861e-02
9.93110844e-01 1.40108085e-02 5.37167919e-04 -3.56474430e-03]
[-2.26488861e-02 -9.72210040e-01 -1.41909330e-01 5.78614699e-02
9.46266913e-02 -4.69729766e-02 -8.16804621e-04 -1.40168181e-01]
[-2.24649003e-02 1.43428710e-01 -9.22467192e-01 -3.07013055e-01
2.09773019e-02 -1.32444542e-01 -6.39983017e-04 -1.25454310e-01]]
------------------------------------------------------------
[[-75.71465491 -35.95078264 -7.26078895]
[-82.3582676 28.90821322 -5.49667139]
[-74.63064344 -67.90649647 19.46180812]]