Rescaling will transform data to all have the same scale.
Transformed data will lie between a given minimum and maximum value, often between zero and one.
This recipe includes the following topics:
- Rescale using MinMaxScaler class
- Call fit() to compute the min and max value to be used for later scaling
- Call transform() on the input data
# import modules
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
# read data file from github
# dataframe: pimaDf
gitFileURL = 'https://raw.githubusercontent.com/andrewgurung/data-repository/master/pima-indians-diabetes.data.csv'
cols = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
pimaDf = pd.read_csv(gitFileURL, names = cols)
# convert into numpy array
pimaArr = pimaDf.values
# Though we won't be using test set in this example
# Let's split our data into the usual train(X) and test(Y) set
X = pimaArr[:, 0:8]
Y = pimaArr[:, 8]
# 1. initialize MinMaxScaler class to limit output range between 0 and 1
# 2. call fit() function to compute the min and max value
scaler = MinMaxScaler(feature_range=(0,1)).fit(X)
# rescale input data using transform()
rescaledX = scaler.transform(X)
# limit precision to 3 decimal points for printing
np.set_printoptions(3)
# print first 3 rows of input data
print(X[:3,])
print('-'*60)
# print first 3 rows of output data
print(rescaledX[:3,])
[[ 6. 148. 72. 35. 0. 33.6 0.627 50. ]
[ 1. 85. 66. 29. 0. 26.6 0.351 31. ]
[ 8. 183. 64. 0. 0. 23.3 0.672 32. ]]
------------------------------------------------------------
[[0.353 0.744 0.59 0.354 0. 0.501 0.234 0.483]
[0.059 0.427 0.541 0.293 0. 0.396 0.117 0.167]
[0.471 0.92 0.525 0. 0. 0.347 0.254 0.183]]