Skewness is the measure of which side the Bell curve (normal distribution) is shifted.
Value near 0 represents less skewness.
This recipe includes the following topics:
- Calculate skew
# import module
import pandas as pd
fileGitURL = 'https://raw.githubusercontent.com/andrewgurung/data-repository/master/pima-indians-diabetes.data.csv'
# define column names
cols = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
# load file as a Pandas DataFrame
pimaDf = pd.read_csv(fileGitURL, names=cols)
# calculate skewness of columns
# skip null values
skew = pimaDf.skew(axis=0, skipna=True)
print(skew)
preg 0.902
plas 0.174
pres -1.844
skin 0.109
test 2.272
mass -0.429
pedi 1.920
age 1.130
class 0.635
dtype: float64