Recursive feature elimination works by recursively removing attributes based on an external estimator. The example below using the logistic regression algorithm as an estimator.
This recipe includes the following topics:
- Initialize external estimator: LogisticRegression class
- Initialize RFE class with reduced output feature set to 3
- Call fit() to run estimator and reduce features
- Display of RFE attributes such as mask of selected features
- Call transform() on the input data
# import modules import pandas as pd import numpy as np from sklearn.feature_selection import RFE from sklearn.linear_model import LogisticRegression # read data file from github # dataframe: pimaDf gitFileURL = 'https://raw.githubusercontent.com/andrewgurung/data-repository/master/pima-indians-diabetes.data.csv' cols = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] pimaDf = pd.read_csv(gitFileURL, names = cols) # convert into numpy array pimaArr = pimaDf.values # Let's split our data into the usual train(X) and test/target(Y) set X = pimaArr[:, 0:8] Y = pimaArr[:, 8] # initialize external estimator model = LogisticRegression() # initialize RFE class # 1. select LogisticRegression as estimator # 2. set output of reduced feature to 3 # 3. call fit() to run estimator and reduce features rfe = RFE(model, 3).fit(X, Y) # display rfe attributes print("Selected Features: %s" % rfe.support_) print("Feature Ranking: %s" % rfe.ranking_) print('-'*60) # call transform to reduce X to the selected features/columns rfeArr = rfe.transform(X) # print first 3 rows of output with only the best 3 features/columns print(rfeArr[:3,]) </pre>
Selected Features: [ True False False False False True True False] Feature Ranking: [1 2 3 5 6 1 1 4] ------------------------------------------------------------ [[ 6. 33.6 0.627] [ 1. 26.6 0.351] [ 8. 23.3 0.672]]