Feature Selection

Updated: 2018-11-21

Feature Selection

  • Embedded approaches: data mining algorithm itself decides which attributes to use. e.g. decision tree classifier
  • Filter approaches: before data mining algorithm is run; independent of the data mining task. e.g. pairwise correlation is as low as possible
  • Wrapper: use data mining algorithm as black box

Feature Selection Architecture

  • a measure for evaluating a subset
  • a strategy that controls the generation of a new subset of features
  • a stopping criterion
  • a validation procedure

mRMR (minimum Redundancy Maximum Relevance Feature Selection)

Univariate Selection

One variable + One target


Reasons of feature selection:

  1. Reducing the number of features, to reduce overfitting and improve the generalization of models.
  2. To gain a better understanding of the features and their relationship to the response variables.

Pearson Correlation

A value between -1 and 1

  • -1: perfect negative correlation
  • +1: perfect positive correlation
  • 0: no linear correlation

Use pearsonr in Scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html

from scipy.stats import pearsonr

pearsonr(x, y)


  • fast to calculate
  • returned value [-1, 1] instead of [0, 1], extra negative/positive info


  • only sensitive to linear relationship.

Maximal Information Coefficient


  • Searches for optimal binning and turns mutual information score into a metric that lies in range [0;1].
  • Linear or non-linear.

Distance correlation

While for Pearson correlation, the correlation value 0 does not imply independence, distance correlation of 0 does imply that there is no dependence between the two variables.

Linear Model And Regularization


Random Forest