# Feature Selection

Updated: 2018-11-21

## Feature Selection

• Embedded approaches: data mining algorithm itself decides which attributes to use. e.g. decision tree classifier
• Filter approaches: before data mining algorithm is run; independent of the data mining task. e.g. pairwise correlation is as low as possible
• Wrapper: use data mining algorithm as black box

### Feature Selection Architecture

• a measure for evaluating a subset
• a strategy that controls the generation of a new subset of features
• a stopping criterion
• a validation procedure

## Univariate Selection

One variable + One target

Reasons of feature selection:

1. Reducing the number of features, to reduce overfitting and improve the generalization of models.
2. To gain a better understanding of the features and their relationship to the response variables.

### Pearson Correlation

A value between -1 and 1

• -1: perfect negative correlation
• +1: perfect positive correlation
• 0: no linear correlation
from scipy.stats import pearsonr

pearsonr(x, y)

Pros:

• fast to calculate
• returned value [-1, 1] instead of [0, 1], extra negative/positive info

Cons:

• only sensitive to linear relationship.

### Maximal Information Coefficient

https://en.wikipedia.org/wiki/Maximalinformationcoefficient

• Searches for optimal binning and turns mutual information score into a metric that lies in range [0;1].
• Linear or non-linear.

### Distance correlation

While for Pearson correlation, the correlation value 0 does not imply independence, distance correlation of 0 does imply that there is no dependence between the two variables.