Feature Selection
Feature Selection
- Embedded approaches: data mining algorithm itself decides which attributes to use. e.g. decision tree classifier
- Filter approaches: before data mining algorithm is run; independent of the data mining task. e.g. pairwise correlation is as low as possible
- Wrapper: use data mining algorithm as black box
Feature Selection Architecture
- a measure for evaluating a subset
- a strategy that controls the generation of a new subset of features
- a stopping criterion
- a validation procedure
mRMR (minimum Redundancy Maximum Relevance Feature Selection)
Univariate Selection
One variable + One target
http://blog.datadive.net/selecting-good-features-part-i-univariate-selection/
Reasons of feature selection:
- Reducing the number of features, to reduce overfitting and improve the generalization of models.
- To gain a better understanding of the features and their relationship to the response variables.
Pearson Correlation
A value between -1 and 1
- -1: perfect negative correlation
- +1: perfect positive correlation
- 0: no linear correlation
Use pearsonr
in Scipy
: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html
from scipy.stats import pearsonr
pearsonr(x, y)
Pros:
- fast to calculate
- returned value [-1, 1] instead of [0, 1], extra negative/positive info
Cons:
- only sensitive to linear relationship.
Maximal Information Coefficient
https://en.wikipedia.org/wiki/Maximal_information_coefficient
- Searches for optimal binning and turns mutual information score into a metric that lies in range [0;1].
- Linear or non-linear.
Distance correlation
While for Pearson correlation, the correlation value 0 does not imply independence, distance correlation of 0 does imply that there is no dependence between the two variables.
Linear Model And Regularization
http://blog.datadive.net/selecting-good-features-part-ii-linear-models-and-regularization/
Random Forest
http://blog.datadive.net/selecting-good-features-part-iii-random-forests/