Weight of Evidence and Information Value

Updated: 2019-01-13

Calculation of Weight of Evidence(WOE)

Weight of evidence(WOE):

WOEi=log(%posi%negi)WOE_i=\log({{\% pos_i} \over {\% neg_i}})

where i=1,2,...ki=1,2, ... k, and kk is the number of bins.

Calculation of Information Value(IV)

Information Value(IV):

IV=i=1k{(%posi%negi)×WOEi}IV = \sum_{i=1}^k \{(\% pos_i - \% neg_i) \times WOE_i \}

WOE and IV work for both continuous and categorical variables.

CONTINUOUS/CATEGORICAL->CATEGORICAL(discrete numeric values)

Calculation

Step 1: binning(out of the scope of this post)

  • CONTINUOUS: calculate pos and neg relative percentage of frequencies by intervals
  • CATEGORICAL: calculate pos and neg relative percentage of frequencies by categories

Optionally there could be a MISSING bin.

Step 2: Calculate WOE for each bin

WOEi=ln(%posi%negi)=ln(posi/iposinegi/inegi)WOE_i = \ln({\%pos_i \over \%neg_i}) = \ln({pos_i / \sum_i pos_i \over neg_i / \sum_i neg_i})

Step 3: Calculate IV

IVi=(%posi%negi)WOEiIV_i = (\%pos_i - \%neg_i) * WOE_i

Step 4: Sum Up

IV=i=1kIViIV = \sum_{i=1}^k IV_i

put everything together:

IV=i=1k{(%posi%negi)ln(%posi%negi)}IV = \sum_{i=1}^k \{(\%pos_i - \%neg_i)\ln({\%pos_i \over \%neg_i}) \}

Example

(This data is made up and only for illustration of calculation)

bin %pos %neg WOE IV
MISSING 0.1 0.05 0.693 0.035
1 0.15 0.05 1.099 0.110
2 0.15 0.1 0.405 0.020
3 0.2 0.2 0.0 0.0
4 0.2 0.25 -0.223 0.011
5 0.2 0.35 -0.560 0.084
Sum 1.0 1.0 0.260
  • WOE of (e.g.) MISSING: WOEMISSING=ln(0.1/0.05)=0.693WOE_{MISSING} = \ln(0.1/0.05) = 0.693
  • IV of (e.g.) MISSING: IVMISSING=(0.10.05)0.693=0.035IV_{MISSING} = (0.1-0.05) * 0.693 = 0.035
  • Total IV: 0.035+0.110+0.020+0.0+0.011+0.084=0.2600.035 + 0.110 + 0.020 + 0.0 + 0.011 + 0.084 = 0.260

Observations

  • if %pos > %neg, WOE is positive
  • if %pos < %neg, WOE is negative
  • if %pos = %neg, WOE is 0
  • IV is always positive