logo

Weight of Evidence and Information Value

Last Updated: 2021-11-19

Calculation of Weight of Evidence(WOE)

Weight of evidence(WOE):

W O E i = log ( % p o s i % n e g i ) WOE_i=\log({{\% pos_i} \over {\% neg_i}})

where i = 1 , 2 , . . . k i=1,2, ... k , and k k is the number of bins.

Calculation of Information Value(IV)

Information Value(IV):

I V = i = 1 k { ( % p o s i % n e g i ) × W O E i } IV = \sum_{i=1}^k \{(\% pos_i - \% neg_i) \times WOE_i \}

WOE and IV work for both continuous and categorical variables.

CONTINUOUS/CATEGORICAL->CATEGORICAL(discrete numeric values)

Calculation

Step 1: binning(out of the scope of this post)

  • CONTINUOUS: calculate pos and neg relative percentage of frequencies by intervals
  • CATEGORICAL: calculate pos and neg relative percentage of frequencies by categories

Optionally there could be a MISSING bin.

Step 2: Calculate WOE for each bin

W O E i = ln ( % p o s i % n e g i ) = ln ( p o s i / i p o s i n e g i / i n e g i ) WOE_i = \ln({\%pos_i \over \%neg_i}) = \ln({pos_i / \sum_i pos_i \over neg_i / \sum_i neg_i})

Step 3: Calculate IV

I V i = ( % p o s i % n e g i ) W O E i IV_i = (\%pos_i - \%neg_i) * WOE_i

Step 4: Sum Up

I V = i = 1 k I V i IV = \sum_{i=1}^k IV_i

put everything together:

I V = i = 1 k { ( % p o s i % n e g i ) ln ( % p o s i % n e g i ) } IV = \sum_{i=1}^k \{(\%pos_i - \%neg_i)\ln({\%pos_i \over \%neg_i}) \}

Example

(This data is made up and only for illustration of calculation)

bin %pos %neg WOE IV
MISSING 0.1 0.05 0.693 0.035
1 0.15 0.05 1.099 0.110
2 0.15 0.1 0.405 0.020
3 0.2 0.2 0.0 0.0
4 0.2 0.25 -0.223 0.011
5 0.2 0.35 -0.560 0.084
Sum 1.0 1.0 0.260
  • WOE of (e.g.) MISSING: W O E M I S S I N G = ln ( 0.1 / 0.05 ) = 0.693 WOE_{MISSING} = \ln(0.1/0.05) = 0.693
  • IV of (e.g.) MISSING: I V M I S S I N G = ( 0.1 0.05 ) 0.693 = 0.035 IV_{MISSING} = (0.1-0.05) * 0.693 = 0.035
  • Total IV: 0.035 + 0.110 + 0.020 + 0.0 + 0.011 + 0.084 = 0.260 0.035 + 0.110 + 0.020 + 0.0 + 0.011 + 0.084 = 0.260

Observations

  • if %pos > %neg, WOE is positive
  • if %pos < %neg, WOE is negative
  • if %pos = %neg, WOE is 0
  • IV is always positive