# Error

## Linear Error vs Squared Error vs Log Loss

There are many ways of quantifying how incorrect a specific prediction is.

For example, model predicts there's a 1% chance of an event (e.g. ad click, payment fraud, etc), if that does not happen:

if it does not happen (0) | if it does happen (1) | |
---|---|---|

linear error | 0.01 | 0.99 |

squared error | 0.01 ^ 2 = 0.0001 | 0.99 ^ 2 = 0.98 |

log loss | log_2 0.99 = -0.01 | log_2 0.01 = -6.6 |

Note that: log loss is calculated as the log of one minus the absolute value of the difference between the prediction and the outcome.

Linear error performs poorly when the average probability of the event is substantially smaller or larger than 50%

The above calculates only one instance, prediction accuracy is often measured as the total error accumulated when making predictions. To calculate the error of the whole dataset, we have a few options:

## Mean Squared Error

**Mean Squared Error**: used as **Cost Function** in **Logistic Regression**

where

- $y$ is the actual value, either 0 or 1
- $y'$ is the predicted value, a number between 0 and 1

It forms a convex curve, can be used to find the global minima. However in Logistic Regression $h_\theta(x) = {1 \over 1 + e^{-\theta \cdot x}}$ is nonlinear, MSE may find a local minima instead.

## Log Loss

**Log Loss** is used in Logistic Regression

Since actual $y$ is either 0 or 1, only one term in $y_i \log y_i' + (1-y_i) \log(1-y_i')$ will be non-zero:

- if $y=1$, it adds error $\log y_i'$
- if $y=0$, it adds error $\log(1-y_i')$

event | predicted | actual | correction | log loss |
---|---|---|---|---|

1 | 0.9 | 1 | 0.9 | -0.046 |

2 | 0.6 | 1 | 0.6 | -0.222 |

3 | 0.7 | 0 | 0.3 | -0.523 |

4 | 0.1 | 0 | 0.9 | -0.046 |

Then the overall log loss is the negative average (negative sum normalized by size).