Updated: 2021-01-01

Linear Error vs Squared Error vs Log Loss

There are many ways of quantifying how incorrect a specific prediction is.

For example, model predicts there's a 1% chance of an event (e.g. ad click, payment fraud, etc), if that does not happen:

if it does not happen (0) if it does happen (1)
linear error 0.01 0.99
squared error 0.01 ^ 2 = 0.0001 0.99 ^ 2 = 0.98
log loss log_2 0.99 = -0.01 log_2 0.01 = -6.6

Note that: log loss is calculated as the log of one minus the absolute value of the difference between the prediction and the outcome.

Linear error performs poorly when the average probability of the event is substantially smaller or larger than 50%

The above calculates only one instance, prediction accuracy is often measured as the total error accumulated when making predictions. To calculate the error of the whole dataset, we have a few options:

Mean Squared Error

Mean Squared Error: used as Cost Function in Logistic Regression

J=1N(yy)2J = {1 \over N} \sum(y' - y)^2


  • yy is the actual value, either 0 or 1
  • yy' is the predicted value, a number between 0 and 1

It forms a convex curve, can be used to find the global minima. However in Logistic Regression hθ(x)=11+eθxh_\theta(x) = {1 \over 1 + e^{-\theta \cdot x}} is nonlinear, MSE may find a local minima instead.

Log Loss

Log Loss is used in Logistic Regression

J=1Nyilogy+(1yi)log(1y)J = - {1 \over N} \sum y_i \log y' + (1-y_i) \log(1-y')

Since actual yy is either 0 or 1, only one term in yilogyi+(1yi)log(1yi)y_i \log y_i' + (1-y_i) \log(1-y_i') will be non-zero:

  • if y=1y=1, it adds error logyi\log y_i'
  • if y=0y=0, it adds error log(1yi)\log(1-y_i')
event predicted actual correction log loss
1 0.9 1 0.9 -0.046
2 0.6 1 0.6 -0.222
3 0.7 0 0.3 -0.523
4 0.1 0 0.9 -0.046

Then the overall log loss is the negative average (negative sum normalized by size).