logo

Data Science - Functions

Last Updated: 2021-11-19

Softmax

This model generalizes logistic regression to classification problems where the class label y can take on more than two possible values.

Sigmoid

Sigmod: "S"-shaped curve

logistic sigmoid, squashing function=>maps the whole real axis into a finite interval inverse: logit function: log odds

def sigmoid(z):
    return 1.0 / (1 + np.exp(-z))

Sigmoid vs Logistic

Logistic is one kind of Sigmoid function(s-curve)

Sigmoid vs Tanh

  • tanh: y in [-1,1]
  • sigmoid: y in [0,1]

Rectifier

f ( x ) = m a x ( 0 , x ) f(x) = max(0, x)

also known as a ramp function and is analogous to half-wave rectification in electrical engineering.

A unit employing the rectifier is also called a rectified linear unit (ReLU)

A smooth approximation to the rectifier is the analytic function, or softplus function

f ( x ) = l n ( 1 + e x ) f(x) = ln(1 + e^x)

The derivative of softplus is logistic function.

f ( x ) = e x / ( e x + 1 ) = 1 / ( 1 + e x ) f(x) = e^x/(e^x + 1) = 1 / (1 + e^-x)

A unit employing the rectifier is also called a rectified linear unit (ReLU).

https://en.wikipedia.org/wiki/Rectifier_(neural_networks)

tanh

wiki: https://en.wikipedia.org/wiki/Hyperbolic_function

tanh activation function is nothing but 2 sigmoid 1 2*\verb sigmoid - 1

t a n h ( x ) = 2 σ ( 2 x ) 1 tanh(x)=2σ(2x)-1

There are two reasons for that choice (tanh) (assuming you have normalized your data, and this is very important):

  • Having stronger gradients: since data is centered around 0, the derivatives are higher. To see this, calculate the derivative of the tanh function and notice that input values are in the range [0,1].
  • Avoiding bias in the gradients. This is explained very well in the paper, and it is worth reading it to understand these issues.