What are the commonly used activation functions ? When are they used.

Ans. The commonly used loss functions are

  1. Linear : g(x) = x. This is the simplest activation function. However it cannot model complex decision boundaries. A deep network with linear activations can be shown incapable of handling non-linear decision boundaries.
  2. Sigmoid :   This is a common activation function in the last layer of the neural network particularly for a classification problem with cross entropy loss. The problem with sigmoid activation is that the gradient becomes close to 0 for high and low values making the learning slow and leading to vanishing gradient problems.
  3. Tanh : This is the most common activation function in the intermediate layers – a rescaled version of sigmoid. Not as prone to saturation as sigmoid. It has stronger gradients.                                                                     
    1. Relu : g(x) = max(0,x)

    Relu tends to give a sparse output since negative input is turned into 0. No vanishing gradient problem. A draw back of relu is that relu may blow the activation values

Leave a Reply

Your email address will not be published. Required fields are marked *