Activation functions
We now know that an ANN is created by stacking individual computing units called perceptrons. We have also seen how a perceptron works and have summarized it as Output 1, IF .
That is, it either outputs a 1 or a 0 depending on the values of the weight, w, and bias, b.
Let's look at the following diagram to understand why there is a problem with just outputting either a 1 or a 0. The following is a diagram of a simple perceptron with just a single input, x:
For simplicity, let's call , where the following applies:
- w is the weight of the input, x, and b is the bias
- a is the output, which is either 1 or 0
Here, as the value of z changes, at some point, the output, a, changes from 0 to 1. As you can see, the change in output a is sudden and drastic:
What this means is that for some small change, , we get a dramatic change in the output, a. This is not particularly helpful if the perceptron is part of a network, because if each perceptron has such drastic change, it makes the network unstable and hence the network fails to learn.
Therefore, to make the network more efficient and stable, we need to slow down the way each perceptron learns. In other words, we need to eliminate this sudden change in output from 0 to 1 to a more gradual change:
This is made possible by activation functions. Activation functions are functions that are applied to a perceptron so that instead of outputting a 0 or a 1, it outputs any value between 0 and 1.
This means that each neuron can learn slower and at a greater level of detail by using smaller changes, . Activation functions can be looked at as transformation functions that are used to transform binary values in to a sequence of smaller values between a given minimum and maximum.
There are a number of ways to transform the binary outcomes to a sequence of values, namely the sigmoid function, the tanh function, and the ReLU function. We will have a quick look at each of these activation functions now.