Computational model efficiency
Neural networks are currently learning millions of weights. Millions of weights mean millions of multiplications. This makes it essential to find a highly efficient model to do this multiplication, and that is done by using matrices. The following diagram depicts how weights are placed in a matrix:
The weight matrix here has one row and four columns, and the inputs are in another matrix. These inputs can be the outputs of the previous hidden layer.
To find the output, we need to simply perform a simple multiplication of these two matrices. This means that is the multiplication of the row and the column.
To make it more complex, let us vary our neural network to have one more hidden layer.
Having a new hidden layer will change our matrix as well. All the weights from the hidden layer 2 will be added as a second row to the matrix. The value is the multiplication of the second row of the matrix with the column containing the input values:
Notice now how and can be actually calculated in parallel, because they don't have any dependents, so really, the multiplication of the first row with the inputs column is not dependent on the multiplication of the second row with the inputs column.
To make this more complex, we can have another set of examples that will affect the matrix as follows:
We now have four sets and we can actually calculate each of them in parallel. Consider , which is the result of the multiplication of the first row with the first input column, while this is the multiplication of the second row of weights with the second column of the input.
In standard computers, we currently have 16 of these operations carried out in parallel. But the biggest gain here is when we use GPUs, because GPUs enable us to execute from 100 to 1,000 of these operations in parallel. One of the reasons that deep learning has been taking off recently is because of GPUs offering really great computational power.