How does a neural network learn?
In this section, we will understand how a simple model predicts and how it learns from data. We will then move on to deep networks, which will give us some insight on why they are better and more efficient compared to other networks.
Assume we are given a task to predict whether a person could have heart disease in the near future. We have a considerable amount of data about the history of the individual and whether they got heart disease later on or not.
The parameters that will be taken into consideration are age, height, weight, genetic factors, whether the patient is a smoker or not, and their lifestyle. Let us begin by building a simple model:
All the information we have for the individual we will use as input, and call them features. As we learned in the previous section, our next step is to multiply the features by the weights, and then take the sum of these products and apply it as an input to a sigmoid function, or the activation function. The sigmoid function will output 1 or 0, depending on whether the sum is positive or negative:
In this case, the activation value produced by the activation function is also the output, since we don't have any hidden layers. We interpret the output value 1 to mean that the person will not have any heart disease, and 0 as the person will have heart disease in the near future.
Let's use a comparative example with three individuals to check whether this model functions appropriately:
As we can see in the preceding diagram, here are the input values for person 1:
- Age = 60 years old
- Height = 180 centimeters
- Weight = 75 kilograms
- Number of people in their family affected by a heart disease = 3
- Non-smoker
- Has a good lifestyle
The input values for person 2 are as follows:
- Age = 50 years old
- Height = 170 centimeters
- Weight = 120 kilograms
- Number of people in their family affected by a heart disease = 7
- Smoker
- Has a sedentary lifestyle
The input values for person 3 are as follows:
- Age = 40 years old
- Height = 175 centimeters
- Weight = 85 kilograms
- Number of people in their family affected by a heart disease = 4
- Light smoker
- Has a very good and clean lifestyle
So if we had to come up with some probability for each of them having a heart disease, then we may come up with something like this:
So, for person 1, there is just a 20% chance of heart disease because of his good family history and the fact that they're not smoking and has a good lifestyle. For person 2, it's obvious that the chances of being affected by heart disease are much higher because of their family history, heavy smoking, and their really bad lifestyle. For person 3, we are not quite sure, which is why we give it a 50/50; since the person may smoke slightly, but also has a really good lifestyle, and their family history is not that bad. We also factor in that this individual is quite young.
So if we were to ponder about how we as humans learned to predict this probability, we'd figure out the impact of each of the features on the person's overall health. Lifestyle has a positive impact on the overall output, while genetics and family history have a very negative impact, weight has a negative impact, and so on.
It just so happens that neural networks also learn in a similar manner, the only difference being that they predict the outcome by figuring out the weights. When it comes to lifestyle, a neural network having a large weight for lifestyle will help reinforce the positive value of lifestyle in the equation. For genetics and family history, however, the neural network will assign a much smaller or negative value to contribute the negative factor to the equation. In reality, neural networks are busy figuring out a lot of weights.
Now let's see how neural networks actually learn the weights.