Basic terminology of Artificial Intelligence
Let’s now jump into basic terminology related to AI. When we say Artificial Intelligence we mostly mean machine learning - a domain of computer science that uses learning algorithms able to tune themselves on data provided by a user. The fundamental block of machine learning is neural networks. They are algorithmic systems based on simulating connected “neural units,” loosely modeling the way that neurons interact in the brain.
As we have mentioned above, these computational models inspired by neural connections have been studied since the 1940s. They have returned to prominence with the rise of computer processing power able to cope with large training data sets and have been used to successfully analyze input data such as images, video, and speech. Deep learning is a subset of machine learning, where neural networks have many layers of neurons (“deep network”). The more layers you include in your machine learning model, the more computational power you need to train it. We talk about the architecture of a model, when we want to describe how many layers it has, how many neurons inside each layer, and how they are connected.
The most common neural networks appearing in applications are:
1. Feedforward neural networks: this is the simplest type of neural network. In this architecture, information moves in only one direction, forward, from the input layer, through the hidden layers (those between input and output), to the output layer. There are no loops in the network. The first single-neuron network was considered already in the 1950s. Advances in computing power and available data allowed this method to achieve great performance in the 21st century.
2. Recurrent neural networks (RNNs): neural networks whose connections between neurons include loops. One of the most common examples of RNNs is LSTMs which are used in language processing tasks.
3. Convolutional neural networks (CNNs): CNNs were originally invented as a superior neural network model for handling cases in which the input data consisted of images. In 2012 CNNs were used for the winning entry in the ImageNet Large Scale Visual Recognition Competition, which sparked interest in machine learning again.
I also note generative adversarial networks (GANs) and reinforcement learning as two methods soon to be more common in commercial applications.
GANs use two neural networks competing against each other. They are often used for photo-realistic image generation: one network is trained to detect fakes, and the other is trying to fool the first one.
Reinforcement learning is an approach in machine learning based on giving rewards designed by developers to steer the machine into good behavior. Algorithms learn by trial and error. A notorious application of reinforcement learning is AlphaGo created by Google DeepMind, which was trained to play Go at a world-class level.
Any of these deep learning methods require thousands of data records for a model to train and achieve the desired accuracy. The authors of ‘Deep Learning’ book mention a general rule of thumb, that a supervised machine learning algorithm should achieve acceptable performance with around 5,000 labeled examples per category and match human-level performance when trained on at least 10 million labeled examples. Of course, it also depends on particular use cases and algorithms architecture. Sometimes more data isn’t that helpful if you don’t know how to feed it properly into machine learning models. On the other hand, sometimes machine learning techniques won’t add more value than traditional statistical analytics. That’s why it’s essential to assess your level of technical development, look at your goals, and think about possible solutions without AI at first.
A lot of machine learning models used currently are trained through “supervised learning,” which requires humans to label and categorize the underlying data. Nevertheless, the new methods like ‘one-shot learning’ show that in the future, we won’t need that much data to train effective AI systems. One will need only a small set of labeled data and a good architecture in place. On top of that, autoML might improve AI even further without the need for human supervisors.
All that means that if an organisation wants to adopt AI successfully, it needs to start with assessing its technology stack and start by collecting data at scale. Linking data across various segments (customer, communication channel, platform) as well as controlling whether the right amount of data is given is crucial. A machine learning model can be ‘overfitted’ if it matches too well the test data but doesn’t work in production, or ‘underfitted’ if it fails to capture essential features and thus fails to generalise.