Applied Deep Learning with Keras
上QQ阅读APP看书,第一时间看更新

Introduction

In the previous chapter, we discussed some applications of machine learning and even built models with the scikit-learn Python package. In this chapter, we will continue learning how to build machine learning models and extend our knowledge to build an Artificial Neural Network (ANN) with the Keras package. (Remember that ANNs represent a large class of machine learning algorithms that are so called because their architecture resembles the neurons in the human brain.)

Keras is a machine learning library designed specifically for building neural networks. While scikit-learn functionality spans a broader area of machine learning algorithms, the functionality of scikit-learn for neural networks is minimal.

ANNs can be used for the same machine learning tasks as other algorithms that we have encountered, such as logistic regression for classification tasks, linear regression for regression problems, and k-means for clustering. Whenever we begin any machine learning problem, to determine what kind of task it is (regression, classification, or clustering), we need to ask the following questions:

  • What outcomes matter the most to me or my business? For example, if you are predicting the value of stock market indices, you could predict whether the price is higher or lower than the previous time point, which would be a classification task, or you could predict the value itself, which would be a regression problem. Each may lead to a different subsequent action or trading strategy. Figure 2.1 shows a candlestick chart; a classification task would predict a positive or negative change (whether the bar color is green or red), whereas a regression task aims to predict the value.
  • Do I have the appropriately labeled data to train a model? For a supervised learning task, we must have at least some labeled data in order to train a model. ANNs can often need a lot of data to develop accurate models so that the factor for consideration when deciding which algorithm is appropriate for a given task.

    The following figure shows an trend of the stock price using candlestick chart:

Figure 2.1: A candlestick chart indicating the movement of a stock index over the book of a month

ANNs are a type of machine learning algorithm that can be used to solve a task. They excel in certain respects and have drawbacks in others, and these pros and cons should be considered before choosing this type of algorithm. Deep learning networks are distinguished from single-layer ANNs by their depth — the total number of hidden layers within the network.

So, deep learning is really just a specific subgroup of machine learning that relies on ANNs with multiple layers.

Advantages of ANNs over Traditional Machine Learning Algorithms

  • Best performance: For any supervised learning task, the best models have been ANNs that are trained on a lot of data. For example, in classification tasks such as classifying images from ImageNet, ANNs can attain greater accuracy than humans.
  • Scales effectively with data: Traditional machine learning models plateau in performance, whereas ANNs architecture are able to learn higher-level representations. This enables ANNs to perform better when provided large amounts of data, and is especially the case for ANNs with deep architecture.

Figure 2.2: Performance scaling with the amount of data for both deep learning algorithms and traditional machine learning algorithms

  • No need for feature engineering: ANNs are able to identify which features are important in modeling, so they are able to model directly from raw data. In traditional machine learning algorithms, the features must be engineered in an iterative process that can be manual and time-consuming.
  • Adaptable and transferable: Weights and features learned from ANNs can be applied to similar tasks. In computer vision tasks, pre-trained classification models can be used as the starting point to build models for other classification tasks. For example, VGG-16 is a 16-layer deep learning model used for ImageNet to classify 1,000 random objects. The weights learned in the model can be transferred to classify other objects given new training data in significantly less time.

Advantages of Traditional Machine Learning Algorithms over ANNs

  • When the training data available is small: In order to attain high performance, ANNs require a lot of data, and the deeper the network, the more data is required. This is because ANNs need to learn the optimal values for a large number of parameters. For example, VGG-16 is a pre-trained model used for the ImageNet challenge that has over 138 million parameters. This ANN was provided over 14 million hand-labeled images to train and learn all the parameters.
  • Cost effective: Both financially and computationally, deep networks can take a lot of computing power and time to train. This demands a lot of resources that may not be available to all. Moreover, these models are time-consuming to tune effectively and require a domain expert who's familiar with the inner workings of the model to achieve optimal performance.
  • Easy to interpret: Many traditional machine learning models are easy to interpret, so identifying which feature had most predictive power in the model is straightforward. This can be incredibly useful when working with non-technical team members who wish to understand and interpret the results of the model. ANNs are more of a black box, in that understanding the structure of the network or values of the various weights do not provide insight into how the results are generated. As such, interpretation of the results requires more effort.

Hierarchical Data Representation

One reason that ANNs are able to perform so well is that the large number of layers enable the network to learn representations of the data at many different levels. This is illustrated in Figure 2.3, in which the representation of an ANN used to identify faces is shown. At lower levels of the model, simple features are learned, such as edges and gradients. As the model progresses, combinations of lower-level features activate to form face parts, and at later layers of the model, generic faces are learned. This is known as feature hierarchy and illustrates the power that this layered representation has for model building and interpretation.

Many examples of input to real-world applications of deep neural networks involve images, video, and natural language text. The feature hierarchy that is learned by deep neural networks enables them to discover latent structures within unlabeled, unstructured data, such as images, video, and natural language text, which makes them useful for processing real-world data, which is most often raw and unprocessed.

The following figure shows an example of learned representation of a deep learning model:

Figure 2.3: Learned representation at various parts of a deep learning model

As deep neural networks become more accessible, their applications are being exploited by various companies. The following are some examples of some companies that use ANNs:

  • Yelp: Yelp use deep neural networks to process, classify, and label their images more efficiently. Since photos are one important aspect of Yelp reviews, the company has placed an emphasis on classifying and categorizing them. This is achieved more efficiently with deep neural networks.
  • Clarifai: This cloud-based company is able to classify images and videos using deep neural network-based models.
  • Enlitic: This company uses deep neural networks to analyze medical image data such as X-rays or MRIs. The use of such networks in this application increases diagnostic accuracy and decreases diagnostic time and cost.