R Machine Learning By Example
上QQ阅读APP看书,第一时间看更新

Algorithms in machine learning

So far we have developed an abstract understanding of machine learning. We understand the definition of machine learning which states that a task T can be learned by a computer program utilizing data in the form of experience E when its performance P improves with it. We have also seen how machine learning is different from conventional programming paradigms because of the fact that we do not code each and every step, rather we let the program form an understanding of the problem space and help us solve it. It is rather surprising to see such a program work right in front of us.

All along while we learned about the concept of machine learning, we treated this magical computer program as a mysterious black box which learns and solves the problems for us. Now is the time we unravel its enigma and look under the hood and see these magical algorithms in full glory.

We will begin with some of the most commonly and widely used algorithms in machine learning, looking at their intricacies, usage, and a bit of mathematics wherever necessary. Through this chapter, you will be introduced to different families of algorithms. The list is by no means exhaustive and, even though the algorithms will be explained in fair detail, a deep theoretical understanding of each of them is beyond the scope of this book. There is tons of material easily available in the form of books, online courses, blogs, and more.

Perceptron

This is like the Hello World algorithm of the machine learning universe. It may be one of the easiest of the lot to understand and use but it is by no means any less powerful.

Published in 1958 by Frank Rosenblatt, the perceptron algorithm gained much attention because of its guarantee to find a separator in a separable data set.

A perceptron is a function (or a simplified neuron to be precise) which takes a vector of real numbers as input and generates a real number as output.

Mathematically, a perceptron can be represented as:

Where, w1,…,wn are weights, b is a constant termed as bias, x1,…,xn are inputs, and y is the output of the function f, which is called the activation function.

The algorithm is as follows:

  1. Initialize weight vector w and bias b to small random numbers.
  2. Calculate the output vector y based on the function f and vector x.
  3. Update the weight vector w and bias b to counter the error.
  4. Repeat steps 2 and 3 until there is no error or the error drops below a certain threshold.

The algorithm tries to find a separator which pides the input into two classes by using a labeled data set called the training data set (the training data set corresponds to the experience E as stated in the definition for machine learning in the previous section). The algorithm starts by assigning random weights to the weight vector w and the bias b. It then processes the input based on the function f and gives a vector y. This generated output is then compared with the correct output value from the training data set and the updates are made to w and b respectively. To understand the weight update process, let us consider a point, say p1, with a correct output value of +1. Now, suppose if the perceptron misclassifies p1 as -1, it updates the weight w and bias b to move the perceptron by a small amount (movement is restricted by learning rate to prevent sudden jumps) in the direction of p1 in order to correctly classify it. The algorithm stops when it finds the correct separator or when the error in classifying the inputs drops below a certain user defined threshold.

Now, let us see the algorithm in action with the help of small example.

For the algorithm to work, we need a linearly separable data set. Let us assume the data is generated by the following function:

Based on the preceding equation, the correct separator will be given as:

Generating an input vector x using uniformly distributed data in R is done as follows:

#30 random numbers between -1 and 1 which are uniformly distributed
x1 <- runif(30,-1,1) 
x2 <- runif(30,-1,1)
#form the input vector x
x <- cbind(x1,x2)

Now that we have the data, we need a function to classify it into one of the two categories.

#helper function to calculate distance from hyperplane
calculate_distance = function(x,w,b) {
 sum(x*w) + b
}

#linear classifier
linear_classifier = function(x,w,b) {
distances =apply(x, 1, calculate_distance, w, b)
return(ifelse(distances < 0, -1, +1))
}

The helper function, calculate_distance, calculates the distance of each point from the separator, while linear_classifier classifies each point either as belonging to class -1 or class +1.

The perceptron algorithm then uses the preceding classifier function to find the correct separator using the training data set.

#function to calculate 2nd norm
second_norm = function(x) {sqrt(sum(x * x))}

#perceptron training algorithm
perceptron = function(x, y, learning_rate=1) {

w = vector(length = ncol(x)) # initialize w
b = 0 # Initialize b
k = 0 # count iterations

#constant with value greater than distance of furthest point
R = max(apply(x, 1, second_norm)) 

incorrect = TRUE # flag to identify classifier

#initialize plot
plot(x,cex=0.2)

#loop till correct classifier is not found
while (incorrect ) {

 incorrect =FALSE 

 #classify with current weights
 yc <- linear_classifier(x,w,b)
 #Loop over each point in the input x
 for (i in 1:nrow(x)) {
 #update weights if point not classified correctly
 if (y[i] != yc[i]) {
 w <- w + learning_rate * y[i]*x[i,]
 b <- b + learning_rate * y[i]*R^2
 k <- k+1

 #currect classifier's components
 # update plot after ever 5 iterations
 if(k%%5 == 0){
 intercept <- - b / w[[2]]
 slope <- - w[[1]] / w[[2]]
 #plot the classifier hyper plane
 abline(intercept,slope,col="red")
 #wait for user input
 cat ("Iteration # ",k,"\n")
 cat ("Press [enter] to continue")
 line <- readline()
 }
 incorrect =TRUE
 }
 }
}

s = second_norm(w)
#scale the classifier with unit vector
return(list(w=w/s,b=b/s,updates=k))
}

It's now time to train the perceptron!

#train the perceptron
p <- perceptron(x,Y)

The plot will look as follows:

The perceptron working its way to find the correct classifier. The correct classifier is shown in green

The preceding plots show the perceptron's training state. Each incorrect classifier is shown with a red line. As shown, the perceptron ends after finding the correct classifier marked in green.

A zoomed-in view of the final separator can be seen as follows:

#classify based on calculated 
y <- linear_classifier(x,p$w,p$b)


plot(x,cex=0.2)

#zoom into points near the separator and color code them
#marking data points as + which have y=1 and – for others
points(subset(x,Y==1),col="black",pch="+",cex=2)
points(subset(x,Y==-1),col="red",pch="-",cex=2)

# compute intercept on y axis of separator
# from w and b
intercept <- - p$b / p$w[[2]]

# compute slope of separator from w
slope <- - p$w[[1]] /p$ w[[2]]

# draw separating boundary
abline(intercept,slope,col="green")

The plot looks as shown in the following figure:

The correct classifier function as found by perceptron