Machine Learning
Machine Learning
Today’s Artificial Intelligence (AI) has far surpassed the hype of blockchain and
quantum computing. This is due to the fact that huge computing resources are easily
available to the common man. The developers now take advantage of this in creating
new Machine Learning models and to re-train the existing models for better
performance and results. The easy availability of High Performance Computing (HPC)
has resulted in a sudden increased demand for IT professionals having Machine
Learning skills.
In this tutorial, you will learn in detail about −
What is the crux of machine learning?
What are the different types in machine learning?
What are the different algorithms available for developing machine learning
models?
What tools are available for developing these models?
What are the programming language choices?
What platforms support development and deployment of Machine Learning
applications?
What IDEs (Integrated Development Environment) are available?
How to quickly upgrade your skills in this important area?
When you tag a face in a Facebook photo, it is AI that is running behind the scenes
and identifying faces in a picture. Face tagging is now omnipresent in several
applications that display pictures with human faces. Why just human faces? There are
several applications that detect objects such as cats, dogs, bottles, cars, etc. We have
autonomous cars running on our roads that detect objects in real time to steer the car.
When you travel, you use Google Directions to learn the real-time traffic situations and
follow the best path suggested by Google at that point of time. This is yet another
implementation of object detection technique in real time.
Let us consider the example of Google Translate application that we typically use
while visiting foreign countries. Google’s online translator app on your mobile helps you
communicate with the local people speaking a language that is foreign to you.
There are several applications of AI that we use practically today. In fact, each one of
us use AI in many parts of our lives, even without our knowledge. Today’s AI can
perform extremely complex jobs with a great accuracy and speed. Let us discuss an
example of complex task to understand what capabilities are expected in an AI
application that you would be developing today for your clients.
Example
We all use Google Directions during our trip anywhere in the city for a daily commute
or even for inter-city travels. Google Directions application suggests the fastest path to
our destination at that time instance. When we follow this path, we have observed that
Google is almost 100% right in its suggestions and we save our valuable time on the
trip.
You can imagine the complexity involved in developing this kind of application
considering that there are multiple paths to your destination and the application has to
judge the traffic situation in every possible path to give you a travel time estimate for
each such path. Besides, consider the fact that Google Directions covers the entire
globe. Undoubtedly, lots of AI and Machine Learning techniques are in-use under the
hoods of such applications.
Considering the continuous demand for the development of such applications, you will
now appreciate why there is a sudden demand for IT professionals with AI skills.
Statistical Techniques
The development of today’s AI applications started with using the age-old traditional
statistical techniques. You must have used straight-line interpolation in schools to
predict a future value. There are several other such statistical techniques which are
successfully applied in developing so-called AI programs. We say “so-called” because
the AI programs that we have today are much more complex and use techniques far
beyond the statistical techniques used by the early AI programs.
Some of the examples of statistical techniques that are used for developing AI
applications in those days and are still in practice are listed here −
Regression
Classification
Clustering
Probability Theories
Decision Trees
Here we have listed only some primary techniques that are enough to get you started
on AI without scaring you of the vastness that AI demands. If you are developing AI
applications based on limited data, you would be using these statistical techniques.
However, today the data is abundant. To analyze the kind of huge data that we
possess statistical techniques are of not much help as they have some limitations of
their own. More advanced methods such as deep learning are hence developed to
solve many complex problems.
Consider the following figure that shows a plot of house prices versus its size in sq. ft.
After plotting various data points on the XY plot, we draw a best-fit line to do our
predictions for any other house given its size. You will feed the known data to the
machine and ask it to find the best fit line. Once the best fit line is found by the
machine, you will test its suitability by feeding in a known house size, i.e. the Y-value in
the above curve. The machine will now return the estimated X-value, i.e. the expected
price of the house. The diagram can be extrapolated to find out the price of a house
which is 3000 sq. ft. or even larger. This is called regression in statistics. Particularly,
this kind of regression is called linear regression as the relationship between X & Y
data points is linear.
In many cases, the relationship between the X & Y data points may not be a straight
line, and it may be a curve with a complex equation. Your task would be now to find out
the best fitting curve which can be extrapolated to predict the future values. One such
application plot is shown in the figure below.
You will use the statistical optimization techniques to find out the equation for the best fit
curve here. And this is what exactly Machine Learning is about. You use known
optimization techniques to find the best solution to your problem.
Machine learning evolved from left to right as shown in the above diagram.
Initially, researchers started out with Supervised Learning. This is the case of
housing price prediction discussed earlier.
This was followed by unsupervised learning, where the machine is made to learn
on its own without any supervision.
Scientists discovered further that it may be a good idea to reward the machine
when it does the job the expected way and there came the Reinforcement
Learning.
Very soon, the data that is available these days has become so humongous that
the conventional techniques developed so far failed to analyze the big data and
provide us the predictions.
Thus, came the deep learning where the human brain is simulated in the Artificial
Neural Networks (ANN) created in our binary computers.
The machine now learns on its own using the high computing power and huge
memory resources that are available today.
It is now observed that Deep Learning has solved many of the previously
unsolvable problems.
The technique is now further advanced by giving incentives to Deep Learning
networks as awards and there finally comes Deep Reinforcement Learning.
Let us now study each of these categories in more detail.
Supervised Learning
Supervised learning is analogous to training a child to walk. You will hold the child’s
hand, show him how to take his foot forward, walk yourself for a demonstration and so
on, until the child learns to walk on his own.
Regression
Similarly, in the case of supervised learning, you give concrete known examples to the
computer. You say that for given feature value x1 the output is y1, for x2 it is y2, for x3
it is y3, and so on. Based on this data, you let the computer figure out an empirical
relationship between x and y.
Once the machine is trained in this way with a sufficient number of data points, now
you would ask the machine to predict Y for a given X. Assuming that you know the real
value of Y for this given X, you will be able to deduce whether the machine’s prediction
is correct.
Thus, you will test whether the machine has learned by using the known test data.
Once you are satisfied that the machine is able to do the predictions with a desired
level of accuracy (say 80 to 90%) you can stop further training the machine.
Now, you can safely use the machine to do the predictions on unknown data points, or
ask the machine to predict Y for a given X for which you do not know the real value of
Y. This training comes under the regression that we talked about earlier.
Classification
You may also use machine learning techniques for classification problems. In
classification problems, you classify objects of similar nature into a single group. For
example, in a set of 100 students say, you may like to group them into three
groups based on their heights - short, medium and long. Measuring the height of
each student, you will place them in a proper group.
Now, when a new student comes in, you will put him in an appropriate group by
measuring his height. By following the principles in regression training, you will train the
machine to classify a student based on his feature – the height. When the machine
learns how the groups are formed, it will be able to classify any unknown new student
correctly. Once again, you would use the test data to verify that the machine has
learned your technique of classification before putting the developed model in
production.
Supervised Learning is where the AI really began its journey. This technique was
applied successfully in several cases. You have used this model while doing the hand-
written recognition on your machine. Several algorithms have been developed for
supervised learning. You will learn about them in the following chapters.
Unsupervised Learning
In unsupervised learning, we do not specify a target variable to the machine, rather we
ask machine “What can you tell me about X?”. More specifically, we may ask questions
such as given a huge data set X, “What are the five best groups we can make out of
X?” or “What features occur together most frequently in X?”. To arrive at the answers
to such questions, you can understand that the number of data points that the machine
would require to deduce a strategy would be very large. In case of supervised learning,
the machine can be trained with even about few thousands of data points. However, in
case of unsupervised learning, the number of data points that is reasonably accepted
for learning starts in a few millions. These days, the data is generally abundantly
available. The data ideally requires curating. However, the amount of data that is
continuously flowing in a social area network, in most cases data curation is an
impossible task.
The following figure shows the boundary between the yellow and red dots as
determined by unsupervised machine learning. You can see it clearly that the machine
would be able to determine the class of each of the black dots with a fairly good
accuracy.
The unsupervised learning has shown a great success in many modern AI
applications, such as face detection, object detection, and so on.
Reinforcement Learning
Consider training a pet dog, we train our pet to bring a ball to us. We throw the
ball at a certain distance and ask the dog to fetch it back to us. Every time the dog
does this right, we reward the dog. Slowly, the dog learns that doing the job rightly
gives him a reward and then the dog starts doing the job right way every time in future.
Exactly, this concept is applied in “Reinforcement” type of learning. The technique was
initially developed for machines to play games. The machine is given an algorithm to
analyze all possible moves at each stage of the game. The machine may select one of
the moves at random. If the move is right, the machine is rewarded, otherwise it may
be penalized. Slowly, the machine will start differentiating between right and wrong
moves and after several iterations would learn to solve the game puzzle with a better
accuracy. The accuracy of winning the game would improve as the machine plays
more and more games.
The entire process may be depicted in the following diagram –
This technique of machine learning differs from the supervised learning in that you
need not supply the labelled input/output pairs. The focus is on finding the balance
between exploring the new solutions versus exploiting the learned solutions.
Deep Learning
The deep learning is a model based on Artificial Neural Networks (ANN), more
specifically Convolutional Neural Networks (CNN)s. There are several architectures
used in deep learning such as deep neural networks, deep belief networks, recurrent
neural networks, and convolutional neural networks.
These networks have been successfully applied in solving the problems of computer
vision, speech recognition, natural language processing, bioinformatics, drug design,
medical image analysis, and games. There are several other fields in which deep
learning is proactively applied. The deep learning requires huge processing power and
humongous data, which is generally easily available these days.
We will talk about deep learning more in detail in the coming chapters.
k-Nearest Neighbours
Decision Trees
Naive Bayes
Logistic Regression
Support Vector Machines
As we move ahead in this chapter, let us discuss in detail about each of the algorithms.
k-Nearest Neighbours
The k-Nearest Neighbours, which is simply called kNN is a statistical technique that
can be used for solving for classification and regression problems. Let us discuss the
case of classifying an unknown object using kNN. Consider the distribution of objects
as shown in the image given below
The diagram shows three types of objects, marked in red, blue and green colors. When
you run the kNN classifier on the above dataset, the boundaries for each type of object will
be marked as shown below –
Now, consider a new unknown object that you want to classify as red, green or blue. This
is depicted in the figure below.
As you see it visually, the unknown data point belongs to a class of blue objects.
Mathematically, this can be concluded by measuring the distance of this unknown point
with every other point in the data set. When you do so, you will know that most of its
neighbours are of blue color. The average distance to red and green objects would be
definitely more than the average distance to blue objects. Thus, this unknown object
can be classified as belonging to blue class.
The kNN algorithm can also be used for regression problems. The kNN algorithm is
available as ready-to-use in most of the ML libraries.
Decision Trees
A simple decision tree in a flowchart format is shown below −
You would write a code to classify your input data based on this flowchart. The
flowchart is self-explanatory and trivial. In this scenario, you are trying to classify an
incoming email to decide when to read it.
In reality, the decision trees can be large and complex. There are several algorithms
available to create and traverse these trees. As a Machine Learning enthusiast, you
need to understand and master these techniques of creating and traversing decision
trees.
Naive Bayes
Naive Bayes is used for creating classifiers. Suppose you want to sort out (classify)
fruits of different kinds from a fruit basket. You may use features such as color, size
and shape of a fruit, For example, any fruit that is red in color, is round in shape and is
about 10 cm in diameter may be considered as Apple. So to train the model, you would
use these features and test the probability that a given feature matches the desired
constraints. The probabilities of different features are then combined to arrive at a
probability that a given fruit is an Apple. Naive Bayes generally requires a small
number of training data for classification.
Logistic Regression
Look at the following diagram. It shows the distribution of data points in XY plane.
From the diagram, we can visually inspect the separation of red dots from green dots.
You may draw a boundary line to separate out these dots. Now, to classify a new data
point, you will just need to determine on which side of the line the point lies.
Support Vector Machines
Look at the following distribution of data. Here the three classes of data cannot be
linearly separated. The boundary curves are non-linear. In such a case, finding the
equation of the curve becomes a complex job.
The Support Vector Machines (SVM) comes handy in determining the separation
boundaries in such situations.
k-means clustering
The 2000 and 2004 Presidential elections in the United States were close — very
close. The largest percentage of the popular vote that any candidate received was
50.7% and the lowest was 47.9%. If a percentage of the voters were to have switched
sides, the outcome of the election would have been different. There are small groups of
voters who, when properly appealed to, will switch sides. These groups may not be
huge, but with such close races, they may be big enough to change the outcome of the
election. How do you find these groups of people? How do you appeal to them with a
limited budget? The answer is clustering.
Let us understand how it is done.
First, you collect information on people either with or without their consent: any sort of
information that might give some clue about what is important to them and what will
influence how they vote.
Then you put this information into some sort of clustering algorithm.
Next, for each cluster (it would be smart to choose the largest one first) you craft a message
that will appeal to these voters.
Finally, you deliver the campaign and measure to see if it’s working.
Clustering is a type of unsupervised learning that automatically forms clusters of similar
things. It is like automatic classification. You can cluster almost anything, and the more
similar the items are in the cluster, the better the clusters are. In this chapter, we are
going to study one type of clustering algorithm called k-means. It is called k-means
because it finds ‘k’ unique clusters, and the center of each cluster is the mean of the
values in that cluster.
Cluster Identification
Cluster identification tells an algorithm, “Here’s some data. Now group similar things
together and tell me about those groups.” The key difference from classification is that
in classification you know what you are looking for. While that is not the case in
clustering.
Clustering is sometimes called unsupervised classification because it produces the
same result as classification does but without having predefined classes.
Language Choice
Here is a list of languages that support ML development −
Python
R
Matlab
Octave
Julia
C++
C
This list is not essentially comprehensive; however, it covers many popular languages
used in machine learning development. Depending upon your comfort level, select a
language for the development, develop your models and test.
IDEs
Here is a list of IDEs which support ML development −
R Studio
Pycharm
iPython/Jupyter Notebook
Julia
Spyder
Anaconda
Rodeo
Google –Colab
The above list is not essentially comprehensive. Each one has its own merits and
demerits. The reader is encouraged to try out these different IDEs before narrowing
down to a single one.
Platforms
Here is a list of platforms on which ML applications can be deployed −
IBM
Microsoft Azure
Google Cloud
Amazon
Mlflow
References
Source:
1) https://upload.wikimedia.org/wikipedia/commons/c/c9/
2) https://chrisjmccormick.files.wordpress.com/2013/08/approx_decision_boun dary.png
3) https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
4) https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm