UNIT I-Machine Learning
UNIT I-Machine Learning
Course Outcome
1. Recognize the characteristics of machine learning that make it useful
to real-world problems.
2. Able to use regularized regression and Classification algorithms.
3. Evaluate machine learning algorithms and model selection.
4. Understand scalable machine learning and machine learning for IoT.
5. Understand Deep leaning and Expert system.
Introduction to Machine Learning
What is Machine Learning?
―Learning is any process by which a system improves performance from
experience.‖ - Herbert Simon
Definition by Tom Mitchell (1998):
Machine Learning is the study of algorithms that
• improve their performance P
• at some task T
• with experience E.
A well-defined learning task is given by <P,T,E>.
Machine learning is a set of tools that, allow us to ―teach‖ computers
how to perform tasks by providing examples of how they should be done.
Introduction to Machine Learning
• In traditional programming, a programmer code all the rules with an expert for which
software is being developed. Each rule is based on a logical foundation; the machine
will execute an output following the logical statement. When the system grows
complex, more rules need to be written. It can quickly become unsustainable to
maintain.
• The machine learns how the input and output data are correlated and it writes a rule.
The programmers do not need to write new rules each time when there is new data.
The algorithms adapt in response to new data and experiences to improve efficacy
over time.
Machine learning is a diverse and exciting field,
and there are multiple ways of defining it:
The Artificial Intelligence View: Learning is central to human knowledge and intelligence,
and, likewise, it is also essential for building intelligent machines. Years of effort in AI has shown that
trying to build intelligent computers by programming all the rules cannot be done; automatic learning is
crucial. For example, we humans are not born with the ability to understand language — we learn it — and
it makes sense to try to have computers learn language instead of trying to program it all it.
The Software Engineering View Machine learning allows us to program computers by
:
example, which can be easier than writing code the traditional way.
The Stats(statistics) View Machine learning is the marriage of computer science and statistics:
.
computational techniques are applied to statistical problems. Machine learning has been applied to a vast
number of problems in many contexts, beyond the typical statistics problems. Machine learning is often
designed with different considerations than statistics (e.g., speed is often more important than accuracy).
History of Machine Learning
History of Machine Learning
• It was in the 1940s when the first manually operated computer system, ENIAC
(Electronic Numerical Integrator and Computer), was invented.
• At that time the word ―computer‖ was being used as a name for a human with
intensive numerical computation capabilities, so, ENIAC was called a numerical
computing machine!
• Well, you may say it has nothing to do with learning?! WRONG, from the
beginning the idea was to build a machine able to emulate human thinking and
learning.
History of Machine Learning
• In the 1950s, we see the first computer game program claiming to be able to beat the checkers
world champion. This program helped checkers players a lot in improving their skills!
• Around the same time, Frank Rosenblatt invented the Perceptron which was a very, very simple
classifier but when it was combined in large numbers, in a network, it became a powerful monster.
• Thanks to statistics, machine learning became very famous in the 1990s. The intersection of
computer science and statistics gave birth to probabilistic approaches in AI. This shifted the field
further toward data-driven approaches. Having large-scale data available, scientists started to build
intelligent systems that were able to analyze and learn from large amounts of data.
Need of Machine Learning
• Machine learning is a field raised out of artificial intelligence (AI). Through the
application of AI, human build better and intelligent machines. But they were
unable to program more complex and constantly evolving challenges.
• Then came the realization that the only way to achieve more advanced tasks was
to let the machine learn from its own input.
• So machine learning was developed as a new capability for computers, and is now
present in so many segments of technology we may not even realize it's there.
Need of Machine Learning
• Some data sets are so massive that the human brain needs help finding patterns, and this is
where machine learning swings into action.
• If big data and cloud computing are gaining importance for their contributions, machine
learning also deserves recognition for helping data scientists analyze large chunks of data
via an automated process that saves time and effort.
• The techniques used for data mining have been around for years, but they're not effective
without the power to run algorithms. When you run deep learning with access to better
data, the output leads to dramatic advances, which is why there's a need for machine
learning.
Why Machine Learning is so important?
Working Process of Machine Learning
Machine Learning Applications
• The raw data divided into two parts, i.e. training data and testing data.
Supervised Learning
• Supervised learning uses the data patterns to predict the values of
additional data for the labels.
• What is being showing must figure out by the algorithm. The purpose
is to explore the data and find some structure within.
Unsupervised Learning
• In unsupervised learning the data is unlabeled, and the input of raw information
directly to the algorithm without pre-processing of the data and without knowing
the output of the data.
• The algorithm figures out the data and according to the data segments, it makes
clusters of data with new labels.
• This learning technique works well on transactional data.
• These algorithms are also used to segment text topics, recommend items and
identify data outliers.
Unsupervised Learning
Types of Unsupervised learning
• Clustering: A clustering technique is to find similarities in the data
point and group similar data points together and to figures out that new
data should belong to which cluster.
• The objective is for the agent to take actions that maximise the expected reward over a
given measure of time. The agent will reach the goal much quicker by following a good
policy. So the purpose of reinforcement learning is to learn the best plan.
Reinforcement Learning
Reinforcement Learning
• It is a type of dynamic programming that trains algorithms using a
system of reward and punishment.
• The data points that are closest to the old data points have the largest
influence on the classification assigned to the points.
• In Distance-based machine learning algorithms, K-Nearest Neighbor
algorithm is simplest algorithm used for classification.
Distance Based Machine Learning Methods
• Similarity measure
• Euclidean space
• Euclidean distance
• Manhattan distance
• Minkowski distance
K-Nearest Neighbor (KNN) Algorithm
• K-Nearest Neighbor is a type of Supervised Machine Learning algorithm.
• K-NN algorithm can be used for both Regression as well as for Classification, however
it is mainly used for the Classification problems.
• K-NN algorithm uses ‗feature similarity‘ to predict the values between the new data
points and available data points and put the new data points into the category that is most
similar to the available categories.
• K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.
K-Nearest Neighbor (KNN) Algorithm
• The following two properties would define KNN well −
• Non-parametric learning algorithm − KNN is also a non-parametric learning
algorithm because it does not assume anything about the underlying data.
• Lazy learning algorithm − KNN is a lazy learning algorithm because it does
not have a specialized training set and uses all the data for training while
classification.
• KNN algorithm at the training phase just stores the dataset and when it gets
new data, then it classifies that data into a category that is much similar to the
new data.
K-Nearest Neighbor (KNN) Algorithm
With the help of K-NN, we can easily identify the category or class of a particular dataset.
Consider the below diagram:
K-Nearest Neighbor (KNN) Algorithm
• The working of the K-NN algorithm:
• Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
• Step-4: Among these k neighbors, count the number of the data points in each category.
• Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.
• As we can see the 3 nearest neighbors are from category A, hence this new data point must
belong to category A.
Procedure to select the value of K in the K-NN Algorithm?
• Below are some points to remember while selecting the value of K in the K-NN
algorithm:
• There is no particular way to determine the best value for "K", so we need to try some
values to find the best out of them. The most preferred value for K is 5.
• A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of
outliers in the model.
• Large values for K are good, but it may find some difficulties.
Advantages and Disadvantages of KNN Algorithm
• Advantages :
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
• Disadvantages:
• Always needs to determine the value of K which may be complex some time.
• The computation cost is high because of calculating the distance between the data
points for all the training samples
Introduction
to
Clustering Techniques
Introduction to Clustering Techniques
• Clustering is basically a type of unsupervised learning method.
• It does it by finding some similar patterns in the unlabeled dataset such as shape, size,
color, behavior, etc., and divides them as per the presence and absence of those similar
patterns.
7. Data mining: Cluster analysis serves as a tool to gain insight into the
distribution of data to observe characteristics of each cluster.
• Discovery of clusters with attribute shape − The clustering algorithm should be capable of
detecting clusters of arbitrary shape. They should not be bounded to only distance measures that
tend to find spherical cluster of small sizes.
• Insensitivity to the order of input records: Some clustering algorithms cannot incorporate newly
inserted data. Some clustering algorithms are sensitive to the order of input data. It is important to
develop algorithms that are insensitive to the order of input.
Requirements of Clustering Algorithm
• High dimensionality − The clustering algorithm should not only be able to handle
low-dimensional data but also the high dimensional space.
• Ability to deal with noisy data − Databases contain noisy, missing or incorrect
data. Some algorithms are sensitive to such data and may lead to poor quality
clusters.
2. Dealing with large number of dimensions and large number of data items can be
problematic because of time complexity;
3. The effectiveness of the method depends on the definition of ―distance‖ (for distance-
based clustering);
4. If an obvious distance measure doesn‘t exist we must ―define‖ it, which is not always easy,
especially in multi-dimensional spaces;
5. The result of the clustering algorithm (that in many cases can be arbitrary itself) can be
interpreted in different ways.
Types of Clustering Methods
• Clustering methods are used to identify groups of similar objects in a multivariate data
sets collected from fields such as marketing, bio-medical and geo-spatial.
• A hierarchical method can be classified into: Agglomerative method and Divisive method
based on how the hierarchical decomposition is formed.
• The agglomerative approach, also called the bottom-up approach, starts with each object
forming a separate group. It successively merges the objects or groups close to one another,
until all the groups are merged into one (the topmost level of the hierarchy), or a termination
condition holds.
• The divisive approach, also called the top-down approach, starts with all the objects in the
same cluster. In each successive iteration, a cluster is split into smaller clusters, until
eventually each object is in one cluster, or a termination condition holds.
Hierarchical methods
Hierarchical methods
• Algorithms under Hierarchical method:
• BIRCH(Balanced Iterative Reducing and Clustering using Hierarchies): is
designed for clustering a large amount of numeric data by integrating hierarchical and
other clustering methods such as iterative partitioning.
• These methods consider the clusters as the dense region having some
similarity and different from the lower dense region of the space.
• These methods have good accuracy and ability to merge two clusters.
Density-based methods
• Algorithms under Density-based method:
• DBSCAN(Density-Based Spatial Clustering of Applications with Noise):
DBSCAN grows clusters according to a density-based connectivity analysis.
• OPTICS (Ordering Points To Identify Clustering Structure): OPTICS
extends DBSCAN to produce a cluster ordering obtained from a wide range of
parameter settings.
• It quantizes the object space into a finite number of cells that form a grid
structure on which all of the operations for clustering are performed.
• The main advantage of the approach is its fast processing time, which is
typically independent of the number of data objects, yet dependent on
only the number of cells in each dimension in the quantized space.
Grid-based methods
Grid-based methods
• Algorithms under Grid-based method:
• STING (Statistical Information Grid):The algorithm can be used on spatial
queries. The spatial area is divided into rectangle cells, which are represented
by a hierarchical structure. Statistical information regarding the attributes in
each grid cell is pre-computed and stored.
distance or density. For example, the CLIQUE algorithm is a subspace method algorithm.
• Top-down approaches start from the full space and search smaller and smaller subspaces recursively.
Top-down approaches are effective only if the subspace of a cluster can be determined by the local
neighborhood.
Graph based methods
• To find clusters in a graph, visualize cutting the graph into pieces, each piece being a cluster, such that the vertices
within a cluster are well connected and the vertices in different clusters are connected in a much weaker way.
• The size of the cut is the number of edges in the cut set. For weighted graphs, the size of a cut is the sum of the weights
• In graph theory and some network applications, a minimum cut is of importance. A cut is minimum if the cut‘s size is
not greater than any other cut‘s size. There are polynomial time algorithms to compute minimum cuts of graphs.
• There are two kinds of methods for clustering graph data, which address some challenges such as High computational
cost, High dimensionality, Sparsity, etc. One uses clustering methods for high-dimensional data, while the other is