0% found this document useful (0 votes)
16 views68 pages

UNIT I-Machine Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
16 views68 pages

UNIT I-Machine Learning

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 68

Machine Learning

Course Outcome
1. Recognize the characteristics of machine learning that make it useful
to real-world problems.
2. Able to use regularized regression and Classification algorithms.
3. Evaluate machine learning algorithms and model selection.
4. Understand scalable machine learning and machine learning for IoT.
5. Understand Deep leaning and Expert system.
Introduction to Machine Learning
What is Machine Learning?
―Learning is any process by which a system improves performance from
experience.‖ - Herbert Simon
Definition by Tom Mitchell (1998):
Machine Learning is the study of algorithms that
• improve their performance P
• at some task T
• with experience E.
A well-defined learning task is given by <P,T,E>.
Machine learning is a set of tools that, allow us to ―teach‖ computers
how to perform tasks by providing examples of how they should be done.
Introduction to Machine Learning
• In traditional programming, a programmer code all the rules with an expert for which
software is being developed. Each rule is based on a logical foundation; the machine
will execute an output following the logical statement. When the system grows
complex, more rules need to be written. It can quickly become unsustainable to
maintain.
• The machine learns how the input and output data are correlated and it writes a rule.
The programmers do not need to write new rules each time when there is new data.
The algorithms adapt in response to new data and experiences to improve efficacy
over time.
Machine learning is a diverse and exciting field,
and there are multiple ways of defining it:
The Artificial Intelligence View: Learning is central to human knowledge and intelligence,
and, likewise, it is also essential for building intelligent machines. Years of effort in AI has shown that
trying to build intelligent computers by programming all the rules cannot be done; automatic learning is
crucial. For example, we humans are not born with the ability to understand language — we learn it — and
it makes sense to try to have computers learn language instead of trying to program it all it.
The Software Engineering View Machine learning allows us to program computers by
:

example, which can be easier than writing code the traditional way.
The Stats(statistics) View Machine learning is the marriage of computer science and statistics:
.

computational techniques are applied to statistical problems. Machine learning has been applied to a vast
number of problems in many contexts, beyond the typical statistics problems. Machine learning is often
designed with different considerations than statistics (e.g., speed is often more important than accuracy).
History of Machine Learning
History of Machine Learning
• It was in the 1940s when the first manually operated computer system, ENIAC
(Electronic Numerical Integrator and Computer), was invented.
• At that time the word ―computer‖ was being used as a name for a human with
intensive numerical computation capabilities, so, ENIAC was called a numerical
computing machine!
• Well, you may say it has nothing to do with learning?! WRONG, from the
beginning the idea was to build a machine able to emulate human thinking and
learning.
History of Machine Learning
• In the 1950s, we see the first computer game program claiming to be able to beat the checkers
world champion. This program helped checkers players a lot in improving their skills!

• Around the same time, Frank Rosenblatt invented the Perceptron which was a very, very simple
classifier but when it was combined in large numbers, in a network, it became a powerful monster.

• Thanks to statistics, machine learning became very famous in the 1990s. The intersection of
computer science and statistics gave birth to probabilistic approaches in AI. This shifted the field
further toward data-driven approaches. Having large-scale data available, scientists started to build
intelligent systems that were able to analyze and learn from large amounts of data.
Need of Machine Learning
• Machine learning is a field raised out of artificial intelligence (AI). Through the
application of AI, human build better and intelligent machines. But they were
unable to program more complex and constantly evolving challenges.

• Then came the realization that the only way to achieve more advanced tasks was
to let the machine learn from its own input.

• So machine learning was developed as a new capability for computers, and is now
present in so many segments of technology we may not even realize it's there.
Need of Machine Learning
• Some data sets are so massive that the human brain needs help finding patterns, and this is
where machine learning swings into action.

• If big data and cloud computing are gaining importance for their contributions, machine
learning also deserves recognition for helping data scientists analyze large chunks of data
via an automated process that saves time and effort.

• The techniques used for data mining have been around for years, but they're not effective
without the power to run algorithms. When you run deep learning with access to better
data, the output leads to dramatic advances, which is why there's a need for machine
learning.
Why Machine Learning is so important?
Working Process of Machine Learning
Machine Learning Applications

• There are many uses of Machine


Learning in various fields. These
fields have different applications
of Supervised, Unsupervised and
Reinforcement learning.
Types of Machine Learning

Machine learning can be


classified into 3 types of
algorithms:
▪ Supervised Learning
▪ Unsupervised Learning
▪ Reinforcement Learning
Supervised Learning
• In supervised learning, algorithms are trained using labeled data, where
the input and the output are known.
• The data is fed in the learning algorithm as a set of inputs, along with the
corresponding outputs, and the algorithm learns by comparing its actual
production with correct outputs to find errors. It then modifies the model
accordingly.

• The raw data divided into two parts, i.e. training data and testing data.
Supervised Learning
• Supervised learning uses the data patterns to predict the values of
additional data for the labels.

• Such method is commonly used in applications where historical data


predict the upcoming events.
• Ex:- It can anticipate when transactions are likely to be fraudulent or
which insurance customer is expected to file a claim.
Supervised Learning Algorithm
Types of Supervised learning
• Classification: A classification problem is when the output variable is
a category, such as ―red‖ or ―blue‖ or ―disease‖ and ―no disease‖.

• Regression: A regression problem is when the output variable is a real


value, such as ―dollars‖ or ―weight‖.
Unsupervised Learning
• Unsupervised Learning is the second type of machine learning, in
which unlabeled data are used to train the algorithm, which means it
used against data that has no historical labels.

• What is being showing must figure out by the algorithm. The purpose
is to explore the data and find some structure within.
Unsupervised Learning
• In unsupervised learning the data is unlabeled, and the input of raw information
directly to the algorithm without pre-processing of the data and without knowing
the output of the data.

• The algorithm figures out the data and according to the data segments, it makes
clusters of data with new labels.
• This learning technique works well on transactional data.
• These algorithms are also used to segment text topics, recommend items and
identify data outliers.
Unsupervised Learning
Types of Unsupervised learning
• Clustering: A clustering technique is to find similarities in the data
point and group similar data points together and to figures out that new
data should belong to which cluster.

• Association: An association rule learning problem is where you want


to discover rules that describe large portions of your data, such as
people that buy X also tend to buy Y.
Reinforcement Learning
• A reinforcement learning algorithm, or agent, learns by interacting with its environment.
• The agent receives rewards by performing correctly and penalties for performing
incorrectly. The agent learns without intervention from a human by maximizing its reward
and minimizing its penalty.

• The objective is for the agent to take actions that maximise the expected reward over a
given measure of time. The agent will reach the goal much quicker by following a good
policy. So the purpose of reinforcement learning is to learn the best plan.
Reinforcement Learning
Reinforcement Learning
• It is a type of dynamic programming that trains algorithms using a
system of reward and punishment.

• With reinforcement learning, the algorithm discovers through trial and


error which actions yield the most significant rewards.

• The reinforcement learning frequently used for robotics, gaming, and


navigation.
Summarization of all Machine Learning Types

• Supervised Learning – Train Me!

• Unsupervised Learning – I am self sufficient in learning

• Reinforcement Learning – My life My rules! (Hit & Trial)


Classification of Machine Learning Concept
Distance Based Machine Learning Methods
• Distance-based algorithms are machine learning algorithms that classify
data points by computing distances between the new data points and a
number of internally stored data points.

• The data points that are closest to the old data points have the largest
influence on the classification assigned to the points.
• In Distance-based machine learning algorithms, K-Nearest Neighbor
algorithm is simplest algorithm used for classification.
Distance Based Machine Learning Methods

• Similarity measure

• Euclidean space
• Euclidean distance
• Manhattan distance
• Minkowski distance
K-Nearest Neighbor (KNN) Algorithm
• K-Nearest Neighbor is a type of Supervised Machine Learning algorithm.
• K-NN algorithm can be used for both Regression as well as for Classification, however
it is mainly used for the Classification problems.

• K-NN algorithm uses ‗feature similarity‘ to predict the values between the new data
points and available data points and put the new data points into the category that is most
similar to the available categories.

• K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.
K-Nearest Neighbor (KNN) Algorithm
• The following two properties would define KNN well −
• Non-parametric learning algorithm − KNN is also a non-parametric learning
algorithm because it does not assume anything about the underlying data.
• Lazy learning algorithm − KNN is a lazy learning algorithm because it does
not have a specialized training set and uses all the data for training while
classification.
• KNN algorithm at the training phase just stores the dataset and when it gets
new data, then it classifies that data into a category that is much similar to the
new data.
K-Nearest Neighbor (KNN) Algorithm
With the help of K-NN, we can easily identify the category or class of a particular dataset.
Consider the below diagram:
K-Nearest Neighbor (KNN) Algorithm
• The working of the K-NN algorithm:

• Step-1: Select the number K of the neighbors

• Step-2: Calculate the Euclidean distance of K number of neighbors

• Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.

• Step-4: Among these k neighbors, count the number of the data points in each category.
• Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.

• Step-6: Our model is ready.


• Suppose we have a new data point and we need to put it in the required category.
Consider the below image:
• Firstly, we will choose the number of neighbors, so we will choose the k=5.
• Next, we will calculate the Euclidean distance between the data points.
• The Euclidean distance is the distance between two points. It can be calculated as:
• By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in
category A and two nearest neighbors in category B.

• Consider the below image:

• As we can see the 3 nearest neighbors are from category A, hence this new data point must
belong to category A.
Procedure to select the value of K in the K-NN Algorithm?

• Below are some points to remember while selecting the value of K in the K-NN
algorithm:

• There is no particular way to determine the best value for "K", so we need to try some
values to find the best out of them. The most preferred value for K is 5.

• A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of
outliers in the model.

• Large values for K are good, but it may find some difficulties.
Advantages and Disadvantages of KNN Algorithm

• Advantages :
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.

• Disadvantages:
• Always needs to determine the value of K which may be complex some time.
• The computation cost is high because of calculating the distance between the data
points for all the training samples
Introduction
to
Clustering Techniques
Introduction to Clustering Techniques
• Clustering is basically a type of unsupervised learning method.

• An unsupervised learning is a method where references are drawn


from datasets consisting of input data without labelled responses.

• Generally, it is used as a process to find meaningful structure,


explanatory underlying processes, generative features, and groupings
inherent in a set of examples.
Introduction to Clustering Techniques
Introduction to Clustering Techniques
• Clustering is the task of dividing the objects or data points into a number of groups such
that data points are more similar to each other in the same group and data points are
more dissimilar to other the data points in the other group.

• It does it by finding some similar patterns in the unlabeled dataset such as shape, size,
color, behavior, etc., and divides them as per the presence and absence of those similar
patterns.

• It is basically a collection of objects on the basis of similarity and dissimilarity between


them.
Introduction to Clustering Techniques
Example 1: Here clusters are distinguish, and can identify that there are 3
clusters in the below picture.
Example 2: Imagine there are number of objects in a basket. Each item has a distinct set of
features (size, shape, color, etc.). The task is to select features of each object in the basket and
accordingly divide them in a group.
Possible Applications
• Clustering Algorithms can be applied in many fields, for instance:

1. Marketing: Finding groups of customers with similar behavior given a large


database of customer data containing their properties and past buying records.

2. Biology: Classification of plants and animals given their features.

3. City-Planning: Helps in the identification of groups of houses in a city according


to house type, value, and geographic location.

4. Libraries : It is used in clustering different books on the basis of topics and


information.
Possible Applications
5. WWW: Clustering also helps in classifying documents on the web for
information discovery.

6. Insurance: It is used to acknowledge the customers, their policies and


identifying the frauds.

7. Data mining: Cluster analysis serves as a tool to gain insight into the
distribution of data to observe characteristics of each cluster.

8. Earthquake studies: By learning the earthquake-affected areas we can


determine the dangerous zones.
Requirements of Clustering Algorithm
• Scalability − Need highly scalable clustering algorithms to deal with large
databases.

• Minimal requirements for domain knowledge to determine input parameters-


Many clustering algorithms require users to input certain parameters in cluster
analysis. Parameters are often difficult to determine, especially for data sets
containing high-dimensional objects. This not only burdens users, but it also
makes the quality of clustering difficult to control.
Requirements of Clustering Algorithm
• Ability to deal with different kinds of attributes − Algorithms should be capable to be applied
on any kind of data such as interval-based (numerical) data, categorical, and binary data.

• Discovery of clusters with attribute shape − The clustering algorithm should be capable of
detecting clusters of arbitrary shape. They should not be bounded to only distance measures that
tend to find spherical cluster of small sizes.

• Insensitivity to the order of input records: Some clustering algorithms cannot incorporate newly
inserted data. Some clustering algorithms are sensitive to the order of input data. It is important to
develop algorithms that are insensitive to the order of input.
Requirements of Clustering Algorithm
• High dimensionality − The clustering algorithm should not only be able to handle
low-dimensional data but also the high dimensional space.

• Ability to deal with noisy data − Databases contain noisy, missing or incorrect
data. Some algorithms are sensitive to such data and may lead to poor quality
clusters.

• Interpretability − The clustering results should be interpretable, comprehensible,


and usable.
Problems Associated with using Clustering Technique
1. Current clustering techniques do not address all the requirements adequately (and
concurrently);

2. Dealing with large number of dimensions and large number of data items can be
problematic because of time complexity;

3. The effectiveness of the method depends on the definition of ―distance‖ (for distance-
based clustering);

4. If an obvious distance measure doesn‘t exist we must ―define‖ it, which is not always easy,
especially in multi-dimensional spaces;

5. The result of the clustering algorithm (that in many cases can be arbitrary itself) can be
interpreted in different ways.
Types of Clustering Methods
• Clustering methods are used to identify groups of similar objects in a multivariate data
sets collected from fields such as marketing, bio-medical and geo-spatial.

• They are different types of clustering methods, including:


• Partitioning methods
• Hierarchical methods
• Density-based methods
• Grid-based methods
• Model-based methods
• Subspace methods
• Graph based methods
Partitioning Method
• From a data set of n objects, a partitioning method constructs k partitions of the
data, where each partition represents a cluster and k ≤n.
• That is, it divides the data into k groups such that each group must contain at
least one object.
• In other words, partitioning methods conduct one-level partitioning on data
sets.
• To find clusters with complex shapes and for very large data sets, partitioning-
based methods need to be extended.
Partitioning Method
• Algorithms under Partitioning Method:
• K-means- The main objective of the K-Means algorithm is to minimize the
sum of distances between the points and their respective cluster centroid.
• K-medoids: A medoid can be defined as the point in the cluster, whose
dissimilarities with all the other points in the cluster is minimum.
Hierarchical methods
• A hierarchical clustering method works by grouping data objects into a tree of clusters

• A hierarchical method can be classified into: Agglomerative method and Divisive method
based on how the hierarchical decomposition is formed.

• The agglomerative approach, also called the bottom-up approach, starts with each object
forming a separate group. It successively merges the objects or groups close to one another,
until all the groups are merged into one (the topmost level of the hierarchy), or a termination
condition holds.

• The divisive approach, also called the top-down approach, starts with all the objects in the
same cluster. In each successive iteration, a cluster is split into smaller clusters, until
eventually each object is in one cluster, or a termination condition holds.
Hierarchical methods
Hierarchical methods
• Algorithms under Hierarchical method:
• BIRCH(Balanced Iterative Reducing and Clustering using Hierarchies): is

designed for clustering a large amount of numeric data by integrating hierarchical and
other clustering methods such as iterative partitioning.

• Chameleon: Chameleon is a hierarchical clustering algorithm that uses dynamic


modeling to determine the similarity between pairs of clusters.
Density-based methods

• To discover clusters with


arbitrary shape, density-based
clustering methods have been
developed.
• The clusters are modeled as
dense regions in the data space,
separated by sparse regions.
Density-based methods

• This is the main strategy behind density-based clustering methods,


which can discover clusters of non-spherical shape.

• These methods consider the clusters as the dense region having some
similarity and different from the lower dense region of the space.

• These methods have good accuracy and ability to merge two clusters.
Density-based methods
• Algorithms under Density-based method:
• DBSCAN(Density-Based Spatial Clustering of Applications with Noise):
DBSCAN grows clusters according to a density-based connectivity analysis.
• OPTICS (Ordering Points To Identify Clustering Structure): OPTICS
extends DBSCAN to produce a cluster ordering obtained from a wide range of
parameter settings.

• DENCLUE (DENsity-based CLUstEring): It clusters objects based on a set


of density distribution functions.
Grid-based methods
• The grid-based clustering approach uses a multi-resolution grid data
structure.

• It quantizes the object space into a finite number of cells that form a grid
structure on which all of the operations for clustering are performed.
• The main advantage of the approach is its fast processing time, which is
typically independent of the number of data objects, yet dependent on
only the number of cells in each dimension in the quantized space.
Grid-based methods
Grid-based methods
• Algorithms under Grid-based method:
• STING (Statistical Information Grid):The algorithm can be used on spatial
queries. The spatial area is divided into rectangle cells, which are represented
by a hierarchical structure. Statistical information regarding the attributes in
each grid cell is pre-computed and stored.

• CLIQUE (CLustering In QUEst): It is a simple grid-based method for


finding density based clusters in subspaces.
Model-based methods
• Model-based method is based on probability models, such as the finite
mixture model for probability densities.

• In other words, in model-based clustering, it is assumed that the data are


generated by a mixture of probability distributions in which each
component represents a different cluster.

• Thus a particular clustering method can be expected to work well when


the data conform to the model.
Subspace methods
• A subspace method searches various subspaces for clusters. Here, a cluster is a subset of objects that are
similar to each other in a subspace. The similarity is often captured by conventional measures such as

distance or density. For example, the CLIQUE algorithm is a subspace method algorithm.

• Generally there are two kinds of strategies:


• Bottom-up approaches start from low-dimensional subspaces and search higher dimensional
subspaces.

• Top-down approaches start from the full space and search smaller and smaller subspaces recursively.
Top-down approaches are effective only if the subspace of a cluster can be determined by the local
neighborhood.
Graph based methods
• To find clusters in a graph, visualize cutting the graph into pieces, each piece being a cluster, such that the vertices

within a cluster are well connected and the vertices in different clusters are connected in a much weaker way.

• The size of the cut is the number of edges in the cut set. For weighted graphs, the size of a cut is the sum of the weights

of the edges in the cut set.

• In graph theory and some network applications, a minimum cut is of importance. A cut is minimum if the cut‘s size is

not greater than any other cut‘s size. There are polynomial time algorithms to compute minimum cuts of graphs.

• There are two kinds of methods for clustering graph data, which address some challenges such as High computational

cost, High dimensionality, Sparsity, etc. One uses clustering methods for high-dimensional data, while the other is

designed specifically for clustering graphs.


END
OF
UNIT I

You might also like