Machine Learning
Machine Learning
Learning
using
R and Python
The Road Map…
We are living in the primitive age of machines, while the
imagination.
Instead of you writing the code, what we do is that, we feed data to the generic
algorithm, and the algorithm / machine builds the logic based on the given data.
2. These programs or algorithms are designed in a way that they learn and improve over
4. In machine learning, we do not have to define explicitly all the steps or conditions like
5. On the contrary, the machine gets trained on a training dataset, large enough to create a
•Linear Regression
•Logistic Regression
Least Square Method – Finding the best fit line
Least squares is a statistical method used to determine the best fit line or
the regression line by minimizing the sum of squares created by a
mathematical function. The “square” here refers to squaring the distance
between a data point and the regression line. The line with the minimum
value of the sum of square is the best-fit regression line.
y = actual value
y ̅ = mean value of y
yp = predicted
value of y
“A computer program is said to learn from experience E with respect to some
3. Decision Trees
4. Random Forest
5. Nearest Neighbor
Classifier: An algorithm that maps the input data to a specific category.
Classification model: A classification model tries to draw some conclusion from the
input values given for training. It will predict the class labels/categories for the new data.
Feature: A feature is an individual measurable property of a phenomenon being observed.
Binary Classification: Classification task with two possible outcomes. Eg: Gender
classification (Male / Female)
Multi class classification: Classification with more than two classes. In multi class
classification each sample is assigned to one and only one target label. Eg: An animal can
be cat or dog but not both at the same time
Multi label classification: Classification task where each sample is mapped to a set of
target labels (more than one class). Eg: A news article can be about sports, a person, and
location at the same time.
2. In simple terms, a Naive Bayes classifier assumes that the presence of a particular
3. Even if these features depend on each other or upon the existence of the other
4. Naive Bayes model is easy to build and particularly useful for very large data sets.
So, with the data, we have to predict whether “we can play on that day or not”.
Our model predicts that there is a 55% chance there will be a Game
tomorrow.
Decision Trees:
1. Decision tree builds classification or regression models in the form of a tree structure.
2. It breaks down a data set into smaller and smaller subsets while at the same time an
3. The final result is a tree with decision nodes and leaf nodes.
4. A decision node has two or more branches and a leaf node represents a classification
or decision.
5. The topmost decision node in a tree which corresponds to the best predictor called
root node.
1. Random forests or random decision forests are an ensemble learning method for
decision trees at training time and outputting the class that is the mode of the classes
2. Random decision forests correct for decision trees’ habit of over fitting to their
training set.
Nearest Neighbor:
it takes a bunch of labelled points and uses them to learn how to label other points.
2. To label a new point, it looks at the labelled points closest to that new point (those are
its nearest neighbors), and has those neighbors vote, so whichever label the most of
the neighbors have is the label for the new point (the “k” is the number of neighbors
it checks).
Support Vector Machine
New examples are then mapped into that same space and predicted to belong to a
Hard Clustering: In hard clustering, each data point either belongs to a cluster
completely or not.
Soft Clustering: In soft clustering, instead of putting each data point into a separate
assigned.
Types of clustering algorithms
4. Density Models: These models search the data space for areas of
varied density of data points in the data space.
K Means Clustering
K means is an iterative clustering algorithm that aims to find local maxima in each
iteration. This algorithm works in these 5 steps :
1. Specify the desired number of clusters K : Let us choose k=2 for these 5 data
points in 2-D space.
2. Randomly assign each data point to a cluster : Let’s assign three points in cluster
1 shown using red color and two points in cluster 2 shown using grey color.
3. Compute cluster centroids : The centroid of data points in the red cluster
is shown using red cross and those in grey cluster using grey cross.
4. Re-assign each point to the closest cluster centroid : Note that only the
data point at the bottom is assigned to the red cluster even though its closer
to the centroid of grey cluster. Thus, we assign that data point into grey
cluster
5. Re-compute cluster centroids : Now, re-computing the centroids for
both the clusters.
Repeat steps 4 and 5 until no improvements are possible : Similarly, we’ll repeat
the 4th and 5th steps until we’ll reach global optima. When there will be no further
switching of data points between two clusters for two successive repeats. It will
mark the termination of the algorithm if not explicitly mentioned.
Hierarchical Clustering
This algorithm starts with all the data points assigned to a cluster of their
own. Then two nearest clusters are merged into the same cluster. In the end,
this algorithm terminates when there is only a single cluster left.
At the bottom, we start with 25 data points, each assigned to separate clusters.
Two closest clusters are then merged till we have just one cluster at the top.
The height in the dendrogram at which two clusters are merged represents the
distance between two clusters in the data space.
the best choice of no. of clusters will be 4 as the red horizontal line in the
dendrogram below covers maximum vertical distance AB.
Two important things that you should know about hierarchical clustering are:
This algorithm has been implemented above using bottom up approach. It is also
possible to follow top-down approach starting with all data points assigned in the
same cluster and recursively performing splits till each data point is assigned a
separate cluster.
The decision of merging two clusters is taken on the basis of closeness of these
clusters. There are multiple metrics for deciding the closeness of two clusters :
Euclidean distance: ||a-b||2 = √(Σ(ai-bi))
Squared Euclidean distance: ||a-b||22 = Σ((ai-bi)2)
Manhattan distance: ||a-b||1 = Σ|ai-bi|
Maximum distance:||a-b||INFINITY = maxi|ai-bi|
Mahalanobis distance: √((a-b)T S-1 (-b)) {where, s : covariance matrix}
Applications of Clustering
1. Recommendation engines
2. Market segmentation
3. Social network analysis
4. Search result grouping
5. Medical imaging
6. Image segmentation
7. Anomaly detection
A pizza chain wants to open its delivery centres across a city. What do you
think would be the possible challenges?
They need to analyse the areas from where the pizza is being ordered
frequently.
They need to understand as to how many pizza stores has to be opened to
cover delivery in the area.
They need to figure out the locations for the pizza stores within all these areas
in order to keep the distance between the store and delivery points minimum.
Resolving these challenges includes a lot of analysis and mathematics.
Association
ASSOCIATION RULES
The Problem
When we go grocery shopping, we often have a standard list of things to buy. Each
shopper has a distinctive list, depending on one’s needs and preferences. A housewife
might buy healthy ingredients for a family dinner, while a bachelor might buy beer
and chips. Understanding these buying patterns can help to increase sales in several
ways. If there is a pair of items, X and Y, that are frequently bought together
1. Both X and Y can be placed on the same shelf, so that buyers of one item would
2. Promotional discounts could be applied to just one out of the two items.
5. While we may know that certain items are frequently bought together, the question
6. Besides increasing sales profits, association rules can also be used in other fields.
Measure 1: Support. This says how popular an itemset is, as measured by the
proportion of transactions in which an itemset appears. In Table 1 below, the support
of {apple} is 4 out of 8, or 50%. Itemsets can also contain multiple items. For
instance, the support of {apple, beer, rice} is 2 out of 8, or 25%.
For example:
If someone buys tea, he is likely to have bought fruit as well, possibly inspiring the
national income,
sales figures,
NRR for a team in cricket is calculated by the following formula
NRR = (Average runs scored per over by the team throughout the tournament) -
(Average runs scored per over by the opposing teams against it).
Also Cricinfo , says the following rules are applied in case a match is abandoned/
concluded a duckworth lewis method.
Where a match is abandoned, but a result is achieved under Duckworth/Lewis, for
net run rate purposes Team 1 will be accredited with Team 2's Par Score on
abandonment off the same number of overs faced by Team 2. Where a match is
concluded but with Duckworth/Lewis having been applied at an earlier point in the
match, Team 1 will be accredited with 1 run less than the final Target Score for Team
2 off the total number of overs allocated to Team 2 to reach the target.
Also only matches that never take place(abandoned without a ball being bowled) are
not considered for the same i believe.
For example in current IPL , RCB has for and against values as (1046/133.0) and
(1034/139.5) . So it would be (1046/133.0) - (1034/139.5) = 7.8644 - 7.4 =
(approx) +0.470
Another example:
Across the three games, TEAM1 scored 678 runs in a total of 147 overs and 2
balls (actually 147.333 overs), a rate of 678/147.333 or 4.602 rpo.
The run-rate scored against TEAM1 across the three games is calculated on the
basis of 466 runs in a total of 50 + 50 + 50 = 150 overs, a rate of 466/150 or
3.107 rpo.
The net run-rate is, therefore, 4.602 - 3.107 = + 1.495
NET RUN RATE OF TEAM1 is + 1.495
Deep
Learning
Introduction
Artificial Intelligence is nothing but the capability of a machine to imitate
intelligent human behavior. AI is achieved by mimicking a human brain, by
understanding how it thinks, how it learns, decides, and work while trying to
solve a problem.
Limitations
Machine learning was not capable of solving these use-cases and hence,
Deep learning came to the rescue. Deep learning is capable of handling
the high dimensional data and is also efficient in focusing on the right
features on its own. This process is called feature extraction.
How Deep Learning Works?
In an attempt to re-engineer a human brain, Deep Learning studies the basic unit
of a brain called a brain cell or a neuron. Inspired from a neuron an artificial
neuron or a perceptron was developed.
Any Deep neural network will consist of three types of layers:
The Input Layer
The Hidden Layer
The Output Layer
Thank You