0% found this document useful (0 votes)
24 views19 pages

Bagging, Boosting, Decision Trees, Random Forest

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views19 pages

Bagging, Boosting, Decision Trees, Random Forest

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Introduction to Decision

Trees
Before Starting the Course
● Upon completion of this course, you will have acquired general knowledge of introductory
artificial intelligence algorithms and data analysis.
● Since the specific topics are difficult to understand and it will not be easy for the person to
settle the logic during the lesson, a lot of individual practice is required.
● Slides will be explained with visuals without drowning in texts. That is why it is extremely
important to take notes during the lesson.
● Since the titles are sufficient for basic level algorithms, the titles should be researched,
and research should be done on sites that contain plenty of practice and theoretical
information.
Decision Trees
A decision tree uses a tree structure
to represent a set of possible
decision paths and an outcome for
each path.

The decision tree is one of the most


commonly used classification
techniques.

It is very easy to understand and


interpret and the process of arriving
at an estimate is completely
transparent.
Structure of Decision Trees
Root: The first cells of the decision trees
are called the root (root or root node).
Each observation is classified as “Yes” or
“No” according to the root condition.

Node: Root cells contain nodes (interval


nodes or nodes). Each observation is
classified with the help of nodes. The
complexity of the model increases as the
number of nodes increases.

Leaf: At the bottom of the decision tree


are leaves (leaf nodes or leaves). The
leaves give us the result.
Decision Trees Application
Decision Trees Application
Let's examine the following two-dimensional data, which has four class labels
Decision Trees Application
A simple decision tree built on this data will iteratively split the data along one or the other
axis according to some quantitative criteria and, at each level, assign the label of the new
region by a majority vote of the scores in it.

Note that after the first split, every point in the parent branch remains unchanged, so there
is no need to further subdivide this branch. At each level, each region is split again along
one of the two features, except for nodes containing an entire color.
How to Calculate
Decision Trees
Gini Index
Pure, means that all data in a selected dataset instance belongs to the same class.

Impure, means that the data is a mix of different classes.

Gini Impurity, It is a measure of the probability that a new sample of a random


variable will be misclassified if a new sample is randomly classified based on the
distribution of class labels from the dataset.

If our dataset is Pure, the probability of misclassification is 0. If our sample is a mix of


different classes, the probability of misclassification will be high.
Entropy
To construct a decision tree, we need to decide
which questions to ask and in what order. At each
stage of the tree there are some possibilities that
we eliminate and some that we do not.

Example: After we learned that an animal has no more


than five legs, we eliminated the possibility of being a
grasshopper. We didn't rule out the possibility of it being a
duck. Each possible question segments the remaining
probabilities according to their answers.

Entropy is a measure of the uncertainty of a


random variable. The higher the entropy, the
more information obtained.
Decision Trees Advantages
● It can work with both continuous and
discrete data
● It needs less data preprocessing. Does
not require outlier detection or scaling
● Decision trees are easily visualized and
classification rules are clearly visible so
they are easy to understand and
interpret
● It can be used for multiple output
Decision Trees Visualization
Ensemble Learning
More than one classification is called the event that
the estimation algorithm gives more successful
results with the logic of working together.

Example: Random forest is the drawing of more than one


decision tree algorithm for the same problem several times
and using them together to solve the problem.

● Max Voting
● Averaging
● Weighted Averaging
● Stacking
● Blending
Bootstrapping
Bagging
Bootstrap Aggregation or Bagging, It is
a widely used ensemble learning method to
reduce the variance in a noisy dataset.

After generating random data subsets from


the main dataset, multiple weak models are
trained independently.

The random forest algorithm is an extension of the


bagging method that uses both bagging and feature
randomness to generate a random forest from
unrelated decision trees.
Random Forest
This concept operates under an ensemble
method called bagging, in which multiple fit
estimators can be combined to reduce the
effect of overfitting.

Bagging uses a collection of parallel


estimators, each of which fits the data well
and averages the results to find a better
classification.

A collection of random decision trees is


known as a random forest.
Boosting
Boosting is an ensemble learning method that
combines a set of weak learners into a strong
learner to minimize training errors.

In reinforcement, a random sample of data is


selected and then sequentially trained – that is,
each model tries to compensate for the weaknesses
of the previous one.

At each iteration, the weak rules from each


classifier are combined to form a single strong
prediction rule.
Boosting App
Source: [Link]
edac1174e971
7/3 = 11/3 =
2.33 3.66

You might also like