0% found this document useful (0 votes)

10 views11 pages

Session 2 On Random Forest

Uploaded by

me.aritra01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views11 pages

Session 2 On Random Forest

Uploaded by

me.aritra01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Recap

31 July 2023 15:16

Session 2 on Random Forest Page 1

Why Ensemble Techniques Work?
31 July 2023 08:42

Session 2 on Random Forest Page 2

Random Forest Hyperparameters
31 July 2023 08:44

Forest Level HP Tree Level HP Miscellaneous HP

N_estimators Criterion Oob_score

Max_features Max_depth N_jobs
Bootstrap Min_Samples_split Random_state
Max_samples Min_samples_leaf verbose
Min_weight_fraction_leaf Warm_start
Max_leaf_nodes Class_weight
Min_impurity_decrease
Ccp_apha

Session 2 on Random Forest Page 3

Session 2 on Random Forest Page 4
Session 2 on Random Forest Page 5
OOB Score
31 July 2023 08:47

"OOB" stands for "out-of-bag". In the context of machine learning, an out-of-bag score is a
method of measuring the prediction error of random forests, bagging classifiers, and other
ensemble methods that use bootstrap aggregation (bagging) when sub-samples of the training
dataset are used to train individual models.

Here's how it works:

1. Each tree in the ensemble is trained on a distinct bootstrap sample of the data. By the
nature of bootstrap sampling, some samples from the dataset will be left out during the
training of each tree. These samples are called "out-of-bag" samples.

2. The out-of-bag samples can then be used as a validation set. We can pass them through
the tree that didn't see them during training and obtain predictions.

3. These predictions are then compared to the actual values to compute an "out-of-bag
score", which can be thought of as an estimate of the prediction error on unseen data.

One of the advantages of the out-of-bag score is that it allows us to estimate the prediction
error without needing a separate validation set. This can be particularly useful when the
dataset is small and partitioning it into training and validation sets might leave too few
samples for effective learning.

Session 2 on Random Forest Page 6

Session 2 on Random Forest Page 7
Extremely Randomized Trees
31 July 2023 11:31

Extra Trees is short for "Extremely Randomized Trees". It's a modification of the Random
Forest algorithm that changes the way the splitting points for decision tree branches are
chosen.

In traditional decision tree algorithms (and therefore in Random Forests), the optimal split
point for each feature is calculated, which involves a degree of computation. For a given node,
the feature and the corresponding optimal split point that provide the best split are chosen.
On the other hand, in the Extra Trees algorithm, for each feature under consideration, a split
point is chosen completely at random. The best-performing feature and its associated random
split are then used to split the node. This adds an extra layer of randomness to the model,
hence the name "Extremely Randomized Trees".

Because of this difference, Extra Trees tend to have more branches (be deeper) than Random
Forests, and the splits are made more arbitrarily. This can sometimes lead to models that
perform better, especially on tasks where the data may not have clear optimal split points.
However, like all models, whether Extra Trees will outperform Random Forests (or any other
algorithm) depends on the specific dataset and task.

Session 2 on Random Forest Page 8

Session 2 on Random Forest Page 9
Advantages and Disadvantages
31 July 2023 08:53

Advantages

• Robustness to Overfitting: Random Forests are less prone to overfitting compared to

individual decision trees, because they average the results from many different trees, each
of which might overfit the data in a different way.

• Handling Large Datasets: They can handle large datasets with high dimensionality
effectively.

• Less Pre-processing: Random Forests can handle both categorical and numerical variables
without the need for scaling or normalization. They can also handle missing values.

• Variable Importance: They provide insights into which features are most important in the
prediction.

• Parallelizable: The training of individual trees can be parallelized, as they are independent
of each other. This speeds up the training process.

• Non-Parametric: Random Forests are non-parametric, meaning they make no assumptions

about the functional form of the transformation from inputs to output. This makes them
very flexible and able to model complex, non-linear relationships.

Disadvantages

• Model Interpretability: One of the biggest drawbacks of Random Forests is that they lack
the interpretability of simpler models like linear regression or decision trees. While you can
rank features by their importance, the model as a whole is essentially a black box.

• Performance with Unbalanced Data: Random Forests can be biased towards the majority
class when dealing with unbalanced datasets. This can sometimes be mitigated by
balancing the dataset prior to training.

• Predictive Performance: Although Random Forests generally perform well, they may not
always provide the best predictive performance. Gradient boosting machines, for instance,
often outperform Random Forests .If the relationships within the data are linear, a linear
model will likely perform better than a Random Forest.

• Inefficiency with Sparse Data: Random Forests might not be the best choice for sparse data
or text data where linear models or other algorithms might be more suitable.

• Parameters Tuning: Although Random Forests require less tuning than some other models,
there are still several parameters (like the number of trees, tree depth, etc.) that can affect
model performance and need to be optimized.

• Difficulty with High Cardinality Features: Random Forests can struggle with high cardinality
categorical features (features with a large number of distinct values). These types of
features can lead to trees that are biased towards the variables with more levels, and may
cause overfitting.

• Can't Extrapolate - This is because they do not predict beyond the range of the training
data, and that they may not predict as accurately as other regression models.

Session 2 on Random Forest Page 10

data, and that they may not predict as accurately as other regression models.

Session 2 on Random Forest Page 11

25 June 2024 12:34: Random Fores Page 1
No ratings yet
25 June 2024 12:34: Random Fores Page 1
6 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
ML Lec6
No ratings yet
ML Lec6
4 pages
Random Forest
No ratings yet
Random Forest
21 pages
Random Forest for ML Enthusiasts
No ratings yet
Random Forest for ML Enthusiasts
4 pages
Randon Forest
No ratings yet
Randon Forest
34 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
21 pages
Random Forest Algorithm in Machine Learning Random Forest Random Forests or Random Decision Trees Decision Trees
No ratings yet
Random Forest Algorithm in Machine Learning Random Forest Random Forests or Random Decision Trees Decision Trees
6 pages
Random Forest
No ratings yet
Random Forest
14 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
Random Forest
No ratings yet
Random Forest
29 pages
Understanding Random Forest Algorithm
No ratings yet
Understanding Random Forest Algorithm
8 pages
Random Forest Medical Diagnosis 1684665707
No ratings yet
Random Forest Medical Diagnosis 1684665707
10 pages
Understanding Random Forests in Machine Learning
100% (1)
Understanding Random Forests in Machine Learning
4 pages
Lecture-12 Machine Learning With Python
No ratings yet
Lecture-12 Machine Learning With Python
18 pages
05.random Forest
No ratings yet
05.random Forest
3 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Random Forest for Data Scientists
No ratings yet
Random Forest for Data Scientists
38 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Random Forest
No ratings yet
Random Forest
25 pages
Random Forest Class Lecture Notes
No ratings yet
Random Forest Class Lecture Notes
2 pages
Random Forest, CNN and Different Algorithm
No ratings yet
Random Forest, CNN and Different Algorithm
14 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
Random Forest
No ratings yet
Random Forest
25 pages
Random Forest
No ratings yet
Random Forest
13 pages
Case Study Possible Questions
No ratings yet
Case Study Possible Questions
3 pages
Random Forest: Machine Learning Guide
100% (1)
Random Forest: Machine Learning Guide
32 pages
Random Forest Presentation
No ratings yet
Random Forest Presentation
37 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
Ensemble Learning Explained
No ratings yet
Ensemble Learning Explained
32 pages
Random Forests: Paper Presentation For CSI5388 Pengcheng Xi Mar. 23, 2005
No ratings yet
Random Forests: Paper Presentation For CSI5388 Pengcheng Xi Mar. 23, 2005
23 pages
Random Forest
No ratings yet
Random Forest
6 pages
Random Forest
No ratings yet
Random Forest
20 pages
Random Forest in ML
No ratings yet
Random Forest in ML
13 pages
Random Forest
No ratings yet
Random Forest
10 pages
Understanding Random Forest Algorithm
No ratings yet
Understanding Random Forest Algorithm
2 pages
Da MS
No ratings yet
Da MS
24 pages
Random Forest and Parameter Tuning in R
No ratings yet
Random Forest and Parameter Tuning in R
9 pages
Trees and Random Forest
No ratings yet
Trees and Random Forest
34 pages
Notes On Random Forest
No ratings yet
Notes On Random Forest
2 pages
Random Forest Summary - Rashmi
No ratings yet
Random Forest Summary - Rashmi
2 pages
Random Forest: Abdelmoniem Bayoumi, PHD
No ratings yet
Random Forest: Abdelmoniem Bayoumi, PHD
10 pages
Random Forest PDF
No ratings yet
Random Forest PDF
14 pages
Lecture #15: Regression Trees & Random Forests
No ratings yet
Lecture #15: Regression Trees & Random Forests
34 pages
Aditri Chaudhuri - DM
No ratings yet
Aditri Chaudhuri - DM
10 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
9 pages
Random Forests
No ratings yet
Random Forests
1 page
Data Mining Notes
No ratings yet
Data Mining Notes
5 pages
Random Forests
No ratings yet
Random Forests
43 pages
Random Forest
No ratings yet
Random Forest
8 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
3 pages
Random Forest Algorithm 1
100% (2)
Random Forest Algorithm 1
14 pages
Random Forest Algorithm Unit 3
No ratings yet
Random Forest Algorithm Unit 3
2 pages
Random Forest Algorithm Updated
No ratings yet
Random Forest Algorithm Updated
11 pages
Random Forest Lecture
No ratings yet
Random Forest Lecture
5 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Machine LearningA Review
No ratings yet
Machine LearningA Review
10 pages
Class 8 - CH 11
No ratings yet
Class 8 - CH 11
2 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Classfication and Prediction
No ratings yet
Classfication and Prediction
133 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Crime Data Mediante Machine Learning
No ratings yet
Crime Data Mediante Machine Learning
6 pages
Agroconnect Connecting Farmers To Smart Agriculture
No ratings yet
Agroconnect Connecting Farmers To Smart Agriculture
7 pages
Decision Analysis Techniques Explained
No ratings yet
Decision Analysis Techniques Explained
31 pages
Risk Analysis Techniques for Projects
No ratings yet
Risk Analysis Techniques for Projects
2 pages
Rainfall Prediction with ML Techniques
No ratings yet
Rainfall Prediction with ML Techniques
7 pages
Chapter 2 Decision Analysis
No ratings yet
Chapter 2 Decision Analysis
41 pages
2 Decision Trees 2
No ratings yet
2 Decision Trees 2
12 pages
Chap-6 Machine Learning Introduction
No ratings yet
Chap-6 Machine Learning Introduction
49 pages
Abstract
No ratings yet
Abstract
11 pages
Mba ZG536 Ec-3r First Sem 2024-2025
No ratings yet
Mba ZG536 Ec-3r First Sem 2024-2025
2 pages
Midterm Solutions
No ratings yet
Midterm Solutions
8 pages
Mini Project Review 1
No ratings yet
Mini Project Review 1
32 pages
Quiz Notes For Aiml
No ratings yet
Quiz Notes For Aiml
14 pages
Fraud Detection for Data Science Students
No ratings yet
Fraud Detection for Data Science Students
38 pages
Building A Symptom-Based Disease Diagnosis Web App
No ratings yet
Building A Symptom-Based Disease Diagnosis Web App
6 pages
TB 969425740
No ratings yet
TB 969425740
16 pages
Dmbi Unit-3
No ratings yet
Dmbi Unit-3
21 pages
Data Science and Machine Learning With Python (New Module)
No ratings yet
Data Science and Machine Learning With Python (New Module)
16 pages
Decision Analysis: Decision Trees & Risk
No ratings yet
Decision Analysis: Decision Trees & Risk
70 pages
Breast Cancer
No ratings yet
Breast Cancer
20 pages
Business Studies Scenario Planning
No ratings yet
Business Studies Scenario Planning
3 pages
ML Disha
No ratings yet
ML Disha
46 pages
Artificial Intelligence and Machine Learning Notes
No ratings yet
Artificial Intelligence and Machine Learning Notes
25 pages
Module 7 Homework Prompt - JMP
No ratings yet
Module 7 Homework Prompt - JMP
6 pages
Decision Tree
No ratings yet
Decision Tree
9 pages