0% found this document useful (0 votes)
41 views5 pages

RandomForest Project Report

Uploaded by

navshu35
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views5 pages

RandomForest Project Report

Uploaded by

navshu35
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Random Forest Classification on Social Network Ads Dataset

Project Title: Random Forest Classification on Social Network Ads


Name: Naveen Kumar

Date: August 1, 2025

Abstract

This project uses the Random Forest Classification algorithm to predict whether a user on a social network

will purchase a product based on their age and estimated salary. The dataset is preprocessed using standard

scaling, and results are evaluated using accuracy and a confusion matrix. Visualization of the decision

boundary shows clear class separation. The model achieves high performance on the test set and

demonstrates the effectiveness of ensemble learning.

Table of Contents

1. Introduction

2. Literature Review

3. Problem Statement

4. Data Collection and Preprocessing

5. Methodology

6. Implementation

7. Results

8. Discussion

9. Conclusion

10. References

11. Appendices

12. Acknowledgments
Random Forest Classification on Social Network Ads Dataset

Introduction

This project explores a supervised machine learning approach to predict user behavior in online

advertisements. By using Random Forest, a powerful ensemble method, we aim to improve classification

accuracy over traditional single-tree models. The goal is to accurately predict if a person will buy a product

based on simple demographic inputs like age and salary.

Literature Review

Ensemble methods like Random Forest are widely known for reducing overfitting and increasing prediction

accuracy. Earlier studies show that decision trees are prone to high variance, which Random Forest

overcomes by averaging many trees. Applications in marketing and user behavior prediction have

demonstrated significant gains through such methods.

Problem Statement

Predict whether a user will purchase a product based on age and estimated salary.

Assumptions:

- Only two features are used.

- Binary classification (0 = No, 1 = Yes).

Limitations:

- Does not account for other possible influences (e.g., device used, browsing time, gender).

Data Collection and Preprocessing

Dataset: Social_Network_Ads.csv

Features: Age, Estimated Salary


Random Forest Classification on Social Network Ads Dataset

Target: Purchased (0 or 1)

Data was split into training and testing sets (75% train, 25% test). Feature scaling was applied using

StandardScaler.

Methodology

We used Random Forest Classification with:

- 10 decision trees (n_estimators=10)

- Entropy criterion for split decisions

Rationale:

- Random Forest reduces overfitting

- Handles non-linearly separable data better than linear models

Implementation

Language: Python

Libraries: scikit-learn, pandas, matplotlib, numpy

Code Example:

from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier(n_estimators=10, criterion='entropy', random_state=0)

classifier.fit(x_train, y_train)

Results

Model Predictions:
Random Forest Classification on Social Network Ads Dataset

classifier.predict([[30, 87000]]) -> [1]

classifier.predict([[40, 0]]) -> [0]

Confusion Matrix:

[[64 4]

[ 3 29]]

Accuracy: 93%

Decision boundary shows clear class separation.

Discussion

The model performed well with 93% accuracy. False positives and negatives were minimal. Some

misclassifications may be due to the limited feature space or overlapping classes in the dataset.

Conclusion

The Random Forest model provided strong performance for this binary classification task. This approach

could be expanded by incorporating additional features for even better predictive power. This work

demonstrates the real-world applicability of ensemble models in marketing and recommendation systems.

References

- Scikit-learn documentation

- Breiman, L. (2001). 'Random Forests'. Machine Learning.

- Dataset: Social_Network_Ads.csv
Random Forest Classification on Social Network Ads Dataset

Appendices

How to Reproduce:

1. Install dependencies:

pip install numpy pandas matplotlib scikit-learn

2. Run the script: python social_ads_rf.py

Acknowledgments

Thanks to the open-source community and contributors of scikit-learn and matplotlib.

You might also like