0% found this document useful (0 votes)

27 views8 pages

Project Report On Credit Risk Analysis Using Random Forest

This document outlines a project focused on credit risk analysis using a Random Forest classifier to predict loan defaults based on a dataset containing various applicant characteristics. It details the dataset, preprocessing steps, exploratory data analysis, model training, and performance evaluation, highlighting the Random Forest model's accuracy of 91.77% compared to baseline models. Future directions include enhancing the model through advanced feature engineering and incorporating external data sources for improved predictive accuracy.

Uploaded by

muzammil watto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views8 pages

Project Report On Credit Risk Analysis Using Random Forest

Uploaded by

muzammil watto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Credit Risk Analysis Using Random Forest

1. Introduction and Dataset Description

Introduction

In today's financial landscape, assessing credit risk is crucial for lenders and financial
institutions. This project offers a simplified view of the factors that contribute to credit risk,
making it an excellent opportunity for data scientists to apply their skills in machine learning and
predictive modeling. The project is focused on analyzing credit risk using machine learning
techniques. The goal is to predict loan defaults by developing a model, specifically a Random
Forest classifier, using a given dataset.

Dataset Description

The dataset provides essential information about loan applicants and their characteristics. Key
features include:

 ID: Unique identifier for each loan applicant.

 Age: Age of the loan applicant.
 Income: Income of the loan applicant.
 Home: Home ownership status (Own, Mortgage, Rent).
 Emp_Length: Employment length in years.
 Intent: Purpose of the loan (e.g., education, home improvement).
 Amount: Loan amount applied for.
 Rate: Interest rate on the loan.
 Status: Loan approval status (Fully Paid, Charged Off, Current).
 Percent_Income: Loan amount as a percentage of income.
 Default: Whether the applicant has defaulted on a loan previously (Yes, No).
 Cred_Length: Length of the applicant's credit history.

The dataset is valuable for assessing credit risk, a crucial task for lenders and financial
institutions. Proper handling and analysis of this data can lead to better credit risk assessment
methods.

2. Importing Libraries

Essential libraries are imported for data manipulation, visualization, and machine learning. These
include:

 Pandas and NumPy for data manipulation.

 Seaborn and Matplotlib for data visualization.
 Plotly for interactive visualizations.
 Scikit-learn for machine learning models and evaluation.
 Yellowbrick for visualizing model performance.
3. Data Loading and data pre-processing

Data Loading

The dataset is loaded into a DataFrame to facilitate data manipulation and analysis. This step
involves reading the data from a CSV file.

Data pre-processing

Data preprocessing is a critical step in preparing the dataset for analysis and modeling. This
process begins with handling missing values, which are common in real-world data. In this
project, missing values in the ‘Emp_Length’ and ‘Rate’ columns were addressed using the mean
imputation strategy. By applying the SimpleImputer from Scikit-learn, we replaced missing
values in these columns with the mean value, ensuring that no important information was lost
and that the dataset remained complete. To further ensure the integrity and quality of the dataset,
we removed all duplicate rows. This step is crucial as duplicates can skew analysis and model
training. The Id column, which serves only as a unique identifier and does not contribute to the
predictive modeling, was dropped from the dataset to streamline the data. Finally, we inspected
the number of unique values in each column to understand the dataset's structure better and to
identify any potential issues or insights regarding the diversity of the data. This thorough
preprocessing ensures that the dataset is clean, consistent, and ready for subsequent exploratory
data analysis and model building.

4. Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) is an essential step in understanding the underlying patterns
and relationships in the dataset. It helps in visualizing the distribution of data, identifying
outliers, and discovering trends that can inform feature engineering and model building.

Target Feature: Status

The status column, which indicates the loan approval status (Fully Paid, Charged Off, Current),
is the target variable for our predictive modeling. To understand its distribution, we use a count
plot:
This visualization helps us see the frequency of each loan status category, providing insight into
class imbalances that may need to be addressed in the modeling phase.

Home Ownership Status

The Home column indicates the home ownership status of the loan applicants. We visualize its
distribution using a count plot:

This plot helps us understand the distribution of home ownership types (Own, Mortgage, Rent)
among the applicants.

Loan Intent

The Intent column specifies the purpose of the loan, such as education or home improvement. Its
distribution is visualized as follows:

This visualization reveals the most common reasons for loan applications, which can be crucial
for understanding different segments of loan applicants.
Default History

The Default column indicates whether the applicant has defaulted on a loan previously. Its
distribution is visualized using a count plot:

Understanding the proportion of applicants with a default history helps in assessing the risk
profile of the dataset.

Converting Categorical Columns to Numeric

To prepare the dataset for machine learning algorithms, we convert categorical columns to
numeric using LabelEncoder:

This step ensures that all features are in a numerical format, suitable for feeding into machine
learning models.
Histogram of Dataset

Visualizing the distribution of numerical features using histograms:

Histograms provide a visual summary of the data distribution for each feature, highlighting
skewness, kurtosis, and the presence of outliers.

Heatmap of Correlation Matrix

The heatmap visualizes the correlation matrix of the features, indicating how features are related
to each other and to the target variable:
Correlations with Status

Analyzing the correlation of each feature with the status:

This bar plot highlights the features most strongly correlated with the target variable, providing
insights into which features might be most predictive of loan status.
6. Model Building

Data Splitting

We start by splitting the dataset into training and testing sets, ensuring an unbiased assessment of
the model's performance. The training set is used to train the model, and the testing set is used to
evaluate its effectiveness on unseen data.

Training the Random Forest Model

We focus on the Random Forest algorithm, which combines multiple decision trees to improve
prediction accuracy and robustness. The Random Forest model is trained on the training data to
learn the patterns and relationships within the dataset.

Making Predictions

Once the model is trained, we use it to predict the loan statuses in the testing set. This step
involves applying the trained model to new data to assess its predictive capabilities.

Evaluating Model Performance

We evaluate the model's performance using several metrics:

 Accuracy Score: The Random Forest model achieved an accuracy score of 91.77%, indicating its
high predictive capability.
 Confusion Matrix: Provides a detailed breakdown of the true positives, false positives, true
negatives, and false negatives.
 Classification Report: Offers detailed metrics like precision, recall, and F1-score for each class.

Feature Importance’s

Random Forest provides a measure of feature importance, indicating which features are most
influential in making predictions.
This bar plot visualizes the importance of each feature, helping to identify the most critical
factors influencing loan approval status.

7. Comparing Model Performance with Baseline Models

To ensure that the Random Forest model is the best choice, we compare its performance with
other baseline models like Logistic Regression and K-Nearest Neighbors.

The random Forest model outperformed the baseline models, achieving an accuracy of 91.77%
compared to 80.58% for Logistic Regression and 82.82% for K-Nearest Neighbors. This
indicates that the Random Forest model is more effective in predicting credit risk, providing
more accurate and reliable results.

Future Direction

To enhance the credit risk analysis model, future efforts should focus on advanced feature
engineering, handling imbalanced data through resampling or cost-sensitive learning, and
optimizing model performance with hyperparameter tuning and ensemble methods.
Incorporating external data sources, such as credit bureau scores and macroeconomic indicators,
can further improve predictive accuracy. Additionally, using tools like SHAP values for model
interpretation, ensuring regulatory compliance, and addressing ethical considerations will help
build a robust, transparent, and fair predictive model. Deploying and monitoring the model in a
real-time environment will ensure its ongoing effectiveness and reliability.

Kritika Sejwal 24MCI10023 ML Lab Project Report
No ratings yet
Kritika Sejwal 24MCI10023 ML Lab Project Report
10 pages
Machine Learning Paper BD
No ratings yet
Machine Learning Paper BD
16 pages
Decision Tree Assignment
No ratings yet
Decision Tree Assignment
7 pages
Random Forest
100% (1)
Random Forest
11 pages
An Kit
No ratings yet
An Kit
12 pages
Credit Default Project 23124001
No ratings yet
Credit Default Project 23124001
13 pages
Loan Status Prediction
No ratings yet
Loan Status Prediction
23 pages
DAV Lab Manual Yashraj
No ratings yet
DAV Lab Manual Yashraj
28 pages
Group 5 Dseb64a Report
No ratings yet
Group 5 Dseb64a Report
10 pages
Loan Approval Prediction Models
No ratings yet
Loan Approval Prediction Models
10 pages
Research Paper ALAS
No ratings yet
Research Paper ALAS
4 pages
PA v0.21
No ratings yet
PA v0.21
17 pages
Python Code For Loan Default Prediction
No ratings yet
Python Code For Loan Default Prediction
4 pages
Edafinal 1
No ratings yet
Edafinal 1
32 pages
LendingClub Loan Default Prediction Model
No ratings yet
LendingClub Loan Default Prediction Model
18 pages
Lending Club (B)
No ratings yet
Lending Club (B)
15 pages
Finance and Risk Analytics Project Sai Vinayak Sanam PDF
No ratings yet
Finance and Risk Analytics Project Sai Vinayak Sanam PDF
99 pages
Credit Card Default Prediction
No ratings yet
Credit Card Default Prediction
33 pages
Capstone Project Report v1 - Abhishek Bihani
No ratings yet
Capstone Project Report v1 - Abhishek Bihani
16 pages
SSRN Id3769854
No ratings yet
SSRN Id3769854
8 pages
Loan Eligibility Prediction Model Analysis
No ratings yet
Loan Eligibility Prediction Model Analysis
12 pages
Ai It HW MST Prac
No ratings yet
Ai It HW MST Prac
14 pages
Final Project Making Predictions From Data-Course 2: October 6, 2020
No ratings yet
Final Project Making Predictions From Data-Course 2: October 6, 2020
20 pages
Loan Approval Prediction System
No ratings yet
Loan Approval Prediction System
21 pages
Hands-On Activity 3.3 Random Forest Mantaring - Ipynb - Mantaring
No ratings yet
Hands-On Activity 3.3 Random Forest Mantaring - Ipynb - Mantaring
13 pages
Final Project Title and Abstract Group-3
No ratings yet
Final Project Title and Abstract Group-3
5 pages
Predictive Model Plan Report
No ratings yet
Predictive Model Plan Report
4 pages
Loan Approval Model Analysis
No ratings yet
Loan Approval Model Analysis
14 pages
PA v0.7
No ratings yet
PA v0.7
15 pages
Ranvijay 12203409
No ratings yet
Ranvijay 12203409
13 pages
PA v0.20
No ratings yet
PA v0.20
17 pages
Credit Loan Default Prediction
No ratings yet
Credit Loan Default Prediction
22 pages
Azki-Loan Data Analysis & Modeling
No ratings yet
Azki-Loan Data Analysis & Modeling
7 pages
ML - Report
No ratings yet
ML - Report
10 pages
DA PRA WEEK 13 (Random Forest) - 054551
No ratings yet
DA PRA WEEK 13 (Random Forest) - 054551
12 pages
Credit Card Fraud Detection Using Random Forest & Cart Algorithm
No ratings yet
Credit Card Fraud Detection Using Random Forest & Cart Algorithm
7 pages
Credit Card Default Prediction: Final Project Report
No ratings yet
Credit Card Default Prediction: Final Project Report
28 pages
REORT
No ratings yet
REORT
3 pages
Bankruptcy Prediction Model Overview
No ratings yet
Bankruptcy Prediction Model Overview
16 pages
1 - Understanding - The - Problem - and - The - Data - Ipynb - Colaboratory
No ratings yet
1 - Understanding - The - Problem - and - The - Data - Ipynb - Colaboratory
9 pages
Finance and Risk Analytics Project PDF
No ratings yet
Finance and Risk Analytics Project PDF
94 pages
Coser Al. Crisan Albu (T)
No ratings yet
Coser Al. Crisan Albu (T)
17 pages
Credit Risk Prediction Model Deployment
No ratings yet
Credit Risk Prediction Model Deployment
6 pages
Reading Material - Module-5 - Introduction To Special Topics
No ratings yet
Reading Material - Module-5 - Introduction To Special Topics
27 pages
Step by Step Data Processing For ML Project
No ratings yet
Step by Step Data Processing For ML Project
16 pages
Loan Approval
No ratings yet
Loan Approval
12 pages
Credit Card Default Prediction PRESENTATION
No ratings yet
Credit Card Default Prediction PRESENTATION
12 pages
Thera Bank Loan Modeling Analysis
100% (1)
Thera Bank Loan Modeling Analysis
25 pages
Business Report FRA-Extended Project
No ratings yet
Business Report FRA-Extended Project
22 pages
G
No ratings yet
G
4 pages
Prediciton of Loan Apprval-Project Report
No ratings yet
Prediciton of Loan Apprval-Project Report
82 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Credit Risk Prediction Model Analysis
No ratings yet
Credit Risk Prediction Model Analysis
7 pages
金融违约笔记
No ratings yet
金融违约笔记
10 pages
Final Report
No ratings yet
Final Report
17 pages
Capstone Project
No ratings yet
Capstone Project
33 pages
Progress Report 2
No ratings yet
Progress Report 2
10 pages
Shsconf Icdeba2023 02008
No ratings yet
Shsconf Icdeba2023 02008
5 pages
Renault Premium Distribution Engl Europe PDF
No ratings yet
Renault Premium Distribution Engl Europe PDF
16 pages
NGOs' Role in Rural Development Efforts
No ratings yet
NGOs' Role in Rural Development Efforts
7 pages
6567-Article Text-27355-1-10-20230123
No ratings yet
6567-Article Text-27355-1-10-20230123
10 pages
Fuel System Parts Catalog
No ratings yet
Fuel System Parts Catalog
1 page
TMS TAdvStringGrid Quick Start
No ratings yet
TMS TAdvStringGrid Quick Start
17 pages
MA in International Relations Handbook
No ratings yet
MA in International Relations Handbook
20 pages
Sample Business Plan For Sba Loan
No ratings yet
Sample Business Plan For Sba Loan
9 pages
PGH Residency Application Form
100% (1)
PGH Residency Application Form
2 pages
Functional Finishes in Garment Design
No ratings yet
Functional Finishes in Garment Design
10 pages
Branding
No ratings yet
Branding
5 pages
Skalp For Sketchup 1.0: Getting Started
No ratings yet
Skalp For Sketchup 1.0: Getting Started
46 pages
Dka
100% (1)
Dka
83 pages
Stripe - Introduction To Online Payments
No ratings yet
Stripe - Introduction To Online Payments
23 pages
Ansys Report
No ratings yet
Ansys Report
26 pages
The West Wind
No ratings yet
The West Wind
22 pages
Decision-Making Skills in Career
No ratings yet
Decision-Making Skills in Career
171 pages
Che. 2024-25 - Incoming Sr.C-120 - Teaching & Test Schedule@01-03-2024
No ratings yet
Che. 2024-25 - Incoming Sr.C-120 - Teaching & Test Schedule@01-03-2024
7 pages
K Ejection
100% (1)
K Ejection
21 pages
LC Test 9 - Trong
No ratings yet
LC Test 9 - Trong
1 page
Internet Safety and Precautions Guide
No ratings yet
Internet Safety and Precautions Guide
20 pages
Raw Data
No ratings yet
Raw Data
8,292 pages
Finished Version LGT5072 Liner Shipping Management Assignment G Group
No ratings yet
Finished Version LGT5072 Liner Shipping Management Assignment G Group
18 pages
Prokaryotic Life: Structure and Function
No ratings yet
Prokaryotic Life: Structure and Function
42 pages
McDonald's Global Expansion Insights
No ratings yet
McDonald's Global Expansion Insights
5 pages
Railway ALP Physics & Maths Syllabus
No ratings yet
Railway ALP Physics & Maths Syllabus
4 pages
Unit 13 Recruitment and Selection in Business-Lo2
No ratings yet
Unit 13 Recruitment and Selection in Business-Lo2
3 pages
Evm - Land (Notes)
No ratings yet
Evm - Land (Notes)
5 pages
Social Forces in World Orders
No ratings yet
Social Forces in World Orders
31 pages
Relative-Pronouns 3975
No ratings yet
Relative-Pronouns 3975
2 pages
1 s2.0 S0753332223015329 Main
No ratings yet
1 s2.0 S0753332223015329 Main
25 pages