Credit Risk Modeling in Python Chapter4
Credit Risk Modeling in Python Chapter4
and implementation
CREDIT RIS K MODELIN G IN P YTH ON
Michael Crabtree
Data Scientist, Ford Motor Company
Comparing classi cation reports
Create the reports with classification_report() and compare
A sample of loans and their predicted probabilities of default should be close to the percentage of
defaults in that sample
10 0.25 0.65 No
1
http://datascienceassn.org/sites/default/ les/Predicting%20good%20probabilities%20with%20supervised%20learning.
# Fraction of positives
(array([0.09602649, 0.19521012, 0.62035996, 0.67361111]),
# Average probability
array([0.09543535, 0.29196742, 0.46898465, 0.65512207]))
Michael Crabtree
Data Scientist, Ford Motor Company
Thresholds and loan status
Previously we set a threshold for a range of prob_default values
This was used to change the predicted loan_status of the loan
1 0.25 0.4 0
2 0.42 0.4 1
3 0.75 0.4 1
Acceptance rate: what percentage of new loans are accepted to keep the number of defaults in a
portfolio low
Accepted loans which are defaults have an impact similar to false negatives
import numpy as np
# Compute the threshold for 85% acceptance rate
threshold = np.quantile(prob_default, 0.85)
0.804
These are loans with prob_default values around where our model is not well calibrated
The .count() of a single column is the same as the row count for the data frame
Michael Crabtree
Data Scientist, Ford Motor Company
Selecting acceptance rates
First acceptance rate was set to 85%, but other rates might be selected as well
Michael Crabtree
Data Scientist, Ford Motor Company
Your journey...so far
Prepare credit data for machine learning models
Important to understand the data
Develop, score, and understand logistic regressions and gradient boosted trees
Stuctural model framework: the model explains the default even based on other factors
Other techniques
Through-the-cycle model (continuous time): macro-economic conditions and other effects are
used, but the risk is seen as an independent event
In many cases, business users will not accept a model they cannot understand
Complex models can be very large and dif cult to put into production