0% found this document useful (0 votes)
212 views2 pages

Assignment 2

A logistic regression model was built using training data to identify customers who have never delayed payments ("good") versus those who have delayed at least once ("bad") based on various predictor variables. The model's performance was validated against additional data and adding a region variable improved accuracy. Total potential profit was calculated based on true/false positive/negative classifications and costs. Alternative models and how the model can help managerial decisions were also examined.

Uploaded by

AbhishekKumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
212 views2 pages

Assignment 2

A logistic regression model was built using training data to identify customers who have never delayed payments ("good") versus those who have delayed at least once ("bad") based on various predictor variables. The model's performance was validated against additional data and adding a region variable improved accuracy. Total potential profit was calculated based on true/false positive/negative classifications and costs. Alternative models and how the model can help managerial decisions were also examined.

Uploaded by

AbhishekKumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 2

Assignment 2

Due Date: July 25, 2018 (11 AM)


Presentation: July 25, 2018
Consider the problem of “Retail Credit Scoring for Auto Finance Ltd.” (Case 5). Auto Finance Ltd., a
part of one of India’s large conglomerates, provide loans to enable cash-strapped lower-middle class
Indian customers to buy the two-wheelers. A major problem as observed by the IT team of Auto
Finance Ltd. is that of high default rate. To be specific, they have observed that approximately 71% of
the customers have delayed their repayments. Thus they are more interested in developing a model
which will eventually help them to decide whether to extend loan to a prospective customer or not.
Based on the attached data set “Assignment_2_data.csv” and the list of variables provided in Exhibit
2 of the case, answer the following questions.

1. Build a logistic regression model based on training data set to identify good customers and
bad customers. A good customer is one who has never delayed the payment, whereas a bad
customer is one who has delayed the payment even once. Use the variables “AGE”,
“NOOFDEPE”, “MTHINCTH”, “SALDATFR”, “TENORYR”, “DWNPMFR”, “PROFBUS”,
“QUALHSC”, “QUAL_PG”, “SEXCODE”, “FULLPDC”, “FRICODE” and “WASHCODE” as predictors
in your logistic model. Clearly interpret the output of the model.
2. Judge the performance of the model based on validation data set. Is the performance of the
model satisfactory? Consider at least two criteria.
3. Include the variable “Region” as an additional predictor in your logistic model. Note that you
have to create appropriate dummy variables for “Region”. Does inclusion of “Region”
improves the performance of the model?
4. Suppose Auto Finance Ltd. provides loan for a 2-year period. The management of the Auto
Finance Ltd. has estimated that the profit associated with a “True Positive” case is Rs. 6360.
Furthermore, they also estimated that the losses associated with a “False Negative” case and
a “False Positive” case are Rs. 12500 and Rs. 6360, respectively. Based on confusion matrix
obtained for the validation data set, calculate the total profit for the company.
5. Can you suggest an alternative model? Is the alternative model better that the logistic
regression model?
6. How will the fitted model be helpful in taking managerial decisions?

Hint: 1. You may start with the following code:


# Reading The data set
d<-read.csv("case_data.csv",header=T)
attach(d)
names(d)

# # Creating a hold-out data set


train=(DATASET=="BUILD")
d.test=d[!train,]

# Creating an array of "DefaulterFlag" variable for the training data


# May be required later
DefaulterFlag_train=DefaulterFlag[train]
# Creating an array of "DefaulterFlag" variable for the hold-out data
# May be required later
DefaulterFlag_test=DefaulterFlag[!train]

# Model
mod=glm(DefaulterFlag~AGE+NOOFDEPE+MTHINCTH+SALDATFR+TENORYR+DWNPMFR+PROFBUS+QUALHSC+Q
UAL_PG+SEXCODE+FULLPDC+FRICODE+WASHCODE,data=d,family=binomial,subset=train)
summary(mod)

2. You may have to define dummy variables for “Region” as follows. Note the reference region is all others. Include
the dummy variables in your model.
# Region Code
d$AP2<-ifelse(Region=="AP2", 1, 0)
d$AP2<-as.factor(AP2)
d$Chennai<-ifelse(Region=="Chennai", 1, 0)
d$Chennai<-as.factor(Chennai)
d$KA1<-ifelse(Region=="KA1", 1, 0)
d$KA1<-as.factor(KA1)
d$KE2<-ifelse(Region=="KE2", 1, 0)
d$KE2<-as.factor(KE2)
d$TN1<-ifelse(Region=="TN1", 1, 0)
d$TN1<-as.factor(TN1)

You might also like