Binary Logistic Regression Lecture 9

This document provides an overview of binary logistic regression. It explains that binary logistic regression is used when the dependent variable is dichotomous (coded as 0 and 1). Ordinary least squares regression is not appropriate in this case because the assumptions of OLS are violated. Binary logistic regression transforms the dependent variable using the logit function to create a linear model that addresses these violations. The document outlines the key aspects of binary logistic regression including interpreting the coefficients, goodness of fit tests, and using the model to predict probabilities.

Uploaded by

Trongtin Lee

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

238 views

Binary Logistic Regression Lecture 9

Uploaded by

Trongtin Lee

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 33

Binary Logistic Regression

The test you choose depends on level of measurement:

Independent Variable Dependent Variable Test

Dichotomous Interval-Ratio Independent Samples t-test
Dichotomous

Nominal Nominal Cross Tabs
Dichotomous Dichotomous

Nominal Interval-Ratio ANOVA
Dichotomous Dichotomous

Interval-Ratio Interval-Ratio Bivariate Regression/Correlation
Dichotomous

Two or More
Interval-Ratio
Dichotomous Interval-Ratio Multiple Regression

Interval-Ratio
Dichotomous Dichotomous Binary Logistic Regression

Binary Logistic Regression
Binary logistic regression is a type of
regression analysis where the dependent
variable is a dummy variable (coded 0, 1)
Why not just use ordinary least squares?
Y = a + bx
You would typically get the correct answers in
terms of the sign and significance of
coefficients
However, there are three problems
^
Binary Logistic Regression
OLS on a dichotomous dependent variable:
Yes = 1
No = 0
X = Income
Y = Support
Privatizing
Social
Security
1 10
Binary Logistic Regression
However, there are three problems
1. The error terms are heteroskedastic (variance of
the dependent variable is different with different
values of the independent variables
2. The error terms are not normally distributed
3. The predicted probabilities can be greater than 1
or less than 0, which can be a problem for
subsequent analysis

Binary Logistic Regression
The logit model solves these problems:
ln[p/(1-p)] = a + BX
or
p/(1-p) = e
a + BX
p/(1-p) = e
a
(e
B
)
X

Where:
ln is the natural logarithm, log
exp
, where e=2.71828
p is the probability that Y for cases equals 1, p (Y=1)
1-p is the probability that Y for cases equals 0,
1 p(Y=1)
p/(1-p) is the odds
ln[p/1-p] is the log odds, or logit

Binary Logistic Regression
Logistic Distribution

Transformed, however,
the log odds are linear.
ln[p/(1-p)]
P (Y=1)
x
x
Binary Logistic Regression
So what are natural logs and exponents?
Ask Dr. Math.
http://mathforum.org/library/drmath/view/55555.html
ln(x) = y is same as: x = e
y

READ THE ABOVE
LIKE THIS: when you see ln(x) say the
value after the equal sign is the
power to which I need to take e
to get x
so
y is the power to which you
would take e to get x
Binary Logistic Regression
So ln[p/(1-p)] = y is same as: p/(1-p) = e
y

READ THE ABOVE
LIKE THIS: when you see ln[p/(1-P)] say
the value after the equal sign is
the power to which I need to take
e to get p/(1-p)
so
y is the power to which you
would take e to get p/(1-p)
Binary Logistic Regression
So ln[p/(1-p)] = a + bX is same as: p/(1-p) = e
a

+ bX

READ THE ABOVE
LIKE THIS: when you see ln[p/(1-P)] say
the value after the equal sign is
the power to which I need to take
e to get p/(1-p)
so
a + bX is the power to which you
would take e to get p/(1-p)
Binary Logistic Regression
The logistic regression model is simply a
non-linear transformation of the linear
regression.
The logistic distribution is an S-shaped
distribution function (cumulative density
function) which is similar to the standard
normal distribution and constrains the
estimated probabilities to lie between 0
and 1.

Binary Logistic Regression
Logistic Distribution
With the logistic transformation, were fitting
the model to the data better.

Transformed, however, the log odds are
linear.
P(Y = 1) 1
.5
0
X = 0 10 20
Ln[p/(1-p)]
X = 0 10 20
Binary Logistic Regression
Recall that OLS Regression used an ordinary least squares
formula to create the linear model we used.
The Logistic Regression model will be constructed by an iterative
maximum likelihood procedure.
This is a computer dependent program that:
1. starts with arbitrary values of the regression coefficients and constructs
an initial model for predicting the observed data.
2. then evaluates errors in such prediction and changes the regression
coefficients so as make the likelihood of the observed data greater
under the new model.
3. repeats until the model converges, meaning the differences between
the newest model and the previous model are trivial.
The idea is that you find and report as statistics the parameters
that are most likely to have produced your data.
Model and inferential statistics will be different from OLS because
of using this technique and because of the nature of the dependent
variable. (Remember how we used chi-squared with
classification?)
Binary Logistic Regression
Youre likely feeling
overwhelmed, perhaps
anxious about
understanding this.
Dont worry, coherence is
gained when you see
similarity to OLS regression:
1. Model fit
2. Interpreting coefficients
3. Inferential statistics
4. Predicting Y for values of the
independent variables (the
most difficult, but well make it
easy)
Binary Logistic Regression
So in logistic regression, we will take the twisted
concept of a transformed dependent variable equaling
a line and manipulate the equation to untwist the
interpretation.
We will focus on:
1. Model fit
2. Interpreting coefficients
3. Inferential statistics
4. Predicting Y for values of the independent variables (the most
difficult)the prediction of probability, appropriately, will be an
S-shape
Lets start with a research example and SPSS
output

Binary Logistic Regression
A researcher is interested in the likelihood of
gun ownership in the US, and what would
predict that.
He uses the 2002 GSS to test the following
research hypotheses:
1. Men are more likely to own guns than women
2. The older persons are, the more likely they are to
own guns
3. White people are more likely to own guns than
those of other races
4. The more educated persons are, the less likely they
are to own guns
Binary Logistic Regression
Variables are measured as such:
Dependent:
Havegun: no gun = 0, own gun(s) = 1
Independent:
1. Sex: men = 0, women = 1
2. Age: entered as number of years
3. White: all other races = 0, white =1
4. Education: entered as number of years
SPSS: Anyalyze Regression Binary Logistic
Enter your variables and for output below, under
options, I checked iteration history

Binary Logistic Regression
SPSS Output: Some descriptive information first

Binary Logistic Regression
SPSS Output: Some descriptive information first

Maximum likelihood process stops at
third iteration and yields an intercept
(-.625) for a model with no
predictors.
A measure of fit, -2 Log likelihood is
generated. The equation producing
this:
-2((Yi * ln[P(Yi)] + (1-Yi) ln[1-P(Yi)])
This is simply the relationship
between observed values for each
case in your data and the models
prediction for each case. The
negative 2 makes this number
distribute as a X
2
distribution.
In a perfect model, -2 log likelihood
would equal 0. Therefore, lower
numbers imply better model fit.

Binary Logistic Regression
Originally, the best guess for each
person in the data set is 0, have no gun!
This is the model for log
odds when any other
potential variable equals
zero (null model). It
predicts : P = .651, like
above. 1/1+e
a
or 1/1+.535
Real P = .349
If you added
each
Binary Logistic Regression
Next are iterations for our full model
Binary Logistic Regression
Goodness-of-fit statistics for new model come next
Test of new model vs. intercept-
only model (the null model), based
on difference of -2LL of each. The
difference has a X
2
distribution. Is
new -2LL significantly smaller?

The -2LL number is ungrounded, but it has a
2

distribution. Smaller is better. In a perfect model, -2 log
likelihood would equal 0.
These are attempts to
replicate R
2
using information
based on -2 log likelihood,
(C&S cannot equal 1)
-2((Yi * ln[P(Yi)] + (1-Yi) ln[1-P(Yi)])
Assessment of new models
predictions
Binary Logistic Regression
Interpreting Coefficients
ln[p/(1-p)] = a + b
1
X
1
+ b
2
X
2
+ b
3
X
3
+ b
4
X
4

b1
b2
b3
b4
a
Being male, getting older, and being white have a positive effect on likelihood of
owning a gun. On the other hand, education does not affect owning a gun.

Well discuss the Wald test in a moment

X1
X2
X3
X4
1
e
b

Which bs are significant?
ln[p/(1-p)] = a + b
1
X
1
+ +b
k
X
k
, the power to which you
need to take e to get:
P P
1 P So 1 P = e
a +

b1X1++bkXk

Ergo, plug in values of x to get the odds ( = p/1-p).
Binary Logistic Regression
The coefficients can be manipulated as follows:
Odds = p/(1-p) = e
a+b1X1+b2X2+b3X3+b4X4
= e
a
(e
b1
)
X1
(e
b2
)
X2
(e
b3
)
X3
(e
b4
)
X4

Odds = p/(1-p) = e
a+.898X1+.008X2+1.249X3-.056X4
= e
-1.864
(e
.898
)
X1
(e
.008
)
X2
(e
1.249
)
X3
(e
-.056
)
X4

Binary Logistic Regression
The coefficients can be manipulated as follows:
Odds = p/(1-p) = e
a+b1X1+b2X2+b3X3+b4X4
= e
a
(e
b1
)
X1
(e
b2
)
X2
(e
b3
)
X3
(e
b4
)
X4

Odds = p/(1-p) = e
-2.246-.780X1+.020X2+1.618X3-.023X4
= e
-2.246
(e
-.780
)
X1
(e
.020
)
X2
(e
1.618
)
X3
(e
-.023
)
X4

Mrrr, Check it
out!
Each coefficient increases the odds by a multiplicative
amount, the amount is e
b
. Every unit increase in X
increases the odds by e
b
.
In the example above, e
b
= Exp(B) in the last column.

Binary Logistic Regression
Each coefficient increases the odds by a multiplicative amount, the amount is e
b
.
Every unit increase in X increases the odds by e
b
.
In the example above, e
b
= Exp(B) in the last column.
For Sex: e
-.780
= .458 If you subtract 1 from this value, you get the proportion
increase (or decrease) in the odds caused by being male, -.542. In percent terms,
odds of owning a gun decrease 54.2% for women.
Age: e
.020
= 1.020 A year increase in age increases the odds of owning a gun 2%.
White: e
1.618
= 5.044 Being white increases the odd of owning a gun by 404%
Educ: e
-.023
= .977 Not significant
Binary Logistic Regression
Age: e
.020
= 1.020 A year increase in age increases the odds of owning a gun 2%.
How would 10 years increase in age affect the odds? Recall (e
b
)
X
is the equation component
for a variable. For 10 years, (1.020)
10
= 1.219. The odds jump by 22% for ten years increase
in age.
Note: Youd have to know the current prediction level for the dependent variable to know if this
percent change is actually making a big difference or not!
Binary Logistic Regression
Note: Youd have to know the current prediction level for the dependent variable to
know if this percent change is actually making a big difference or not!
Recall that the logistic regression tells us two things at once.
Transformed, the log odds are linear.

Logistic Distribution

ln[p/(1-p)]
P (Y=1)
x
x
Binary Logistic Regression
We can also get p(y=1) for particular folks.
Odds = p/(1-p); p = P(Y=1)
With algebra
Odds(1-p) = p Odds-p(odds) = p
Odds = p+p(odds) Odds = p(1+odds)
Odds/1+odds = p or
p = Odds/(1+odds)
Ln(odds) = a + bx and odds = e
a + bx

so
P = e
a+bX
/(1+ e
a+bX
)
We can therefore plug in numbers for X to get P
If a + BX = 0, then p = .5 As a + BX gets really big, p approaches 1
As a + BX gets really small, p approaches 0 (our model is an S curve)
Binary Logistic Regression
For our problem, P = e
-2.246-.780X1+.020X2+1.618X3-.023X4

1 + e
-2.246-.780X1+.020X2+1.618X3-.023X4

For, a man, 30, Latino, and 12 years of education, the P equals?
Lets solve for e
-2.246-.780X1+.020X2+1.618X3-.023X4
= e
-2.246-.780(0)+.020(30)+1.618(0)-.023(12)

e
-2.246 0 + .6

+ 0 - .276
= e
-1.922
= 2.71828
-1.922
= .146
Therefore,
P = .146 = .127 The probability that the 30 year-old, Latino with 12
1.146 years of education will own a gun is .127!!! Or
you could say there is a 12.7% chance.

Binary Logistic Regression
Inferential statistics
are as before:

In model fit, if
2
test
is significant, the
expanded model
(with your variables),
improves prediction.

This Chi-squared test
tells us that as a set,
the variables improve
classification.

Binary Logistic Regression
Inferential statistics are as before:

The significance of the coefficients is
determined by a wald test. Wald is
2
with
1 df and equals a two-tailed t
2
with p-value
exactly the same.
Binary Logistic Regression

1. Significance test for -level = .05
2. Critical X
2
df=1
= 3.84
3.To find if there is a significant slope in the population,
H
o
:

= 0
H
a
:

0
4.Collect Data
5.Calculate Wald, like t (z): t = b
o (1.96 * 1.96 = 3.84)

s.e.
6.Make decision about the null hypothesis
7.Find P-value

So how would I do hypothesis testing? An Example:
Reject the null for Male, age, and white. Fail to reject the null for education.
There is a 24.2% chance that the sample came from a population where
the education coefficient equals 0.

Stats216 hw2
No ratings yet
Stats216 hw2
21 pages
Regresi Logistik
No ratings yet
Regresi Logistik
34 pages
Sample Performance Evaluation
100% (2)
Sample Performance Evaluation
5 pages
Birth of The User, H. Lupton
No ratings yet
Birth of The User, H. Lupton
5 pages
Regression Logistic 4
No ratings yet
Regression Logistic 4
51 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
SAHADEB - Logistic Reg - Sessions 8-10
No ratings yet
SAHADEB - Logistic Reg - Sessions 8-10
145 pages
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
No ratings yet
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
10 pages
MIT Microeconomics 14.32 Final Review
No ratings yet
MIT Microeconomics 14.32 Final Review
5 pages
0lecture 18
No ratings yet
0lecture 18
104 pages
Introduction To Logistic Regression
No ratings yet
Introduction To Logistic Regression
20 pages
chapter 8
No ratings yet
chapter 8
39 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Lecture+8+ +Linear+Regression
No ratings yet
Lecture+8+ +Linear+Regression
45 pages
Logistic Regression Example Illustrated
No ratings yet
Logistic Regression Example Illustrated
20 pages
09_23ECE216_LogisticRegression
No ratings yet
09_23ECE216_LogisticRegression
40 pages
Multinomial Logit or Probit Model 2
No ratings yet
Multinomial Logit or Probit Model 2
13 pages
class
No ratings yet
class
102 pages
Factor Analysis
No ratings yet
Factor Analysis
152 pages
The Laws of Linear Combination
No ratings yet
The Laws of Linear Combination
45 pages
Unit - 5
No ratings yet
Unit - 5
111 pages
A Simple But Effective Logistic Regression Derivation
No ratings yet
A Simple But Effective Logistic Regression Derivation
6 pages
Mathematics Promotional Exam Cheat Sheet
No ratings yet
Mathematics Promotional Exam Cheat Sheet
8 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
21 pages
CH-3
No ratings yet
CH-3
123 pages
Fernando, Logit Tobit Probit March 2011
No ratings yet
Fernando, Logit Tobit Probit March 2011
19 pages
Lecture 3 - Econometria I
No ratings yet
Lecture 3 - Econometria I
46 pages
Logit Probit and Tobit Models For Catego PDF
No ratings yet
Logit Probit and Tobit Models For Catego PDF
19 pages
Logistic Nota
No ratings yet
Logistic Nota
87 pages
Empirical Finance2
No ratings yet
Empirical Finance2
28 pages
Fixed and Random Model Linear Gndu2020 PDF
No ratings yet
Fixed and Random Model Linear Gndu2020 PDF
18 pages
An Introduction To Logistic Regression
No ratings yet
An Introduction To Logistic Regression
48 pages
Review of Multiple Regression: Assumptions About Prior Knowledge. This Handout Attempts To Summarize and Synthesize
No ratings yet
Review of Multiple Regression: Assumptions About Prior Knowledge. This Handout Attempts To Summarize and Synthesize
12 pages
Logistic Reg
No ratings yet
Logistic Reg
87 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
MLStackCafe QAS 1672810525772
No ratings yet
MLStackCafe QAS 1672810525772
12 pages
Briefly Discuss The Concept of LR Analysis
No ratings yet
Briefly Discuss The Concept of LR Analysis
9 pages
Annotated 4 Ch4 Linear Regression F2014
No ratings yet
Annotated 4 Ch4 Linear Regression F2014
11 pages
Classical Multiple Regression
No ratings yet
Classical Multiple Regression
5 pages
Lecture 22. Glm
No ratings yet
Lecture 22. Glm
41 pages
2015 Preparatory Notes: Australian Chemistry Olympiad (Acho)
No ratings yet
2015 Preparatory Notes: Australian Chemistry Olympiad (Acho)
44 pages
What Is Empirical - Models
No ratings yet
What Is Empirical - Models
14 pages
Logistic Regression: Psy 524 Ainsworth
No ratings yet
Logistic Regression: Psy 524 Ainsworth
37 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Empirical Models: Data Collection
No ratings yet
Empirical Models: Data Collection
16 pages
Logit and Probit: Models With Discrete Dependent Variables
No ratings yet
Logit and Probit: Models With Discrete Dependent Variables
30 pages
Econometrics - Regression Powerpoint
No ratings yet
Econometrics - Regression Powerpoint
18 pages
Stat 473-573 Notes
No ratings yet
Stat 473-573 Notes
139 pages
125.785 Module 2.1
No ratings yet
125.785 Module 2.1
94 pages
Chapter 5-LDVM-2024
No ratings yet
Chapter 5-LDVM-2024
27 pages
Ordinary Least Squares With A Single Independent Variable
No ratings yet
Ordinary Least Squares With A Single Independent Variable
6 pages
Chapter 01
No ratings yet
Chapter 01
28 pages
5 - 7. MR - Estimation
No ratings yet
5 - 7. MR - Estimation
12 pages
Econometrics II CH 1
No ratings yet
Econometrics II CH 1
48 pages
Day 13 Logistic Regression
No ratings yet
Day 13 Logistic Regression
28 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
Machine Learning and Pattern Recognition Week 3 Intro - Classification
No ratings yet
Machine Learning and Pattern Recognition Week 3 Intro - Classification
5 pages
STA2100-Regression Analysis
No ratings yet
STA2100-Regression Analysis
15 pages
Classification PDF
No ratings yet
Classification PDF
70 pages
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Linear Wood Plank Ceiling Systems: Installation Manual
No ratings yet
Linear Wood Plank Ceiling Systems: Installation Manual
21 pages
My Synopsis in Street Vebdors
No ratings yet
My Synopsis in Street Vebdors
4 pages
Transporter Current Account-1
No ratings yet
Transporter Current Account-1
15 pages
TENALLOY-Z PLUS 3.15x450 MM-gazpro
No ratings yet
TENALLOY-Z PLUS 3.15x450 MM-gazpro
1 page
Where Are You Mama - The Role of Motherhood in The Victorian Nove
No ratings yet
Where Are You Mama - The Role of Motherhood in The Victorian Nove
78 pages
Ion Future - Stem Career List
No ratings yet
Ion Future - Stem Career List
8 pages
HE2 Reactor Design Batch Contin Text
No ratings yet
HE2 Reactor Design Batch Contin Text
17 pages
Transport in Plants
No ratings yet
Transport in Plants
11 pages
Pub2023 047 L World Migration Report 2024 1
No ratings yet
Pub2023 047 L World Migration Report 2024 1
384 pages
BUSM2031 Assessment Task 3
No ratings yet
BUSM2031 Assessment Task 3
4 pages
New Example of "Say/mean/matter" English 101 Excercise Sheet
No ratings yet
New Example of "Say/mean/matter" English 101 Excercise Sheet
3 pages
Esser Fire
No ratings yet
Esser Fire
5 pages
IIT Delhi Placement Brochure
No ratings yet
IIT Delhi Placement Brochure
7 pages
HRM301 Bonus Case Study
100% (1)
HRM301 Bonus Case Study
2 pages
Schritt Fuer Schritt - Eng
No ratings yet
Schritt Fuer Schritt - Eng
15 pages
BioStat Notes
No ratings yet
BioStat Notes
10 pages
EXERCISE Present Tenses
100% (1)
EXERCISE Present Tenses
2 pages
Gantry Girder
No ratings yet
Gantry Girder
12 pages
Training Manual: Functional Adult Literacy
No ratings yet
Training Manual: Functional Adult Literacy
108 pages
Bộ 20 đề chuẩn form 2022 (TPHCM) - TS10
No ratings yet
Bộ 20 đề chuẩn form 2022 (TPHCM) - TS10
66 pages
Safety Considerations For Product Design To Minimize Medication Errors Guidance For Industry
No ratings yet
Safety Considerations For Product Design To Minimize Medication Errors Guidance For Industry
22 pages
November 2019 Timetable
No ratings yet
November 2019 Timetable
5 pages
PHILANIMA: Philippine Animation and The New Innovative and Mastered Arts
100% (2)
PHILANIMA: Philippine Animation and The New Innovative and Mastered Arts
23 pages
Power System Analysis: Newton-Raphson Power Flow
No ratings yet
Power System Analysis: Newton-Raphson Power Flow
28 pages
Automotive Welding Technician Level4
No ratings yet
Automotive Welding Technician Level4
38 pages
UGC NET Final Answer Key 2020
No ratings yet
UGC NET Final Answer Key 2020
87 pages
SMA 1204, Exam April.2015.2019.exam
No ratings yet
SMA 1204, Exam April.2015.2019.exam
4 pages
RecentAdvancesinPavementDeisgnofFP IRc
No ratings yet
RecentAdvancesinPavementDeisgnofFP IRc
77 pages