Logistic Regression in R and Python
Logistic Regression in R and Python
However, the collection, processing, and analysis of data have been largely
manual, and given the nature of human resources dynamics and HR KPIs, the
approach has been constraining HR. Therefore, it is surprising that HR
departments woke up to the utility of machine learning so late in the game. Here
is an opportunity to try predictive analytics in identifying the employees most
likely to get promoted.
Here, g() is the link function, E(y) is the expectation of target variable and α
+ βx1 + γx2 is the linear predictor ( α,β,γ to be predicted). The role of link
function is to ‘link’ the expectation of y to linear predictor.
Important Points
To start with logistic regression, I’ll first write the simple linear regression
equation with dependent variable enclosed in a link function:
Now, we’ll simply satisfy these 2 conditions and get to the core of logistic
regression. To establish link function, we’ll denote g() with ‘p’ initially and
eventually end up deriving this function.
Since probability must always be positive, we’ll put the linear equation in
exponential form. For any value of slope and dependent variable, exponent of
this equation will never be negative.
p = exp(βo + β(Age)) = e^(βo + β(Age)) ------- (b)
To make the probability less than 1, we must divide p by a number greater than
p. This can simply be done by:
Using (a), (b) and (c), we can redefine the probability as:
If p is the probability of success, 1-p will be the probability of failure which can
be written as:
The R code is provided below but if you’re a Python user, here’s an awesome
code window to build your logistic regression model. No need to open Jupyter –
you can do it all here:
Without going deep into feature engineering, here’s the script of simple logistic
regression model:
setwd('C:/Users/manish/Desktop/dressdata')
#load data
train <- read.csv('Train_Old.csv')
set.seed(88)
split <- sample.split(train$Recommended, SplitRatio = 0.75)