0% found this document useful (0 votes)

36 views12 pages

Data Analysis for Workforce Insights

python code for regression on PLFS

Uploaded by

vtechonlinejobs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views12 pages

Data Analysis for Workforce Insights

python code for regression on PLFS

Uploaded by

vtechonlinejobs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

4/2/24, 2:43 PM PLFS_MVPA

In [1]: # Step 1
# upload the dataset

import pandas as pd
df = pd.read_excel('C:/Users/user/Desktop/PLFS_2022_23.xlsx')

In [2]: #List the columns in the dataset

df.columns.tolist()

['Sector',
Out[2]:
'State',
'Religion',
'Social Group',
'Sex',
'Age',
'Marital Status',
'General Education',
'Technical Education',
'No of years in formal education',
'Status of Current Attendance in Educational Institution',
'Whether received any Vocational/ Technical Training',
'Duration of Training',
'Status Code',
'Industry Code',
'Whether Engaged in any work in Subsidiary Capacity',
'No of Workers in the Enterprise',
'Type of Job Contract',
'Eligible of Paid Leave',
'Social Security Benefits',
'Earning for Regular Salaried/ Wage Workers',
'Earnings for Self Employed']

In [3]: # Data cleaning step - 2

# Sector variable - Assigning Rural as 0 and Urban as 1

df['Sector'] = df['Sector'].apply(lambda x: 1 if x == 2 else 0)

In [4]: df['Sector'].value_counts()

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 1/12

4/2/24, 2:43 PM PLFS_MVPA

0 56713
Out[4]:
1 31542
Name: Sector, dtype: int64

In [5]: # Data Cleaning - Step 3

# Assign 1 for Hinduism (majority) and 0 for other religions (minority)

df['Religion'] = df['Religion'].apply(lambda x: 1 if x == 1 else 0)

In [6]: df['Religion'].value_counts()

1 72706
Out[6]:
0 15549
Name: Religion, dtype: int64

In [7]: # Data Cleaning - Step 4

# Assign 0 to SC/ST/OBC and 1 for others

df['Social Group'] = df['Social Group'].apply(lambda x: 1 if x == 9 else 0)

df['Social Group'].value_counts()

0 71226
Out[7]:
1 17029
Name: Social Group, dtype: int64

In [8]: # Data Cleaning - Step 5

# Assign 1 to Male and 0 for others

df['Sex'] = df['Sex'].apply(lambda x: 1 if x == 1 else 0)

df['Sex'].value_counts()

1 45439
Out[8]:
0 42816
Name: Sex, dtype: int64

In [9]: # Data Cleaning - Step 6

# Assign 0 to upto higher secondary education and 1 for above higher secondary education

df['General Education'] = df['General Education'].apply(lambda x: 0 if x in (1,2,3,4,5,6,7,8,10) else 1)

In [10]: # Data Cleaning - Step 7

# Assign 0 to NO technical education and 1 for others

df['Technical Education'] = df['Technical Education'].apply(lambda x: 0 if x == 1 else 1)

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 2/12
4/2/24, 2:43 PM PLFS_MVPA

In [11]: # Data Cleaning - Step 8

# Assign 0 to received no vocational/technical training and 1 for others

df['Whether received any Vocational/ Technical Training'] = df['Whether received any Vocational/ Technical Training'].apply(lambd

In [12]: # Data Cleaning - Step 9

# Assign 0 to being Engaged in any work in Subsidiary Capacity and 1 for No

df['Whether Engaged in any work in Subsidiary Capacity'] = df['Whether Engaged in any work in Subsidiary Capacity'].apply(lambda

In [13]: # Data Cleaning - Step 10

# Assign 0 to NO written contract and 1 for others

df['Type of Job Contract'] = df['Type of Job Contract'].apply(lambda x: 0 if x == 1 else 1)

In [14]: # Data Cleaning - Step 11

# Assign 1 to currently married and 0 for others

df['Marital Status'] = df['Marital Status'].apply(lambda x: 1 if x == 2 else 0)

In [15]: # Data Cleaning - Step 12

# Adding new Log columns to my df to deal with high variations in both the Earning Columns

import numpy as np

epsilon = 1e-7

df['log_sal'] = np.log(df['Earning for Regular Salaried/ Wage Workers'] + epsilon)

df['log_self'] = np.log(df['Earnings for Self Employed']+ epsilon)

In [16]: # Data Cleaning - Step 13

#Adding new squared columns to handle in case of non linear relations

df['Age_sq'] = df['Age'] ** 2
df['Formal_Edu_sq'] = df['No of years in formal education'] ** 2

In [17]: #List all the final columns in the dataset

df.columns

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 3/12

4/2/24, 2:43 PM PLFS_MVPA

Index(['Sector', 'State', 'Religion', 'Social Group', 'Sex', 'Age',

Out[17]:
'Marital Status', 'General Education', 'Technical Education',
'No of years in formal education',
'Status of Current Attendance in Educational Institution',
'Whether received any Vocational/ Technical Training',
'Duration of Training', 'Status Code', 'Industry Code',
'Whether Engaged in any work in Subsidiary Capacity',
'No of Workers in the Enterprise', 'Type of Job Contract',
'Eligible of Paid Leave', 'Social Security Benefits',
'Earning for Regular Salaried/ Wage Workers',
'Earnings for Self Employed', 'log_sal', 'log_self', 'Age_sq',
'Formal_Edu_sq'],
dtype='object')

In [18]: # Data Cleaning - Step 14

# Seggregating data for salaried population and self earning population into 2 separate dataframes

df_sal = df[df['Earning for Regular Salaried/ Wage Workers'] > 0]

df_self = df[df['Earnings for Self Employed'] > 0]

In [19]: #Step - 15 - Regression Model

# MODEL No. 1 - Estimating Earnings for Regular Salaried/ Wage Workers

import statsmodels.api as sm

# Define the independent variables

independent_vars = [
'Sector',
'Religion',
'Sex',
'Age',
'Social Group',
'General Education',
'Marital Status',
'Technical Education',
'No of years in formal education',
'Whether received any Vocational/ Technical Training',
'Whether Engaged in any work in Subsidiary Capacity',
'Type of Job Contract',
'Age_sq',
'Formal_Edu_sq'

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 4/12

4/2/24, 2:43 PM PLFS_MVPA

# Add a constant to the independent variables

X = sm.add_constant(df_sal[independent_vars])

# Define the target variable :

y = df_sal['log_sal']

# Fit the linear regression model

model = sm.OLS(y, X).fit()

# Print the model summary

print(model.summary())

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 5/12

4/2/24, 2:43 PM PLFS_MVPA

OLS Regression Results

==============================================================================
Dep. Variable: log_sal R-squared: 0.502
Model: OLS Adj. R-squared: 0.501
Method: Least Squares F-statistic: 403.0
Date: Tue, 02 Apr 2024 Prob (F-statistic): 0.00
Time: 14:36:52 Log-Likelihood: -4655.5
No. Observations: 5610 AIC: 9341.
Df Residuals: 5595 BIC: 9441.
Df Model: 14
Covariance Type: nonrobust
=======================================================================================================================
coef std err t P>|t| [0.025 0.975]
-----------------------------------------------------------------------------------------------------------------------
const 6.7024 0.090 74.836 0.000 6.527 6.878
Sector 0.1722 0.017 9.907 0.000 0.138 0.206
Religion 0.0626 0.021 3.018 0.003 0.022 0.103
Sex 0.5594 0.022 25.993 0.000 0.517 0.602
Age 0.0596 0.005 13.043 0.000 0.051 0.069
Social Group 0.1134 0.017 6.624 0.000 0.080 0.147
General Education 0.0320 0.036 0.898 0.369 -0.038 0.102
Marital Status 0.1157 0.022 5.297 0.000 0.073 0.159
Technical Education 0.2505 0.029 8.581 0.000 0.193 0.308
No of years in formal education 0.0017 0.006 0.283 0.777 -0.010 0.013
Whether received any Vocational/ Technical Training 0.0275 0.015 1.815 0.070 -0.002 0.057
Whether Engaged in any work in Subsidiary Capacity 0.2856 0.028 10.222 0.000 0.231 0.340
Type of Job Contract 0.4822 0.017 27.850 0.000 0.448 0.516
Age_sq -0.0006 5.59e-05 -11.248 0.000 -0.001 -0.001
Formal_Edu_sq 0.0022 0.000 5.798 0.000 0.001 0.003
==============================================================================
Omnibus: 173.792 Durbin-Watson: 1.969
Prob(Omnibus): 0.000 Jarque-Bera (JB): 355.426
Skew: -0.205 Prob(JB): 6.61e-78
Kurtosis: 4.163 Cond. No. 2.05e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.05e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

In [20]: # Step - 16
# Calculate VIF values for the Model - 1

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 6/12

4/2/24, 2:43 PM PLFS_MVPA

from statsmodels.stats.outliers_influence import variance_inflation_factor

from statsmodels.tools.tools import add_constant

# Define independent variables used in the regression model

# Add a constant column for intercept (necessary for VIF calculation)

df_with_const = add_constant(df_sal[independent_vars])

# Calculate VIF for each independent variable

vif_data = pd.DataFrame()
vif_data["Variable"] = df_with_const.columns
vif_data["VIF"] = [variance_inflation_factor(df_with_const.values, i) for i in range(df_with_const.shape[1])]

print(vif_data)

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 7/12

4/2/24, 2:43 PM PLFS_MVPA

Variable VIF
0 const 145.780813
1 Sector 1.190473
2 Religion 1.066603
3 Sex 1.091667
4 Age 48.178772
5 Social Group 1.086503
6 General Education 5.226346
7 Marital Status 1.666061
8 Technical Education 1.353767
9 No of years in formal education 15.317316
10 Whether received any Vocational/ Technical Tra... 1.045240
11 Whether Engaged in any work in Subsidiary Capa... 1.182732
12 Type of Job Contract 1.218561
13 Age_sq 44.150284
14 Formal_Edu_sq 25.843036

In [21]: # Step - 17
# MODEL No. 2 - Estimating Earnings for Self Employed

import statsmodels.api as sm

# Define the independent variables

# Add a constant to the independent variables

X = sm.add_constant(df_self[independent_vars])
file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 8/12
4/2/24, 2:43 PM PLFS_MVPA

# Define the target variable :

y = df_self['log_self']

# Fit the linear regression model

model = sm.OLS(y, X).fit()

# Print the model summary

print(model.summary())

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 9/12

4/2/24, 2:43 PM PLFS_MVPA

OLS Regression Results

==============================================================================
Dep. Variable: log_self R-squared: 0.418
Model: OLS Adj. R-squared: 0.418
Method: Least Squares F-statistic: 715.8
Date: Tue, 02 Apr 2024 Prob (F-statistic): 0.00
Time: 14:37:51 Log-Likelihood: -12329.
No. Observations: 13960 AIC: 2.469e+04
Df Residuals: 13945 BIC: 2.480e+04
Df Model: 14
Covariance Type: nonrobust
=======================================================================================================================
coef std err t P>|t| [0.025 0.975]
-----------------------------------------------------------------------------------------------------------------------
const 6.4892 0.080 80.708 0.000 6.332 6.647
Sector 0.3093 0.012 26.292 0.000 0.286 0.332
Religion -0.0239 0.014 -1.753 0.080 -0.051 0.003
Sex 0.8892 0.014 64.611 0.000 0.862 0.916
Age 0.0461 0.003 17.629 0.000 0.041 0.051
Social Group 0.1482 0.013 11.301 0.000 0.122 0.174
General Education 0.0799 0.033 2.435 0.015 0.016 0.144
Marital Status 0.0814 0.016 5.022 0.000 0.050 0.113
Technical Education 0.0932 0.043 2.156 0.031 0.008 0.178
No of years in formal education 0.0147 0.004 3.808 0.000 0.007 0.022
Whether received any Vocational/ Technical Training 0.0205 0.011 1.946 0.052 -0.000 0.041
Whether Engaged in any work in Subsidiary Capacity 0.1606 0.013 11.935 0.000 0.134 0.187
Type of Job Contract 0.5474 0.058 9.418 0.000 0.433 0.661
Age_sq -0.0005 2.87e-05 -18.217 0.000 -0.001 -0.000
Formal_Edu_sq 0.0004 0.000 1.271 0.204 -0.000 0.001
==============================================================================
Omnibus: 736.413 Durbin-Watson: 1.970
Prob(Omnibus): 0.000 Jarque-Bera (JB): 1526.202
Skew: -0.368 Prob(JB): 0.00
Kurtosis: 4.443 Cond. No. 4.34e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 4.34e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

In [22]: # Step - 18
# Calculate VIF Values for Model - 2

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 10/12

4/2/24, 2:43 PM PLFS_MVPA

from statsmodels.stats.outliers_influence import variance_inflation_factor

from statsmodels.tools.tools import add_constant

# Define independent variables used in the regression model

# Add a constant column for intercept (necessary for VIF calculation)

df_with_const = add_constant(df_self[independent_vars])

# Calculate VIF for each independent variable

vif_data = pd.DataFrame()
vif_data["Variable"] = df_with_const.columns
vif_data["VIF"] = [variance_inflation_factor(df_with_const.values, i) for i in range(df_with_const.shape[1])]

print(vif_data)

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 11/12

4/2/24, 2:43 PM PLFS_MVPA

Variable VIF
0 const 263.213006
1 Sector 1.163624
2 Religion 1.074025
3 Sex 1.220613
4 Age 44.292376
5 Social Group 1.096183
6 General Education 4.291929
7 Marital Status 1.206137
8 Technical Education 1.179082
9 No of years in formal education 16.354550
10 Whether received any Vocational/ Technical Tra... 1.109433
11 Whether Engaged in any work in Subsidiary Capa... 1.116591
12 Type of Job Contract 1.026795
13 Age_sq 43.738403
14 Formal_Edu_sq 25.047101

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 12/12

Coding Activity 3.ipynb - Colaboratory
No ratings yet
Coding Activity 3.ipynb - Colaboratory
7 pages
Note 4
No ratings yet
Note 4
18 pages
Adoption
No ratings yet
Adoption
7 pages
5103A1
No ratings yet
5103A1
6 pages
Results
No ratings yet
Results
7 pages
Department of Economics Problem Set
No ratings yet
Department of Economics Problem Set
5 pages
Franciele - Bloco de Notas
No ratings yet
Franciele - Bloco de Notas
6 pages
Weka
No ratings yet
Weka
9 pages
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
No ratings yet
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
12 pages
R Programing 6 Feb
No ratings yet
R Programing 6 Feb
10 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Linear Regression Using R
No ratings yet
Linear Regression Using R
24 pages
Understanding Wage Determinants and Gender Wage Gap A Case Study of Nevada Using 2002 Current Population Survey Data
No ratings yet
Understanding Wage Determinants and Gender Wage Gap A Case Study of Nevada Using 2002 Current Population Survey Data
10 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Assignment - Group 3
No ratings yet
Assignment - Group 3
2 pages
First Binary
No ratings yet
First Binary
2 pages
Data Sciences for Economists Exam 2023
No ratings yet
Data Sciences for Economists Exam 2023
7 pages
DW 14
No ratings yet
DW 14
14 pages
283
No ratings yet
283
7 pages
CH 5 - Multicollearity
No ratings yet
CH 5 - Multicollearity
27 pages
Frequencies
No ratings yet
Frequencies
14 pages
R Working Manuals Students
No ratings yet
R Working Manuals Students
11 pages
OLS Stata9
No ratings yet
OLS Stata9
13 pages
Probit and Logit Models Stata Program and Output PDF
No ratings yet
Probit and Logit Models Stata Program and Output PDF
10 pages
Homework 2 Questions
No ratings yet
Homework 2 Questions
7 pages
Aiml
No ratings yet
Aiml
27 pages
HW 3
No ratings yet
HW 3
20 pages
Untitled4 Assigment 3
No ratings yet
Untitled4 Assigment 3
9 pages
Dummy Variable Regression Models (Lec 8-9) : 1 Nguyen Thu Hang, BMNV, FTU CS2
No ratings yet
Dummy Variable Regression Models (Lec 8-9) : 1 Nguyen Thu Hang, BMNV, FTU CS2
48 pages
Applied Econometrics Problem Set Solutions
No ratings yet
Applied Econometrics Problem Set Solutions
14 pages
BEN YAO AGBEYESRO - Assignment4
No ratings yet
BEN YAO AGBEYESRO - Assignment4
12 pages
GMU Econ535-Applied Econometrics Problem Set3 (PS3) Solutions Spring 2024
No ratings yet
GMU Econ535-Applied Econometrics Problem Set3 (PS3) Solutions Spring 2024
15 pages
All As 525 v2
No ratings yet
All As 525 v2
10 pages
Life Expectancy at BirthJ Total Years
No ratings yet
Life Expectancy at BirthJ Total Years
3 pages
Data Preprocessing
No ratings yet
Data Preprocessing
27 pages
Heckman Selection Model
No ratings yet
Heckman Selection Model
9 pages
Centeno - Alexander PSET2 LBYMET2 Final
No ratings yet
Centeno - Alexander PSET2 LBYMET2 Final
11 pages
Logistic Regression Model in Jupyter
No ratings yet
Logistic Regression Model in Jupyter
22 pages
Fdsa New Lab
No ratings yet
Fdsa New Lab
14 pages
Intro LOGIT
No ratings yet
Intro LOGIT
46 pages
Lecture 8
No ratings yet
Lecture 8
61 pages
Chapter - 5 - Panel Data Analysis
No ratings yet
Chapter - 5 - Panel Data Analysis
53 pages
Midterm Fall2011
No ratings yet
Midterm Fall2011
13 pages
Bank Marketing Data Analysis
No ratings yet
Bank Marketing Data Analysis
18 pages
Exploratory Data Analysis and Preprocessing Pipeline
No ratings yet
Exploratory Data Analysis and Preprocessing Pipeline
18 pages
ch4 Dummy
No ratings yet
ch4 Dummy
54 pages
Data Analysis Report
No ratings yet
Data Analysis Report
16 pages
Stata Output Logit
No ratings yet
Stata Output Logit
3 pages
Jamboree
No ratings yet
Jamboree
10 pages
Ps 3
No ratings yet
Ps 3
13 pages
Dummy Variable Regression Models
No ratings yet
Dummy Variable Regression Models
48 pages
Results 1
No ratings yet
Results 1
4 pages
Assignment
No ratings yet
Assignment
9 pages
Building Logistic Regression Model in Python
No ratings yet
Building Logistic Regression Model in Python
24 pages
Machine Learning Project
67% (3)
Machine Learning Project
30 pages
Econometrics 7
No ratings yet
Econometrics 7
49 pages
Bank Rpubs
No ratings yet
Bank Rpubs
24 pages
Employee - Preprocessing - Ipynb - Colab
No ratings yet
Employee - Preprocessing - Ipynb - Colab
20 pages
SSRN Id4121304
No ratings yet
SSRN Id4121304
46 pages
Linear Regression
No ratings yet
Linear Regression
33 pages
Heart Transplant Forecasting Analysis
No ratings yet
Heart Transplant Forecasting Analysis
5 pages
Introduction to Econometrics 4th Edition Christopher Dougherty
No ratings yet
Introduction to Econometrics 4th Edition Christopher Dougherty
418 pages
ISI Delhi Placement Brochure 2015-2016
No ratings yet
ISI Delhi Placement Brochure 2015-2016
31 pages
Group 02 - KTEE309 (HK1-2324) 2.1
No ratings yet
Group 02 - KTEE309 (HK1-2324) 2.1
25 pages
Zhang 2021
No ratings yet
Zhang 2021
25 pages
Planar Least Squares Regression Explained
No ratings yet
Planar Least Squares Regression Explained
2 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
23 pages
Introductory Econometrics Econ 012
No ratings yet
Introductory Econometrics Econ 012
9 pages
Time Series Penn
No ratings yet
Time Series Penn
67 pages
Time Series Forecasting Guide
No ratings yet
Time Series Forecasting Guide
36 pages
Ind HW#15
100% (1)
Ind HW#15
3 pages
Old Exam-Dec PDF
No ratings yet
Old Exam-Dec PDF
6 pages
Demand Forecasting: Types & Techniques
No ratings yet
Demand Forecasting: Types & Techniques
16 pages
Issue of Stationarity For Time Series Model
100% (1)
Issue of Stationarity For Time Series Model
6 pages
Presentation REGRESSION
No ratings yet
Presentation REGRESSION
26 pages
5 - Ratio Regression and Difference Estimation - Revised
No ratings yet
5 - Ratio Regression and Difference Estimation - Revised
39 pages
Quiz 10 - Regression, Cluster Analysis, & Association Analysis
No ratings yet
Quiz 10 - Regression, Cluster Analysis, & Association Analysis
3 pages
Multicolinearity
No ratings yet
Multicolinearity
26 pages
Accurate Estimation of Cross-Excitation in Multivariate Hawkes Process Models of Infectious Diseases
No ratings yet
Accurate Estimation of Cross-Excitation in Multivariate Hawkes Process Models of Infectious Diseases
8 pages
Introductory Econometrics Group Project
No ratings yet
Introductory Econometrics Group Project
9 pages
DAB25 MockFinal
No ratings yet
DAB25 MockFinal
5 pages
R2 Model Validation and Cross-Validation
No ratings yet
R2 Model Validation and Cross-Validation
46 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
12 pages
An Automobile Rental Company Wants To Predict The Yearly Maintenance Expense
No ratings yet
An Automobile Rental Company Wants To Predict The Yearly Maintenance Expense
2 pages
Understanding Organizational Learning
No ratings yet
Understanding Organizational Learning
105 pages
Regression Evaluation Metrics
No ratings yet
Regression Evaluation Metrics
12 pages
Instrumental Variables: A Study of Implicit Behavioral Assumptions Used in Making Program Evaluations
No ratings yet
Instrumental Variables: A Study of Implicit Behavioral Assumptions Used in Making Program Evaluations
27 pages
Linear Regression with TensorFlow
No ratings yet
Linear Regression with TensorFlow
5 pages

Data Analysis for Workforce Insights

Uploaded by

Data Analysis for Workforce Insights

Uploaded by

4/2/24, 2:43 PM PLFS_MVPA

In [2]: #List the columns in the dataset

In [3]: # Data cleaning step - 2

df['Sector'] = df['Sector'].apply(lambda x: 1 if x == 2 else 0)

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 1/12

In [5]: # Data Cleaning - Step 3

df['Religion'] = df['Religion'].apply(lambda x: 1 if x == 1 else 0)

In [7]: # Data Cleaning - Step 4

df['Social Group'] = df['Social Group'].apply(lambda x: 1 if x == 9 else 0)

In [8]: # Data Cleaning - Step 5

df['Sex'] = df['Sex'].apply(lambda x: 1 if x == 1 else 0)

In [9]: # Data Cleaning - Step 6

df['General Education'] = df['General Education'].apply(lambda x: 0 if x in (1,2,3,4,5,6,7,8,10) else 1)

In [10]: # Data Cleaning - Step 7

df['Technical Education'] = df['Technical Education'].apply(lambda x: 0 if x == 1 else 1)

In [11]: # Data Cleaning - Step 8

In [12]: # Data Cleaning - Step 9

In [13]: # Data Cleaning - Step 10

df['Type of Job Contract'] = df['Type of Job Contract'].apply(lambda x: 0 if x == 1 else 1)

In [14]: # Data Cleaning - Step 11

df['Marital Status'] = df['Marital Status'].apply(lambda x: 1 if x == 2 else 0)

In [15]: # Data Cleaning - Step 12

df['log_sal'] = np.log(df['Earning for Regular Salaried/ Wage Workers'] + epsilon)

In [16]: # Data Cleaning - Step 13

In [17]: #List all the final columns in the dataset

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 3/12

Index(['Sector', 'State', 'Religion', 'Social Group', 'Sex', 'Age',

In [18]: # Data Cleaning - Step 14

df_sal = df[df['Earning for Regular Salaried/ Wage Workers'] > 0]

In [19]: #Step - 15 - Regression Model

# Define the independent variables

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 4/12

# Add a constant to the independent variables

# Define the target variable :

# Fit the linear regression model

# Print the model summary

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 5/12

OLS Regression Results

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 6/12

from statsmodels.stats.outliers_influence import variance_inflation_factor

# Define independent variables used in the regression model

# Add a constant column for intercept (necessary for VIF calculation)

# Calculate VIF for each independent variable

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 7/12

# Define the independent variables

# Add a constant to the independent variables

# Define the target variable :

# Fit the linear regression model

# Print the model summary

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 9/12

OLS Regression Results

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 10/12

from statsmodels.stats.outliers_influence import variance_inflation_factor

# Define independent variables used in the regression model

# Add a constant column for intercept (necessary for VIF calculation)

# Calculate VIF for each independent variable

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 11/12

file:///C:/Users/user/Downloads/PLFS_MVPA (2).html 12/12

You might also like