0% found this document useful (0 votes)
109 views

STATG5 - Simple Linear Regression Using SPSS Module

This document discusses how to perform simple linear regression analysis using SPSS. It explains that simple linear regression analyzes the relationship between one independent variable and one dependent variable. The document outlines the objectives, discusses key concepts like the regression line and residuals, and reviews assumptions that must be met like linearity, homoscedasticity and independent errors. It then provides step-by-step instructions for conducting a simple linear regression analysis in SPSS, including generating a scatter plot, checking assumptions, and interpreting results.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views

STATG5 - Simple Linear Regression Using SPSS Module

This document discusses how to perform simple linear regression analysis using SPSS. It explains that simple linear regression analyzes the relationship between one independent variable and one dependent variable. The document outlines the objectives, discusses key concepts like the regression line and residuals, and reviews assumptions that must be met like linearity, homoscedasticity and independent errors. It then provides step-by-step instructions for conducting a simple linear regression analysis in SPSS, including generating a scatter plot, checking assumptions, and interpreting results.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Simple Linear Regression Using SPSS

Introduction 
Regression analysis is a simple statistical tool used to model the dependence of a
variable on one (or more) explanatory variables. The functional relationship may then be
formally stated as an equation, with associated statistical values that describe how well this
equation fits data. 

Linear regression is a very powerful statistical technique. Many people have some
familiarity with regression just from reading the news, where graphs with straight lines are
overlaid on scatter plots. Linear models can be used for prediction or to evaluate whether
there is a linear relationship between two numerical variables.

Objectives:  
a. Explain the simple linear regression
b. Discuss the Assumptions
c. Discuss how to use SPSS in making with Simple Linear Regression
d. Interpret Data of simple linear regression in SPSS

Discussion
Simple Linear Regression

In Simple Linear Regression, we try to find the relationship between a single


independent variable (input) and a corresponding dependent variable (output). It is a statistical
method that allows us to summarize and study relationships between two continuous
(quantitative) variables: This can be expressed in the form of a straight line. Often, the
objective is to predict the value of an output variable (or response) based on the value of an
input (or predictor) variable.

 One variable, denoted x, is regarded as the predictor, explanatory, or independent


variable.
 The other variable, denoted y, is regarded as the response, outcome, or dependent
variable.

Simple linear regression gets its adjective "simple," because it concerns the study of only one
predictor variable. In contrast, multiple linear regression, which we study later in this course,
gets its adjective "multiple," because it concerns the study of two or more predictor variables.

Simple Linear Regression using SPSS

Linear regression is a very powerful statistical technique. Many people have some
familiarity with regression just from reading the news, where graphs with straight lines are
overlaid on scatterplots. Linear models can be used for prediction or to evaluate whether there
is a linear relationship between two numerical variables.

Page | 1
In regression it is convenient to define X as the explanatory variable (or independent) variable
and Y as the outcome (or dependent) variable. We are concerned with determining how well
X can predict Y.

It is important to know which variable is the outcome (Y) and which is the explanatory (X)!
This may sound obvious but in education research it is not always clear - for example does
greater interest in reading predict better reading skills? Possibly. But it may be that having
better reading skills encourages greater interest in reading. Education research is littered with
such 'chicken and egg' arguments! Make sure that you know what your hypothesis about the
relationship is when you perform a regression analysis as it is fundamental to your
interpretation.

Let's try and visualize how we can make a prediction using a scatterplot:

Figure 1 Figure 1.1

Figure 1 plots five observations (XY pairs). We can summarize the linear relationship
between X and Y best by drawing a line straight through the data points. This is called
the regression line and is calculated so that it represents the relationship as accurately
as possible.
Figure 1.1 shows this regression line. It is the line that minimizes the differences
between the actual Y values and the Y value that would be predicted from the line.
These differences are squared so that negative signs are removed, hence the term 'sum
of squares' which you may have come across before. You do not have to worry about
how to calculate the regression line - SPSS/PASW does this for you!

Page | 2
Figure 1.3

The line has the formula Y = A + BX, where A is the intercept (the point where the
line meets the Y axis, where X = 0) and B is the slope (gradient) of the line (the
amount Y increases for each unit increase in X), also called the regression coefficient.
Figure 1.3 shows that for this example the intercept (where the line meets the Y axis)
is 2.4. The slope is 1.31, meaning for every unit increase in X (an increase of 1) the
predicted value of Y increases by 1.31 (this value would have a negative sign if the
correlation was negative).
The regression line represents the predicted value of Y for each value of X. We can
use it to generate a predicted value of Y for any given value of X using our formula,
even if we don't have a specific data point that covers the value. From Figure 1.3 we
see that an X value of 3.5 predicts a Y value of about 7.
Of course, the model is not perfect. The vertical distance from each data point to the
regression line (see Figure 1.2) represents the error of prediction. These errors are
called residuals. We can take the average of these errors to get a measure of the
average amount that the regression equation over-predicts or under-predicts the Y
values. The higher the correlation, the smaller these errors (residuals), and the more
accurate the predictions are likely to be.
We will have a go at using SPSS/PASW to perform a linear regression soon but first we must
consider some important assumptions that need to be met for simple linear regression to be
performed.

Assumptions of Linear Regression

Simple linear regression is only appropriate when the following conditions are satisfied:

Linear relationship: The outcome variable Y has a roughly linear relationship with
the explanatory variable X.
Homoscedasticity: For each value of X, the distribution of residuals has the same
variance. This means that the level of error in the model is roughly the same regardless
of the value of the explanatory variable (homoscedasticity - another disturbingly
complicated word for something less confusing than it sounds).
Independent errors: This means that residuals (errors) should be uncorrelated.

Page | 3
It may seem as if we're complicating matters but checking that the analysis you perform is
meeting these assumptions is vital to ensuring that you draw valid conclusions.

Other important things to consider


The following issues are not as important as the assumptions because the regression analysis
can still work even if there are problems in these areas. However, it is still vital that you check
for these potential issues as they can seriously mislead your analysis and conclusions.

Problems with outliers/influential cases: It is important to look out for cases which
may unduly influence your regression model by differing substantially to the rest of
your data.
Normally distributed residuals: The residuals (errors in prediction) should be
normally distributed.
Let us look at these assumptions and related issues in more detail - they make more sense
when viewed in the context of how you go about checking them.

Test Procedure in SPSS Statistics

The steps below show you how to analyze your data using linear regression in SPSS
Statistics when none of the six assumptions in the previous section, Assumptions, have been
violated. At the end of these steps, we show you how to interpret the results from your linear
regression.

EXAMPLE 1: A boss was interested in predicting job satisfaction from burnout.

Data: 200 counselors; Outcome variable: Job satisfaction; Predictor Variable: Level of
burnout

Step 1: Make a Scatterplot

Go to Graphs> Chart Builder> Gallery> Scatter Plot> Drag Simple Scatter in Canvass> Move
the x variable and y variable> Click OK

Figure 2

Page | 4
You will be presented with the Linear Regression dialogue box:

Figure 3

2. Transfer the independent variable, Income, into the Independent(s): box and the dependent
variable, Price, into the Dependent: box. You can do this by either drag-and-dropping the
variables or by using the appropriate Right arrow buttons. You will end up with the
following screen:

Figure 4

3. You now need to check four of the assumptions discussed in the Assumptions section
above: no significant outliers (assumption #3); independence of observations (assumption #4);
homoscedasticity (assumption #5); and normal distribution of errors/residuals (assumptions
#6). You can do this by using the Statistics and Plots features, and then selecting the
appropriate options within these two dialogue boxes. In our enhanced linear regression guide,
we show you which options to select in order to test whether your data meets these four
assumptions.

4. Click on the OK button. This will generate the results.

Page | 5
Figure 5. shows that there is a negative correlation because the dots generally go down to the
right.

Figure 5. Simple Scatter of Job Satisfaction by Burnout

Step 2: Add the Regression Line

Double Click anywhere in the scatter plot to open Chart Editor. In the Chart Editor, add a
regression line by clicking Add Fit Line at Total > Close Chart Editor

Figure 6

Page | 6
In the Scatter Plot the data look linearly related and negative. So as burnout goes up, job
satisfaction goes down. Also, the spread of the data are similar all along the regression line.
They are not cone shaped or curved. So we have established Homoscedasticity and Linearity.
WeFigure
do not7need to check for co-linearity because there is only one predictor variable. Co-
linearity only occurs when we have multiple predictors, some of which are corelated among
themselves. So it is not a problem when we have only one predictor.

Step 3: Conduct a Regression Analysis

Go to Analyze> Regression> Linear

Figure 8

This opens the Dialogue Box for Regression. We are using Burnout to predict the Job
Satisfaction, so we move Burnout in the Independent(s) window, then move the variable Job
Satisfaction into the Dependent Window.

The method must be set to Enter.

Figure 9

Page | 7
Next, Click on Statistics> Check Confidence intervals> Check Descriptives> Check Durbin-
Watson> Check Casewise Diagnostics> Click Continue

Figure 10

Next, Click Plots> Click ZPRED(predictor) to X and move Z Residual to Y(residual errors)>
Check Histogram> Check Normal Probability Plot> Click Continue> OK

Figure 10

*ZRESID - The standardized residuals for each case.


*ZPRED - The standardized predicted values for each case.

Page | 8
As we have discussed, the term standardized simply means that the variable is adjusted such
than it has a mean of zero and standard deviation of one - this makes comparisons between
variables much easier to make because they are all in the same ‘standard’ units. By plotting
*ZRESID on the Y-axis and *ZPRED on the X-axis you will be able to check the assumption
of homoscedasticity - residuals should not vary systematically with each predicted value and
variance in residuals should be similar across all predicted values. You should also tick the
boxes marked Histogram and P-P plot. This will allow you to check that the residuals are
normally distributed. To close the menu click Continue.

Step 4: Check for Outliers

Check Assumptions for Regression. Check Outliers in the Residual Statistics Box.

Look at the Standardized Residuals, this can be interpreted like a Z Score. The minimum and
maximum values for Std. Residuals should not exceed -3.29 or +3.29 respectively. If they do,
you have outliers. Stop the interpretation when you have outliers and go back to the data set
and identify your Outliers.

Step 5: Check for Independence of Observations

We check this by examining the Independence of Errors using the Durbin-Watson Test

The Durbin-Watson Test is close to 2. We do not want it less than 1 or greater than 3. So the
assumption for the Independence of Observation has been met.

Step 6: Check for Normality

Check for the P-P Plot

Page | 9
The dots generally line up along a 45 degree line. So, we have normality of residuals.

We have also generated a P-P plot to check that our residuals are normally distributed (Figure
2.8.5). We can use this plot to compare the observed residuals with what we’d expect if they
were normally distributed (represented by the diagonal line). The dependent variable for Job
Satisfaction is also nicely distributed.

Page | 10
The scatterplot of the standardized residuals vs the predicted values is elliptical as it should
be, there is no pattern here. Thus, all assumptions are met.

Interpreting Simple Linear Regression SPSS/PASW Output

Simple Linear Regression Descriptive and Correlations output

The Descriptive Statistics simply provide the mean and standard deviation for both your
explanatory and outcome variables and that all 200 variables were used. This will be useful
when we write up the model.

Page | 11
More useful is the correlations table which provides a correlation matrix along with
probability values for all variables. As we only have two variables there is only one
correlation coefficient. It shows here that the variables correlate at -0.65, that is a moderately
strong negative correlation.

The Variables Entered/Removed Box tells us that the only predictor in the model was burnout,
and the only dependent variable was job satisfaction. Also, we used the Enter Method.
SPSS Simple linear regression model output

The Model Summary provides the correlation coefficient and coefficient of determination (r2)
for the regression model. First we get R/correlation which entails a moderately negative
correlation. The R Square, 0.423, it tells us the proportion of variance in job satisfaction
accounted for by burnout which means 42.3% of the variance in job satisfaction was predicted
from the level of burnout

The ANOVA tells us whether our regression model explains a statistically significant
proportion of the variance. Specifically, it uses a ratio to compare how well our linear
regression model predicts the outcome to how accurate simply using the mean of the outcome
data as an estimate is. Hopefully our model predicts the outcome more accurately than if we
were just guessing the mean every time! Given the strength of the correlation it is not
surprising that our model is statistically significant (p < .0005). It is another way of looking at
our regression model and it tells us that our model with one predictor works better than simply

Page | 12
predicting the mean. The significance value 0.000 means that our model using burnout as a
predictor was significantly better than prediction without burnout in the model. There is a
statiscal relationship between the predictor and the outcome variable.

The Coefficients table gives us the values for the regression line.

There are two types of coefficients in regression: Standardized and Unstandardized. First we
need to find our a and b to plug in in our regression equation. We notice a column with a b but
no column for a, but actually both a and b are in this column labeled in capital b. Remember
that a is a constant. So, this value of a is 235.459 while the b value is -2.11. Now, the b
coefficient has a t value associated with it, a t-test, to see if adding a variable as a predictor
proves the predictive ability of the model. If a t-test for a Beta coefficient is not statistically
significant, then that tells you that this predictor does not add to your model, so you ignore it.
If it is significant, as it is here, then look at the coefficient, is it positive or negative? That tells
you weather the dependent variable will increase or decrease due to an increase in the
predictor.

For every one unit increase in the predictor (burnout), the outcome variable will increase by
the unstandardized coefficient value. This coefficient is negative, so we read it as for every
one unit increase in burnout, job satisfaction will decrease by 2.112 points. The standardized
Beta will be interpreted as, for every one standardized deviation increase in burnout, job
satisfaction will decrease by 0.65 of the standard deviation.

SPSS simple linear regression residuals output

You will also note that you have a new variable in your data set: ZRE_1 (you may want to re-
label this so it is a bit more user friendly!). This provides the standardized residuals for each
of your participants and can be analyzed to answer certain research questions. Residuals are a
measure of error in prediction so it may be worth using them to explore whether the model is
more accurate for predicting the outcomes of some groups compared to others.

Page | 13
Regression Equation

Ŷ= a + bX

Ŷ= 235.46 - 2.11(X)

Prediction: Four more employees are measured for burnout. What do you predict their
job satisfaction will be? These numbers represent X. Plug them into the formula

a. 25
b. 50
c. 70
d. 120
Answer:

a. 182.71
b. 129.96
c. 87.76
d. -17.74

Summary
 Regression analysis is a simple statistical tool used to model the dependence of a
variable on one (or more) explanatory variables. The functional relationship may then
be formally stated as an equation, with associated statistical values that describe how
well this equation fits data. 
 One variable, denoted x, is regarded as the predictor, explanatory, or independent
variable.
 The other variable, denoted y, is regarded as the response, outcome, or dependent
variable.

 Step 1: Make a Scatterplot


Go to Graphs> Chart Builder> Gallery> Scatter Plot> Drag Simple Scatter in Canvass>
Move the x variable and y variable> Click OK
Step 2: Add the Regression Line
Double Click anywhere in the scatter plot to open Chart Editor. In the Chart Editor,
add a regression line by clicking Add Fit Line at Total > Close Chart Editor
Step 3: Conduct a Regression Analysis
Go to Analyze> Regression> Linear
Next, Click Plots> Click ZPRED(predictor) to X and move Z Residual to Y(residual
errors)> Check Histogram> Check Normal Probability Plot> Click Continue> OK

Page | 14
Step 4: Check for Outliers
Check Assumptions for Regression. Check Outliers in the Residual Statistics Box.

Step 5: Check for Independence of Observations


We check this by examining the Independence of Errors using the Durbin-Watson Test

Step 6: Check for Normality


Check for the P-P Plot

Page | 15
References:
https://courses.lumenlearning.com/odessa-introstats1-1/chapter/introduction-linear-regression/

https://medium.datadriveninvestor.com/types-of-linear-regression-89f3bef3a0c7?
gi=3b67f81bb49a#:~:text=Linear%20Regression%20is%20generally%20classified,Multiple
%20Linear%20Regression

https://stats.idre.ucla.edu/spss/seminars/introduction-to-regression-with-spss/introreg-
lesson1/#s4

https://statistics.laerd.com/spss-tutorials/linear-regression-using-spss-statistics.php

file:///C:/Users/PC-2/Downloads/Interpreting%20Linear%20Regression%20.pdf

file:///C:/Users/PC-2/Downloads/STAT-G5-Module%20(1).pdf

Page | 16

You might also like