0% found this document useful (0 votes)

1 views

Lab box cox and multiple linear reg-1

Uploaded by

nganvu28904

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

Lab box cox and multiple linear reg-1

Uploaded by

nganvu28904

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Lab Practice: Box Cox transform and

Multiple Linear regression

BANA3010 Data driven analytics December 5, 2024

Lab Practice Submission Instructions:

• This is an individual lab practice and will typically be assigned in the laboratory (computer lab). You
can use your personal computer but the practical exams will be performed with a lab computer.
• Your program should work correctly on all inputs. If there are any specifications about how the program
should be written (or how the output should appear), those specifications should be followed.
• Your code and functions/modules should be appropriately commented. However, try to avoid making
your code overly busy (e.g., include a comment on every line).
• Variables and functions should have meaningful names, and code should be organized into function-
s/methods where appropriate.
• Academic honesty is required in all work you submit to be graded. You MUST NOT copy or share your
code with other students to avoid plagiarism issues.
• Use the template provided to prepare your solutions.
• You should upload your .R file(s) to the Canvas before the end of the laboratory session unless the
instructor gave a specified deadline.
• Submit separate .R file for each Lab problem with the following naming format: Lab4_Q1.R. Failure
to submit a .R file for lab or assignment will result in a 0.
• Late submission of lab practice without an approved extension is NOT allowed.

Lab Practice 3 Page 1

1. The simple linear regression model involves several assumptions. Among them are:

(a) That E(y|x), the mean value of y, is a straight-line function of x.

(b) That the errors, ϵi , have constant variance. That is, the variation in the errors is theoretically the
same regardless of the value of x or ŷ.

(c) The errors have mean 0.

(d) The errors are independent.

(e) The errors have a normal distribution.

To assess assumptions 1a to 1c, we can examine scatterplots of

• y versus x.

• residuals versus fitted values, ŷ, or x.

Assumption 1d is assessed with an autocorrelation (ACF) plot of the residuals. Assumption 1e is assessed
with a normal probability plot, and is considered the least crucial of the assumptions. We will see how
to generate the relevant graphical displays to help us assess whether the assumptions are met.

For this example, we will use the dataset “gpa.txt". As there is no headings in this dataset, it is often
convenient to use function colnames to names each columns.

data<-read.table("gpa.txt", header=FALSE ,sep="")

colnames(data)<-c("GPA","ACT")
attach(data)

We would like to construct a linear regression model of the variable GPA against ACT. And in the
following exercises, we will investigate whether the regression assumptions are met or not.

(a) Let us firstly take a look of the scatterplot of GPA against ACT. In addition to the scatterplots,
we would also like to add the estimated regression line to the plots:

result<-lm(GPA~ACT)
plot(ACT,GPA, main="GPA against ACT")
abline(result, col="red")

The function abline() overlays a line to the plot. In this case, it overlays the estimated regression
line from result. What are the features to look out for in a scatterplot of the response against
the predictor?

(b) It is usually easier to assess the regression assumptions using a residual plot (residuals plotted
against either the fitted values or the predictor).

plot(result$fitted.values, result$residuals,main="Residuals vs fits")

abline(h=0,col="red")

As you may have remember from previous labs, $ can be used to access the components that are a
part of a more complex object in R. For instance, the function lm will return a data object (which
we save as result) that contains many sub-components. The fitted values and the residuals of
the regression model are stored in vectors named as fitted.values and residuals.

Based on this plot, what will you say about the regression assumptions?

acf(result$residuals, main="ACF of Residuals")

Lab Practice 3 Page 2

What does this plot tell us?

(d) The last thing we need to check is the normality assumption. We use a normal probability plot for
this.

qqnorm(result$residuals)
qqline(result$residuals, col="red")

The first command qqnorm draws a plot of the quantiles of the estimated residuals from our linear
regression model against the theoretical quantiles of errors based on normality assumption. The
second command qqline add a reference line to the first plot to make it is easier to examine
whether the distribution of residuals is consistent with the normality assumption.

Based on this plot, what will you say about assumption regarding normality of error terms?

(e) In this part of the lab, we will go through the procedure on how to carry out the lack of fit (LOF)
test to test if the linearity assumption is reasonable, carry out the Box-Cox transformation, and
apply relevant transformation(s) to the predictor and/or response variable. For this example, we
will use the dataset “training.txt". This data set comes from an experiment that investigates
the impact of number of days of training (first column, the predictor) on the performance scores
(second column, the response variable). Read the data into R and name the columns accordingly.

data<-read.table("training.txt", header=FALSE ,sep="")

colnames(data)<-c("Training","Performance")

i. Generate the scatterplot and residual plot. Comment on whether the regression assumptions
are satisfied. What are the consequences with violation of these assumptions?

ii. Suppose we want to apply a Box-Cox transformation. To produce a plot of the profile log-
likelihoods for the parameter, λ, of the Box-Cox power transformation, type

library(MASS)
boxcox(result)

The boxcox() function is stored in the MASS library. You need to load this library to use this
function. What do you notice? For the boxcox() function, there is an optional argument
called lambda which allows us to change the range of λ for the Box-Cox transform. Type
?boxcox to see how to specify this argument.

iii. Should we transform the response variable?

iv. Next, we perform a lack of fit (LOF) test. To produce the ANOVA table associated with an
LOF test, type

reduced<-lm(Performance~Training)
full<-lm(Performance~0 + as.factor(Training))
anova(reduced,full)

Here we still use the function lm to construct the full regression model, with the following
modifications: First, we use the command as.factor(Training) to let R treat the variable
Training as a categorial variables so that R will solely focus on the different levels of variables
Training. Second, the regression model is specified by Performance~0 + as.factor(Training).
Here 0+ as.factor(Training) is used to specific a regression model without the intercept
term (which is not necessary if we want to study the full model).

Solely based on the p-value of the LOF test, what conclusion can you draw?

Lab Practice 3 Page 3

v. Does your conclusion from the LOF test contradict your earlier belief regarding the linear
relationship between the response variable and predictor? What do you think is going on here?

vi. What transformation will you use? Apply the transformation to the data, perform the regres-
sion, and check if the assumptions are met.

2. Multiple Linear Regression in R Here we will investigates the data set “Bears.txt" dataset, which
contains informations on 19 female wild bears. The variables are Age (age in months), Neck (neck girth
in inches), Length (length of bear in inches), Chest (chest girth in inches), and Weight (weight of bear
in pounds).

(a) Before fitting the multiple linear regression model, create separate plots for Weight against each
predictor. Comment on the association between Weight against each predictor.

(b) Fit a multiple regression model using Weight as the response and the other variables as predictors.
To use lm() for multiple regression, type

result<-lm(Weight~Age+Neck+Length+Chest)

The name of additional predictor is added to lm() after a + sign.

(c) Similar to linear regression, the command summary(result) can be used to display the key infor-
mations of regression results. Check the result of the above multiple linear regression. Also conduct
four different simple linear regression of Weight against each of the four predictors. Compare the
results with the result from multiple linear regression. Do you think multiple linear regression model
is appropriate for this data set?

Lab Practice 3 Page 4

M348 Applied Statistical Modelling - Linear Models
No ratings yet
M348 Applied Statistical Modelling - Linear Models
504 pages
C1M5 Peer Reviewed Others
No ratings yet
C1M5 Peer Reviewed Others
27 pages
A Study On Operational and Financial Performance of Canara Bank
0% (1)
A Study On Operational and Financial Performance of Canara Bank
10 pages
3010 Lab Model diagnostic-1
No ratings yet
3010 Lab Model diagnostic-1
4 pages
Lab-10-Forest-Regression
No ratings yet
Lab-10-Forest-Regression
5 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
Lab-5-1-Regression and Multiple Regression
100% (2)
Lab-5-1-Regression and Multiple Regression
8 pages
R Lab 4
No ratings yet
R Lab 4
7 pages
Module01.1 LinearRegression
No ratings yet
Module01.1 LinearRegression
32 pages
CS ELEC 4 Finals Module
No ratings yet
CS ELEC 4 Finals Module
57 pages
Proycto Final Karla Tamayo Bioestadistica - Ingles.
No ratings yet
Proycto Final Karla Tamayo Bioestadistica - Ingles.
5 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
Lecture 8
No ratings yet
Lecture 8
25 pages
Regression Analysis Script
No ratings yet
Regression Analysis Script
24 pages
RegrCorr PDF
No ratings yet
RegrCorr PDF
20 pages
Correlation and Regression Skill Set
No ratings yet
Correlation and Regression Skill Set
8 pages
Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
Lab 5 LR
No ratings yet
Lab 5 LR
9 pages
Lab MLR and matrix
No ratings yet
Lab MLR and matrix
2 pages
unit5_R
No ratings yet
unit5_R
5 pages
Exercice V
No ratings yet
Exercice V
5 pages
MIT 302 - Statistical Computing II - Tutorial 03
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 03
16 pages
ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
No ratings yet
ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
38 pages
Isye4031 Regression and Forecasting Practice Problems 2 Fall 2014
No ratings yet
Isye4031 Regression and Forecasting Practice Problems 2 Fall 2014
5 pages
How To Use "Qqplot": X: Independent Variable, Y: Dependent Variable
No ratings yet
How To Use "Qqplot": X: Independent Variable, Y: Dependent Variable
6 pages
Linear Regression for Real
No ratings yet
Linear Regression for Real
1 page
Untitled
No ratings yet
Untitled
5 pages
Experiment 7 (I) : Artificial Intelligence & Machine Learning Lab
No ratings yet
Experiment 7 (I) : Artificial Intelligence & Machine Learning Lab
4 pages
MakeUpCat
No ratings yet
MakeUpCat
6 pages
Chapter 3 - Multiple Linear Regression
No ratings yet
Chapter 3 - Multiple Linear Regression
49 pages
Regression Analysis Assignment1111
No ratings yet
Regression Analysis Assignment1111
13 pages
Lesson - 4.2 - Exploratory Data Analysis - Analyze - Phase
No ratings yet
Lesson - 4.2 - Exploratory Data Analysis - Analyze - Phase
50 pages
WINSEM2024-25_CSE3506_ELA_CH2024250502181_Reference_Material_III_21-12-2024_21NEW3
No ratings yet
WINSEM2024-25_CSE3506_ELA_CH2024250502181_Reference_Material_III_21-12-2024_21NEW3
7 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Lab 4
No ratings yet
Lab 4
7 pages
What Is Empirical - Models
No ratings yet
What Is Empirical - Models
14 pages
4 Regression Analysis
No ratings yet
4 Regression Analysis
44 pages
Statistical Analysis
No ratings yet
Statistical Analysis
26 pages
Stat 362 UNIT 1
No ratings yet
Stat 362 UNIT 1
53 pages
MultivariableRegression Summary
No ratings yet
MultivariableRegression Summary
15 pages
Chapter 4 MLR
No ratings yet
Chapter 4 MLR
17 pages
Correlation Analysis. Regression
No ratings yet
Correlation Analysis. Regression
73 pages
Week 2
No ratings yet
Week 2
66 pages
Statistic For Agriculture Studies: The Assumptions of Regression
No ratings yet
Statistic For Agriculture Studies: The Assumptions of Regression
6 pages
Module01 LinearRegression
No ratings yet
Module01 LinearRegression
41 pages
Basic Regression Analysis 2
No ratings yet
Basic Regression Analysis 2
6 pages
Cran.r2021-Linear Regression and Logistic Regression With Missing Covariates
No ratings yet
Cran.r2021-Linear Regression and Logistic Regression With Missing Covariates
10 pages
Diagnostico de Modelos
No ratings yet
Diagnostico de Modelos
4 pages
3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
8-Simple Regression Analysis
No ratings yet
8-Simple Regression Analysis
9 pages
Empirical Models: Data Collection
No ratings yet
Empirical Models: Data Collection
16 pages
CH 2
No ratings yet
CH 2
31 pages
Lesson 4 Linear Assumptions
No ratings yet
Lesson 4 Linear Assumptions
27 pages
Transformações No R
No ratings yet
Transformações No R
4 pages
REGRESSION MODELLING ASS
No ratings yet
REGRESSION MODELLING ASS
6 pages
Regression Practice Questions
No ratings yet
Regression Practice Questions
19 pages
BI Practical No.9
No ratings yet
BI Practical No.9
2 pages
H-311 Linear Regression Analysis With R
100% (1)
H-311 Linear Regression Analysis With R
71 pages
Statistical Analysis: Linear Regression
No ratings yet
Statistical Analysis: Linear Regression
36 pages
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Ecotech Solutions
No ratings yet
Ecotech Solutions
12 pages
E19 (B), E20 (C) : Multiplex Communication System
No ratings yet
E19 (B), E20 (C) : Multiplex Communication System
14 pages
Core Certification Exam Prep Guide
No ratings yet
Core Certification Exam Prep Guide
15 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
71 pages
Bts33 Manual
No ratings yet
Bts33 Manual
19 pages
Booklet - Update 19 Jun - r6
No ratings yet
Booklet - Update 19 Jun - r6
10 pages
Working With Kindle - Guide PDF
No ratings yet
Working With Kindle - Guide PDF
21 pages
Case Study Financial Analysis of Amazon
No ratings yet
Case Study Financial Analysis of Amazon
12 pages
6B3 MyiMaths PW
No ratings yet
6B3 MyiMaths PW
4 pages
Divisional Performance Notes
No ratings yet
Divisional Performance Notes
27 pages
LSS+ Electronic Infobase Edition Version 5.0
No ratings yet
LSS+ Electronic Infobase Edition Version 5.0
1 page
Risk Assessment No. 8.3
No ratings yet
Risk Assessment No. 8.3
4 pages
Global Positioning System
100% (1)
Global Positioning System
18 pages
Top 10 Diagnostics Tips For Client Troubleshooting With SCCM Ver 4
No ratings yet
Top 10 Diagnostics Tips For Client Troubleshooting With SCCM Ver 4
100 pages
Bill Inner Report (PDF) (4)
No ratings yet
Bill Inner Report (PDF) (4)
1 page
Logs Com-Yy-Dreamer 2021 05 24 17 26
No ratings yet
Logs Com-Yy-Dreamer 2021 05 24 17 26
2 pages
CIA
100% (1)
CIA
303 pages
Arrest
No ratings yet
Arrest
47 pages
Brass Restructured Rules v2
No ratings yet
Brass Restructured Rules v2
3 pages
Ch06 Payback Period
No ratings yet
Ch06 Payback Period
6 pages
SCU Harvard Referencing Style - 2019 PDF
No ratings yet
SCU Harvard Referencing Style - 2019 PDF
14 pages
113 Extemporaneous Speech Topics
No ratings yet
113 Extemporaneous Speech Topics
5 pages
Capstone Project - Strategy Implementation Plan For E-Types
No ratings yet
Capstone Project - Strategy Implementation Plan For E-Types
2 pages
Govcon Persona Guide Players+at+Layers
No ratings yet
Govcon Persona Guide Players+at+Layers
15 pages
Ee8691 - Syllabus - Embedded System - R2017
No ratings yet
Ee8691 - Syllabus - Embedded System - R2017
2 pages
Quiz 1 - 18CS35
No ratings yet
Quiz 1 - 18CS35
20 pages
List of Important Revolutions in India
No ratings yet
List of Important Revolutions in India
8 pages
Outlier Detection: Techniques and Applications: N. N. R. Ranga Suri Narasimha Murty M G. Athithan
No ratings yet
Outlier Detection: Techniques and Applications: N. N. R. Ranga Suri Narasimha Murty M G. Athithan
227 pages
Document Review Form Template
No ratings yet
Document Review Form Template
3 pages