0% found this document useful (0 votes)

47 views7 pages

Lab 6 - Shell

This document describes a lab on performing ANOVA in R. The learning objectives are to learn how to perform ANOVA in R using both step-by-step methods and functions, and to perform investigations of the ANOVA model assumptions. The document contains exercises using a dataset of video game reviews to determine if different platforms have different average review scores, and using the iris dataset to determine if species have different average sepal lengths. The results of these analyses support rejecting the null hypotheses and concluding that platforms and species differ in their average scores/lengths.

Uploaded by

Mansi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views7 pages

Lab 6 - Shell

Uploaded by

Mansi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lab 6 - ANOVA 1

Mansi Kumari (7908159)

2023-03-03

Learning Objectives

By the end of this lab, you should have a grasp on the following concepts:

• How to perform ANOVA in R, both step-by-step and with an easy R function.

• How to perform a simple investigation of the model assumptions.

Instructions

To complete this worksheet, add code as needed into the R code chunks given below. Do not delete the
question text. All text should be in complete English sentences. Be sure to change the author of this file to
reflect your name and student number.
To properly see the questions, knit this .Rmd file to .pdf and view the output. You will have a link in your
email that takes you to the Crowdmark submission page. Once you have completed the worksheet, knit it
to .pdf and upload your output to Crowdmark.

1
Exercises
Import the Games200 dataset. This dataset contains a random sample of 200 games released in 2019, along
with the metascore (average critic review), the userscore (average user review), and platform of release.

Games200 <- read.csv("~/Downloads/Games200.csv")

Our goal is to determine whether each video game platform receives the same metascore on average, or not,
based on this sample.
Make a boxplot comparing the metascores for each platform.

boxplot(Metascore ~ Platform, data = Games200)

90
80
Metascore

70
60
50

PC PlayStation 4 Switch Xbox One

Platform

Use aggregate to calculate the mean of each group

aggregate(Metascore ~ Platform, data = Games200,FUN = mean)

## Platform Metascore
## 1 PC 74.63462
## 2 PlayStation 4 71.48889
## 3 Switch 72.24675
## 4 Xbox One 78.11538

Use aggregate to determine the sample size of each group.

2
aggregate(Metascore ~ Platform, data = Games200,FUN = length)

## Platform Metascore
## 1 PC 52
## 2 PlayStation 4 45
## 3 Switch 77
## 4 Xbox One 26

Calculate the overall mean.

mean(Games200$Metascore)

## [1] 73.46

Calculate the SSG by hand, using your earlier calculations.

my.SSG<-52(74.63-73.46)ˆ2 + 45(71.48-73.46)ˆ2 + 77(72.25-73.46)ˆ2 + 26(78.12-73.46)ˆ2

my.SSG

## [1] 924.9421

Calculate the MSG by hand, using your earlier calculations.

my.MSG <- my.SSG/(4 - 1)

my.MSG

## [1] 308.314

Use the aggregate function with var to find the sample variances, and then from there find the SSE.

aggregate(Metascore ~ Platform, FUN = var, data = Games200)

## Platform Metascore
## 1 PC 58.78544
## 2 PlayStation 4 68.84646
## 3 Switch 57.42515
## 4 Xbox One 43.06615

my.SSE <- 5158.79 + 4468.85 + 7657.43 + 2543.07

my.SSE

## [1] 11469.12

Calculate the MSE by hand, using your earlier calculations.

my.MSE <- my.SSE/(200 - 4)

my.MSE

## [1] 58.51592

Calculate the F test statistic, using your earlier calculations.

3
my.F <- my.MSG/my.MSE
my.F

## [1] 5.268892

Use pf to find the P-value for this test.

1 - pf(my.F, df1 = 3, df2 = 196)

## [1] 0.001622573

What is your conclusion?

The p-value is 0.00162.We can conclude that we would reject our null hypothesis at 5% level of significance.We
have sufficient evidence to conclude that not all platforms have the same mean.
Repeat the earlier test, using the aov function.

my.aov <- aov(Metascore ~ Platform, data = Games200)

Use the summary function to print out the ANOVA results.

summary(my.aov)

## Df Sum Sq Mean Sq F value Pr(>F)

## Platform 3 923 307.80 5.261 0.00164 **
## Residuals 196 11468 58.51
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Create a histogram of the residuals of the ANOVA model

hist(my.aov$residuals)

4
Histogram of my.aov$residuals
50
40
Frequency

30
20
10
0

−20 −10 0 10 20

my.aov$residuals

What does this tell you about your Normality assumption?

Use the aggregate function with sd to find the standard deviations of each group.

aggregate(Metascore ~ Platform, FUN = sd, data = Games200)

## Platform Metascore
## 1 PC 7.667167
## 2 PlayStation 4 8.297377
## 3 Switch 7.577939
## 4 Xbox One 6.562481

What does this tell you about your equal-variances assumption?

Next we will do ANOVA on the iris dataset. Use the data function to load in this dataset.

data(iris)

5
This dataset contains the petal and sepal lengths and widths (in cm) for a sample of 150 iris flowers. They
are divided by their species: iris setosa, iris virginica, and iris versicolor.
We will do an analysis to determine if their sepal widths differ significantly, on average.
Exercise: Write the hypotheses for this test in TeX

H0 : µSetosa = µV irginica = µV ersicolor vs Ha : Not all means are equal

Exercise: Use the aov function to conduct a hypothesis test at the 5% level of significance to
determine whether the mean sepal lengths are equal for all species.

my_aov <-aov(Sepal.Length~Species,data = iris)

summary(my_aov)

## Df Sum Sq Mean Sq F value Pr(>F)

## Species 2 63.21 31.606 119.3 <2e-16 ***
## Residuals 147 38.96 0.265
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Exercise: Give a fully-worded conclusion to this test.

As our p-value is below 5% because we conducted this test at 5% level of significance which means we reject
our null hypothesis and there is sufficient evidence at 5 % level of significance to conclude that the mean
sepal lengths is not equal for all species.
Exercise: Check whether the ANOVA model assumptions appear to be accurate.

hist(my_aov$residuals)

6
Histogram of my_aov$residuals
60
50
40
Frequency

30
20
10
0

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

my_aov$residuals

aggregate(Sepal.Length~Species,data = iris,FUN = sd)

## Species Sepal.Length
## 1 setosa 0.3524897
## 2 versicolor 0.5161711
## 3 virginica 0.6358796

The residuals appear to have an approximately normal shape, and also that none of the standard deviations
are twice the size of the other ,so that the conditions of the test appear to be satisfied .

Proportion Tests & Categorical Data Analysis
No ratings yet
Proportion Tests & Categorical Data Analysis
185 pages
Probability Concepts for Students
No ratings yet
Probability Concepts for Students
50 pages
Lab 8 - Shell
No ratings yet
Lab 8 - Shell
6 pages
Social Networking Pros and Cons
No ratings yet
Social Networking Pros and Cons
12 pages
Data Science
No ratings yet
Data Science
87 pages
BMSCSCSBSSyll
No ratings yet
BMSCSCSBSSyll
44 pages
Eco 307 Econometrics Course Outline 202526
No ratings yet
Eco 307 Econometrics Course Outline 202526
2 pages
790 (Ebook PDF) Statistics For Nursing: A Practical Approach 3rd Edition Download
50% (2)
790 (Ebook PDF) Statistics For Nursing: A Practical Approach 3rd Edition Download
53 pages
Thesis Help for Public Admin Students
No ratings yet
Thesis Help for Public Admin Students
8 pages
Intro to Descriptive Statistics
100% (2)
Intro to Descriptive Statistics
57 pages
Complete Solutions Manual For University Calculus Early Transcendentals 4th Edition by Hass Verified
No ratings yet
Complete Solutions Manual For University Calculus Early Transcendentals 4th Edition by Hass Verified
326 pages
Predictive Maintenance in Industry 4.0
100% (1)
Predictive Maintenance in Industry 4.0
16 pages
MSC Statistics Thesis Topics
100% (3)
MSC Statistics Thesis Topics
8 pages
Heumann Et Al - 2016-Introduction To Statistics and Data Analysis
No ratings yet
Heumann Et Al - 2016-Introduction To Statistics and Data Analysis
317 pages
Alvi Hanif Adil Ahmed Vveinhardt Impact of Organizational Culture On Organizational Commitment and Job Satisfaction-Libre
No ratings yet
Alvi Hanif Adil Ahmed Vveinhardt Impact of Organizational Culture On Organizational Commitment and Job Satisfaction-Libre
11 pages
Stats Test Prep for Students
No ratings yet
Stats Test Prep for Students
4 pages
Organizational Learning Models
No ratings yet
Organizational Learning Models
6 pages
Course Outline
No ratings yet
Course Outline
4 pages
Accuracy of Reliability Calculated by The Monte Carlo Simulation Method
No ratings yet
Accuracy of Reliability Calculated by The Monte Carlo Simulation Method
12 pages
Central Limit Theorem Explained
No ratings yet
Central Limit Theorem Explained
16 pages
adminucall,+1.+ALLURE+CONFERENCE RERES+NUR+FITRIANA
No ratings yet
adminucall,+1.+ALLURE+CONFERENCE RERES+NUR+FITRIANA
14 pages
The PQRST Strategy, Reading Comprehension, and Learning Styles
No ratings yet
The PQRST Strategy, Reading Comprehension, and Learning Styles
18 pages
KC Housing Assignment Fall 2024
No ratings yet
KC Housing Assignment Fall 2024
40 pages
Statistics Tutorial Solutions
No ratings yet
Statistics Tutorial Solutions
5 pages
How To Lie With Statistics: Darrell Huff Meredith Mincey
No ratings yet
How To Lie With Statistics: Darrell Huff Meredith Mincey
7 pages
JKUAT PhD Seminar Schedule 2025
No ratings yet
JKUAT PhD Seminar Schedule 2025
2 pages
Interpreting One-Way MANOVA Results
No ratings yet
Interpreting One-Way MANOVA Results
4 pages
SPSS Quantitative Research Project
No ratings yet
SPSS Quantitative Research Project
2 pages
Causal Forest Presentation - High Dim Causal Inference
No ratings yet
Causal Forest Presentation - High Dim Causal Inference
113 pages
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 4 - Bivariate Analysis
No ratings yet
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 4 - Bivariate Analysis
8 pages
Hydrology Stats for ECE Students
No ratings yet
Hydrology Stats for ECE Students
7 pages
Practical Clinical Epidemiology For The Veterinarian 1. Auflage Edition Aurora Villarroel PDF Download
100% (2)
Practical Clinical Epidemiology For The Veterinarian 1. Auflage Edition Aurora Villarroel PDF Download
44 pages
THESIS2final (Fixed) Ivan Santos
No ratings yet
THESIS2final (Fixed) Ivan Santos
65 pages
Least Squares Regression Techniques
No ratings yet
Least Squares Regression Techniques
44 pages

Lab 6 - Shell

Uploaded by

Lab 6 - Shell

Uploaded by

Lab 6 - ANOVA 1

Mansi Kumari (7908159)

• How to perform ANOVA in R, both step-by-step and with an easy R function.

Games200 <- read.csv("~/Downloads/Games200.csv")

boxplot(Metascore ~ Platform, data = Games200)

PC PlayStation 4 Switch Xbox One

Use aggregate to calculate the mean of each group

aggregate(Metascore ~ Platform, data = Games200,FUN = mean)

Use aggregate to determine the sample size of each group.

Calculate the overall mean.

Calculate the SSG by hand, using your earlier calculations.

my.SSG<-52*(74.63-73.46)ˆ2 + 45*(71.48-73.46)ˆ2 + 77*(72.25-73.46)ˆ2 + 26*(78.12-73.46)ˆ2

Calculate the MSG by hand, using your earlier calculations.

my.MSG <- my.SSG/(4 - 1)

aggregate(Metascore ~ Platform, FUN = var, data = Games200)

my.SSE <- 51*58.79 + 44*68.85 + 76*57.43 + 25*43.07

Calculate the MSE by hand, using your earlier calculations.

my.MSE <- my.SSE/(200 - 4)

Calculate the F test statistic, using your earlier calculations.

Use pf to find the P-value for this test.

1 - pf(my.F, df1 = 3, df2 = 196)

What is your conclusion?

my.aov <- aov(Metascore ~ Platform, data = Games200)

Use the summary function to print out the ANOVA results.

## Df Sum Sq Mean Sq F value Pr(>F)

Create a histogram of the residuals of the ANOVA model

What does this tell you about your Normality assumption?

aggregate(Metascore ~ Platform, FUN = sd, data = Games200)

What does this tell you about your equal-variances assumption?

H0 : µSetosa = µV irginica = µV ersicolor vs Ha : Not all means are equal

my_aov <-aov(Sepal.Length~Species,data = iris)

## Df Sum Sq Mean Sq F value Pr(>F)

Exercise: Give a fully-worded conclusion to this test.

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

aggregate(Sepal.Length~Species,data = iris,FUN = sd)

You might also like

my.SSG<-52(74.63-73.46)ˆ2 + 45(71.48-73.46)ˆ2 + 77(72.25-73.46)ˆ2 + 26(78.12-73.46)ˆ2

my.SSE <- 5158.79 + 4468.85 + 7657.43 + 2543.07