0% found this document useful (0 votes)

157 views6 pages

R Guide: Descriptive & Inferential Stats

This document provides an overview of descriptive and inferential statistics methods using R. It introduces importing data, descriptive statistics for categorical and numerical variables including frequency tables, histograms, boxplots and scatter plots. Inferential statistics methods covered include hypothesis testing for one and two means and one and two proportions. Exercises are provided to help learn and apply these statistical techniques in R.

Uploaded by

Trần Thị Bích Thảo 3KT -19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

157 views6 pages

R Guide: Descriptive & Inferential Stats

Uploaded by

Trần Thị Bích Thảo 3KT -19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lab 2021-Probability and Statistics

Descriptive and Inferential statistics

With R
Contents
1. Objectives ............................................................................................................................ 1
2. Import data set into R ....................................................................................................... 1
3. Descriptive statistics ......................................................................................................... 2
3.1. Categorical variables ................................................................................................. 2
a. One categorical variable ........................................................................................... 3
b. Two categorical variables ......................................................................................... 3
3.2. Numerical variables .................................................................................................. 3
a. One numerical variable ............................................................................................. 3
b. Two numerical variables .......................................................................................... 4
4. Inferential statistics ........................................................................................................... 4
4.1. Hypothesis testing of mean(s) ................................................................................. 4
a. One mean: ................................................................................................................... 4
b. Two means: ................................................................................................................. 5
4.2. Hypothesis testing of proportion(s) ........................................................................ 5
a. One proportion .......................................................................................................... 5
b. Two porportions ........................................................................................................ 6
4.3. Checking assumptions (optional)............................................................................ 6

1. Objectives
- Read data file into R, introduction to data environment in R
- Descriptive statistics methods for categorical and numerical variables
- Basic Inferential statistics methods
2. Import data set into R

Before reading data file into R, you should better create your own folder to save the
necessary material and the output of your working sessions. In order to check the
current working directory, we use:

o getwd()

Page 1 of 6
Lab 2021-Probability and Statistics
The output of this command is the current folder which is chosen automatically by R. if
you wish to change into your own, the following code will help:

o setwd(“mydirectory”)

Exercise 1: Create a folder in your E directory and label it Learning R. Copy and paste
all the files to be used for this lab into that directory.

There are three main types of data file: Comma Seperated Value file (.csv), text file (.txt),
and excel file (.xls or .xlsx). we will focus on the first one, using the available function
[Link]():

o ourDataframe <- [Link]("filename",tringsAsFactors= FALSE)

filename: name of the delimited text file

stringAsFactors: a logical value that tells R whether to convert character data into
factor. If you set stringsAsFactors= FALSE, the R system will not automatically
convert character variables into factors (if later on you need factors, you need to
convert yourself). Factors can be understood simply as categorical variables.
[Link]() returns a data frame and we assign it to ourDataframe so we can refer to it
again and again.

Now, let’s read in the [Link] dataset from your working directory.

o data1<-[Link]("[Link]",stringsAsFactors=FALSE)
o head(data1)#to see the 5 first rows in your dataframe
o str(data1)#to check the structure of variables
Exercise 2.

a. What does the following code do?

o extracted1 <- data1[1:10,]
o extracted2 <- data1[,1:3]
b. Select all the rows of the data2 data frame that has mpg at least 20.0.
c. (Optional) Change the name of variables.
3. Descriptive statistics
3.1. Categorical variables

Automatically, R cannot understand a categorical variable unless we change it into a

factor by the function factor(). For example, the variable am has only two values so
we will convert it into a factor to use it as a categorical variable (when we draw a
frequency table or a bar chart for example).

o data1$am<-factor(data1$am,ordered=FALSE,levels=c(0,1),
labels=c("automatic","manual"))

Exercise 3.

Page 2 of 6
Lab 2021-Probability and Statistics
Convert the gear variable into factor, using the labels “3 gear”, “4 gears”,…

a. One categorical variable

The descriptive methods for categorical variables include (relative) frequency table and
bar charts. The procedure can be conducted in R as following:

o #create frequency table

o [Link]<-table(data1$am)
o [Link]([Link])#relative frequency table
o [Link]([Link])*100#percent frequency table
o #create bar charts
o barplot([Link],col="skyblue",main="Barplot of Types of
Transmission ", xlab=" Types of Transmission ",
ylab="Frequency")
Exercise 4.

Adjust the y-axis limit if it cannot cover the length of bars.

b. Two categorical variables

We can use clustered or stacked bar graph to describe two categorical variables
simultaneously. Firstly, we need to form the cross-tabulation table:

o [Link]<-table(data1$am,data1$gear)
o barplot([Link], col=c("red", "yellow"), beside=TRUE,
ylim=c(0,20))#clustered bar graph
o barplot([Link],col=c("red","yellow"))

Exercise 5. For the above bar graphs:

a. Add a title for the graph and labels for the two axes.
b. Use different colors for the bars. You can use the colors() function to list the
names of the available colors in R.
3.2. Numerical variables
a. One numerical variable

Measures of Central tendency and Relative standing: Mean, Median, Q1, Q3, Minimum
and Maximum are computed in summary() function.

o summary(data1$mpg)

Measures of variability: Variance and standard deviation of a variable are available in

R, with function var() and sd(). The interquartile range is determined by function
IQR().

o var(data1$mpg)
o sd(data1$mpg)
o IQR(data1$mpg)

Page 3 of 6
Lab 2021-Probability and Statistics
Stem and leaf display:

o stem(data1$mpg)

Histogram:

o hist(data1$mpg)

Box plot:

The following command is to work with boxplot (for numerical data):

o boxplot(data1$mpg)
o [Link](data1$mpg)#to obtain the statistics used to
construct the boxplot
o boxplot(data1$mpg~data1$gear)#to give the boxplots vs
Number of Gears variable
b. Two numerical variables

When dependent and independent variables are both numerical, we must use scatter
plot. Via this plot, we can read (at the first glance) the overall pattern, possible groups
or outliers in our data set.

Let’s consider variable mpg and wt (i.e., weight). We produce scatter plot by plot()
function:

o plot(data1$mpg~data1$wt)

Exercise 6. Add a title to the plot and comments on the trend of this data set. Are there
any outliers or groups of data points?

4. Inferential statistics
4.1. Hypothesis testing of mean(s)
a. One mean:
o [Link](x, alternative = "less", mu = 100)

The above function performs a one-sample t-test on the data contained in x, using a left-
tailed test with 𝐻𝑜 : 𝜇 = 100.

Other options for alternative are: greater, [Link].

Note: The [Link] argument allows us to specify the confidence level of the
reported Confidence Interval (CI) for the relevant parameter in each t-test.

Exercise 7: An investor with a stock portfolio worth several hundred thousand dollars
sued his broker and brokerage firm because lack of diversification in his portfolio led to
poor performance. The conflict was settled by an arbitration panel that gave
“substantial damages” to the investor. File [Link] gives the rates of return for the
39 months that the account was managed by the broker. The arbitration panel compared

Page 4 of 6
Lab 2021-Probability and Statistics
these returns with the average of the Standard & Poor’s 500-stock index for the same
period. Consider the 39 monthly returns as a random sample from the population of
monthly returns that the brokerage would generate if it managed the account forever.
Are these returns compatible with a population mean of μ = 0.95%, the S&P 500
average? Perform that test at 0.01 level of significance.

b. Two means:
o [Link](y~x,data=dataframe,alternative=”[Link]”)#if y is a
numeric variable and x is a dichotomous variable,
o [Link](y1,y2,paired=FALSE,alternative=”[Link]”)#if y1 and
y2 are numeric, and the samples are independent

Exercise 8: Load the [Link] dataset. Home values tend to increase over time
under normal conditions, but the recession of 2008 and 2009 has reportedly caused the
sales price of existing homes to fall nationwide (BusinessWeek, March 9, 2009). You
would like to see if the data support this conclusion. The file HomePrices contains data
on 30 existing home sales in 2006 and 40 existing home sales in 2009. Would you feel
justified in concluding that resale prices of existing homes have declined from 2006 to
2009? Why or why not? Use 0.01 level of significance.

Exercise 9: Data is from W.S. Gosset's 1908 paper and a built-in dataset in R named
sleep. Two different sleeping drugs were taken by two groups of patients. The variable
"extra" is the increase in hours of sleep on the groups (consisting of 10 patients each).
The variable "group" gives the labels for which drug each patient took. Does the
information indicate that the first drug is less effective than the second type? Use 0.05
level of significance.

4.2. Hypothesis testing of proportion(s)

R does not supply any function that performs Z tests of proportion(s). Use the functions
that have been written by FMT teachers to perform one-proportion Z test and two-
proportion Z test. You are permitted to use these functions in your project report.

a. One proportion

Exercise 10: In a cover story, BusinessWeek published information about sleep habits of
Americans (BusinessWeek, January 26, 2004). The article noted that sleep deprivation
causes a number of problems, including highway deaths. Fifty-one percent of adult
drivers admit to driving while drowsy. A researcher hypothesized that this issue was
an even bigger problem for night shift workers.

i. Formulate the hypotheses that can be used to help determine whether more than
51% of the population of night shift workers admit to driving while drowsy.
ii. A sample of 500 night shift workers identified those who admitted to driving while
drowsy. See the [Link] file. What is the sample proportion? What is the p-
value? At α = .01, what is your conclusion?

Page 5 of 6
Lab 2021-Probability and Statistics
b. Two porportions

Exercise 11: JupiterResearch estimated that theU.S. online dating market reached $932
million in 2011 and that the European online dating sites doubled revenues from 243
million euros in 2006 to 549 million euros in 2011. When trying to start a new
relationship, people want to make a favorable impression. Sometimes they will even
stretch the truth a bit when disclosing information about themselves. A study of
deception in online dating studied the accuracy of the information given in their online
dating profiles by 80 online daters. The study found that 22 of 40 men lied about their
height, while 17 of 40 women were deceptive in this way. A difference between the
person’s actual height and that reported in the online dating profile was classified as a
lie if it was greater than 0.5 inches.

i. Find the sample proportion of men who lied about their height. Do the same for
the women.
ii. Do men lie more often about their height than women? State the hypotheses that
can be used to test the assumption. What is the p-value? Use α = 0.05. What is
your conclusion?
4.3. Checking assumptions (optional)

In order to check the normality assumption in t-test, we can perform the following
methods: histogram, QQ-plot, Shapiro-Wilk’s test (with the null hypothesis that
“sample distribution is normal”).

Page 6 of 6

Intro to Statistics with R: Descriptive & Inferential
No ratings yet
Intro to Statistics with R: Descriptive & Inferential
6 pages
Sta2 - Lab 1
No ratings yet
Sta2 - Lab 1
4 pages
Basic Statistical Analysis with R
No ratings yet
Basic Statistical Analysis with R
11 pages
Business Analytics (Unit4 Chapter5)
No ratings yet
Business Analytics (Unit4 Chapter5)
7 pages
Graph Plotting Techniques in R
No ratings yet
Graph Plotting Techniques in R
12 pages
Module 5-6
No ratings yet
Module 5-6
12 pages
R Basics: Data Import & Management
No ratings yet
R Basics: Data Import & Management
4 pages
R For Data Exploration
No ratings yet
R For Data Exploration
52 pages
Unit 4 Ba Shivdas
No ratings yet
Unit 4 Ba Shivdas
17 pages
Computer Oriented Statistical Techniques
No ratings yet
Computer Oriented Statistical Techniques
29 pages
R Tutorial for EHS Data Analysis
No ratings yet
R Tutorial for EHS Data Analysis
9 pages
R/Python Basics: Data Structures & Analysis
No ratings yet
R/Python Basics: Data Structures & Analysis
22 pages
ProbList2 24 SLN
No ratings yet
ProbList2 24 SLN
20 pages
Dar 4
No ratings yet
Dar 4
28 pages
Statistical Procedures Overview
No ratings yet
Statistical Procedures Overview
20 pages
Stata Syntax Guide for Beginners
No ratings yet
Stata Syntax Guide for Beginners
4 pages
Week 3
No ratings yet
Week 3
6 pages
ANOVA Analysis with R Programming
No ratings yet
ANOVA Analysis with R Programming
32 pages
Business Analytics: Data Variable Analysis
No ratings yet
Business Analytics: Data Variable Analysis
16 pages
R Manual PDF
No ratings yet
R Manual PDF
78 pages
R Basics: Graphs & Paired t-Test Guide
No ratings yet
R Basics: Graphs & Paired t-Test Guide
5 pages
R Worksheet: Concert & NHL Data Analysis
No ratings yet
R Worksheet: Concert & NHL Data Analysis
14 pages
R Basics for Business Analytics
No ratings yet
R Basics for Business Analytics
7 pages
Ma 3
No ratings yet
Ma 3
32 pages
BA - Unit 4 (P2)
No ratings yet
BA - Unit 4 (P2)
17 pages
R Questions With Solution
No ratings yet
R Questions With Solution
11 pages
R Worksheet: Concert & NHL Data Analysis
No ratings yet
R Worksheet: Concert & NHL Data Analysis
14 pages
Business Analytics Unit 4
No ratings yet
Business Analytics Unit 4
24 pages
R Studio Lab Summary Sheet
No ratings yet
R Studio Lab Summary Sheet
3 pages
Module 5 (003) - Updated
No ratings yet
Module 5 (003) - Updated
101 pages
R Exploratory Data Analysis Guide
No ratings yet
R Exploratory Data Analysis Guide
6 pages
R Workshop: Data Manipulation & Analysis
No ratings yet
R Workshop: Data Manipulation & Analysis
3 pages
Exploratory Data Analysis and Visualization
No ratings yet
Exploratory Data Analysis and Visualization
10 pages
R Assignment 1: Statistics & Coding Tasks
No ratings yet
R Assignment 1: Statistics & Coding Tasks
7 pages
Basic R Commands For Data Analysis
No ratings yet
Basic R Commands For Data Analysis
7 pages
Exploratory Data Analysis in R
No ratings yet
Exploratory Data Analysis in R
31 pages
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
No ratings yet
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
28 pages
R Data Visualization: Histograms & Boxplots
No ratings yet
R Data Visualization: Histograms & Boxplots
7 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
115 pages
R Programming Basics and Data Analysis
No ratings yet
R Programming Basics and Data Analysis
117 pages
Stata Basics for Data Management
No ratings yet
Stata Basics for Data Management
32 pages
R Module 5
No ratings yet
R Module 5
21 pages
SSMDA
No ratings yet
SSMDA
37 pages
R-Language Lab Record 2021-2022
No ratings yet
R-Language Lab Record 2021-2022
31 pages
Managing Data with R: Techniques & Tools
No ratings yet
Managing Data with R: Techniques & Tools
59 pages
R Arrays, Data Frames, and Factors Guide
No ratings yet
R Arrays, Data Frames, and Factors Guide
23 pages
R Statistics: Analyzing mtcars Data Set
No ratings yet
R Statistics: Analyzing mtcars Data Set
10 pages
STA1007S Lab 3: R Plots and Subsetting
No ratings yet
STA1007S Lab 3: R Plots and Subsetting
10 pages
R Data Visualization Techniques
No ratings yet
R Data Visualization Techniques
34 pages
Exercise 1 - Basic Graphs
No ratings yet
Exercise 1 - Basic Graphs
10 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
73 pages
DEV Lab Manual
No ratings yet
DEV Lab Manual
27 pages
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
No ratings yet
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
13 pages
R Programming-Chapiter 6
No ratings yet
R Programming-Chapiter 6
10 pages
Mendenhall R
No ratings yet
Mendenhall R
14 pages
Data Wrangling Techniques in R
No ratings yet
Data Wrangling Techniques in R
40 pages
R Data Visualization Techniques
No ratings yet
R Data Visualization Techniques
46 pages
Choose The Correct Answer.: A H A B
100% (1)
Choose The Correct Answer.: A H A B
112 pages
DSA Using C - C++
No ratings yet
DSA Using C - C++
14 pages
ASME TE2024 StallSurge
No ratings yet
ASME TE2024 StallSurge
16 pages
Lego Mindstorms EV3 Essentials Guide
0% (1)
Lego Mindstorms EV3 Essentials Guide
2 pages
Tcds Cessna 172 3a12 Rev 81
100% (2)
Tcds Cessna 172 3a12 Rev 81
31 pages
Sircal Product Brochure
No ratings yet
Sircal Product Brochure
1 page
Signal System Handwitten Notes
No ratings yet
Signal System Handwitten Notes
191 pages
HPLC Column and System Troubleshooting PDF
No ratings yet
HPLC Column and System Troubleshooting PDF
42 pages
SPC 2304 Computer System Security Year 4 Semester II
No ratings yet
SPC 2304 Computer System Security Year 4 Semester II
3 pages
ICT Practical Class 10
No ratings yet
ICT Practical Class 10
18 pages
Mathematics-I: Deepak Singh
No ratings yet
Mathematics-I: Deepak Singh
20 pages
CL 9 Phy PPT5 Ls 9 Gravitation
No ratings yet
CL 9 Phy PPT5 Ls 9 Gravitation
19 pages
Semaphores
No ratings yet
Semaphores
3 pages
Critical Reviews in Analytical Chemistry: To Cite This Article: Ion Gh. Tanase, Dana Elena Popa, Gabriela Elena Udri
No ratings yet
Critical Reviews in Analytical Chemistry: To Cite This Article: Ion Gh. Tanase, Dana Elena Popa, Gabriela Elena Udri
18 pages
Object Oriented Analysis & Design Guide
No ratings yet
Object Oriented Analysis & Design Guide
19 pages
Lesson 4-Advanced Spreadsheet Skills
No ratings yet
Lesson 4-Advanced Spreadsheet Skills
17 pages
Energy Forms in Thermodynamics Study Guide
No ratings yet
Energy Forms in Thermodynamics Study Guide
16 pages
Coherent OTDR Used For Fibre Faults Detection
No ratings yet
Coherent OTDR Used For Fibre Faults Detection
6 pages
Math Review: Numbers & Algebra Concepts
No ratings yet
Math Review: Numbers & Algebra Concepts
2 pages
List of Excel Keyboard Shortcuts
No ratings yet
List of Excel Keyboard Shortcuts
9 pages
Simulation With Arena: Part
No ratings yet
Simulation With Arena: Part
10 pages
CNC Process Capability Analysis Report
No ratings yet
CNC Process Capability Analysis Report
6 pages
React JS Guide: JSX Basics & Usage
No ratings yet
React JS Guide: JSX Basics & Usage
55 pages
International A-Level Mathematics Exam
No ratings yet
International A-Level Mathematics Exam
28 pages
Notes
No ratings yet
Notes
16 pages
Isc 11 Physics
No ratings yet
Isc 11 Physics
5 pages
CFD Analysis of Centrifugal Pump Impeller
No ratings yet
CFD Analysis of Centrifugal Pump Impeller
13 pages
UM1724 Hardware Layout and Configuration Table 15. ARDUINO Connectors On NUCLEO-F334R8 (Continued)
No ratings yet
UM1724 Hardware Layout and Configuration Table 15. ARDUINO Connectors On NUCLEO-F334R8 (Continued)
2 pages
Visi Progress
No ratings yet
Visi Progress
45 pages
Linease Actuator CAHB 22E - 17210 en
No ratings yet
Linease Actuator CAHB 22E - 17210 en
4 pages

R Guide: Descriptive & Inferential Stats

Uploaded by

R Guide: Descriptive & Inferential Stats

Uploaded by

Lab 2021-Probability and Statistics

Descriptive and Inferential statistics

o ourDataframe <- [Link]("filename",tringsAsFactors= FALSE)

filename: name of the delimited text file

a. What does the following code do?

Automatically, R cannot understand a categorical variable unless we change it into a

a. One categorical variable

o #create frequency table

Adjust the y-axis limit if it cannot cover the length of bars.

b. Two categorical variables

Exercise 5. For the above bar graphs:

Measures of variability: Variance and standard deviation of a variable are available in

The following command is to work with boxplot (for numerical data):

Other options for alternative are: greater, [Link].

4.2. Hypothesis testing of proportion(s)

You might also like