03 Basic Statistical Data Analysis Using Excel

Basic Statistical Data Analysis
Using Microsoft Excel

Glyzel Grace M. Francisco
STAT2100 – Statistical Analysis with Software Application
1st Semester, 2022-2023
Basic Statistical Data Analysis Using Microsoft Excel | 1

CENTRAL LUZON STATE UNIVERSITY
DEPARTMENT of
STATISTICS Learning Outcomes
After completing this chapter, the students must be able to
• Assess the normality of data using Microsoft Excel

• Perform hypothesis testing using Microsoft Excel
• Perform correlation and simple linear regression using Microsoft Excel
GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 2

DEPARTMENT of
STATISTICS Normal Distribution
• also called as the Gaussian distribution
• It is a distribution related to continuous data from a statistical analysis

standpoint.
• It is sometimes called the “bell curve,” since it is a bell-shaped and

asymptotic about the x-axis
• Mean = Median = Mode
• Symmetric about the mean (50% of the data will fall below the mean and
half of the data will fall above the mean)

DEPARTMENT of
• Since a number of the most common statistical tests rely on the normality
of a sample or population, it is often useful to test whether the underlying
distribution is normal, or at least symmetric. This can be done via the
following approaches:
o Review the distribution graphically (via histograms, boxplots, QQ plots)
o Analyze the skewness and kurtosis
o Employ statistical tests (esp. Chi-square, Kolmogorov-Smirnov, Shapiro-
Wilk, Jarque-Barre, D’Agostino-Pearson)
• Note: If data is not symmetric, sometimes it is useful to make a
transformation whereby the transformed data is symmetric and so can be
analyzed more easily.

DEPARTMENT of
A normal or Gaussian distribution :

DEPARTMENT of
The following graphs can be used to test whether the data is normally
distributed or not :
• Histogram
• Normal Q-Q plot
• Boxplot
Note: the graphs are not as accurate as the formal test for normality.

DEPARTMENT of
STATISTICS 1. Histogram
➢ A histogram can be used to test

whether data is normally
distributed. This test simply
consists of looking at the
histogram and discerning
whether it approximates the
bell curve of a normal
distribution.

DEPARTMENT of
STATISTICS 2. Q-Q plot
• A Q-Q plot is a short term for “quantile-quantile”

plot
• often used to assess whether or not a set of data

potentially came from some theoretical
distribution.
• In most cases, this type of plot is used to

determine whether or not a set of data follows a
normal distribution.

DEPARTMENT of
• an alternative graphical method of assessing normality to the histogram and

is easier to use when there are small sample sizes. The scatter should lie as
close to the line as possible with no obvious pattern coming away from the
line for the data to be considered normally distributed.

DEPARTMENT of
Example:
Using Q-Q plot determine whether the data set with 20 elements is normally
distributed.
-14 -11 18 37 36 -21 -14 50 43 40
-48 -1 -48 17 -23 18 -50 34 39 -41
0 38 31 -20 27 46 -18 29 -6 -24
-39 -14 -6 -47 45 -27 -37 6 -38 -29
18 49 6 22 45 -24 24 20 4 -43

DEPARTMENT of
Step 1. Enter and sort the data Step 2. Find the rank of each data value
=RANK(number, ref, [order])
(…) 0 – Descending
(…) 1 - Ascending
Note: When a formula contains an absolute

reference, no matter which cell the formula occupies
the cell reference does not change: if you copy or
move the formula, it refers to the same cell as it did
in its original location. In an absolute reference, each
part of the reference (the letter that refers to the
row and the number that refers to the column) is
preceded by a “$” – for example, $A$1 is an absolute
reference to cell A1. Wherever the formula is copied
or moved, it always refers to cell A1.

DEPARTMENT of
Step 3. Find the cumulative

= (𝑖 − 0.5)/𝑛
probability of getting less than or
theoretical z-score of the z-score = (𝑖 − 0.5)/𝐶𝑂𝑈𝑁𝑇(𝑑𝑎𝑡𝑎 𝑟𝑎𝑛𝑔𝑒)
of i.
𝑷𝒓𝒐𝒃. 𝒐𝒇 𝒊 = (𝑖 − 0.5)/𝑛

DEPARTMENT of
= 𝑁𝑂𝑅𝑀. 𝑆. 𝐼𝑁𝑉(𝑃𝑟𝑜𝑏 . 𝑜𝑓 𝑖)
Step 4. Find the theoretical
z-score based on Prob. of i
z ~ 𝑁𝑜𝑟𝑚𝑎𝑙 (0,1)

DEPARTMENT of
Step 5. Get the standardized = ( 𝑋 − 𝐴𝑉𝐸𝑅𝐴𝐺𝐸 𝑑𝑎𝑡𝑎 𝑟𝑎𝑛𝑔𝑒 /

value of X 𝑆𝑇𝐷𝐸𝑉. 𝑆(𝑑𝑎𝑡𝑎 𝑟𝑎𝑛𝑔𝑒))
𝑋−𝜇
𝑧=
𝜎
Where:
𝜇 = 𝑀𝑒𝑎𝑛
𝜎 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

DEPARTMENT of
Step 6. Plot the z-score and standardized value of X
Insert ➝ scatter (z-score of i, standardized of X) then Add linear trend

DEPARTMENT of

DEPARTMENT of

DEPARTMENT of
Note: Click the line then change the format
2.50 For normally distributed data,
2.00
observations should lie approximately on
a straight line. If the data is non-normal,
1.50
the points form a curve that deviates
1.00
markedly from a straight line. Possible
0.50 outliers are points at the ends of the line,
AXIS TITLE
0.00 distanced from the bulk of the

-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00
-0.50 observations.
-1.00
-1.50 In this case, the data was merely

-2.00 approximately follows a normal
-2.50
distribution
AXIS TITLE

DEPARTMENT of
STATISTICS 3. Box Plot
• Box plots can’t actually be used to test for
normality, they can be useful for testing for
symmetry, which often is a sufficient
substitute for normality.
• The box plot shape will show if a statistical

data set is normally distributed or skewed.
• If your data comes from a NORMAL

DISTRIBUTION, the box will be symmetrical
with the mean and median in the center.
• If the data meets the assumption of

normality, there should also be few outliers.

DEPARTMENT of
STATISTICS 3. Box Plot
Interquartile Range
(IQR)
whisker whisker
Minimum/ Maximum/
Lower Fence Median Upper Fence
𝑸𝟏 𝑸𝟑
(𝑸𝟏 − 1.5 ∗ IQR) (25th Percentile) (75th Percentile) (𝑸𝟑 + 1.5 ∗ IQR)

DEPARTMENT of
STATISTICS Skewness
Skewness is a measure of the asymmetry of the probability distribution of a random
variable about its mean.
• If skewness is 0, the data are perfectly symmetrical.
• If skewness is between -0.5 and 0.5, the distribution is approximately symmetric.
• If skewness is between -1 and -0.5 or between 0.5 and 1, the distribution is
moderately skewed.
• If skewness is less than -1 or greater than 1, the distribution is highly skewed.
In Microsoft Excel: =SKEW(number1, [number2], ...)
Moderately Approximately Moderately
Skewed symmetric Skewed
Highly Highly
Skewed Skewed
−𝟏 − 𝟏ൗ𝟐 𝟎 𝟏ൗ
𝟐 𝟏
DEPARTMENT of
STATISTICS Skewness
Perfectly Symmetric
or Normal Distribution
Positively skewed distribution Negatively skewed distribution
or Skewed to the right or Skewed to the left

DEPARTMENT of
STATISTICS Kurtosis
Kurtosis tells you the height and sharpness of the central peak, relative to
that of a standard bell curve.
• A perfect normal distribution has a kurtosis of 0 (mesokurtic distribution)

• Kurtosis value above 0 = Leptokurtic distribution (sharper peak and
longer/flatter tails)
• Kurtosis value below 0 = Platykurtic distribution (rounder peak and
shorter/thinner tails)
In Microsoft Excel: KURT(number1, [number2], ...)

DEPARTMENT of
STATISTICS Kurtosis
Platykurtic Distribution Normal Distribution or Leptokurtic Distribution

Kurtosis < 0 Mesokurtic Distribution Kurtosis > 0
Kurtosis = 0

DEPARTMENT of
STATISTICS Shapiro-Wilk W Test
Tests for assessing if data is normally distributed
using Shapiro-Wilk W Test
• developed by Shapiro and Wilk (1965)

• It is the ratio of two estimates of the variance of a normal distribution
based on a random sample of n observations.
• This test for normality has been found to be the most powerful test in
most situations.
• Hypothesis:
Ho: The data is normally distributed.
Ha: The data is not normally distributed.
• Decision Rule: Reject Ho if P-value < 𝛼 (level of significance)

DEPARTMENT of
Step 1. Arrange the data in ascending order.

DEPARTMENT of
Step 2. Calculate 𝑆𝑆 as follows:
𝑆𝑆 = σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 2
Note: If 𝑛 is even, let 𝑚 = 𝑛/2, while if 𝑛

is odd let 𝑚 = (𝑛 − 1)/2
In excel use DEVSQ function to calculate

𝑆𝑆
= 𝐷𝐸𝑉𝑆𝑄(𝑑𝑎𝑡𝑎 𝑟𝑎𝑛𝑔𝑒)

DEPARTMENT of
Step 3. Calculate 𝑏 as follows
a. taking the 𝑎𝑖 weights from the
Shapiro-Wilk W Table 1.
b. Corresponding to each of the
6 coefficients a1 ,… , a6 , calculate
the values 𝑥12 − 𝑥1 , 𝑥11 − 𝑥2 ,…,
𝑥7 − 𝑥6 where 𝑥𝑖 is the ith data
element in ascending order.
Shapiro-Wilk W Table 1: Coefficients
c. Get the product of the = 𝑆𝑈𝑀(𝐼5: 𝐼10)
coefficients and difference values.
d. Get the sum of the product of the
coefficients and difference values.

DEPARTMENT of
Step 4. Calculate the test
statistic
𝑊 = 𝑏2 /𝑆𝑆
= (𝑏^2)/𝑆𝑆
or
= (𝑏 ∗ 𝑏)/𝑆𝑆

DEPARTMENT of
Step 5. Find the value in the Shapiro-Wilk W Table 2 that is closest to 𝑊,
interpolating if necessary. This is the p-values for the test.
The p-value lies between 0.50 and 0.90. The W value for 0.5 is 0.943 and
the W value for 0.9 is 0.973.
DEPARTMENT of
We need to interpolate since 𝑊 = 0.971 𝑦 lies between 0.50 and 0.90.
The 𝑊 value for 0.5 𝑥1 is 0.943 𝑦1 and the 𝑊 value for 0.9 𝑥2 is 0.973
𝑦2 .
Linear Interpolation:
𝑦2 − 𝑦1
𝑦 − 𝑦1 = 𝑥 − 𝑥1
𝑥2 − 𝑥1
0.973 − 0.943
0.971 − 0.943 = 𝑥 − 0.5
0.9 − 0.5
𝑥 = 0.873 ⇒ 𝑃 − 𝑣𝑎𝑙𝑢𝑒
Therefore, the p-value is equivalent to 0.873

DEPARTMENT of
STATISTICS
Formal hypothesis testing for Shapiro-Wilk W Test

a. Ho: The data is normally distributed.
Ha: The data is not normally distributed.
b. Test to use: Shapiro-Wilk W Test
c. Decision Rule: P-value Method
Reject Ho if P-value < 𝛼 = 0.05
d. P-value =0.873
e. Decision: Since 0.873 ≮ 0.05, failed to reject Ho
f. Conclusion: We therefore conclude that the data is normally
distributed.

DEPARTMENT of
STATISTICS
What if the data is not normally distributed?

If the data is not normally distributed, there are two options:
• Transform the dependent variable (repeating the normality checks on
the transformed data): Common transformations include taking the
log or square root of the dependent variable.
• Use a non-parametric test: Non-parametric tests are often called
distribution free tests and can be used instead of their parametric
equivalent.

DEPARTMENT of
STATISTICS
Key to non-parametric tests:

Parametric Test What to check for normality Non-parametric test
Independent t-test Dependent variable by group Mann-Whitney test
Paired t-test Paired differences Wilcoxon signed rank test
Residuals/ dependent variable
One-way ANOVA Kruskal-Wallis test
by group
Repeated measures ANOVA Residuals at each time point Friedman test
Pearson’s correlation coefficient Both variables Spearman’s correlation
Simple linear regression Residuals N/A
Note:
• The residuals are the differences between the observed and expected values.
• Excel will not perform non-parametric tests even with the data analysis toolpak add in.

DEPARTMENT of
STATISTICS Hypothesis Testing
One Population
One Sample One Sample t-test
Parameter
Hypothesis Dependent
Paired Sample t-test
Samples
Testing
Independent Sample z-test
Two Population
Parameter (𝑖𝑓 𝜎1 & 𝜎2 𝑎𝑟𝑒 𝑘𝑛𝑜𝑤𝑛)
Independent Independent Sample t-test

Samples (𝑖𝑓 𝜎1 & 𝜎2 𝑎𝑟𝑒 𝑢𝑛𝑘𝑛𝑜𝑤𝑛 𝑎𝑛𝑑 𝜎1 = 𝜎2 )
Independent Sample t-test

(𝑖𝑓 𝜎1 & 𝜎2 𝑎𝑟𝑒 𝑢𝑛𝑘𝑛𝑜𝑤𝑛 𝑎𝑛𝑑 𝜎1 ≠ 𝜎2 )

DEPARTMENT of
STATISTICS Components of Hypothesis Testing
Test of Hypothesis
Set of rules that lead to the acceptance or rejection of statistical method.
• Hypothesis - standard procedure for testing a claim about a property of a
population based on sample data
• It is a claim or statement about a property of a population.
Null Hypothesis (𝑯𝒐 ) Alternative Hypothesis (𝑯𝒂 𝒐𝒓𝑯𝟏 )

Tail of
distribution hypothesis that differs from the null
must contain the condition of equality
hypothesis
Two-tailed  Equal (=) 𝜇 = 𝜇𝑜  Not equal to (≠) 𝜇 ≠ 𝜇𝑜
Right-tailed  Less than or equal to (≤) 𝜇 ≤ 𝜇𝑜  Greater than (>) 𝜇 > 𝜇𝑜
Left-tailed  Greater than or equal to (≥) 𝜇 ≥ 𝜇𝑜  Less than (<) 𝜇 < 𝜇𝑜

DEPARTMENT of
STATISTICS Two Types of Error in Hypothesis Testing
• Type I error - mistake of Rejecting 𝐻𝑜 when it is actually true
𝛼 = probability of committing a Type I error = P(Reject 𝐻𝑜 | 𝐻𝑜 is true)
• Type II error - mistake of Failing to Reject 𝐻𝑜 when it is actually

false
𝛽 = probability of committing a Type II error = P(Reject 𝐻𝑜 | 𝐻𝑜 is false)
Result of testing a hypothesis

Our Decision
Actual Situation
Accept Ho Reject Ho
𝐻𝑜 is true Correct Decision (1-α) Type I error (α)
𝐻𝑜 is false Type II error (β) Correct Decision (1-β)
DEPARTMENT of
Test statistic
– sample statistic or a value based on the sample data.
– used in making decision about the rejection of the null hypothesis
Critical region/Rejection region

– set of all values of the test statistic that would cause to reject 𝐻0
Critical Value
– value/s that separate the critical region from the values that would not
lead to rejection of 𝐻0

DEPARTMENT of
Level of significance 𝜶
• probability of making the mistake of rejecting the null hypothesis when it
is true
Commonly used degree of confidence and level of significance
Degree of Confidence (1 − 𝛼) Level of significance (𝛼)

90% 0.10
95% 0.05
99% 0.01

DEPARTMENT of
Tails in a distribution
• Examine the alternative hypothesis
• Extreme regions bounded by the critical values
Left-tailed test
• The CR is in the extreme left region under the curve.
Right-tailed test:
• The CR is in the extreme right region under the curve.
Two-tailed test:
• The CR is in the two extreme regions under the curve.
DEPARTMENT of
STATISTICS
1. Left-tailed test
• The CR is in the extreme left region under the curve.
Ha: 𝜇 < 𝜇𝑜
Acceptance Region
Rejection Region
It means that if the test
statistic fall in this region,
It means that if the test we failed to reject Ho
we reject Ho
Critical Value

DEPARTMENT of
STATISTICS
2. Right-tailed test:
• The CR is in the extreme right region under the curve.
Ha: 𝜇 > 𝜇𝑜
Acceptance Region
Rejection Region
we failed to reject Ho It means that if the test
we reject Ho
Critical Value

DEPARTMENT of
STATISTICS
3. Two-tailed test:
• The CR is in the two extreme regions under the curve.
Ha: 𝜇 ≠ 𝜇𝑜
Rejection Region
Acceptance Region Rejection Region

statistic fall in this region, It means that if the test It means that if the test
we reject Ho statistic fall in this region, statistic fall in this region,
we failed to reject Ho we reject Ho
Critical Region Critical Value

DEPARTMENT of
STATISTICS
Guidelines in Hypothesis Testing:
a. State the null (Ho) and alternative (Ha) hypothesis. Identify the claim.
b. Determine the test statistic, critical value and tail of the distribution where
the rejection region is located.
c. Formulate the decision rule.
d. Compute the value of the test statistic.
e. Make a decision. ( whether to reject or to accept Ho)
f. Draw the conclusion by answering the original claim.

DEPARTMENT of
STATISTICS
Steps in testing a hypothesis: P-value Method

Identify the
State the test to use,
Make a
null and the tailed Formulate
decision
alternative of the the Determine Interpret
(whether
hypothesis. distribution decision the P-value the result
to reject or
Identify the and the rule
not the Ho)
claim level of
significance
Decision Rule
• If the p-value is smaller than the significance level, Ho is rejected.
• If it is larger than the significance level, Ho is not rejected (or failed to
reject Ho
DEPARTMENT of
STATISTICS
P-value
• The probability of observing a sample value as extreme as, or more
extreme than, the value observed, given that the null hypothesis is true.
INTERPRETING THE WEIGHT OF EVIDENCE AGAINST Ho

If the p-value is less than
(a) 0.10, we have some evidence that Ho is not true.
(b) 0.05, we have strong evidence that Ho is not true.
(c) 0.01, we have very strong evidence that Ho is not true.
(d) 0.001, we have extremely evidence that Ho is not true.

DEPARTMENT of
STATISTICS
It is important to summarize the results of a statistical study correctly.
The following table will help you when you summarize the results.
Decision Claim is Ho Claim is Ha
Reject Ho There is enough There is enough
evidence to reject the evidence to support the
claim. claim.
Failed to reject Ho There is no enough There is no enough
claim. claim.

DEPARTMENT of
STATISTICS
Symbol Definition Example
• greater than • The mean weight is greater than 50kg

> 𝜇 > 50kg
• more than • X is more than 50kg
< • less than • The mean weight is less than 50kg 𝜇 < 50kg
• The mean weight is greater than or
• greater than or equal to
≥ equal to 50kg 𝜇 ≥ 50kg
• at least
• The mean weight is at least 50kg
• The mean weight is less than or equal
• less than or equal to
≤ to 50kg 𝜇 ≤ 50kg
• at most
• The mean weight is at most 50kg
= • equal • The mean weight is equal to 50kg 𝜇 = 50kg
≠ • not equal • The mean weight is not equal to 50kg 𝜇 ≠ 50kg

DEPARTMENT of
STATISTICS
1. t-Test for One Sample Mean

• Used to compare the mean value of a sample with a constant value (𝜇0 ).
Example 1:
Test the hypothesis that the average sugar content of particular cookies is
10g, if the contents of a random sample of 10 cookies are:
10.3 9.7 10.1 10.3 10.1 9.8 9.9 10.5 10.3 9.9
Use 0.01 level of significance and assume that the distribution of contents is
normal. Assume the data are normally distributed.

DEPARTMENT of
STATISTICS
Step in performing t-Test for One

Sample Mean
Step 1: Create a dummy second

variable and input at least 2 zeroes
in it.

DEPARTMENT of
STATISTICS
Step 2: Click Data → Data Analysis → t-Test Two-Sample Assuming

Unequal Variances → OK

DEPARTMENT of
STATISTICS
Step 3:
• For variable 1 range,
select the actual
data.
• For variable 2 range,
select the dummy
data.
• For the hypothesized
mean difference,
input the test value
from the hypothesis.

DEPARTMENT of
STATISTICS
Step 4: For the output,

• delete the dummy column,
• change the title of the
table
• delete the word mean in
the hypothesized mean
difference row.

DEPARTMENT of
STATISTICS
Solution: P-value Method

Step 1: State the null (Ho) and alternative (Ha) hypothesis. Identify the claim
• Ho : 𝜇 = 10g ⟹ The average sugar content of particular cookies is 10g (claim)
• Ha : 𝜇 ≠ 10g ⟹ The average sugar content of particular cookies is not 10g)
Step 2: Identify the test to use, the tailed of the distribution and the level of
significance
• t-Test for One Sample Mean
• Since the alternative hypothesis is 𝜇 ≠ 10g, therefore the tail of the
distribution is two-tailed test.
• Level of significance: 𝛼 = 0.01
Step 3: Formulate the decision rule
• Reject Ho if P-value < 𝛼 = 0.01
DEPARTMENT of
STATISTICS
Step 4: Determine the P-value

• Since the tail of the distribution is
two-tailed, therefore the P-value
is 0.30.
Step 5: Make a decision

(whether to reject or not the Ho)
• Since P-value is not less than 𝛼 =
0.01 , failed to reject Ho
Step 6: Interpret the results

• At 1% level of significance, there is no enough evidence to reject the claim
that the average sugar content of particular cookies is 10g.
DEPARTMENT of
STATISTICS
claim. claim.
claim. claim.

DEPARTMENT of
STATISTICS
Example 2:
Communication company conducted a study about the number of text
messages sent by a teenager in a day. They claim that on the average,
teenager sent at least 65 text messages per day. To update the estimates, a
sample of 11 teenagers ask how many text messages they sent a day. Their
response were:
51 175 47 49 44 54 145 203 21 42 100
At 0.05 level of significance, can you conclude that the mean number of text
messages sent by teenagers per day is 65. Assume the data are normally
distributed.

DEPARTMENT of
STATISTICS

• Ho : 𝜇 ≥ 65 ⟹ the average text messages per day is at least 65 (claim)
• Ha : 𝜇 < 65 ⟹ the average text messages per day is less than 65
significance
• t-Test for One Sample Mean
• Since the alternative hypothesis is 𝜇 < 65 text messages per day,
therefore the tail of the distribution is one tailed to the right test.
DEPARTMENT of
STATISTICS

one-tailed, therefore the P-value
is 0.16.
Step 5: Make a decision

(whether to reject or not the Ho)

• At 5% level of significance, there is no enough evidence to reject the claim
that the average text messages per day is at least 65.
DEPARTMENT of
STATISTICS
claim. claim.
claim. claim.

DEPARTMENT of
STATISTICS
Hypothesis Testing for Two Populations
• It is used when researchers wish to compare two sample means, using

experimental and control groups.
• Main interest is on the difference between the two populations.
• For example, the average lifetimes of two different brands of bus tires
might be compared to see whether there is any difference in tread wear.
Two different brands of fertilizer might be tested to see whether one is
better than the other for growing plants. Or two brands of cough syrup
might be tested to see whether one brand is more effective than the
other.
DEPARTMENT of
STATISTICS
2. z-Test for Two Sample Means

Main interest is on the difference between 𝜇1 and 𝜇2
Assumptions:
1. Both samples are random samples.
2. The samples must be independent of each other. That is, there can be no
relationship between the subjects in each sample.
3. The standard deviations of both populations must be known; and if the
sample sizes are less than 30, the populations must be normally or
approximately normally distributed.

DEPARTMENT of
STATISTICS
Males Females
Example 1: 6 11 11 6 8 11
A random sample of the number of
6 14 8 7 5 13
sports offered by colleges for males
and females is shown. At 𝛼 = 0.10, 6 9 5 6 5 5
is there enough evidence to 6 9 18 10 7 6
support the claim that there is 15 6 11 16 10 7
significant difference in the number 9 9 5 7 5 5
of sports offered by colleges for 8 9 6 9 18 13
males and females?
9 5 11 7 8 5
Assume 𝜎1 = 𝜎2 = 3.3.
7 7 5 11 4 6
10 7 10 14 12 5

DEPARTMENT of
STATISTICS
Step in performing z-Test for Two Sample Means

1. Click the Data tab on the ribbon, then look for Data Analysis
2. Choose z-Test: Two Sample for Means, then click OK

DEPARTMENT of
STATISTICS
3. Input the range that

contains the variables 1
and 2.
4. Input the known
variance for variables 1
and 2.
5. Choose output
options, then click ok

DEPARTMENT of
STATISTICS
6. the output will be

DEPARTMENT of
STATISTICS

• Ho : 𝜇1 = 𝜇2 ⟹ There is no significant difference in the number of sports offered by colleges for males and female.
• Ha : 𝜇1 ≠ 𝜇2 ⟹ There is significant difference in the number of sports offered by colleges for males and female. (claim)
significance
• z-Test for Two Sample Means
• Since the alternative hypothesis is 𝜇1 ≠ 𝜇2 , therefore the tail of the
DEPARTMENT of
STATISTICS

two-tailed, therefore the P-value is
0.619.
Step 5: Make a decision (whether to
reject or not the Ho)
• At 10% level of significance, there is no enough evidence to support the
claim that there is significant difference in the number of sports offered by
colleges for males and females.
DEPARTMENT of
STATISTICS
claim. claim.
claim. claim.

DEPARTMENT of
STATISTICS
Example 2: Resident Commuter

The dean of students wants to see 22 19 23 23 32 18 26 18 36 35
whether there is a significant 25 19 24 21 18 23 23 22 23 24
difference in ages of resident 18 21 18 28 20 26 21 22 25 20
students and commuting students.
19 23 29 26 18 19 25 19 25 32
She selects a random sample of 50
students from each group. The ages 26 27 21 19 24 29 19 19 21 20
are shown here. At 𝛼 = 0.05, 22 26 21 22 19 20 23 19 18 19
determine if there is enough 19 26 26 18 23 20 22 22 27 18
evidence to reject the claim of no 25 18 27 22 22 20 35 28 19 17
difference in the ages of the two
20 19 18 26 18 18 21 22 35 17
groups. Assume 𝜎1 = 3.68 and
𝜎2 = 4.7. 30 19 19 18 20 30 18 19 24 16

DEPARTMENT of
STATISTICS

• Ho : 𝜇1 = 𝜇2 ⟹ There is no significant difference in ages of resident students and commuting students. (claim)
• Ha : 𝜇1 ≠ 𝜇2 ⟹ There is no significant difference in ages of resident students and commuting students.
significance
• z-Test for Two Sample Means
DEPARTMENT of
STATISTICS
Step 4: Determine the

P-value
• Since the tail of the
distribution is two-
tailed, therefore the
P-value is 0.130.

DEPARTMENT of
STATISTICS
Step 5: Make a decision (whether to reject or not the Ho)

• Since P-value is not less than 𝛼 = 0.05 , failed to reject Ho

• At 5% level of significance, there is no enough evidence to reject the
claim that the there is no significant difference in ages of resident
students and commuting students

DEPARTMENT of
STATISTICS
3. t-Test for Two Independent Sample Means

• Samples are independent if the sample selected from one population is
not related to the sample selected from the other population.
Example 1:
A teacher wants to know which of the two sections has a higher score. The
teacher will use these scores to make recommendations to the principal.
Random samples of students are asked about their scores. Test the claim that
the population mean scores are different for the two sections? Assume the
data are normally distributed and equal variances. Use =0.05.
Section A 81 77 75 74 86 90 62 73 91 98
Section B 89 64 35 68 69 55 37 57 42 49

DEPARTMENT of
STATISTICS
Step in performing t-Test for Two Independent Sample Means

2. Choose t-Test: Two Sample…, then click OK

DEPARTMENT of
STATISTICS
3. Input the range that contains the variables 1 and 2.

4. Choose output options, then click ok

DEPARTMENT of
STATISTICS
5. the output
will be

DEPARTMENT of
STATISTICS

• Ho : 𝜇1 = 𝜇2 ⟹ The population mean scores are not different for the two sections
• Ha : 𝜇1 ≠ 𝜇2 ⟹ The population mean scores are different for the two sections (claim)
significance
• t-Test for Two Independent Sample Means (assume equal variances)
• Since the alternative hypothesis is 𝜇1 ≠ 𝜇2 , therefore the tail of the distribution
is two-tailed test.
DEPARTMENT of
STATISTICS

• Since the tail of the distribution
is two-tailed, therefore the P-
value is 0.001.
Step 5: Make a decision (whether
to reject or not the Ho)
• Since P-value is less than 𝛼 =
0.05 , reject Ho

• At 5% level of significance, there is enough evidence to support the claim
that the population mean scores are different for the two sections.
DEPARTMENT of
STATISTICS
claim. claim.
claim. claim.

DEPARTMENT of
STATISTICS
Example 2:
The production manager at Bellevue Steel, a manufacturer of wheelchairs,
wants to compare the number of defective wheelchairs produced on the day
shift with the number on the afternoon shift. A sample of the production
from 6 day shifts and 8 afternoon shifts revealed the following number of
defects. At the 0.05 significance level, test the claim that there is a difference
in the mean number of defects per shift. Assume the data are normally
distributed and unequal variances.
Day 5 8 7 6 9 7
Afternoon 8 10 7 11 9 12 14 9

DEPARTMENT of
STATISTICS

• Ho : 𝜇1 = 𝜇2 ⟹ there is no difference in the mean number of defects per shift
• Ha : 𝜇1 ≠ 𝜇2 ⟹ there is a difference in the mean number of defects per shift (claim)
significance
• t-Test for Two Independent Sample Means (assume unequal variances)
DEPARTMENT of
STATISTICS
Note: Assume unequal variances, so use t-Test: Two-Sample Assuming Unequal Variances

DEPARTMENT of
STATISTICS

• Since the tail of the distribution is two-tailed, therefore the P-value
is 0.010.

DEPARTMENT of
STATISTICS
Step 5: Make a decision (whether to reject or not the Ho)

• Since P-value is less than 𝛼 = 0.05 , reject Ho

that there is a difference in the mean number of defects per shift.

DEPARTMENT of
STATISTICS
claim. claim.
claim. claim.

DEPARTMENT of
STATISTICS
4. t-Test for Two Dependent Sample Means

• Dependent means one sample is related to the other sample.
• such samples are referred to as paired sample or matched sample
• get one value from each of two subjects sharing the same characteristics
Example:
The following are the scores of eight students before and after a review. At 5%
level of significance, test the claim that there is significant difference in the
scores before and after the review. Assume the data are normally distributed.
Student 1 2 3 4 5 6 7 8
Before 77 74 82 73 87 68 66 80
After 72 68 76 68 84 68 61 76
DEPARTMENT of
STATISTICS
Step in performing t-Test for Two Dependent Sample Means

2. Choose t-Test: Paired Two Sample for Means, then click OK

DEPARTMENT of
STATISTICS


DEPARTMENT of
STATISTICS

DEPARTMENT of
STATISTICS

• Ho : 𝜇1 = 𝜇2 ⟹ There is no significant difference in the scores before and after the review
• Ha : 𝜇1 ≠ 𝜇2 ⟹ There is a significant difference in the scores before and after the review (claim)
significance
• t-Test for Two Dependent Sample Means
DEPARTMENT of
STATISTICS

two-tailed, therefore the P-value is
0.0005.
• Since P-value is less than 𝛼 = 0.05,
reject Ho
that there is significant difference in the scores before and after the review.

DEPARTMENT of
STATISTICS
claim. claim.
claim. claim.

DEPARTMENT of
STATISTICS
Example 2:
Advertisements by Fitness Center claim that completing its course will result
in losing weight. A random sample of eight recent participants showed the
following weights before and after completing the course. At 0.01 level of
significance, can we conclude the students lost weight? Assume the data are
normally distributed.
Student 1 2 3 4 5 6 7 8
Before 155 228 141 162 211 164 184 172
After 154 201 147 157 196 150 170 165

DEPARTMENT of
STATISTICS

• Ho : 𝜇1 = 𝜇2 ⟹ The students doesn’t lost weight
• Ha : 𝜇1 > 𝜇2 ⟹The students lost weight (claim)
significance
• t-Test for Two Dependent Sample Means
• Since the alternative hypothesis is 𝜇1 > 𝜇2 , therefore the tail of the
distribution is one tailed to the right test.
DEPARTMENT of
STATISTICS

• Since the tail of the distribution
is two-tailed, therefore the P-
value is 0.016.
• Since P-value is not less than
𝛼 = 0.01 , failed to reject Ho
• At 1% level of significance, there is no enough evidence to support the
claim that the students lost weight.

DEPARTMENT of
STATISTICS
claim. claim.
claim. claim.

DEPARTMENT of
STATISTICS
5. F-test for Two-Sample Variance
• It is used to test whether the variances of two populations are equal.
Step in performing F-test for Two-Sample Variance
2. Choose F-Test: Two Sample for Variance, then click OK

DEPARTMENT of
STATISTICS

Note: The variance of variable 1
must be larger than the variance
of variable 2.
In this case, section B

have larger variance
that section A, so the
data of section B must
be put in the “Variable
1 Range” and section A
in “Variable 2 Range” .

DEPARTMENT of
STATISTICS

DEPARTMENT of
STATISTICS
Solution: F-test
Step 1: Ho: 𝜎12 = 𝜎22 (Variances are equal)
Ha: 𝜎12 ≠ 𝜎22 (Variances are not equal)
Step 2: F-test, 𝛼=0.05
Step 3: Reject Ho F > F Critical one-tail
Reject Ho if P-value < 𝛼 = 0.05 (P-value Method)
Step 4: F = 2.46 and F Critical = 3.17
P-value = 0.098
Step 5: Since F is not greater than F Critical, failed to reject Ho
Since P-value is not less that 𝛼 = 0.05, failed to reject Ho
Step 6: At 5% level of significance, we can conclude that the variances are equal.

DEPARTMENT of
STATISTICS Correlation Analysis
• Correlation is one of the most common and most useful statistics.

• Linear correlation analysis is a statistical technique to determine the
strength or degree of linear relationship existing between two variables.
• A measure of the degree of linear relationship is called population
coefficient ρ
Assumptions:
• The sample of paired (X,Y) data is a random sample.
• The data pairs fall approximately on a straight line and are measured at the
interval or ratio level.
• The pairs of (X,Y) data have a bivariate normal distribution.

DEPARTMENT of
STATISTICS Scatter plot
Scatter Diagram / Scatter plot - is a type of mathematical diagram

using Cartesian coordinates to display values for two variables for a set
of data
Positive linear Negative linear Curvilinear No

Relationship Relationship relationship relationship

DEPARTMENT of
STATISTICS Example
Suppose data below were collected from a sample of 10 Pizza Parlor restaurants
located near college campuses. The size of student population (in thousands) and the
quarterly sales (in thousands of dollars) for the 10 restaurants are summarized in the
table below.
Restaurant
1 2 3 4 5 6 7 8 9 10
(𝒊)
Student Population
(1000s) 2 6 8 8 12 16 19 20 22 26
(𝒙𝒊 )
Quarterly Sales
($1000s) 58 105 88 118 117 137 157 169 148 202
(𝒚𝒊 )

DEPARTMENT of
STATISTICS Example: Scatter plot in Excel
Step 1: Highlight the cells that

contain the data you want to
use in the chart

DEPARTMENT of
STATISTICS
Step 2: Click the insert tab on the ribbon
Step 3: Select the X, Y (scatter) you want to create

DEPARTMENT of
STATISTICS
Step 4: You'll see many options when you select this button, select Scatter

DEPARTMENT of
STATISTICS
Step 5: The chart will appear. Customize bar chart through ”chart design”,
“format”, “Quick Layout”.

DEPARTMENT of
STATISTICS
The final output will be

250
200
QUARTERLY SALES ($1000S)
150
100
50
0
0 5 10 15 20 25 30
STUDENT POPULATION (1000S)

DEPARTMENT of
STATISTICS
Types of Correlation
1. Positive linear correlation – if the independent variable increases, the

dependent variable also increases; or if the independent variable decreases, the
dependent variable also decreases.
2. Negative linear correlation – if the independent variable increases, the
dependent variable decreases; or if the independent variable decreases, the
independent variable increases.
3. Zero linear correlation – no linear relationship exists between the dependent
and independent variables. If the relationship is not linear, then it can be
exponential or logarithmic. Note that this does not imply that the independent
variable is not related with the independent variable.
DEPARTMENT of
STATISTICS a
Linear Correlation Coefficient r

• Measures the strength of the linear relationship between the paired X and Y
values in a sample.
• It is sometimes referred to as Pearson Product Moment Correlation in
honor of Karl Pearson (1857-1936), who originally develop it.
𝑛 σ 𝑋𝑖 𝑌𝑖 − (σ 𝑋𝑖 )(σ 𝑌𝑖 )
𝑟=
2 2 2 2
σ σ σ σ
𝑛 𝑋𝑖 − ( 𝑋𝑖 ) 𝑛 𝑌𝑖 − ( 𝑌𝑖 )

DEPARTMENT of
STATISTICS
Interpretations of the Linear

𝐶𝑃𝑋𝑌 Correlation Coefficient r
𝑟=
𝑆𝑆𝑋 𝑆𝑆𝑌
Value of r Interpretation
where
σ 𝑋𝑖 σ 𝑌𝑖 0.01 to 0.20 Very weak linear relationship
▪ 𝐶𝑃𝑋𝑌 = σ 𝑋𝑖 𝑌𝑖 − 𝑛
2
0.21 to 0.40 Weak linear relationship
2 σ 𝑋𝑖
▪ 𝑆𝑆𝑥 = σ 𝑋𝑖 - 𝑛 0.41 to 0.70 Moderate linear relationship
2 σ 𝑌𝑖 2
▪ 𝑆𝑆𝑦 = σ 𝑌𝑖 - 0.71 to 0.90 Strong linear relationship
𝑛
0.91 to 0.99 Very strong linear relationship

DEPARTMENT of
STATISTICS
Number of Final grade

Example 1: Student
absences (X) (Y)
The data below shows the
1 6 82
number of absences and the
final grades of eight randomly 2 2 86
selected students from a 3 15 45
statistics class. Determine 4 9 74
whether there is significant 5 11 59
linear relationship between
6 5 90
number of absences and final
grade. 7 8 78
8 4 86

DEPARTMENT of
STATISTICS
Step in performing Correlation:

2. Choose Correlation, then click OK

DEPARTMENT of
STATISTICS
3. Input the range that contains the data.


DEPARTMENT of
STATISTICS
𝑟 = −0.94
It means that the number of absences
and final grade have negative very
strong linear relationship
DEPARTMENT of
STATISTICS
Properties of the Linear Correlation Coefficient r
1. The value of r is always -1 to 1. That is, -1 ≤ r ≤ 1.

2. A value near 0 indicates there is little linear relationship between variables
3. The value of r does not change if all values of either variable are converted
to a different scale.
4. The value of r is not affected by the choice of x or y. Interchange all x and y
values and the value of r will not change.
5. r measures the strength of a linear relationship. It is not designated to
measure the strength of a relationship that is not linear.
6. The value of r is sensitive to outliers.

DEPARTMENT of
STATISTICS
Formal Hypothesis Test
➢The null and alternative hypothesis are:

Ho: 𝜌 = 0
(There is no significant linear correlation between variables X and Y.)
Ha: 𝜌 ≠ 0
(There is significant linear correlation between variables X and Y.)
➢The decision rule:

Reject Ho if P-value < 𝛼 (level of significance)

DEPARTMENT of
STATISTICS
How to perform a Pearson correlation test in Excel

Step 1: Calculate the Pearson correlation coefficient in Excel
r = PEARSON(array1, array2)
Step 2: Calculate the t-statistic from the coefficient value
t = (r*SQRT(n-2))/(SQRT(1-r^2))
where: r is the Pearson correlation coefficient
n is the number of observation
Note: if your coefficient value is negative, then use the following
formula
t = (ABS(r)*SQRT(n-2))/(SQRT(1-ABS(r)^2))

DEPARTMENT of
STATISTICS
Step 3: Calculate the p-value from the t statistic

P-value = TDIST(t, df, tails)
where:
t is the t-statistic from the coefficient value
df is the degrees of freedom which is equivalent to n-2
tails is the tailed of the distribution (‘1‘ for a one-tailed analysis or a
‘2‘ for a two-tailed analysis

DEPARTMENT of
STATISTICS
Example 1 in Excel:
=PEARSON(B2:B8, C2:C8)
=(ABS(F2)*SQRT(F3-2))
/(SQRT(1-ABS(F2)^2))
=TDIST(F4,F5,F6)

DEPARTMENT of
STATISTICS
Example 1: Formal Hypothesis Test of Correlation Analysis

1.) Ho: 𝜌 = 0 (There is no significant linear correlation between number of absences and final grade)
Ha: 𝜌 ≠ 0 (There is significant linear correlation between number of absences and final grade)
2.) Correlation analysis
3.) Reject Ho if P-value < 𝛼 = 0.05
4.) P-value = 0.0014
5.) Since P-value is less than 𝛼 = 0.05 , reject Ho
6.) At 5% level of significance, we can conclude that there is significant linear
correlation between number of absences and final grade.

DEPARTMENT of
STATISTICS
Y-Height (cm) X-Span (cm)

Example 2:
171 173
The height and arm span of 10 195 193
adult males were measures (in 180 188
cm). 182 185
190 186
175 178
177 182
178 182
192 198
202 202

DEPARTMENT of
STATISTICS
=PEARSON(A2:A11,B2:B11)
=(ABS(E2)*SQRT(E3-2))/(SQRT(1-E2^2))
=TDIST(E4,E5,E6)

DEPARTMENT of
STATISTICS
Example 2: Formal Hypothesis Test of Correlation Analysis

1.) Ho: 𝜌 = 0 (There is no significant linear correlation between number of absences and final grade)
Ha: 𝜌 ≠ 0 (There is significant linear correlation between number of absences and final grade)
2.) Correlation analysis
3.) Reject Ho if P-value < 𝛼 = 0.05
4.) P-value = 0.000089
5.) Since P-value is less than 𝛼 = 0.05 , reject Ho
6.) At 5% level of significance, we can conclude that there is significant linear
correlation between height and arm span.

DEPARTMENT of
STATISTICS
Other Correlations
• Pearson Product Moment Correlation - appropriate when both
variables are measured at an interval and ratio level.
• Spearman Rank Order Correlation (rho) – appropriate when the two
variables are in ordinal level.
• Point-Biserial Correlation – appropriate when one measure is a
continuous interval level and the other is dichotomous (i.e., two-
category).

DEPARTMENT of
STATISTICS Simple Linear Regression Analysis
• Used to predict the value of a dependent variable based on the value of at

least one independent variable
• Used to explain the impact of changes in an independent variable on the
dependent variable.
Components of Regression Analysis:

1. Dependent/ Response variable (Y)
– the variable we wish to explain
2. Independent/ Predictor / Explanatory variable (X’s)
– the variable used to explain the dependent variable

DEPARTMENT of
STATISTICS
Simple Linear Regression Model:

Population
Slope Independent
Population y Coefficient Variable
intercept
Dependent
Variable Random Error term,
𝒚 = 𝜷𝒐 + 𝜷𝟏 𝑿 + 𝜺 or residual
Linear Regression Assumptions

Error values (𝜀) are statistically independent
Error values are normally distributed for any given values of x
The probability distribution of the errors is normal
The probability distribution of the errors has constant variance
The underlying relationship between x variable and the y variable is linear
DEPARTMENT of
STATISTICS Regression
Estimated Regression Model by Least Squares Method

Estimate of the Estimate of the Formulas for 𝒃𝟏 and 𝒃𝟎 :
Estimated (or regression regression
Independent σ 𝑥−𝑥ҧ 𝑦−𝑦ത 𝐶𝑃𝑥𝑦
predicted) intercept slope
variable
𝑏1 = σ 𝑥−𝑥ҧ 2
or 𝑏1 = 𝑆𝑆
y value 𝑥
𝑏0 = 𝑦ത − 𝑏1 𝑥ҧ
𝒚ෝ𝒊 = 𝒃𝒐 + 𝒃𝟏 𝑿
Interpretation of the slope and the intercept:
▪ 𝑏0 is the estimated average value of y when the value of x is zero
▪ 𝑏1 is the estimated change (increase or decrease) in the average
value of y as a result of a one-unit increase in x
DEPARTMENT of
STATISTICS
Evaluating a regression equation’s ability to predict

𝐶𝑃𝑋𝑌
Coefficient of Determination 𝑅2 = 𝑏1 𝑆𝑆
𝑦
• The proportion of the total variation in the dependent variable Y that is
explained, or accounted for, by the variation in the independent variable X.
• The coefficient of determination is also called R-squared and is denoted as
𝑅2 .
• It is the squared of the correlation coefficient.
• If it were possible to make perfect predictions, the coefficient of
determination would be 100%. A coefficient of determination of 100% is
associated with a correlation coefficient of +1.0 or −1.0.

DEPARTMENT of
STATISTICS
Step in performing Regression:

2. Choose Regression, then click OK

DEPARTMENT of
STATISTICS
3. Input the
range that
contains the
variables X and Y.
4. Choose output
options, then
click ok

DEPARTMENT of
STATISTICS

DEPARTMENT of
STATISTICS
Example: The data below shows the number of absences and the final grades of
seven randomly selected students from a statistics class.
Number of Final grade 1. Create regression equation
Student
absences (X) (Y) to predict the final grade of
A 6 82 student.
B 2 86 2. Interpret 𝑏0 and 𝑏1
C 15 43 3. Compute the value of 𝑅2
4. What is the estimated
D 9 74
value of the final grade if
E 12 58
the number of absences is
F 5 90 1, 4, 7, 10, 20
G 8 78
DEPARTMENT of
STATISTICS a
Solution:
Regression Equation:
𝑦ෝ𝑖 = 𝑏𝑜 + 𝑏1 𝑥𝑖
𝑦ො𝑖 = 101.31 − 3.51𝑥𝑖
• 101.31 is the estimated average final grade when the number of absences
is zero.
• -3.51 is the estimated decreased in the average final grade as a result of
one-unit increase in the number of absences.
DEPARTMENT of
STATISTICS
Solution:
Coefficient of Determination:
𝑅2 = 0.8927 or 89.27%
89.27% of the total variation in the final

grade was explained, or accounted for,
by the variation of the number of
absences.

DEPARTMENT of
STATISTICS a
Solution:
What is the estimated value of the final grade if the number of absences is 1, 4,
7, 10, and 20
Final regression equation:

𝐹𝑖𝑛𝑎𝑙 𝑔𝑟𝑎𝑑𝑒 = 101.31 − 3.51 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑏𝑠𝑒𝑛𝑐𝑒𝑠
Case 1: no. of absences = 1 Final grade = 101.31 − 3.51 1 = 97.80


DEPARTMENT of
STATISTICS
Y-Height (cm) X-Span (cm)

Example 2: 171 173
The height and arm span of 10 adult 195 193
males were measures (in cm). 180 188
1. Create regression equation to predict 182 185
the height of males.
190 186
2. Interpret 𝑏0 and 𝑏1
175 178
3. Determine and interpret 𝑅2
177 182
4. What is the estimated value of the
height if the span is 150, 200, 250. 178 182
192 198
202 202

DEPARTMENT of
STATISTICS a
Solution:
Regression Equation:
𝑦ෝ𝑖 = 𝑏𝑜 + 𝑏1 𝑥𝑖
𝑦ො𝑖 = −10.997 + 1.05𝑥𝑖
−10.997 is the estimated average height when the size of arm span is zero.
1.05 is the estimated increased in the average height as a result of one-unit
increase in the size of arm span.

DEPARTMENT of
STATISTICS
Solution:
Coefficient of Determination:
𝑅2 = 0.8677 or 86.77%
86.77% of the total variation in the
height was explained, or accounted
for, by the variation of the size of arm
span.

DEPARTMENT of
STATISTICS
Solution:
What is the estimated value of the height if the span is 150, 200, 250.
Final regression equation:

ℎ𝑒𝑖𝑔ℎ𝑡 = −10.997 + 1.05(𝑎𝑟𝑚 𝑠𝑝𝑎𝑛)
Case 1: arm span = 150 ℎ𝑒𝑖𝑔ℎ𝑡 = −10.997 + 1.05 150 = 146.503
Case 2: arm span = 200 ℎ𝑒𝑖𝑔ℎ𝑡 = −10.997 + 1.05 200 = 199.003
Case 3:arm span = 250 ℎ𝑒𝑖𝑔ℎ𝑡 = −10.997 + 1.05 250 = 251.503

03 Basic Statistical Data Analysis Using Excel

Uploaded by

03 Basic Statistical Data Analysis Using Excel

Uploaded by

Basic Statistical Data Analysis

Using Microsoft Excel

Basic Statistical Data Analysis Using Microsoft Excel | 1

After completing this chapter, the students must be able to

• Assess the normality of data using Microsoft Excel

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 2

• also called as the Gaussian distribution

• It is a distribution related to continuous data from a statistical analysis

• It is sometimes called the “bell curve,” since it is a bell-shaped and

• Mean = Median = Mode

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 3

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 4

A normal or Gaussian distribution :

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 5

• Normal Q-Q plot

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 6

➢ A histogram can be used to test

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 7

• A Q-Q plot is a short term for “quantile-quantile”

• often used to assess whether or not a set of data

• In most cases, this type of plot is used to

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 8

• an alternative graphical method of assessing normality to the histogram and

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 9

-48 -1 -48 17 -23 18 -50 34 39 -41

0 38 31 -20 27 46 -18 29 -6 -24

-39 -14 -6 -47 45 -27 -37 6 -38 -29

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 10

=RANK(number, ref, [order])

Note: When a formula contains an absolute

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 11

Step 3. Find the cumulative

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 12

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 13

Step 5. Get the standardized = ( 𝑋 − 𝐴𝑉𝐸𝑅𝐴𝐺𝐸 𝑑𝑎𝑡𝑎 𝑟𝑎𝑛𝑔𝑒 /

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 14

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 15

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 16

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 17

0.00 distanced from the bulk of the

-1.50 In this case, the data was merely

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 18

• The box plot shape will show if a statistical

• If your data comes from a NORMAL

• If the data meets the assumption of

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 19

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 20

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 22

• A perfect normal distribution has a kurtosis of 0 (mesokurtic distribution)

In Microsoft Excel: KURT(number1, [number2], ...)

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 23

Platykurtic Distribution Normal Distribution or Leptokurtic Distribution

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 24

• developed by Shapiro and Wilk (1965)

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 25

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 26

Step 2. Calculate 𝑆𝑆 as follows:

Note: If 𝑛 is even, let 𝑚 = 𝑛/2, while if 𝑛

In excel use DEVSQ function to calculate

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 27

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 28

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 29

Therefore, the p-value is equivalent to 0.873

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 31

Formal hypothesis testing for Shapiro-Wilk W Test

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 32

What if the data is not normally distributed?

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 33

Key to non-parametric tests:

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 34

Independent Independent Sample t-test

Independent Sample t-test

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 35

Null Hypothesis (𝑯𝒐 ) Alternative Hypothesis (𝑯𝒂 𝒐𝒓𝑯𝟏 )

GGMFRANCISCO Basic Statistical Data Analysis Using Microsoft Excel | 36

• Type II error - mistake of Failing to Reject 𝐻𝑜 when it is actually