Statistics Module
Statistics Module
TABLE OF CONTENTS
1. Module Objectives ..3
2. Basic Concepts of Statistics
1.1 Meaning of statistics....4
1.2 Using statistical data........4
1.3 Types of statistics....5
1.4 Sources of statistical data....5
1.5 Types of data variables5
3. Summarizing Data
2.1 Frequency distribution 8
2.2 Graphical
presentation.14
4. Describing Data
1.1 Measures of central tendency.....19
Page 2 of 91
1.2 Measures of dispersion...24
5. Probability Theory
4.1 Probability concepts30
4.2 Discrete probability distribution- Binomial distribution and Poisson
distribution35 4.3
Normal probability distribution...42
5 Sampling Methods and Sampling Distributions......................................49
6 Hypothesis Testing
6.1 Defining hypothesis.54
6.2 Setting up hypothesis...54
6.3 Test statistics56
6.4 Analysis of Variance [ANOVA]..59
7
8
Page of 91
Page 3 of 91
Page of 91
Page 4 of 91
POPULATION
A population is a collection of all individuals, objects or measurements of interest. Most of
the time, due to the expense, size of population, medical concerns, e.t.c it is not possible to
use the entire population for a statistical study; therefore, researchers use samples.
SAMPLE
It is a group of subjects (human or otherwise) from the population. Example
The following examples constitute a population.
The heights of students at Natural Resources College.
And its sample can be.
The heights of students in nutrition 12 class.
DATA
These are values that the variable can assume.
VARIABLES
A variable is a characteristics or attribute that can assume different values.
SOURCES OF STATISTICAL DATA
Sources of data can be internal or external.
INTERNAL DATA SOURCE
All types of organization will collect and keep data, which is therefore internal to the
organization.
ADVANTAGES OF INTERNAL DATA SOURCES
1. It will be cheaper
2. Readily available information can be used much more quickly.
3. It can be understood much more easily.
EXTERNAL SOURCES
The sources of statistical information
Page of 91
Page 5 of 91
Classification of variables can be summarized as follows
Data
Qualitative
Quantitative
Discrete
Continuous
TYPES OF DATA
(a) PRIMARY DATA
If data is collected for a specific purpose then it known as primary data. For example:
population census. They are collected by a researcher himself or herself.
(b) SECONDARY DATA
Secondary data is data, which has been collected for some purpose other than for
which it is being used. For example if a company has to keep records of when
employees are sick and you use this information to tabulate the number of days
employees had malaria in a given month, then this information would be classified as
secondary data. Secondary data is collected and possibly processed by people other
than the researcher in question. In other words these are collected by others to be
reused by the researcher.
Page of 91
Page 6 of 91
Page of 91
Page 7 of 91
Page of 91
Page 8 of 91
2. Tally the data and place the results in the second column.
3. Count the tallies and place in the third column
Class Tally
Frequency
A 5
B 7
AB
9
O
4
Example
A survey taken in a restaurant shows the number of cups of coffee consumed with
each meal. Construct an ungrouped frequency distribution.
0 2 2 1 1 1
3 5 3 2 2 2
1 0 1 2 4 2
0 1 0 1 4 4
2 2 0 1 1 5
Solution
Class
Tally
Frequency
1 5
2 8
3 10
4 2
5 3
6 2
Page of 91
Page 9 of 91
S = the smallest data
K =number of classes
3. The class width should be an odd number. This ensures that the midpoint of each class has the same value as the data. The class mid-point
Xm is obtained by adding the lower and upper boundaries and dividing
by 2 or adding the lower and upper limits and diving by 2.
Xm lowerlim it upper lim it
Or
Page of 91
2
lower
boundary
Xm upper
boundary
Page 10 of 91
4. The classes must be mutually exclusive. Mutually excusive classes have nonoverlapping
class limits so that data cannot be placed into two classes.
For example
A
B
10-20
not 10-20
21-31
20-30
32-42
30-40
43-53
40-50
If a person is 40 years old, into which class of table B she or he should be placed?
5. The classes must be continuous. Even if there are no values in a class, the class must be
included in the frequency distribution. In other words, there should be no gaps in the
frequency distribution. The only exception occurs when the class with a zero frequency
is the first or last class.
6. The classes must be exhaustive. There should be enough classes to accommodate all the
data.
Example
The following data represents times (second) for 50 runners in a race.
246 238 246 251 240
243 245 243 241 248
244 246 249 246 245
244 248 240 243 249
242 245 239 244 245
245 248 248 249 248
250 242 243 245 242
242 246 246 245 247
244 240 245 247 248
247 250 247 248 250
Construct a grouped frequency distribution for the data.
Solution
Procedure for constructing a group distribution
Step 1: determine the classes
Suppose we want to have 5 classes then
S
WL
K
=2.4 3
A number is rounded up if there is any decimal remainder when dividing. For example,
534= 13.25 is rounded up to 15. 856 14.167 is rounded up to 15.
Page of 91
10
Page 11 of 91
Select a starting point for the lowest class limit. This can be the smallest data value
or any convenient number less than the smallest data value. In this class 137 is
used.
Add the width to the lowest class limit taken as the starting point to get the next lower
limit of the next class. Keep adding until there are 5 classes as shown.
137, 140, 143, 146, 149
Subtract one unit from the lower limit of the second class to get the upper limit of
the first class. Then add the width to each upper limit to get all the upper limits as
shown below.
139, 142, 145, 148, 151 So
the five classes are:
137-139, 140-142, 143-145. 146-148, 149-151
Find the class boundaries by subtracting 0.5 from each lower class limit and
adding 0.5 to each upper class limit. i.e 236.5-239.5, 239.5-242.5-245.5, e.t.c
Find the mid-point of each class Step 2. Tally the data.
Step3: Find the numerical frequency from tallies.
Class interval
137-139
140-142
143-145
146-148
148-151
Class boundaries
136.5-139.5
139.5 -142.5
142.5-145.5
145.5-148.5
148.5-151.5
Mid-point Tally
238
241
244
247
250
Frequency
2
8
14
19
7
Example
The average quantitative GRE scores for the top 30 graduate schools or engineering are
listed below. Construct a frequency distribution with six classes
767 770
763 760
780 750
756 766
761
747
746
758
760
766
764
770
771
754
769
762
768 776
771 771
759 753
746
Solution
WL S
K
=
=5.667 6
Page of 91
11
Page 12 of 91
Smallest lower limit (starting point) is 145. Other lower limits are 751, 757,763, 769, 775 The upper limit of
the first class is 751-1=750. Other upper limits are 756, 762, 768, 774, 780
Class limit
745-750
751-756
757-762
763-768
769-774
775-780
Frequency
4
7
7
11
2
1
Page of 91
12
Page 13 of 91
Relative frequencies are the frequencies divided by the total number of observations. actual
frequency
Relative frequency=
Total number
of
observations
Example
The following data represents heights of students in cm
Height (cm)
Frequency
Under 165
7
Under 170
11
Under 175
17
Under 180
20
Under 185
16 Under
190
9
Construct a relative frequency distribution.
Solution
Height (cm)
Frequency
Relative frequency
Under 165
0.0875
Under 170
11
0.1375
Under 175
17
0.2125
Under 180
20
0.25
Under 185
16
0.2
Under 190
0.1125
7
Under 170
Page of 91
0+0.0875=0.0875
0.1375
11
13
0.0875+0.1375=0.225
Page 14 of 91
1.
2.
3.
4.
5.
Under 175
17
0.2125
0.225+0.2125=0.4357
Under 180
20
0.25
0.4357+0.25=0.6875
Under 185
16
0.2
0.6875+0.2=0.8875
Under 190
9
0.1125
0.8875+0.1125=1.000
The reasons for constructing a frequency distribution are:
To organize the data in a meaningful way.
To enable the reader to determine the nature or shape of the distribution.
To facilitate computational procedures for measures of average and spread.
To enable the researcher to draw charts and graphs for the presentation of the data.
To enable the reader to make comparison among different data sets.
GRAPHICAL PRESENTATION
After the data have been organized into a frequency distribution, they can be
presented in graphical form. The purpose of graphs in statistics is to convey the
data to the viewers in pictorial form. It is easer for most people to comprehend
the meaning of data presented graphically than presented numerically in tables or
frequency distributions.
The three most commonly used graphs are:
(a) histogram
(b) frequency polygon
(c) cumulative frequency graph or ogive (pronounced 0-jive)
THE HISTOGRAM
The word histogram is derived from Greek: histos- anything set upright and gramma
drawing, record, writing.
The histogram is a graph that displays the data by using continuous vertical bars
(unless the frequency of class is 0) of various heights to represent the frequencies
of the classes.
Example
The annual exports of a group of small firms in Lilongwe are
Exports (K
Number of
millions
firms
2-4
4
5-7
12
8-10
15
11-13
8
14-16
4
Construct a histogram to represent the data shown above.
Page of 91
14
Page 15 of 91
Solution
Step 1: Construct a frequency distribution that has class boundaries.
Class
2-4
5-7
8-10
11-13
14-16
Class boundaries
1.5-4.5
4.5-7.5
7.5-10.5
10.5-13.5
13.5-16.5
Frequency
4
12
15
8
4
Step 2: Draw and label the x-axis and y-axis. The x-axis is always the horizontal
axis and the y-axis is always the vertical axis.
Step 3: Represent the frequency on the y axis and the class boundaries on the xaxis.
Step4: Using frequencies as the heights, draw vertical bars for each class.
Frequency
Page of 91
15
Page 16 of 91
4. Extend the lines at each end of the histogram to the mid-points of the next highest and lowest classes,
which will have equal a frequency of zero. NOTE: The lines are extended to the x-axis so that the area of
the polygon will equal that of the histogram it represents.
Example
Draw a frequency polygon it represents from the example above.
Solution
Class
2-4
5-7
8-10
11-13
14-16
Class boundaries
1.5-4.5
4.5-7.5
7.5-10.5
10.5-13.5
13.5-16.5
Frequency
4
12
15
8
4
Mid-point
3
6
9
12
15
OR
Step 1: Find the mid-points of each class.
Step 2: Draw the x-axis to represent scores.
Step 3: Draw the y-axis to represent frequencies.
Step 4: Plot the frequency against class mid-point.
Step 5: Join the crosses in order, that is the cross representing the first class should be joined
to the one representing the second class and so on.
Step 6: Include the two extreme points.
For instance,
Class
2-4
5-7
8-10
11-13
14-16
Class boundaries
1.5-4.5
4.5-7.5
7.5-10.5
10.5-13.5
13.5-16.5
Frequency
4
12
15
8
4
Mid-point
3
6
9
12
15
Page of 91
16
Page 17 of 91
Page of 91
17
Page 18 of 91
Page of 91
18
Page 19 of 91
x1 x2 x3 ...xn
i 1
xi
sample.
For a population, the Greek letter (mu) is used for the mean.
N
x1
2 3
... xN
i 1
xi
population.
If some values appear more than once, we may use the following formula.
x fx
Example
Find the mean of the following data:
20, 26, 40, 36, 23, 42, 35, 24, 30.
Page of 91
19
Page 20 of 91
Solution
x
x n
= 30.7
Example
The table below shows the frequency distribution of the number of days on which 100
employs of a firm were late for work in a given month. Using the data, find the mean
number of dates on which an employee is late in a month.
Number of days late Number of employees
1
32
2
25
3
18
4
14
5
11
Solution
x f
Fx
1 32
25
2 18
14
3 11
32
50
54
56
55
4
5
f
100
fx
247
But x fx
Page of 91
20
Page 21 of 91
=2.47
Therefore the mean number of days is 2.4 days.
MEAN FOR GROUPED DATA
Steps to be followed
1. Make a frequency distribution as shown below Class Frequency (f) mid-point(Xm)
f. Xm
2. Find the mid-points of each class and place them in column 3
3. Multiply the frequency by the mid-point for each class and place the product in column
4.
4. Find the sum of columns 2 and 4. In other ways, find f
and fxm
x fx
Example
The marks scored by 500 candidates in an examination in which the maximum mark was
50 were:
Mark
Frequency
range
1-5
10
6-10
41
11-15
72
16-20
83
21-25
94
26-30
81
31-35
71
36-40
27
41-45
13
46-50
8
Calculate a mean mark for these candidates.
Solution
Mark
Frequency
range
Page of 91
Mid-point(xm) f.xm
21
Page 22 of 91
1-5
6-10
11-15
16-20
21-25
26-30
31-35
36-40
41-45
46-50
10
41
72
83
94
81
71
27
13
8
3
8
13
18
23
28
33
38
43
48
30
328
936
1494
2162
2268
2343
1026
559
384
fxm
11530
500
But x fx
=23.1
ADVANTAGES OF ARITHMETIC MEAN
1. It is easy to calculate.
2. It uses all the values.
3. It is used in computing other statistics such as variance.
DISADVANTAGRS OF ARITHMETIC MEAN
1. It is affected by extremely high or low values.
2. It can not be read from a graph.
MEDIAN
A statistic which is not affected by a few very unusual extreme scores is the median. Median
is the middle value when the values are arranged in order (either ascending or descending
order). When the data is ordered, it is called data array.
The median is used when one must determine whether the data values fall into the upper
or lower half in the distribution.
Steps in computing the median are:
1. Arrange the data in order.
2. Select the middle point.
NOTE: If the number of value (n) is odd, then the median is the value of the middle
value; if n is even, then the median is the value of the arithmetic mean of the two middle
values.
In other words, (a) if n is odd and M is the value of the median then:
M= the value of the n 1
th
observation
2
Page of 91
22
Page 23 of 91
(b) if n is even, the middle observations are nth
and
the
n 1 th
2 2
observations and then
M= the values of the mean of these two observations
Example
Find the median of 4, 3, 5, 2, 11.
Solution
Arranging in ascending order, we have
2, 3, 4, 5, 11. n= 5(odd)
Median = the value of
=3rd observation
=4.
Example
Six people take shoe sizes: 7, 9, 9, 8, 5, 6. What is the median?
Solution
Arranging in ascending order, we have 5, 6, 7, 8, 9, 9.and n= 6(even)
So M= the mean of 6th
and
the
61 th
observations
2 2
=the mean of 3rd and 4th observations
=
= 7.5
ADVANTAGES OF MEDIAN
1. Its value is not distorted by extreme values.
2. All the observations are used to order the data even though only the middle one or two middle
observations are used in the calculation.
3. It can be illustrated graphically in a very simple way.
DISADVANTAGES OF MIDIAN
1. In a grouped frequency distribution, the value of the median within the median class
can only be an estimate.
2. It is of little use in calculating other statistical measures.
MODE
Mode is the most frequent data value or the value that occurs most often in a data set.
The mode is used when the most typical case is desired.
Page of 91
23
Page 24 of 91
A set observations with one mode is called unimodal, a set of observations with two
modes is called bimodal while a set of observations with more than two modes is called
multimodal.
Example
The monthly salaries of a sample of doctors are: K35000, K58000, K50000, K49000,
K50000, K50000, K60000, K70400, K50000, K40000, K50000, K40000, K65000, K55000.
What is the modal (mode) monthly salary?
Solution
The value which occurs most often is K50000.
Therefore the modal salary is K50000.
Example
Find the modal class for the frequency distribution of miles that 20 runners ran in one week.
Class
Frequency
5.5-10.5
10.5-15.5
15.5-20.5
20.5-25.5
25.5-30.5
30.5-35.5
35.5-40.5
1
2
3
5
4
3
2
Solution
The modal class is 20.5-25.5 since it has the largest frequency.
ADVANTAGES OF THE MODE
1. It is not distorted by extreme values of the observations.
2. It is easy to calculate.
DISADVANTAGES OF THE MODE
1. It can not be used to calculate any further statistics.
2. It may have more than one value.
THE WEIGHTED MEAN
Sometimes, one must find the mean of a data set in which not all values are equally represented.
This type of mean is called weighted mean.
To find the weighted mean, multiple each value by its corresponding weight and divide
the sum of the products by the sum of the weights.
In other words x w1x 1 w2x2 ... wnxn wx
Page of 91
24
Page 25 of 91
w1 w2 ... wn w
Where w1, w2, w3, , wn are the weights and x1, x2,, xn are the value.
Example
A student received an A in English composition 1 (3 credits), a C in Introduction to
Psychology (3 credits), a B in Biology 1 (4 credits) and a D in Physical Education (2
credits). Assuming A=4 grade points, B= 3 grade points, C= 2 grade points, D= 1 grade
point and F= 0 grade point. Find the students grade-point average.
Solution
Course
English composition 1
Introduction to Psychology
Biology 1
Physical Education
Credits (w)
3
3
4
2
w12
Grade (X)
A(4 points)
C(2 points)
B(3 points)
D(1 point)
wx
12
6
12
2
wx
32
But x wx
=
=2.7
MEASURES OF DISPERSION
In statistics, to describe the data set accurately, statisticians must know more than
measures of central tendency. We also need to know the spread of data. There are several
different measures of dispersion. The most important of these (which we will describe in
this section) are:
Range
Variance
Standard deviation.
RANGE
A range is the deference between the highest value and the lowest value.
Example
The weights of the contents of several small bottles are (in grams): 4, 3, 6, 5, 7, 2 and 4.
Find the range.
Solution
The lowest value is 2 and the highest value is 7.
Therefore range = 7-2
=5
Page of 91
25
Page 26 of 91
ADVANTANGES OF THE RANGE
1. It is easy to understand.
3. It is simple to calculate.
4. It is a good measure for comparison as it spans the whole distribution.
DISADVANTAGES OF THE RANGE
1. It uses only two of the observations and so can be distorted by extreme values
2. It can not be used in calculating other functions of the observations.
STANDARD DEVIATION AND VARIANCE
These two measures of dispersion can be discussed in the same section because the standard
deviation is the square root of the variance.
The variance is the average of the squares of the distance each value is from the mean.
The symbol for the population variance is 2 ( is the Greek lowercase sigma).
The formula for the population variance is
2
N
Where x= individual value
Population mean
N= population size
The standard deviation is he square root of the variance. The symbol for the population
standard deviation is and its formula is
x
N
The standard deviation is one of the measures used to describe the variability of a
distribution. It has an additional use which makes it more important than other measures
of dispersions. It is used as a unit to measure the distance between any two observations.
STEPS TO BE FOLLOWED IN CALCULATING STANDARD DEVIATIONS
Page of 91
26
Page 27 of 91
. The
N reason of
squaring is that since the distance were squared; the units of the resultant
numbers are squares of the units of the original raw data. Finding the square root
of the variance puts the standard deviation in the same unit as the raw data.
2
Example
Find the variance and standard deviation for the following set:
10, 66, 50, 30, 40, 20 Solution
The mean, =
= 35
x
x-
(x-) 2
10
60
50
30
40
20
10-35=25
60-35=25
50-35=15
30-35=-5
40-35=5
20-35=15
625
625
225
25
25
225
But 2
1750
N
=
=291.7
Therefore, the variance is 291.7
And 2
Therefore,
N
291.7
=
=17.1
Therefore, the standard deviation is 17.1
SAMPLE VARIANCE AND STANARD DEVITION
The formula for finding sample variance and standard deviation are as follows;
xx2
Page of 91
x x
2
27
Page 28 of 91
Variance, s2
n 1
n1
x x
n 1
or
n 1
x
=
sample
mean
n = sample sizes
NOTE: Dividing by n-1 gives a slightly larger value and an unbiased estimate of the population
variance.
Example
Find the sample variance and standard deviation for three standard deviation for three students
earnings. The data are in Kwacha.
3, 4, 5
Solution
2 2
but s2 xx
n1
x nx
2
n1
x 3 45 12
x 12 144
2
x 3 4 5 50
2
x x
n
s
n1
2
1144
50
3
31
=
=1
(ii) The standard deviation is 1 1
For the grouped data, we use the following steps;
1. Make a table as shown below and find the mid-point of each class.
Page of 91
28
Page 29 of 91
f. xm
f. xm2
2. Multiply the frequency by the mid-point of each class and place the product in column 4.
3. Multiply the frequency by the square of the mid-point and place the product in column 5
f .x
.xm
s2 n 1
6. Take the square root to get the standard deviation.
Example
Find the sample variance and standard deviation for the following frequency distribution.
Class
Frequency
limits
13-19
2
20-26
7
27-33
12
34-40
5
41-47
6
48-54
1
55-61
0
62-68
2
Solution
Class limits
Frequency
Mid-pointxm
f. xm
f. xm2
13-19
2
16
32
512
20-26
7
23
161
3703
27-33
12
30
360
10800
34-40
5
37
185
6845
41-47
6
44
264
11616
48-54
1
51
51
2601
55-61
0
58
0
0
62-68
2
65
130
8450
f.xm
=1183
Page of 91
29
f.x
2
m
=38327
Page 30 of 91
s2
38327
351
1126.270588
s 1126.270588
=33.6
Therefore, the sample variance is 1126.3 and standard deviation is 33.6
TUTORIAL 3: DESCRIBING DATA
1. Find the mean, mode, median, variance and standard deviation of the following set
of numbers;
(a) 2, 3, 5, 3, 3
(b) 3, 4, 5, 4, 5, 4, 5, 3, 4, 4
2. Find the weighted mean of 20, 25, 30, 35 if they are assigned weightings of (a)
1, 2, 3, 4
(b) 1, 3, 7, 9 respectively
3. An instructor grades exams, 20%; term paper, 30%; final exam,50%. A student had
grade of 83, 72 and 90 respectively. Find the students final average. Use the
weighted mean.
4. In a class of 29 students, this distribution of quiz scores was recorded.
Class
Frequency
limit
0-2
1
3-5
3
6-8
5
9-11
14
12-14
6
Find the mean, variance and standard deviation of the quiz.
5. A survey was made of the monthly earnings of four Agricultural assistance and the
results are recorded below
K18, 000, K19, 000, K20, 000 and K21, 000
Calculate the following
(a) sample variance
(b) Sample standard deviation.
Page of 91
30
Page 31 of 91
Page of 91
31
Page 32 of 91
(i)
Solutions
S 1,2,3,4,5,6
(ii) A1,3,5
(iii) B2,4,6
(iv) C 1,2,3,5
(v) D3,6
outcomes
in
in
the
sample space
PE
nS
Example
Suppose you toss a coin. Find probability that the outcome is head Solution
S H,T and E
H nH
But PH
nS
Page of 91
32
Page 33 of 91
Example
For a card is drawn from a standard pack of cards, find the probability of getting a
queen. Solution
n(S)= 52 and n(E)= 4
nE
but PE
nS
Example
If a family has three children, find the probability that all the children are girls. Solution
S BBB,BBG,BGB,GBB,BGG,GBG,GGB,GGG
So n(S)= 8 and n(E)= 1
nE
But PE
nS
P E
1 PE or PE1 P E or PE P E 1
Example
Page of 91
33
Page 34 of 91
The weather bureau estimates the probability of rain tomorrow to be 0.42. What is the
probability that it does not rain?
Solution
P(no rain) = 1- P(rain)
=1 0.41
=0.58
A BB
PAB PAPBPABand B are not mutually exclusive events.
Example
A card is picked at random from a standard pack of cards. What is the probability it
is
(a) a red heart or a club?
(b) red heart or a king?
Solution
There is no red heart that is a club. So P(r or c) = P(r)+P(c)
=
=
Page of 91
34
Page 35 of 91
=
=
=
Example
In a hospital unit there are 8 nurses and 5 physicians; 7 nurses and 3 physicians are
females. If a staff person is selected, find the probability that the subject is a nurse
or a male.
Solution
The sample space is shown below
staff
Female
males
Total
Nurses
7
1
8
3
2
5
physicians
total
10
3
13
There is a nurse who is a male. So
P(nurse or male) = P(nurse) + P(male)- P(nurse male)
=
=
MULTIPLICATION LAWS
1. When two events are independent, the probability of both occurring is
PAB PA.PB
Example
A coin is tossed and a die is rolled. Find the probability of getting a 4 on the die and
a head on the coin.
Solution
P(4 and head) = P(4).P(head)
=
=
Page of 91
35
Page 36 of 91
Example
A card is drawn from a desk and replaced. Then a second card is drawn, find the probability
of getting a queen and the ace.
Solution
P(queen and ace)= P(queen).P(ace)
=
2. P
AB
=
P
= P
A
B
P
P
B/ A
A/ B
Example
A card is drawn from a deck and without replacement. Then a second card is
drawn. Find the probability of getting a queen and then ace. Solution
P(queen ace)=
Example
The world wide Insurance Company found that 53% of the residents of a city had
homeowners insurance with the company. Of these clients, 27% also had
automobile insurance with the company. If a resident is selected, find the probability
that the resident has both homeowners and automobile insurances with the World
Wide Insurance Company.
Solution
Let homeowners insurance be H and automobile insurance be A.
So P(H) =.53 P(A/H) =0.27
P(H and A) = P(H).P(A/H)
=0.54 x 0.27
=0.1431
DISCRETE PROBABILTY DISTRIBUTION
RANDOM VARIABLES
A random variable is the numerical outcome of a random experiment, denoted by X.
If the experiment is repeated, different values of X will be obtained and these values
are denoted by small x.
Page of 91
36
Page 37 of 91
If a variable can assume only a specific number of values, such as the outcome for
the roll of a die or the outcome for the toss of a coin, then the variable is called a
discrete variable. Discrete variables have values that can be counted.
DISCRETE PROBABILTY DISTRIBUTION
It consists of the values a random variable can assume and the corresponding
probabilities of the value. The probabilities are determined theoretically or by
observation.
Example
Construct a probability distribution for rolling a single die. Solution
S 1,2,3,4,5,6
P1 ,P2 ,P3 ,P4 ,P5
and
P6
So its distribution is
Outcome x
1 2 3 4 5 6
Probability P(x) 1 1 1 1 1 1
6
P(X x) 1
Page of 91
37
Page 38 of 91
2. The probabilities of each event in the sample space must be between or equal to
0 and 1. That is 0P(X x)1
Example
Represent graphically the probability distribution for the sample space for tossing
three coins.
Number of heads
0 1 2 3
Probability P(X=x) 1 3 3 1
8
Solution
Example
A random variable has the distribution shown in the following table
X
0
P(X=x) 1
1 2
1
4
4
(a) Find P(X=1)
(b) Represent graphically the probability distribution. Solution
(a) P(X=1) = 1 - 4 4 since P(X x) 1
=1
Page of 91
38
Page 39 of 91
1.
2.
3.
4.
5.
BINOMIAL DISTRIBUTION
Many types of probability problems have only two outcomes or can be reduced to
two outcomes. Some examples of experiments where you have two outcomes are:
A win or loss in a football game.
A pass or fail in an examination.
A head or tail on a coin toss.
Effective or ineffective lecturer.
A correct or incorrect item.
Situations like these are called binomial experiments.
A binomial experiment is a probability that satisfies the following four requirements;
(i) Each trial can have reduced to two outcomes that can be considered as either
success or failure.
(ii) There must be a fixed number of trials.
(iii)
The outcome of each trial must be independent of each other.
(iv)The probability of success or failure must remain the same for each trial. The
outcome of a binomial; experiment and the corresponding probabilities of these
outcomes are called a binomial distribution. In a binomial experiment, the
probability of exactly x successes in n trials is
P(X x)
Page of 91
n!
x nx
39
Page 40 of 91
(n x)!x!
Where n= number of trials
x= the number of successes in n trial
NOTE 0 xn
p= the numerical probability of success i.e P(S)
q=The numerical probability of a failure i.e P(F)=1-P(S)
NOTE: n!=n n1) n 2 n3 (n 4
0!=1
...321
Example
Find (a) 3! and (b) 5!
Solutions
(a) 3!= 3x2x1
=6
(b) 5!=5x4x3x2x1
=120
Example
A survey found that one out of five Malawian say he or she has visited a doctor in
any given month. If 10 people are selected at random, find the probability that
exactly 3 will have visited a doctor last month. Solution
n=10, x=3, p= and q= (1 )
x nx
n!
q
but P(X x) p
(n x)!x!
= 10!
103
(103)!3! 5 5
=10987!134
7!321 5 5
=0.201
Example
It has been found that an average 5% of the eggs supplied at NRC market are
cracked. If you buy a box of 6 eggs what is the probability that it contains 2 or
more cracked eggs?
Solution
P=P(cracked)=0.05, q=P(not cracked)= 1- 0.05(0.95) and n=6
P(2 or more cracked)=1-P(less than 2 cracked)
=1- P(0) P(1)
Page of 91
40
Page 41 of 91
P(0)
6! 0.050 0.95
60
(60)!0!
10.956
!
=0.7351
=
0.051 0.956 1
P(1)
!
0.050.955
!
=60.050.955
0.2321
Therefore P (2 or more cracked)=1-(0.7351+0.2321)
=1-0.9672
=0.033 to 3 d.p
For a binomial distribution
(a) mean=np
(b) variance = 2 n.p.qor n.p.(1-p)
=
,so q=1=
=480x
=80
(b) variance=n.p.q
=480 x x
=66.7
(c) standard deviation = n.p.q
Page of 91
41
Page 42 of 91
= 66.7
=8.2
Example
Let X be equal to the number of responses out of n=20 questions and let p equal to
the probability of a correct choice on a single question. A candidate in an
examination randomly select one of the 5 possible answers for each question and
hence that p . Find the mean, variance and standard deviation for the student
Solution
Given: n= 20 and p .
So q= 1=
= 3.2
=1.8 to 1 decimal place
POSSION DISTRIBUTION
The binomial distribution is useful in cases where we take a fixed sample size and
count the number of successes. Sometimes, we dont have a definite sample size
and then the binomial distribution is of no use. In such cases we use another
theoretical distribution called the Poisson distribution
In a Poisson distribution, the probability of exactly r successes is
P(r) em
m
r!
Where m = mean
r= number of events (successes)
e= 2.7183
WHEN TO USE THE POISSON DISTRIBUTION
(1) When n is large i. very large number of trials n 30
Page of 91
42
Page 43 of 91
(2) p is small
(3) The independent variable occurs over a period of time, or a density of items is distributed over a given
area or volume.
Example
If there are 200 typographical randomly distributed in a 500 page manuscript, find
the probability that a given page contains exactly three errors.
Solution
Given: r =3
But m=
=0.4
and P(r) em m r
P(3)
r!
2.7183 0.4
(0.4)
3!
=0.0072
Example
A number of accidents per working week in a particular factory in Lilongwe are
known to follow a Poisson distribution with a mean 0.5. Find the probability that
in a particular week there will be (i) 2 accidents (ii) less than 3 accidents
Solution
(i) P(r) em m r r!
2.7183 0.5
(0.5)
P(2)
2!
=0.08
(ii) P(less than 3 accidents)= P(0) +P(1)+P(2)
P(less
than
2.7183 0.5
0
2.7183 0.5
accidents) 0.5
0.5
0!
Page of 91
43
1!
2!
2.7183 0.5
0.5
Page 44 of 91
=0.6065
NORMAL PROBABILITY DISTRIBUTION
A normal distribution is a continuous, unimodal, symmetric bell shaped distribution
of a variable. For example
(vii)
The area between the curve and the x-axis is 1 unit or 100%
The standard normal distribution is a normal distribution with a mean of 0 and a
standard deviation of 1. All normally distributed variables can be transformed into the
standard normally distributed variables by using the formula for the standard score.
X
Where Z = Z-score
X = value
=mean
= standard deviation
The Z score is actually the number of standard deviations that a particular X value is
away from the mean.
Example
A student scored 65 on a Mathematics test that had a mean of 50 and a standard
deviation of 10. Calculate his Z score. Solution
X
Page of 91
44
Page 45 of 91
Z
=
=1.5
AREA UNDER THE STANDARD NORMAL CURVE
NOTE: (i) (z1) Area under the curve between the ordinate at Z and the mean.
(ii) P(x1 < x < x2)= Area under the curve between the ordinates (x1) and
(x2 )
Example
Find the area under the normal curve between Z= 0 and Z= 2.34
Solution
45
Page 46 of 91
Solution
First step: Find the area between Z=0 and Z=2.47 i.e. 0.4932
Second step: Find the area between Z= 0 and Z=2.000 i.e. 0.4772
Third step: Find the difference of the two i.e. 0.4932- 0.4772
Therefore, the area is 0.060
NOTE: If the area is on the same side of Z=0, subtract the areas.
Example
Find the area between Z=1.68 and Z= -37 Solution
Page of 91
46
Page 47 of 91
First step: Find the area between Z=0 and Z=1.68 i.e. 0.4535
Second step: Find the area between Z= 0 and Z=-1.37 i.e. 0.4147
Third step: Find the sum of the two i.e. 0.4535 +0.4147
Therefore, the area is 0.8682
NOTE: If the areas are on opposite sides of Z=0, add the two areas.
NORMAL PROBABILITIES
P(x1 x x2 ) P(z1 z z2 ) Area under the standard normal curve between the
ordinates at z1 and z2.
Example
Find the probability for each of the following
(a) P(0<z<2.32)
(b) P(z<1.65)
(c) P(z>1.91)
Solutions
(a)
Page of 91
47
Page 48 of 91
(a) Z
=
=0.4
P(z<0.4)
Page of 91
48
Page 49 of 91
P(z> 1.26)
P(z>1.26)= are to the right of z=1.26
=0.5000-0.3962
=0.1038
(c) Z1
=-2
P(-2<z<1)
and
Z2
=1
P(-2<z<1)=P(2)+P(1)
=0.4772+0.3413
=0.8185
(b)
X
Page of 91
6
49
12 15
Page 50 of 91
P(x)
(c)
X
P(x
)
3
0.3
6
8
0.6 0.7
X
P(x
)
5
1.
2
10
0.
3
(d)
(a)
(b)
(c)
(d)
(e)
(f)
15
0.
5
5. A die is tossed 100 times. Find the mean, variance and standard deviation of the
number of 2s that will be rolled.
6. The average number of phone inquiries per day at the poison control is four. Find
the probability it will receive five calls on a given day. Use Poisson approximation.
7. Find the probabilities for each, using the standard normal distribution.
P(0<z<1.69)
P(-1.57<z<0)
P(1.32<z<1.51)
P(Z>2.59)
P(z<-1.77)
P(-0.05<z<1.10)
(a)
(b)
(c)
(d)
OBJECTIVES
By the end of this topic, you should be able to;
State the four basic sampling techniques.
Define sampling distribution.
Calculate the standard error.
Use the central limit theorem to solve problems involving sample means for larger
samples
SAMPLING METHODS
To obtain samples that are unbiased i.e. give each subject in the population an equally likely
chance of being selected- statisticians use four basic methods of sampling: random.
Systematic, stratified and cluster sampling.
Page of 91
50
Page 51 of 91
51
Page 52 of 91
2. The standard deviation of the sample means will be smaller than the standard
deviation of the population. This is so because the extreme values of x must be
smaller than the extreme values of x. The standard deviation, of x depends
upon
This standard deviation is called the standard error of the mean (SE) i.e. SE
Example
What is the standard error for a sample of 100 with a standard deviation of 5? If you
increase the sample size to 200, what change on the standard error do you notice? Solution
SE
Page of 91
52
Page 53 of 91
=
n
5
100
=
=0.5
SE
=
n
5
200
=0.35
Observation: The larger the sample the smaller the standard error.
THE CENTRAL LIMIT THEOREM
It states that the sample mean x of n observations is normally distributed with the mean
NOTE: x is the sample mean and the denominator is the standard error of the mean.
Example
The final examination scores for MAT at NRC are normally distributed with mean 60
and standard deviation 10. A Lecturer teaches one of the sets of MAT and his class has
22 students. What is the probability that the average final examination score for his
class is below 55?
Solution
Page of 91
53
Page 54 of 91
=
n
55
60
10
22
=2.35
P x 55 Pz 2.35
= 0.5000 0.4906 (from normal tables)
=0.0094
Page of 91
54
Page 55 of 91
(4) In a sample of 400 people, 175 were males. Find the standard error of the sample
proportion.
(5) The average age of lawyers is 43.6 years, with a standard deviation of 5.1 years. If
a law firm employs 50 lawyers, find the probability that the average age of the
group is greater than 44.2 years odd.
55
Page 56 of 91
(d) Tests of hypothesis using the normal distribution.
(e) Use the one-way ANOVA technique to determine if there is a significant difference among three or more
means.
A hypothesis is a statement about the value of a population parameter. An example of a hypothesis
is fifty percent of eligible voters are below the age of twenty-five.
A statistical hypothesis is a conjecture about a population parameter. This conjecture may or
may not be true.
There are two types of statistical hypothesis for each situation: the null hypothesis and the
alternative hypothesis.
NULL HYPOTHESIS
It designated H0 and read H subzero.
It is a statistical hypothesis that states that there is no difference between a parameter and a
specific value.
H stands for hypothesis and the subscript zero stands for no difference.
H0: the value of the population mean given in the problem and it shows that the
sample belongs to the population.
ALTERNATIVE HYPOTHESIS
It is symbolized by H1.
It is a statistical hypothesis that states the existence of a difference between a parameter and
a specific value.
The alternative hypothesis depends on the wording of the problem. The wording can
suggest one of the three possible meanings;
(a) The sample comes from a population and the mean of which is not equal to . In
other words, it may be smaller or larger. Then you take H1: u .
For this alternative, you divide the critical region into two equal parts and put one in
each tail of the distribution as shown in the following figure.
Page of 91
56
Page 57 of 91
(c) The sample comes from a population with u smaller than u0 so that H1:u<u0 and put
the whole of the critical region in the left-hand of the distribution as shown below.
Example
State the null and alternate hypothesis for each conjecture.
(a) A researcher thinks that if the expectant mothers use vitamin pills, the birth weight
of the babies will increase. The average birth weight of the population is 8.6 kg
(b) An engineer hypothesizes that the mean number of defects can be decreased in a
manufacturing process of compact disks by using robots instead of humans for
certain tasks. The mean number of defective disks per 1000 is 18.
(c) A Psychogist feels that playing soft music during a test will change the result of
the test. The Psychologist is not sure whether the grades will be higher or lower. In
the past, the mean of the score was 73.
Solutions
(a) H0 :u 8.6
H1 :u 8.6
(b) H0 :u 18
H1 :u18
(c) H1 :u 73
H0 :u 73
POSSIBLE OUTCOMES OF A HYPOTHESIS TEST
There are two possibilities for a correct decision and two possibilities for an incorrect decision.
There are two types of errors namely: type 1 and type 11.
A type 1 error occurs if one rejects the null hypothesis when it is true. A type 11 error occurs
if one does not reject the null hypothesis when it is false.
H0 true
H0 false
Error type 1
Correct
decision
Reject H0
Do not reject H0
Correct
Error type 11
decision
TESTS OF HYPOTHESIS USING THE NORMAL DISTRIBUTION
In hypothesis testing, the following steps are recommended;
1. State the hypothesis. Be sure to state both the null and alternative hypothesis.
2. Stating the level of significance.
The level of significance is the maximum probability of committing a type 1 error.
The critical region is the range of values of the test value that indicates that there
is a significant difference and that the null hypothesis should be rejected.
Page of 91
57
Page 58 of 91
The shaded part is called critical region and the unshaded area is called noncritical or
acceptance region.
X
3. Standardizes x. i.e. Z
Z=
= -1.65
4. Comparing the standardized x to a significance level
-1.65 is between -1.96 and 1.69
5. Make decision
Accept H0 i.e. Yes it belongs to same variety.
Page of 91
58
Page 59 of 91
Example
Over a long period, the weights of pots of jam made by a standard process have been
normally distributed with the mean 345g and standard deviation 2.8g. A pot produced just
before the process closed for the day weights 338.5g. Is the process working correctly?
2.
Solution
Formulate hypothesis
H0: u =345 (yes it is working correctly)
H1: u 345
Decision rule 5% level of significance
3.
Standardize x
1.
Z
= -2.32
Comparing the standardized x to a significance level
-2.32< -1.96 (outside acceptance region)
Make decision
Reject H0. In other words, it is not working correctly.
4.
5.
Example
A researcher reports that the average salary of assistant professors is more than K42,
000. A sample of 30 assistants professors has a mean salary of K43, 260. At
0.05 test the claim that assistants professors earn more than K42000 a month. The
standard deviation of the population is K5230.
Solution
1. Formulate hypothesis
H0: u K42000
H1: u K42000
2. Decision rule 5% level of significance
3.
Standardize x
n
43,000
42,000
5230
Z=
30
4.
5.
= 1.32
Comparing the standardized x to a significance level
1.32< 1.65(It is not in the acceptance region)
Make decision
Reject H0. In other words, the assistants professors earn not more than K42, 000
Page of 91
59
Page 60 of 91
For comparison of two variances or standard deviations, an F test is used.
s
21
22
0.05
2.18
.
.
d.f.D
d.f.N
...
1
2
.
.
20
Page of 91
60
14
15
Page 61 of 91
When there is no difference in the means, the between-group variance estimate will be
equal to the within-group variance and the F test value will be approximately equal to
1. The null hypothesis will not be rejected.
For a test of the difference among three or more means, the following hypothesis should
be used.
H0: u1=u2=u3==un
H1: At least one mean is different from the others.
The degree of freedom for this F test is
d.f.N = k-1 where k is the number of groups
d.f.D = N-k where N is the sum of the sample sizes of the groups. The
F-test to compare means is always right-tailed.
STEPS TO BE FOLLOWED
1. Find the mean and variance of each sample.x1,s12 ,x2,s22 ..., xk ,sk2
x
N
3. Find the between-group variance.
SB2
ni x i x GM
k 1
4. Find the within-group variance.
SW2
n 1s
i
i2
n 1
i
Example
A researcher wishes to try three different techniques to lower the blood pressure of
individual diagnosed with high blood pressure. The subjects are randomly assigned to
three groups; the first group takes medication, the second group exercises and the third
group follows a special diet. After four weeks, the reduction in each persons blood
pressure is recorded. At 0.05, test the claim that there is no difference among the
means. The data are shown below.
Page of 91
61
Page 62 of 91
Medication Exercise
10
6
12
8
9
3
15
0
13
2
x1 11.8
s12 5.7
x2 3.8
s22 10.2
Diet
5
9
12
8
4
x3 7.5
s32 10.3
Solution
1. Formulate hypothesis
H0: u1=u2=u3==un
H1: At least one mean is different from the others.
2. Find the critical value
K = 3 and N = 15
d.f.N= 3-1= 2
d.f.D = N-k= 15-3=12
The critical value is 3.89 (from F distribution table)
3. (a) mean and variance of each sample are shown in the table
(b) Find the grand mean
xGM
7.73
(c) Find the between-group variance,
n x
SB2
x GM
k 1
511.8 7.732 53.8 7.732 57.6 7.732
31
80.07
(d) Find the within-group variance.
SW2
Page of 91
n 1s
i
62
Page 63 of 91
n 1
i
515.7
5110.22 5110.32
515151
8.17
(e). Find the F test value
F SSW B2 2
9.17
4. 3.89< 9.19, reject the null hypothesis
5. There is enough evidence to reject the claim and conclude that
at least one mean is different from the others.
Example
A state employee wishes to see if there is a significant difference in the number of
employees at the interchanges of three state toll roads. The data is shown. At
0.05 , can it be concluded that there is a significance difference in the average number
of employees at each interchange?
Pennsylvance
Turnpike
7
14
32
19
10
11
x1 15.5
s12 81.9
Green Bypass
Mon-Fayette Expressway
10
1
1
0
11
1
x2 4.0
s22 25.6
Solution
1. State the hypothesis
H0: u1=u2=u3==un
H1: At least one mean is different from the others.
Page of 91
63
Beaker valley
Expressway
1
12
1
9
1
11
x3 5.8
s32 29.0
Page 64 of 91
2. Find the critical value
K = 3 and N = 18
d.f.N= 3-1= 2
d.f.D = N-k= 18-3=15
The critical value is 3.68 (from F distribution table)
2. (a) mean and variance of each sample are shown in the table
(b) Find the grand mean
xGM
8.4
(c) Find the between-group variance
2
ni x i x GM
SB2
k 1
31
229.59
(d) Find within-group variance
n 1s
SW2
i2
n 1
i
6 181.9
6 125.62 6 129.02
6 16 16 1
45.5
(e) Find the F test value
SB2
F SW 2
Page of 91
64
Page 65 of 91
5.05
4. Make the decision: reject the null hypothesis.
5. Summarize the results: There is evidence that there is a difference among the mean.
Page of 91
65
Page 66 of 91
4260
3500
2300
2000
1850
5238
4626
4347
3300
6529
4543
3668
3379
2874
7. The numbers (in thousands) of farms per state found in three regions of Malawi are
listed below. Test the claim at 0.05 that the mean number of forms is the
same across these regions.
Northern Malawi Central Malawi Southern Malawi
48
95
29
57
52
40
24
64
40
10
64
68
38
TOPIC 7: TEST OF PROPORTIONS (CHI SQUARE TESTING)
OBJECTIVES
By the end of this topic, you should be able to;
(a) Use chi-squared tables
(b) Calculate the chi-squared statistic of a sample.
(c) State five steps in testing hypothesis using chi-squared test.
(d) Test proportions for homogeneity using chi-square.
This section explains how to conduct a chi-square test of homogeneity. The test is applied to
a single categorical variable from two different populations. It is used to determine whether
frequency counts are distributed identically across different populations.
The hypotheses in this case would be
H0 : p1=p2=p3==pn
H1 : At least one proportion is different from the others
Steps to be followed when using Chi Squared Test are as follows:
1. Formulate the hypotheses 2.
Construct a contingency table.
It is made up of R rows and C columns. It should be noted that row and column
headings do not count in determing the number of rows and columns.
3. Determine the appropriate number of degrees of freedom (d.f).
The degrees of freedom of any contingency table are (rows 1) times (columns -1);
that is d.f = (R-1) (C -1)
4. Compute the test value. To compute the test value, first find the expected values. For each cell of the
contingency table, use the formula
row
sumcolumn sum
Expected
value,
E
grand
total
Page of 91
66
Page 67 of 91
To find the test value, use the formula
E
X 2 O E
Where X2 =Chi-Square
O= Observed frequency/ value
E= Expected frequency/ value
5. Make the decision
6. Summarize the results
ASSUMPTIONS FOR THE CHI-SQUARE HOMOGENEITY TESTS
a. The data are obtained from a random sample
b. The expected value in each cell must be 5 or more.
Example
A researcher selected a sample of 150 seniors from each of the three area high schools and
asked each senior, do you drive to school; in a car owned by either you or your parents?
The data are shown in the table. At = 0.05, test the claim that the proportion of students
who drive their own or their parents cars is the same at all three schools.
Yes
No
Total
Solution
Step 1
School 1
18
32
50
School 2
22
28
50
School 3
16
34
50
Total
56
94
150
E1,3
Page of 91
E1,1
18.67, E1,2
E2,1
31.33, E2,2
18.67 ,
18.67
67
31.33,
Page 68 of 91
E2,3
31.33
E
But X 2 O E
=
1 8 18.67 2 2 18.67 16 18.672 32 31.332 28 31.332 34 31.332
Page of 91
68
Page 69 of 91
Expected
value,
row
E
sumcolumn
grand total
sum
E1,1
15.5,E1,2
15.5,E1,3
15.5,E1,4
E2,1
14.5,E2,2
14.5,E2,3
14.5,E2,4
15.5
14.5
But X 2 O EE
15 15.5
18 15.5
13 15.5
16 15.5
15 14.5
12 14.5
17 14.5
15.5
15.5
15.5
15.5
14.5
14.5
14.5
= 1.735
Step 5 Make decision
The decision is not to reject the null hypothesis
Step 6 Summarize the results.
The proportions are equal
69
14
14
14.5
Page 70 of 91
2
1
2
31
16
131
216
66 57 113
Category
Population
1 2
3 4
40 17 3
35 22 8
4. Test the hypothesis that the proportions is the same for all three age groups
Age groups
25 and under over 25 and under 50 50 and 0ver Total
Claim
40
35
60
135
No claim
60
65
40
165
Total
100
100
100
300
5. Test the hypothesis that the proportions of individual in categories 1, 2, 3 and 4 are the same
in populations 1 and 2.
Category
Population 1
2
3
4
40 17
3
35 22
8
70
Page 71 of 91
SCATTER DIAGRAMS
Consider the following set of pairs of values
x
1
2
3
4
5
6
y
2
3.5
3.75
4.0
4.5
5.5
These (x, y) pairs of values form an example of a bivariate distribution. When they are
plotted on graph paper as shown below, the result is called a scatter diagram, scatter gram,
or scatter plot.
A scatter diagram is a visual way to describe the nature of the relationship between the independent
and dependent.
Example
The marks of ten candidates in each of two examinations are given below.
Examination
8
10
18
23
29
32
35
38
42
1
Examination
10 12
20
20
25
30
29
31
36
2
Plot this information on a scatter diagram
Solution
Page of 91
71
48
35
Page 72 of 91
CORRELATION
Correlation is a statistics method used to determine whether a relation between variables
exists. There are two types of correlations, namely: (a) linear correlation and (b) nonlinear
correlation.
LINEAR CORRELATION
The correlation is said to be linear when the relationship between the two variables is linear.
In other words, straight lines can represent all the points. There are two types of linear
correlations namely; positive and negative correlations.
POSITIVE LINEAR CORRELATION
The points have the appearance of clustering about a line that slopes up to the right.
Page of 91
72
Page 73 of 91
The points have the appearance of clustering about a line that slopes down to the right.
NON-LINEAR CORRELATION
Here a straight line cannot represent the points.
REGRESSION
Page of 91
73
Page 74 of 91
Once a scatter diagram has been produced, the next problem is to determine the line to
which the points approximate. In order to determine the relationship between x and y, we
need to know what straight line to draw through the collection of points on the scatter
diagram. It will not go through all the points but will lie somewhat in the midst of the
collection of points and it will slope in the direction suggested by the points such a line is
called a regression line or line of best fit.
There are two methods for drawing a regression line a graphing method and a
mathematical method.
(a) GRAPHICAL METHOD
whose coordinates are the means of x and y values. This point is called the mean
value
STEPS TO BE FOLLOWED WHEN USING GRAPHICAL METHOD
y
8
MATHEMATICAL METHOD
The equation of a straight line is usually given as y = mx + c where m is the gradient and
c is the y-intercept. In statistics, this equation can be written also as y = a + bx
Where a is the y-intercept and b the slope/gradient and it is called the equation of regression
line.
If the equation of a regression line is y= mx + c then
Page of 91
74
Page 75 of 91
m nxy2xx
y , nx
c ymx or c
y
mx n
x 578
57.8 n
10
y 470 47 yphysics
n
10
Therefore, the mean center is (57.8, 47)
m n xy2xx 2 y n
x
0.326
THE CORRELATION COEFFICIENT
Page of 91
75
Page 76 of 91
To measure the strength, or intensity of the correlation in a particular case, we
calculate a linear correlation coefficient, which we indicate by the smaller letter r. The
formula for a linear correlation coefficient, r is
nxy xy
r
n
x x ny y
2
x2
y2
xy
2. Find the values of x2, y2 and xy and place these values in the corresponding
columns of the table.
3. Substitute in the formula and solve for r.
Example
Suppose we are given the following pairs of x and y
x
y
10
5
Calculate r.
Solution
x
10
14
7
12
5
6
x 54
Page of 91
14
3
Y
5
3
5
2
7
8
y 30
7
5
12
2
x2
100
196
49
144
25
36
x 550
2
76
5
7
y2
25
8
25
4
49
64
y 176
2
6
8
xy
50
42
35
24
35
48
xy 234
Page 77 of 91
But r
r
r
r
r
xy xy
n x x n y
y
2
6 234 5430
6 550 54 6 176 30
2
1404 1620
3300
216
59904
216
24475
r =- 0.88 to 2 d.p.
Example
Compute r for the data obtained in a study of age and systolic blood pressure of six
randomly selected subjects. The data are shown in the table below.
Subject
A
B
C
D
E
F
Age x
43
48
56
61
67
70
Solution
x
43
48
56
61
67
70
x2
1849
2304
3136
3721
4489
4900
Y
128
120
135
143
141
152
x 345
Pressure y
128
120
135
143
141
152
y 819
y2
16384
14400
18225
20449
19881
23881
112443
20399
Here n =6 and
nxyxy
r
n
Page of 91
x x ny y
2
6 47634345819
77
xy
5504
5760
7560
8723
9447
10640
xy
47634
Page 78 of 91
(6203993452)(61124438192)
r = 0.897
CHARACTERISTICS OF CORRELATION COFFICIENT
a. The correlation coefficient is always between -1 and +1 inclusive
b. A correlation coefficient of -1.0 occurs when there is perfect negative correlation
i.e. all the points lie exactly on a straight line sloping down from left to right.
c. A correlation of 0 occurs when there is no correlation.
d. A correlation of 1.0 occurs when there is a perfect positive correlation i.e. all the points
lie exactly on a straight line sloping upwards from left to right.
e. A correlation of between 0 and +1 or 0 and -1.0 indicates that the variables are partially
correlated.
CORRELATION ANALYSIS
Correlation - an analysis method used to decide whether there is a statistically significant
relationship between two variables.
In correlation analysis, you perform the following steps
1. Draw the scatter plot for the variables
2. Compute the value of the correlation coefficient
3. State the hypotheses
The hypotheses will be H0: p=0 means there is a correlation
H1: p 0 means that there is a significant correlation
between the variables in the population.
Note: p is called population correlation coefficient.
4. Test the significance of the correlation at the given
coefficiency is
t
n
2 with degree of freedom equal to n-2
1 r
(b) the two tailed critical values are used. These values are found in the
t distribution tables.
5. Give a grief explanation of the type of relationship.
Page of 91
78
Page 79 of 91
Example
A researcher wishes to determine if a persons age is related to the number of hours, he
or she exercises per week. The data for the sample are shown here.
Age x
Hours
y
1
8
1
0
26 32 38 52
5
59
1.5 1
(b)
x
18
26
32
38
52
59
Page of 91
y
10
5
2
3
1.5
1
x2
324
676
1024
1444
2704
3481
y2
100
25
4
9
2.25
1
79
xy
180
130
64
114 78
59
Page 80 of 91
xy
225
22.5
9653
141.2
625
n= 6
r nxyxy
x
x ny
y
2
6625 22522.5
n 2
1 r2
0.832
6 2
1 0. 832 2
2.999
Since there are 6 2 =4 degree of freedom, so its critical value is 2.78 (e).
There is a significant between a person age and the number of hours he or she
exercises.
Page of 91
80
Page 81 of 91
(c)
(d)
(e)
(f)
(g)
39
0
43
Cholesterol
45 80 50 55
y
Compute the scatter plot for the variables.
Compute the value of the correlation coefficient.
State the hypotheses.
Test the significance of the correlation at 0.05
Give a brief explanation of the type of relationship.
Page of 91
81
52
60
Page 82 of 91
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
Page of 91
82
Page 83 of 91
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
Page of 91
83
Page 84 of 91
F DISTRIBUTION
F Table for alpha=.05 .
df
2/
df
1
10
12
15
20
24
30
40
60
12
0
IN
F
16 19 21 22 23 23 23 23 24 24 24 24 24 24 25 25 25 25 25
1.4 9.5 5.7 4.5 0.1 3.9 6.7 8.8 0.5 1.8 3.9 5.9 8.0 9.0 0.0 1.1 2.1 3.2 4.3
47 00 07 83 61 86 68 82 43 81 06 49 13 51 95 43 95 52 14
6
0
3
2
9
0
4
7
3
7
0
9
1
8
1
2
7
9
4
18. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19. 19.
51 00 16 24 29 32 35 37 38 39 41 42 44 45 46 47 47 48
28 00 43 68 64 95 32 10 48 59 25 91 58 41 24 07 91 74
10. 9.5 9.2 9.1 9.0 8.9 8.8 8.8 8.8 8.7 8.7 8.7 8.6 8.6 8.6 8.5 8.5 8.5 8.5
12 52 76 17 13 40 86 45 12 85 44 02 60 38 16 94 72 49 26
80
1
6
2
5
6
7
2
3
5
6
9
2
5
6
4
0
4
4
7.7 6.9 6.5 6.3 6.2 6.1 6.0 6.0 5.9 5.9 5.9 5.8 5.8 5.7 5.7 5.7 5.6 5.6
08 44 91 88 56 63 94 41 98 64 11 57 02 74 45 17 87 58
6
3
4
2
1
1
2
0
8
4
7
8
5
4
9
0
7
1
6.6 5.7 5.4 5.1 5.0 4.9 4.8 4.8 4.7 4.7 4.6 4.6 4.5 4.5 4.4 4.4 4.4 4.3 4.3
07 86 09 92 50 50 75 18 72 35 77 18 58 27 95 63 31 98 65
9
1
5
2
3
3
9
3
5
1
7
8
1
2
7
8
4
5
0
5.9 5.1 4.7 4.5 4.3 4.2 4.2 4.1 4.0 4.0 3.9 3.9 3.8 3.8 3.8 3.7 3.7 3.7 3.6
87 43 57 33 87 83 06 46 99 60 99 38 74 41 08 74 39 04 68
4
3
1
7
4
9
7
8
0
0
9
1
2
5
2
3
8
7
9
5.5 4.7 4.3 4.1 3.9 3.8 3.7 3.7 3.6 3.6 3.5 3.5 3.4 3.4 3.3 3.3 3.3 3.2 3.2
91 37 46 20 71 66 87 25 76 36 74 10 44 10 75 40 04 67 29
4
4
8
3
5
0
0
7
7
5
7
7
5
5
8
4
3
4
8
5.3 4.4 4.0 3.8 3.6 3.5 3.5 3.4 3.3 3.3 3.2 3.2 3.1 3.1 3.0 3.0 3.0 2.9 2.9
17 59 66 37 87 80 00 38 88 47 83 18 50 15 79 42 05 66 27
7
0
2
9
5
6
5
1
1
2
9
4
3
2
4
8
3
9
6
Page of 91
84
19.
49
57
5.6
28
1
Page 85 of 91
5.1 4.2 3.8 3.6 3.4 3.3 3.2 3.2 3.1 3.1 3.0 3.0 2.9 2.9 2.8 2.8 2.7 2.7 2.7
17 56 62 33 81 73 92 29 78 37 72 06 36 00 63 25 87 47 06
4
5
5
1
7
8
7
6
9
3
9
1
5
5
7
9
2
5
7
10
4.9 4.1 3.7 3.4 3.3 3.2 3.1 3.0 3.0 2.9 2.9 2.8 2.7 2.7 2.6 2.6 2.6 2.5 2.5
64 02 08 78 25 17 35 71 20 78 13 45 74 37 99 60 21 80 37
6
8
3
0
8
2
5
7
4
2
0
0
0
2
6
9
1
1
9
11
4.8 3.9 3.5 3.3 3.2 3.0 3.0 2.9 2.8 2.8 2.7 2.7 2.6 2.6 2.5 2.5 2.4 2.4 2.4
44 82 87 56 03 94 12 48 96 53 87 18 46 09 70 30 90 48 04
3
3
4
7
9
6
3
0
2
6
6
6
4
0
5
9
1
0
5
12
4.7 3.8 3.4 3.2 3.1 2.9 2.9 2.8 2.7 2.7 2.6 2.6 2.5 2.5 2.4 2.4 2.3 2.3 2.2
47 85 90 59 05 96 13 48 96 53 86 16 43 05 66 25 84 41 96
2
3
3
2
9
1
4
6
4
4
6
9
6
5
3
9
2
0
2
13
4.6 3.8 3.4 3.1 3.0 2.9 2.8 2.7 2.7 2.6 2.6 2.5 2.4 2.4 2.3 2.3 2.2 2.2 2.2
67 05 10 79 25 15 32 66 14 71 03 33 58 20 80 39 96 52 06
2
6
5
1
4
3
1
9
4
0
7
1
9
2
3
2
6
4
4
14
4.6 3.7 3.3 3.1 2.9 2.8 2.7 2.6 2.6 2.6 2.5 2.4 2.3 2.3 2.3 2.2 2.2 2.1
00 38 43 12 58 47 64 98 45 02 34 63 87 48 08 66 22 77
1
9
9
2
2
7
2
7
8
2
2
0
9
7
2
4
9
8
2.1
30
7
15
4.5 3.6 3.2 3.0 2.9 2.7 2.7 2.6 2.5 2.5 2.4 2.4 2.3 2.2 2.2 2.2 2.1 2.1
43 82 87 55 01 90 06 40 87 43 75 03 27 87 46 04 60 14
1
3
4
6
3
5
6
8
6
7
3
4
5
8
8
3
1
1
2.0
65
8
16
4.4 3.6 3.2 3.0 2.8 2.7 2.6 2.5 2.5 2.4 2.4 2.3 2.2 2.2 2.1 2.1 2.1 2.0 2.0
94 33 38 06 52 41 57 91 37 93 24 52 75 35 93 50 05 58 09
0
7
9
9
4
3
2
1
7
5
7
2
6
4
8
7
8
9
6
17
4.4 3.5 3.1 2.9 2.8 2.6 2.6 2.5 2.4 2.4 2.3 2.3 2.2 2.1 2.1 2.1 2.0 2.0
51 91 96 64 10 98 14 48 94 49 80 07 30 89 47 04 58 10
3
5
8
7
0
7
3
0
3
9
7
7
4
8
7
0
4
7
18
4.4 3.5 3.1 2.9 2.7 2.6 2.5 2.5 2.4 2.4 2.3 2.2 2.1 2.1 2.1 2.0 2.0 1.9 1.9
13 54 59 27 72 61 76 10 56 11 42 68 90 49 07 62 16 68 16
9
6
9
7
9
3
7
2
3
7
1
6
6
7
1
9
6
1
8
19
4.3 3.5 3.1 2.8 2.7 2.6 2.5 2.4 2.4 2.3 2.3 2.2 2.1 2.1 2.0 2.0 1.9 1.9
80 21 27 95 40 28 43 76 22 77 08 34 55 14 71 26 79 30
7
9
4
1
1
3
5
8
7
9
0
1
5
1
2
4
5
2
20
4.3 3.4 3.0 2.8 2.7 2.5 2.5 2.4 2.3 2.3 2.2 2.2 2.1 2.0 2.0 1.9 1.9 1.8 1.8
51 92 98 66 10 99 14 47 92 47 77 03 24 82 39 93 46 96 43
2
8
4
1
9
0
0
1
8
9
6
3
2
5
1
8
4
3
2
21
4.3 3.4 3.0 2.8 2.6 2.5 2.4 2.4 2.3 2.3 2.2 2.1 2.0 2.0 2.0 1.9 1.9 1.8 1.8
24 66 72 40 84 72 87 20 66 21 50 75 96 54 10 64 16 65 11
8
8
5
1
8
7
6
5
0
0
4
7
0
0
2
5
5
7
7
Page of 91
85
1.9
60
4
1.8
78
0
Page 86 of 91
22
4.3 3.4 3.0 2.8 2.6 2.5 2.4 2.3 2.3 2.2 2.2 2.1 2.0 2.0 1.9 1.9 1.8 1.8 1.7
00 43 49 16 61 49 63 96 41 96 25 50 70 28 84 38 89 38 83
9
4
1
7
3
1
8
5
9
7
8
8
7
3
2
0
4
0
1
23
4.2 3.4 3.0 2.7 2.6 2.5 2.4 2.3 2.3 2.2 2.2 2.1 2.0 2.0 1.9 1.9 1.8 1.8 1.7
79 22 28 95 40 27 42 74 20 74 03 28 47 05 60 13 64 12 57
3
1
0
5
0
7
2
8
1
7
6
2
6
0
5
9
8
8
0
24
4.2 3.4 3.0 2.7 2.6 2.5 2.4 2.3 2.3 2.2 2.1 2.1 2.0 1.9 1.9 1.8 1.8 1.7 1.7
59 02 08 76 20 08 22 55 00 54 83 07 26 83 39 92 42 89 33
7
8
8
3
7
2
6
1
2
7
4
7
7
8
0
0
4
6
0
25
4.2 3.3 2.9 2.7 2.6 2.4 2.4 2.3 2.2 2.2 2.1 2.0 2.0 1.9 1.9 1.8 1.8 1.7 1.7
41 85 91 58 03 90 04 37 82 36 64 88 07 64 19 71 21 68 11
7
2
2
7
0
4
7
1
1
5
9
9
5
3
2
8
7
4
0
26
4.2 3.3 2.9 2.7 2.5 2.4 2.3 2.3 2.2 2.2 2.1 2.0 1.9 1.9 1.9 1.8 1.8 1.7 1.6
25 69 75 42 86 74 88 20 65 19 47 71 89 46 01 53 02 48 90
2
0
2
6
8
1
3
5
5
7
9
6
8
4
0
3
7
8
6
27
4.2 3.3 2.9 2.7 2.5 2.4 2.3 2.3 2.2 2.2 2.1 2.0 1.9 1.9 1.8 1.8 1.7 1.7
10 54 60 27 71 59 73 05 50 04 32 55 73 29 84 36 85 30
0
1
4
8
9
1
2
3
1
3
3
8
6
9
2
1
1
6
1.6
71
7
28
4.1 3.3 2.9 2.7 2.5 2.4 2.3 2.2 2.2 2.1 2.1 2.0 1.9 1.9 1.8 1.8 1.7 1.7
96 40 46 14 58 45 59 91 36 90 17 41 58 14 68 20 68 13
0
4
7
1
1
3
3
3
0
0
9
1
6
7
7
3
9
8
1.6
54
1
29
4.1 3.3 2.9 2.7 2.5 2.4 2.3 2.2 2.2 2.1 2.1 2.0 1.9 1.9 1.8 1.8 1.7 1.6 1.6
83 27 34 01 45 32 46 78 22 76 04 27 44 00 54 05 53 98 37
0
7
0
4
4
4
3
3
9
8
5
5
6
5
3
5
7
1
6
30
4.1 3.3 2.9 2.6 2.5 2.4 2.3 2.2 2.2 2.1 2.0 2.0 1.9 1.8 1.8 1.7 1.7 1.6
70 15 22 89 33 20 34 66 10 64 92 14 31 87 40 91 39 83
9
8
3
6
6
5
3
2
7
6
1
8
7
4
9
8
6
5
40
4.0 3.2 2.8 2.6 2.4 2.3 2.2 2.1 2.1 2.0 2.0 1.9 1.8 1.7 1.7 1.6 1.6 1.5 1.5
84 31 38 06 49 35 49 80 24 77 03 24 38 92 44 92 37 76 08
7
7
7
0
5
9
0
2
0
2
5
5
9
9
4
8
3
6
9
60
4.0 3.1 2.7 2.5 2.3 2.2 2.1 2.0 2.0 1.9 1.9 1.8 1.7 1.7 1.6 1.5 1.5 1.4
01 50 58 25 68 54 66 97 40 92 17 36 48 00 49 94 34 67
2
4
1
2
3
1
5
0
1
6
4
4
0
1
1
3
3
3
12
0
3.9 3.0 2.6 2.4 2.2 2.1 2.0 2.0 1.9 1.9 1.8 1.7 1.6 1.6 1.5 1.4 1.4 1.3 1.2
20 71 80 47 89 75 86 16 58 10 33 50 58 08 54 95 29 51 53
1
8
2
2
9
0
8
4
8
5
7
5
7
4
3
2
0
9
9
in
f
3.8 2.9 2.6 2.3 2.2 2.0 2.0 1.9 1.8 1.8 1.7 1.6 1.5 1.5 1.4 1.3 1.3 1.2 1.0
41 95 04 71 14 98 09 38 79 30 52 66 70 17 59 94 18 21 00
5
7
9
9
1
6
6
4
9
7
2
4
5
3
1
0
0
4
0
Page of 91
86
1.6
22
3
1.3
89
3
Page 87 of 91
df\
a
rea
.995
.990
.975
.950
.900
.750
.500
.250
.100
.050
.025
.010
.005
0.00
0
04
0.00
0
16
0.00
0
98
0.00
3
93
0.01
5
79
0.10
1
53
0.45
4
94
1.32
3
30
2.70
5
54
3.84
1
46
5.02
3
89
6.63
4
90
7.87
9
44
0.01
0
03
0.02
0
10
0.05
0
64
0.10
2
59
0.21
0
72
0.57
5
36
1.38
6
29
2.77
2
59
4.60
5
17
5.99
1
46
7.37
7
76
9.21
0
34
10.5
9
663
0.07
1
72
0.11
4
83
0.21
5
80
0.35
1
85
0.58
4
37
1.21
2
53
2.36
5
97
4.10
8
34
6.25
1
39
7.81
4
73
9.34
8
40
11.3
4
487
12.8
3
816
0.20
6
99
0.29
7
11
0.48
4
42
0.71
0
72
1.06
3
62
1.92
2
56
3.35
6
69
5.38
5
27
7.77
9
44
9.48
7
73
11.1
4
329
13.2
7
670
14.8
6
026
0.41
1
74
0.55
4
30
0.83
1
21
1.14
5
48
1.61
0
31
2.67
4
60
4.35
1
46
6.62
5
68
9.23
6
36
11.0
7
050
12.8
3
250
15.0
8
627
16.7
4
960
0.67
5
73
0.87
2
09
1.23
7
34
1.63
5
38
2.20
4
13
3.45
4
60
5.34
8
12
7.84
0
80
10.6
4
464
12.5
9
159
14.4
4
938
16.8
1
189
18.5
4
758
0.98
9
26
1.23
9
04
1.68
9
87
2.16
7
35
2.83
3
11
4.25
4
85
6.34
5
81
9.03
7
15
12.0
1
704
14.0
6
714
16.0
1
276
18.4
7
531
20.2
7
774
1.34
4
41
1.64
6
50
2.17
9
73
2.73
2
64
3.48
9
54
5.07
0
64
7.34
4
12
10.2
1
885
13.3
6
157
15.5
0
731
17.5
3
455
20.0
9
024
21.9
5
495
1.73
4
93
2.08
7
90
2.70
0
39
3.32
5
11
4.16
8
16
5.89
8
83
8.34
2
83
11.3
8
875
14.6
8
366
16.9
1
898
19.0
2
277
21.6
6
599
23.5
8
935
10
2.15
5
86
2.55
8
21
3.24
6
97
3.94
0
30
4.86
5
18
6.73
7
20
9.34
1
82
12.5
4
886
15.9
8
718
18.3
0
704
20.4
8
318
23.2
0
925
25.1
8
818
Page of 91
87
Page 88 of 91
11
2.60
3
22
3.05
3
48
3.81
5
75
4.57
4
81
5.57
7
78
7.58
4
14
10.3
4
100
13.7
0
069
17.2
7
501
19.6
7
514
21.9
2
005
24.7
2
497
26.7
5
685
12
3.07
3
82
3.57
0
57
4.40
3
79
5.22
6
03
6.30
3
80
8.43
8
42
11.3
4
032
14.8
4
540
18.5
4
935
21.0
2
607
23.3
3
666
26.2
1
697
28.2
9
952
13
3.56
5
03
4.10
6
92
5.00
8
75
5.89
1
86
7.04
1
50
9.29
9
07
12.3
3
976
15.9
8
391
19.8
1
193
22.3
6
203
24.7
3
560
27.6
8
825
29.8
1
947
14
4.07
4
67
4.66
0
43
5.62
8
73
6.57
0
63
7.78
9
53
10.1
6
531
13.3
3
927
17.1
1
693
21.0
6
414
23.6
8
479
26.1
1
895
29.1
4
124
31.3
1
935
4.60
0
5.22
9
6.26
2
7.26
0
8.54
6
11.0
3
14.3
3
18.2
4
22.3
0
24.9
9
27.4
8
30.5
7
32.8
0
92
35
14
94
76
654
886
509
713
579
839
791
132
16
5.14
2
21
5.81
2
21
6.90
7
66
7.96
1
65
9.31
2
24
11.9
1
222
15.3
3
850
19.3
6
886
23.5
4
183
26.2
9
623
28.8
4
535
31.9
9
993
34.2
6
719
17
5.69
7
22
6.40
7
76
7.56
4
19
8.67
1
76
10.0
8
519
12.7
9
193
16.3
3
818
20.4
8
868
24.7
6
904
27.5
8
711
30.1
9
101
33.4
0
866
35.7
1
847
18
6.26
4
80
7.01
4
91
8.23
0
75
9.39
0
46
10.8
6
494
13.6
7
529
17.3
3
790
21.6
0
489
25.9
8
942
28.8
6
930
31.5
2
638
34.8
0
531
37.1
5
645
19
6.84
3
97
7.63
2
73
8.90
6
52
10.1
1
701
11.6
5
091
14.5
6
200
18.3
3
765
22.7
1
781
27.2
0
357
30.1
4
353
32.8
5
233
36.1
9
087
38.5
8
226
20
7.43
3
84
8.26
0
40
9.59
0
78
10.8
5
081
12.4
4
261
15.4
5
177
19.3
3
743
23.8
2
769
28.4
1
198
31.4
1
043
34.1
6
961
37.5
6
623
39.9
9
685
21
8.03
3
65
8.89
7
20
10.2
8
290
11.5
9
131
13.2
3
960
16.3
4
438
20.3
3
723
24.9
3
478
29.6
1
509
32.6
7
057
35.4
7
888
38.9
3
217
41.4
0
106
22
8.64
2
72
9.54
2
49
10.9
8
232
12.3
3
801
14.0
4
149
17.2
3
962
21.3
3
704
26.0
3
927
30.8
1
328
33.9
2
444
36.7
8
071
40.2
8
936
42.7
9
565
23
9.26
0
42
10.1
9
572
11.6
8
855
13.0
9
051
14.8
4
796
18.1
3
730
22.3
3
688
27.1
4
134
32.0
0
690
35.1
7
246
38.0
7
563
41.6
3
840
44.1
8
128
24
9.88
6
23
10.8
5
636
12.4
0
115
13.8
4
843
15.6
5
868
19.0
3
725
23.3
3
673
28.2
4
115
33.1
9
624
36.4
1
503
39.3
6
408
42.9
7
982
45.5
5
851
15
Page of 91
88
Page 89 of 91
25
10.5
1
965
11.5
2
398
13.1
1
972
14.6
1
141
16.4
7
341
19.9
3
934
24.3
3
659
29.3
3
885
34.3
8
159
37.6
5
248
40.6
4
647
44.3
1
410
46.9
2
789
26
11.1
6
024
12.1
9
815
13.8
4
390
15.3
7
916
17.2
9
188
20.8
4
343
25.3
3
646
30.4
3
457
35.5
6
317
38.8
8
514
41.9
2
317
45.6
4
168
48.2
8
988
27
11.8
0
759
12.8
7
850
14.5
7
338
16.1
5
140
18.1
1
390
21.7
4
940
26.3
3
634
31.5
2
841
36.7
4
122
40.1
1
327
43.1
9
451
46.9
6
294
49.6
4
492
28
12.4
6
134
13.5
6
471
15.3
0
786
16.9
2
788
18.9
3
924
22.6
5
716
27.3
3
623
32.6
2
049
37.9
1
592
41.3
3
714
44.4
6
079
48.2
7
824
50.9
9
338
29
13.1
2
115
14.2
5
645
16.0
4
707
17.7
0
837
19.7
6
774
23.5
6
659
28.3
3
613
33.7
1
091
39.0
8
747
42.5
5
697
45.7
2
229
49.5
8
788
52.3
3
562
To index
3
0
13.7
8
672
14.9
5
346
16.7
9
077
18.4
9
266
20.5
9
923
24.4
7
761
29.3
3
603
34.7
9
974
40.2
5
602
43.7
7
297
46.9
7
924
50.8
9
218
53.6
7
196
The table should include values for p=0.1 so that a one-tailed test can be conducted at the
p=0.05 level, but we never do such tests in my class, so why clutter up the table?
Page of 91
89
Page 90 of 91
REFERENCES
Bluman, A.G (2001). Elementary Statistics: A Step by Step Approach. New-York,
McGraw-Hill
Page of 91
90
Page 91 of 91
Clarke, G.M. and Cooke,D.(1991). A Basic Course in Statistics. London: Edward Arnold.
Mendenhall, W; beaver, R.J; Beaver, B.M(2006). Introduction to Probability and Statistics.
12th Edition, Thomson Books/Cole, Belmont.
Saiti F.G. (2003). Mathematics, Module 9: Statistics 11. Domasi College of Education.
Page of 91
91