Full Stats Notes

Contents
1 Introduction 5
1.1. Overview of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2. Definition of terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3. Sampling techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1. Types of sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4. Probability sampling methods . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1. Simple random sampling . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.2. Systematic random sampling . . . . . . . . . . . . . . . . . . . . . 9
1.4.3. Stratified sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.4. Cluster sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5. Non-probability sampling methods . . . . . . . . . . . . . . . . . . . . . . 12
1.5.1. Convinience sampling method . . . . . . . . . . . . . . . . . . . . . 12
1.5.2. Quota sampling method . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.3. Expert sampling method . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.4. Chain referral sampling method . . . . . . . . . . . . . . . . . . . 12
1.6. Sampling errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.7. Data collection methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.7.1. Observation method . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.7.2. Interview method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.7.3. Experimentation method . . . . . . . . . . . . . . . . . . . . . . . . 15
1.8. Worked examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Data and Data Presentation 17

2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2. Data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1. Qualitative random variables . . . . . . . . . . . . . . . . . . . . . 18
2.2.2. Quantitative random variables . . . . . . . . . . . . . . . . . . . . 18
2.3. Data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.1. Primary data sources . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.2. Secondary data sources . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4. Data presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.1. Frequency distribution table . . . . . . . . . . . . . . . . . . . . . . 23
2.4.2. Pie Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.3. Bar graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.4. Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.5. Stem and leaf display . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.6. Frequency polygon . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5. Worked examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1
2 CONTENTS
3 Measures of Central Tendency 33

3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2. Important Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3. Measures of central tendency . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4. The arithmetic mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4.1. Mean for ungrouped data . . . . . . . . . . . . . . . . . . . . . . . 34
3.4.2. Mean for grouped data . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5. The Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5.1. Mode for ungrouped data . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5.2. Mode for grouped data . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.6. The Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6.1. Median for grouped data . . . . . . . . . . . . . . . . . . . . . . . . 40
3.7. Quartiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.7.1. Quartiles for ungrouped data . . . . . . . . . . . . . . . . . . . . . 42
3.7.2. Quartiles for grouped data . . . . . . . . . . . . . . . . . . . . . . . 43
3.7.3. The second quartile, Q2 (Median) . . . . . . . . . . . . . . . . . . . 44
3.7.4. The upper quartile, Q3 . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.7.5. Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.8. Skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.9. Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.10.Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4 Measures of Dispersion 47
4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2. The Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3. The Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4. The Standard deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5. The Coefficient of variation (CV) . . . . . . . . . . . . . . . . . . . . . . . 52
4.6. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 Basic Probability 55
5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2. Definition of terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3. Approches to probability theory . . . . . . . . . . . . . . . . . . . . . . . . 55
5.4. Properties of probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.5. Basic probability concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.6. Types of events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.7. Laws of probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.7.1. Addition Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.7.2. Multiplication laws . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.8. Types of probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.9. Contigency Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.10.Tree diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.11.Counting rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.11.1. Multiplication rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.11.2. Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.11.3. Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.12.Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
CONTENTS 3
6 Probability Distributions 69
6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.3. Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.4. Random variable probability distributions . . . . . . . . . . . . . . . . . . 70
6.5. Properties of discrete random variable distribution . . . . . . . . . . . . 71
6.6. Probability terminology and notation . . . . . . . . . . . . . . . . . . . . . 72
6.7. Discrete probability distributions . . . . . . . . . . . . . . . . . . . . . . . 73
6.7.1. Bernoulli distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.7.2. Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.7.3. Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.8. Continuous probability distributions . . . . . . . . . . . . . . . . . . . . . 78
6.8.1. The Uniform distribution . . . . . . . . . . . . . . . . . . . . . . . 78
6.8.2. The Exponential distribution . . . . . . . . . . . . . . . . . . . . . 80
6.8.3. The Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . 80
6.8.4. The standard normal distribution . . . . . . . . . . . . . . . . . . 81
7 Confidence Intervals 85
7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.2. Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.3. Confidence interval for the Population Mean . . . . . . . . . . . . . . . . 86
7.4. Confidence interval for a population proportion . . . . . . . . . . . . . . . 91
7.5. Confidence interval for the population variance . . . . . . . . . . . . . . . 92
7.6. Confidence interval for population standard deviation . . . . . . . . . . . 93
7.7. Confidence interval for difference of two populations means . . . . . . . 94
7.7.1. Case 1: If population variance is known . . . . . . . . . . . . . . . 94
7.7.2. Case 2: If population variances are unknown . . . . . . . . . . . . 95
8 Hypothesis Testing 97
8.1. Definitions and critical clarifications . . . . . . . . . . . . . . . . . . . . . 97
8.2. General procedure on Hypotheses Testing . . . . . . . . . . . . . . . . . . 99
8.3. Hypothesis testing concerning Population Mean . . . . . . . . . . . . . . 99
8.3.1. Case 1: If the population variance is known . . . . . . . . . . . . . 99
8.3.2. Case 2: If the population variance is not known . . . . . . . . . . 100
8.4. Hypothesis testing concerning the Population Proportion . . . . . . . . . 101
8.5. Comparing two populations . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.5.1. Hypothesis testing concerning difference between two population
means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.6. Independent and dependent samples . . . . . . . . . . . . . . . . . . . . . 104
8.6.1. Advantages of paired comparisons . . . . . . . . . . . . . . . . . . 105
8.6.2. Disadvantages of paired comparisons . . . . . . . . . . . . . . . . 105
8.7. Test Procedure concerning difference of two Population Proportions . . . 106
8.8. Tests for Independence: χ2 -test . . . . . . . . . . . . . . . . . . . . . . . . 107
8.9. Ending Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
9 Regression Analysis 109

9.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
9.2. Uses of Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 109
9.3. Abuses of Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 110
4 CONTENTS
9.4. The Simple Linear Regression model . . . . . . . . . . . . . . . . . . . . . 110

9.4.1. The scatter plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
9.4.2. The regression equation . . . . . . . . . . . . . . . . . . . . . . . . 111
9.4.3. The correlation coefficient . . . . . . . . . . . . . . . . . . . . . . . 113
9.4.4. The coefficient of determination, R2 . . . . . . . . . . . . . . . . . 114
10 Index numbers 117

10.1.Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
10.2.Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
10.3.What is an Index Number? . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
10.3.1. Characteristics of Index Numbers . . . . . . . . . . . . . . . . . . 117
10.3.2. Uses of Index Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 118
10.4.Types of Index Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
10.5.Methods of constructing Index Numbers . . . . . . . . . . . . . . . . . . . 120
10.5.1. Aggregate method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
10.5.2. Weighted Aggregates Index . . . . . . . . . . . . . . . . . . . . . . 121
10.6.Laspeyres Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
10.7.Paasche Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
10.8.Fisher’s Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Chapter 1
Introduction
1.1. Overview of Statistics
Statistics is when individual data is collected, summarized, analysed and presented

and used for decision making. It is an important tool in transforming raw data into
meaningful and usable information. Also statistics can be regarded as a decision sup-
port tool. A table below shows a transformation process of data to information. Data
refers to unprocessed, raw set of values and information is processed data.
Input Process Output

Data Statistical Analysis Information
Raw observation Transformation process Useful, Usable and Meaningful
An understanding of statistics allows one to: i) Perform simple statistical data ma-
nipulation and analysis. ii) Intelligently prepare and interpret reports in numerical
terms. iii) Communicate effectively with statistical analysts. iv) Make good decisions.
1.2. Definition of terms
The following terms shall be used in this module more often.
Statistics
Definition 1
Statistics refers to the methodology of collecting, presenting and analysis of data and
the use of such data.
Definition 2
In common usage, it refers to numerical data. This means any collection of data or
information constitutes what is referred to as Statistics. Some examples under this
definition are:
6 Introduction
1. Vital statistics - These are numerical data on births, marriages, divorces, com-
municable diseases, harvests, accidents etc.
2. Business and economic statistics - These are numerical data on employment,

production, prices, sales, dismissals etc.
3. Social statistics - These are numeric data on housing, crime, education etc.
Definition 3 - Statistics is making sense of data.
In Statistics, we usually deal with large volumes of data making it difficult to study
each observation, in order to draw conclusions about the source of the data. We seek
statistical methods that can summarise the data so that we can draw conclusions about
these data without scrutinising each observations. Such methods fall under area of
statistics called descriptive statistics.
A Statistician is an individual who collects data, analyses it using statistical tech-

niques, interprets the results, makes conclusions and recommendations on the basis
of data analysis.
Population
A population is a collection of elements about which we wish to make an inference.
The population must be clearly defined before the sample is taken.
Parameter(s)
These are numeric measures derived from a population e.g. population mean (µ), pop-
ulation variance (σ 2 ) and population standard deviation (σ).
Data
Data is what is more readily available from a variety of sources and of varying quality
and quantity. Precisely data is individual observation on a variable and in itself con-
veys no useful information.
Information
To make sound decision, one needs good and quality information. Information must be
timely, accurate, relevant, adequate and readily available. Information is defined as
processed data.
Random variable
A variable is any characteristic being measured or observed. Since a variable can take
on different values at each measurement it is termed a random variable. For example,
sales, company turnover, weight, height, yield, number of babies born, colour of vehicle
etc.
Introduction 7
Target population
This is a population whose properties are estimated via a sample or usually the ’total’
population.
Sample
A sample is a collection of sampling units drawn from a population. Data is obtained
from the sample and used to describe characteristics of the population. A sample can
also be defined as a subset or part of or a fraction of a population.
statistic(s)
The term statistics with lowercase s indicates numeric measure(s) derived from a sam-
ple e.g. sample mean (x̄), sample variance (s2 ) and sample standard deviation (s).
Sampling frame
A sampling frame is a list of sampling units. A set of information used to identify a
sample population for statistical treatment. It includes a numerical identifier for each
individual, plus other identifying information about characteristics of the individuals,
to aid in analysis and allow for division into further frames for more in-depth analysis.
Sampling
Sampling is a process used in statistical analysis in which a predetermined number of
observations is taken from a larger population. The methodology used to sample from
a larger population depend on the type of analysis being performed which include
simple random sampling, systematic sampling and cluster sampling. These sampling
methods will be discussed later.
Sampling units
Sampling units are non-overlapping collection of elements from the entire population.
It is a member of both the sampling frame and sample. The sampling units partition
the population of interest.
1.3. Sampling techniques

We do explore the sampling techniques in order to be able to decide which one is the
most appropriate for each given situation to be used for data collection. Sampling
techniques are methods of how data can be collected from the given population.
1.3.1. Types of sampling
Probability sampling
Probability sampling has a distinguishing characteristic that each unit in the popula-
8 Introduction
tion has a known, non-zero probability of being included in the sample thus, it is clear
that every subject or unit has an equal chance of being selected from the population.
These probabilities are usually equal for each unit. It eliminates the danger of being
biased in the selection process due to one’s own opinion or desire.
Non-probability Sampling
Non-probability sampling is a process where probabilities cannot be assigned to the
units objectively and hence it is difficult to determine the reliability of the sample
results in terms of probability. A sample is selected according to one’s convenience
or generality in nature. It is a good technique for pilot or feasibility studies. Exam-
ples include purposive sampling, convenience sampling and quota sampling. In non-
probability sampling, the units that make up the sample are collected with no specific
probability structure in mind e.g. units making up the sample through volunteering.
Remark
We shall focus on probability sampling because if an appropriate technique is chosen,
then it assures sample representativeness and hence the errors for the sampling can
be estimated.
Reasons for sampling

Sampling is done mostly for reasons of cost, time, accessibility, utility and speed. Ex-
pansion on the reasons is left for the lecture. Some points to clearly define when
sampling. Sampling method to be employed.
Sample size
Reliability degree of the conclusions that we can obtain i.e. an estimation of the error
that we are going to have. An inappropriate selection of the elements of the sample
can cause further errors once we want to estimate the corresponding population pa-
rameters.
1.4. Probability sampling methods
The four methods of probability sampling are simple random, systematic, stratified
and cluster sampling methods.
1.4.1. Simple random sampling
Requires that each element of the population have an equal chance of being selected.
A simple random sample is selected by assigning a number to each element in the
population list and then using a random number table to draw out the elements of the
sample. The element with the number drawn out makes it into the sample. The popu-
Introduction 9
lation is ”mixed up” before a previously specified number, n (sample size), of elements
is selected at random. Each member of the population is selected one at a time, inde-
pendent of one another. However, it is noted that all elements of the study population
are either physically present or listed.
Also, regardless of the process used for this method, the process can be laborious espe-
cially when the list of the population is long or it is completed manually without the
aid of a computer. A simple random sample can be got using calculator by use of the
random key, a computer using excel function =rand(), or random number tables.
In this method, every set of n elements in the population has an equal chance of being
selected as the sample unit.
Advantages of simple random sampling
• It eliminates bias due to the personal judgement or discretion of the researcher.
• More representative of the population.
• Estimates are more accurate.
Disadvantages of simple random sampling
• Requires an up to date sampling frame.
• Numbering of the elements in a population may be time consuming e.g. for large
populations.
Illustration - Simple random sampling

An example of simple random sampling may include writing each member of the pop-
ulation on a piece of paper and putting in a hat. Selecting the sample from the hat
is random and each member of the population has an equal chance of being selected.
Simple random sampling is got by selecting from a hat or container a total sample size
by selecting one item after the other until n items are got. However, this approach is
not feasible for large populations, but can be completed easily if the population is very
small.
1.4.2. Systematic random sampling

Selection of sampling units is done in sequences separated on lists by the interval
selection. In this method, every nth element from the list is selected as the sample,
starting with a sample element n randomly selected from the first k elements. For
example, if the population has 1000 elements and a sample size of 100 is needed, then
k, an elevation factor given by N 1000
n , would be 100 = 10. Now, randomly select a number
7 then the sample units would continue by selecting the 10th element from the first 7th
10 Introduction
element that is 17th , 27th , 37th , 47th and ... up to 997th . Care must be taken when using
systematic sampling to ensure that the original population list has not been ordered
in a way that introduces any non-random factors into the sampling.
Illustration - Systematic random sampling

An example of systematic sampling would be: If an official from the Academic Registry
of a University with 4000 students is to register 200 students for a tour of regional uni-
versities. Seelction of students should be systematic and random.
The official may initially randomly select the 15th student. The elevation factor, k =
4000 th th th
200 = 20 student. The official would then keep adding 20 and selecting the 35 , 55 ,
75th student and so on to register for the tour of regional universities until the end of
the list is reached.
Remark
In cases where the population is large and the population list is available, systematic
sampling is usually preferred over simple random sampling since it is more convenient
to the experimenter.
1.4.3. Stratified sampling
It is used when representatives from each homogeneous subgroup within the popula-
tion need to be represented in the sample. The first step in stratified sampling is to
divide the population into subgroups called strata based on mutually exclusive crite-
ria. Random or systematic samples are then taken from each subgroup. The sampling
fraction for each subgroup may be taken in the same proportion as the subgroup has
in the population.
Illustration - Stratified sampling

As an example, if an owner of a local supermarket conducting a customer satisfaction
survey may wish to select random customers from each customer type in proportion to
the number of customers of that type in the population. Suppose 40 sample units are to
be selected, and 10% of the customers are managers, 60% are users, 25% are operators
and 5% are customers, then 4 managers, 24 users, 10 operators and 2 customers would
be randomly selected from the stratas of managers, users, operators and customers.
Remark
Stratified sampling can also sample an equal number of items from each subgroup.
Introduction 11
1.4.4. Cluster sampling
In cluster sampling, the population that is being sampled is divided into naturally oc-
curring groups called clusters. A cluster is as heterogeneous as possible to matching
the population clusters which says that a cluster is representative of the population.
A random sample is then taken from within one or more selected clusters.
Illustration
An organization with 300 small branches providing a service country wide has an em-
ployee at the HQ who is interested in auditing compliance to some company standards.
The employee might use cluster sampling to randomly select 40 branches as represen-
tatives for the audit and then randomly sample coding systems for auditing from just
the 40.
Remark
Cluster sampling can tell us a lot about that particular cluster, but unless the clusters
are selected randomly and a lot of clusters are sampled, generalizations cannot always
be made about the entire population.
Difference between a cluster and a stratum

A cluster is a heterogeneous subgroups but a stratum is a homogeneous subgroup. A
summary of probability sampling methods is discussed below.
Probability sampling methods summary
Simple random sampling

Each member of the study population has an equal probability of being selected.
Systematic sampling
Each member of the study population is either assembled or listed, a random start is
designated, then members of the population are selected at equal intervals
Stratified sampling
Each member of the study population is assigned to a homogeneous subgroup or stra-
tum, and then a random sample is selected from each stratum.
Cluster sampling
Each member of the study population is assigned to a heterogeneous subgroup or clus-
ter, then clusters are selected at random and all members of a selected cluster are
included in the sample.
12 Introduction
1.5. Non-probability sampling methods

There are four non-probability sampling methods that are convienience, quota, ex-
pert and chain referral.
1.5.1. Convinience sampling method
This is a sampling methods which is based on the proximity of the population elements
to the decision maker. Being at the right place at the right time. Elements nearby are
selected and those not in close physical or communication range are not considered.
The method is also called availability sampling method.
1.5.2. Quota sampling method
This is a sampling method in which certain distinct or known characteristics in the

population should appear in relatively similar proportions. Eg is a population (N )
of 100 people comprised of 60 females and 40 males. If a sample of 20 people is to
be selected, then that ratio of 6:4 has to be reflected indicating that 12 females and 8
males have to be selected. The method is also called proportionate sampling method.
1.5.3. Expert sampling method
This is a sampling method in which the decision maker has direct or indirect control
over which elements are to be included in the sample. The emthod is appropriate
when the decision maker feels that some members have better or more information
than others or some members are more representative than others. The emthod is
also called judgemental sampling method.
1.5.4. Chain referral sampling method
The researcher starts with a person who displays qualities of interest then refers to
the next and so on. The method is also called snowballing or networking sampling
method.
1.6. Sampling errors

During sampling, errors can be committed by the statistician or one collecting data.
There are either sampling or non-sampling errors. Errors can be corrected by sam-
pling without bias. Some common sources of bias are incorrect sampling operation
and non-interviews during data collection. Some errors that arise in sampling are dis-
cussed below.
Selection error
Introduction 13
Selection error occurs when some elements of the population have a higher probability
of being selected than others. Consider a scenario where a manager of a local super-
market wishes to measure how satisfied his customers are? He proceeds to interview
some of them from 08:00 to 12:00. Clearly, the customers who do their shopping in the
afternoon are left out and will not be represented making the sample unrepresentative
of all the customers. Such kind of errors can be avoided by choosing the sample so that
all the customers have the same probability of being selected. This is a sampling error.
Non-response error
It is possible that some of the elements of the population do not want or cannot answer
certain questions. It may also happen when we have a questionnaire including per-
sonal questions, that some of the members of the population do not answer honestly or
would rather avoid answering. These errors are generally very complicated to avoid,
but in case that we want to check honesty in answers, we can include some questions
called filter questions to detect if the answers are honest. This is a non-sampling error.
Interviewer influence error

The interviewer may fail to be impartial i.e. s/he can promote some answers more than
others.
Remark
A sample that is not representative of the population is called a biased sample. Ques-
tions relating to selecting out of naturally arise. These are: When concluding about
the population, how many of the population elements is represented by each one of the
sample elements? What proportion of the population are we selecting? The responses
lie in the following data collection methods discussed below.
1.7. Data collection methods
The three data collection methods are: observation, interviews and experimenta-
tion. Depending on the type of research and data to be collected, different methods
can be used to collect that data set.
1.7.1. Observation method
This method has the direct and desk research methods. Direct observation involves
collecting data by observing the item in action. Examples for this method are: pedes-
trian flow at a junction, traffic flow at a road intersection, purchase behavior of a
commodity in a shop, quality control inspection etc. An advantage of this method is
that the respondent behaves in a natural way since he is not aware that he is being
observed. A disadvantage is that it is a passive form of data collection. Also there is
14 Introduction
no opportunity to investigate the behavior further. Desk research involves consulting

and extracting secondary data from source documents and collect data from them.
1.7.2. Interview method
This method collects primary data through direct questioning. A questionnaire is the
instrument used to structure the data collection process. Three approaches in data
collection using interviews are: personal, postal and telephone interviews.
Personal interviews
A questionnaire is completed through face-to-face contact with the respondent. A re-
searcher carries out an interview with the respondent through use of guided questions.
Advantages for this method are: high response rate, it allows probing for reasons,
data collection is immediate, data accuracy is assured, useful for technical data, non-
verbal responses can be observed and noted, more questions can be asked, responses
are spontaneous and use of aided-recall questions is possible. Disadvantages for this
method are that it is time consuming, it requires trained and experienced interview-
ers, fewer interviews are conducted because of cost and time constraints, biased data
can be collected if interviewer is inexperienced.
Telephone interviews
The interview is conducted through telephone between the interviewer and intervie-
wee. The researcher asks questions from a guided questionnaire through phoning
the respondent. Advantages of this method are: it allows quicker contact with geo-
graphically dispersed respondents, callbacks can be made if respondent is not initially
available, low cost, interviewer probing is possible, clarity on questions can be provided
by the interviewer and a larger sample of respondents can be reached in short space
of time. Disadvantages are that respondent anonymity is lost, non-verbal responses
cannot be observed, trained interviewers are required hence more costly, possible in-
terviewer bias, respondent may terminate interview prematurely, and sampling errors
are compounded if more respondents do not have telephones.
Postal surveys
When target population is large or geographically dispersed then use of postal ques-
tionnaires is considered most suitable. It involves posting questionnaires to the se-
lected sampling units. Advantages of this method is that larger sample of respon-
dents can be reached, very costs effective, interviewer bias is eliminated, respondents
have more time to consider their responses, anonymity of respondents is assured re-
sulting in more honest responses, respondents are more willing to answer personal
questions. The disadvantages for this method are: low response rate, respondents
cannot get clarity on some questions, mailed questionnaires must be short and simple
Introduction 15
to complete, limited possibilities of probing or further investigations, data collection

takes long time, no control of who answers the questionnaire, and no possibilities of
validating responses.
1.7.3. Experimentation method

This is when primary data is generated through manipulation of variables under con-
trolled conditions. The method is mostly used in scientific, agriculture and engineering
research. Data on the primary variable under study is monitored and recorded whilst
the researcher controls effects of a number of influencing factors. Examples include:
Demand elasticity for a product, advertising effectiveness. Advantages of this method
are: quality data is collected and results are generally more objective and valid. The
disadvantages are that the method is costly and time consuming and may be impos-
sible to control for certain factors which affects the results.
1.8. Worked examples

Question
Solution
Question
Solution
16 Introduction
Chapter 2
Data and Data Presentation
2.1. Introduction
A statistician collects data, analyses it using statistical techniques, interprets the re-
sults and makes conclusions and recommendations on the basis of the analysis. The
word data keeps turning in our discussion. Data is the ”blood of statistics”. It refers to
the raw, unprocessed facts or figures
The world of statistics resolves around data, there is no statistics without data. What
is data? How is it collected? Why do we collect it? These are the questions to be
answered in this chapter.
2.2. Data types
An understanding of nature of data is necessary for two reasons. It enables a user to:
assess data quality and to select the appropriate statistical method to use to analyse
the data.
Quality of data is influenced by three factors that are type, source and method used
to collect data. The type of data gathered determines the type of analysis which can
be performed on the data. Certain statistical methods are valid for certain data types
only. An incorrect application of a statistical method to a particular data type can ren-
der the findings invalid and also give incorrect results.
Data type is determined by the nature of the random variables which the data rep-
resents. Random variables are essentially of two kinds that are qualitative and quan-
titative.
18 Introduction
2.2.1. Qualitative random variables

These are variables which yield categorical (non-numeric) responses. The data gener-
ated by qualitative random variables are classified into one of a number of categories.
The numbers representing the categories are arbitrary codes: coded values cannot be
manipulated arithmetically as it does not make sense.
Examples of qualitative random variables
Random variables Response categories Data code

Supervisor 1
Managerial level Section head 2
Departmental head 3
General Manager 4
Do you like soft drink? Yes 2
No 1
Gender Female 0
Male 1
2.2.2. Quantitative random variables

Quantitative random variables are variables that yield numeric responses. The data
generated for quantitative random variables can be meaningfully manipulated using
conventional arithmetic operations.
Examples of quantitative random variables
Random variables Response range Data

Age of employee 17 - 65 years 39 years
Distance to work 0 - 20 km 5.3 km
Class size 1, 2, 3 ... 15 pupils
Each random variable category is associated with a different type of data. There are
two classifications of data types.
Data type 1 - Data measurement scales
Data measurement scales include nominal, ordinal, interval and ratio-scaled data.
Nominal-scaled data
Objects or events are distinguished on the basis of a name. Nominal-scaled data is
associated mainly with qualitative random variables. Where data of qualitative ran-
dom variables is assigned to one of a number of categories of equal importance, then
Data and Data Presentation 19
such data is referred to as nominal-scaled data. There is no implied ordering between

the groups of the random variable.
Examples of nominal-scaled data

Table below shows examples of nominal scaled data.
Qualitative random variables Response categories Data code

Gender Male / Female 1/2
Car type owned Mazda/Golf/Toyota/Honda 1/2/3/4
City leaved in Harare/Byo/Mutare/Gweru 1/2/3/4
Marital Status Married/Single/Divorced/Widow 1/2/3/4
Engineering Profession Civil/Electrical/Mechanical 1/2/3
Each observation of the random variables is assigned to only one of the categories pro-
vided. Arithmetic calculations cannot be meaningfully performed on the coded values
assigned to each category. They are only numeric codes which are arbitrarily assigned
and can be counted. Nominal-scaled data is the weakest form of data, since only a
limited range of statistical analysis can be formed on such data.
Ordinal-scaled data
Objects or events are distinguished on the basis of the relative amounts of some char-
acteristics they posses. The magnitude between measurements is not reflected in the
rank. Such data is associated mainly with qualitative random variables. Like nominal-
scaled data, ordinal-scaled data is also assigned to only one of a number of coded cat-
egories, but there is now a ranking implied between the categories in terms of being
better, bigger, longer, older, taller, or stronger, etc. While there is an implied differ-
ence between the categories, this difference cannot be measured exactly. That is, the
distance between categories cannot be quantified nor assumed to be equal. Ordinal-
scaled data is generated from ranked responses in market research studies.
Examples of Ordinal-scaled data
Qualitative random variables Response categories Data codes

T-Shirt size Small / Medium / Large 1/2/3
Company turnover Low / Medium / High 1/2/3
Management levels Lower / Middle / Senior 1/2/3
Work experience Little / Moderate / Extensive 1/2/3
Magazine type Rank the top three magazine 1/2/3
you often read
Sizes of bulbs Smallest / Small / Large / Largest 1/2/3/4
There is a wider range of valid statistical methods (i.e. the area of non-parametric
statistics) available for the analysis of ordinal-scaled data than there is for nominal-
scaled data. Ordinal-scaled data is also generated from a ”counting process”.
20 Introduction
Interval-scaled data
Interval-scaled data is associated with quantitative random variables. Differences
can be measured between values of a quantitative random variable. Thus interval-
scaled data possesses both order and distance properties. Interval-scaled data, how-
ever, does not possess an absolute origin. Therefore the ratio of values cannot be mean-
ingfully compared for interval-scaled data. The absolute difference makes sense when
interval-scaled data has been collected.
Examples of Interval-scaled data

Suppose four places A, B, C and D have temperatures 20o C, 25o C, 35o C and 40o C re-
spectively. Using interval scale we see that the difference between A and B is equal to
that of C and D. However ratios are not used. A value of 0o C does not mean absence of
temperature, also it is not correct to say temperature of D is twice as much as that of A.
Interval-scaled data is most often generated in marketing studies through rating re-
sponses on a continuum scale. A wide range of statistical technique Ratio-scaled data
This data is associated mainly with quantitative random variables. If the full range of
arithmetic operations can be meaningfully performed on the observations of a random
variable, the data associated with that random variable is termed ratio-scaled. It is a
numeric data with a zero origin. The zero origin indicates the absence of the attribute
being measured.
Example 1 of ratio-scaled data
Quantitative random variable Response data values

Age 42 years
Income $2,500
Distance 35 km
Time 32 minutes
Mass 240g
Price $7.82
Such data are the strongest form of statistical data which can be gathered and lends
itself to the widest range of statistical methods. Ratio-scaled data can be manipulated
meaningfully through normal arithmetic operations. Ratio-scaled data is gathered
through a measurement process. It should be noted that if ratio-scaled data is grouped
into categories, the data type becomes ordinal-scaled. This then reduces the scope for
statistical analysis on the random variable.
Example 2: Ratio-scaled data

By capturing a random variable Age, data in categories instead of actual age, the data
becomes ordinal-scaled. However, the random variable remains quantitative in na-

ture. See table below.
Random variable Response category Data code used

0 - 16 1
17 - 24 2
Age 25 - 36 3
37 - 45 4
46 - 55 5
When data capturing instruments are set up, care must be exercised to ensure that
the most useful form of data is captured. However, this is not always possible for
reasons of convenience, cost and sensitivity of information. This applies particularly
to random variables such as age, personal income, company turnover and consumer
behavior questions of a personal nature. The functional area of marketing generates
mostly categorical that is nominal/ordinal data arising from consumer studies, while
the areas of finance or accounting and production generate mainly quantitative (ratio)
data. Human resources management generates a mix of qualitative and quantitative
data for analysis.
Data type 2
A second classification of data type is either discrete and continuous data.
Discrete data
A random variable whose observations can take on only specific values, usually only
integer values, is referred to as a discrete random variable. In such instances, certain
values are valid, while others are invalid.
Examples of random variables generating discrete data

(i) Number of cars in a parking lot at a given time. (ii) Daily number of hotel rooms
booked for January 1992. (iii) Number of students in a class. (iv) Number of employees
in an organization. (v) Number of paintings in an art collection. (vi) Number of cars
sold in a month by a dealer. (vii) Number of life assurance policies issued in 1990 in
Zimbabwe.
Continuous data
A random variable whose observations take on any value in an interval is said to gen-
erate continuous data. This means that any value between a lower and an upper limit
is valid.
Examples of random variables generating continuous data

(i) Time taken to travel to work daily. (ii) Age of a bottle of red wine. (iii) Mass of a
22 Introduction
caravan. (iv) Tensile strength of material. (v) Speed of an aircraft. (vi) Length of a
ladder.
2.3. Data sources
Data for statistical analysis are available from any different sources. There are two
classification types of data sources that are: internal or external and primary or sec-
ondary sources.
Internal data sources

This refers to the availability of data from within an organisation; internal data are
generated during the course of normal business activities. Examples of internal data
sources include: i) Financial data sales vouchers, credit notes, accounts receivable,
accounts payable, asset register. ii) Production data - production cost records, stock
sheets. iii) Human Resource data - time sheets, wages and salaries schedule, em-
ployee personal employment files. iv) Marketing data sales data, advertising expen-
diture.
External data sources

Data available from outside an organization is referred to as external data sources.
Such sources may be private institutions, trade/employer/employee associations, profit
motivated organizations or government bodies. The cost of the external data is de-
pendent on the source. Generally, the cost is greater from private bodies than it is
from government or public sources. Examples of extenal data sources include: i) Pri-
vate source include - Commercial and Industrial Association of Business, Research
Bureau. ii) Public domain sources include newspapers, journals, trade magazines,
reference material in libraries, The Central Statistical Services (ZimStats) is the Gov-
ernments data capturing and dissemination instrument and others such as universi-
ties, reference libraries, banks’ economic reports.
2.3.1. Primary data sources
Data which is captured at the point where it is generated is called primary data. Such
data is captured for the first time and with specific purpose in mind. Examples of data
sources are similar to those for internal data source but also include survey data that
is personnel, salary and market research surveys.
Advantages of primary data

Primary data are directly relevant to the problem at hand and generally offer greater
control over data accuracy.
Disadvantages of primary data

Primary data can be time consuming to collect and are generally more expensive e.g.
market research.
2.3.2. Secondary data sources
Data collected and processed by others for a purpose other than the problem at hand
are called secondary data. Such data are already in existence either within or outside
an organisation that is one can get both internal and external secondary data. The
problem at hand is to determine whether data is primary or secondary. Examples of
internal secondary data sources are: Aged market research figures, previous financial
statements of your company and past sales reports. Examples of external secondary
data sources are reports produced by external data sources.
Advantages of secondary data

Some of the advantages of use of secondary data are that data is already in existence,
access time is relatively short, data is generally less expensive to acquire.
Disadvantages of secondary data

Some disadvantages of secondary data are that data may not be problem specific, data
may be outdated and hence inappropriate, it may be difficult to assess data accuracy,
data may not be subject to further manipulation and combining various sources could
lead to errors of collation and introduce bias.
2.4. Data presentation

Data can be presented in tables, charts or graphs. Graphical techniques are pictorial
representations of data such that the main features of the data are captured. The var-
ious graphical techniques which we will cover in this section are pie chart, bar graph,
histogram, box and whisker plot and stem and leaf display. Some other techniques
which are important are dotplots, Lorenz and Z curves and these are not discussed
in this module. Various graphs and charts are constructed from data presented in a
frequency distribution table.
2.4.1. Frequency distribution table
A frequency distribution table is a table that summarises the random variable showing
how it is distributed from the lowest to the highest value and the number of occurrence
(frequencies) of the random variable values. It can show distribution of exact values
of the random variable in class intervals. Frequency distribution tables can display
values for grouped or ungrouped data sets. An example of a frequency distribution
table is shown below.
24 Introduction
Marks Frequencies
10 - 19 7
20 - 29 10
30 - 39 9
40 - 49 3
50 - 59 5
60 - 69 1
2.4.2. Pie Charts
A pie chart as the name suggests, is a circle divided into segments like a pie cut into
pieces from the centre of a circle going outwards. Each segment represents one or
more values taken by a variable. Such charts are used to display qualitative data. An
example below illustrates how to and see how to construct and interpret a pie chart.
Illustration - Constructing a pie chart

The ages of 10 students doing Accounting program at a University are: 26, 28, 28, 16,
22, 35, 42, 19, 55, 28. Grouping the ages into classes of 25 and below, 26-35, 36-45, and
above 45, leads to a frequency distribution table below.
We now express these age groups as proportions or percentages and then indicate
Age group Number of students

Below 25 3
26 - 35 5
36 - 45 1
Above 45 1
the angle in degrees as in table below.
Age group Number of Students Proportions Percentages Angle

3 3 3
Below 25 3 0+3+5+1+1 = 10 10 × 100% = 30% 108o
5
26 - 35 5 10 50% 180o
1
36 - 45 1 10 10% 36o
1
Above 45 1 10 10% 36o
There are only 4 groups. What we wish to do is to represent these percentages of age
groups as angles in degrees that add up to 360o (the total number of degrees in a circle)
as shown in column 5 table 2.4.2. The calculation of the angle of the ith category can
be done directly from the observations by using the formular.
Xi
Angle i = Pn × 360o
i=1 Xi
ie. each observation multiplied by 3600 divided by the sum of the observations.
2.4.3. Bar graph

A bar chart, as the name suggests, is a visual presentation of data by means of bars
or blocks put side by side but not touching each other. Each bar represents a count
of the different categories of the data. Although both pie chart and bar graphs are
used to illustrate qualitative data or discrete qualitative data, bar charts use the ac-
tual counts or frequencies of occurrences of each category of data. Bar graphs can be
simple, stacked, compound or component depending on the data type. We need not use
the actual data; we can use the percentage to come up with the bar graph. A bar graph
from the above data is shown below.
Illustration - Simple bar graph

We will now construct the bar chart using the above data. We come up with suitable
scales for the height and width of the graph, which are such that the graph is clear and
representative in class example. The bars represent each age group count in terms of
height. You can choose to make the bars thin or wide, it’s up to you all you need to be
certain of is that the bars represent each age group in terms of height. The bars should
be of the sane width. Often, we represent each category by different colours or shades.
This is especially useful when we are comparing several groups. For instance, we could
be comparing the age groups of different intakes that would mean several graphs all
put side by side. In this way we can compare the intakes aged X over different years.
26 Introduction
2.4.4. Histogram
A histogram is a graph drawn from a frequency distribution. It is used to represent
continuous quantitative data. It usually consists of adjacent, touching rectangles or
bars. The area of each rectangle is drawn in proportion to the frequency corresponding
to that frequency class. When the class intervals are equal, the area of each rectangle
is a constant multiple of height and so the histogram can be drawn as for a bar chart,
except that the rectangles are touching. If the class intervals are not equal, the fre-
quencies are adjusted accordingly to come up with frequency densities for the larger
class intervals.
Illutration - Histogram
Consider results of a test written by 45 students and marked out of 70. Data is pre-
sented in categories in table below. Use the data in the table to draw a histogram for
the mark distribution.
Marks Frequencies
10 - 19 7
20 - 29 10
30 - 39 9
40 - 49 3
50 - 59 5
60 - 69 1
2.4.5. Stem and leaf display
A stem and leaf diagram is basically a histogram where the rectangles are built up to
the correct height by individual numbers. Each data value is split up into its stem,
the first digit or first two digits, etc., depending on the data and its leaves. Thus, the
number 23 will have a stem 2 and leaf 3. The number 7 has stem 0 and leaf 7. Perhaps
an example will illustrate this clearly.
Illustration - Stem and leaf display

A scientist interested in finding out the age groups of people interested in cultural
movies went to a movie theatre and collected the following information. Ages of people
watching movie is shown below.
7 15 22 38 12 18 14 26 20 15 22 34 12 18 24
19 14 29 21 32 12 17 24 13 25 20 15 31 11 16
23 39 19 14 28 20 9 16 22 39 13 25 19 14 31
To display this information in a stem and leaf plot, we take stems 0, 1, 2 and 3 and
list them on the left side of a vertical line and the leaves on the right side opposite the
appropriate stem. The stem and leaf display of these data are represented below. A
stem and leaf display should always have a key that indicates how data is displayed
ie. Key: 0|7 = 7 or Key: 3|8 = 38.
Stem and leaf plot of ages
Stem Leaf
0 79
1 122233444455566788999
2 000122234455689
3 1124899
28 Introduction
Key: 3|8 = 38
Take note that 1st , 2nd , 3rd etc. number on the right (leaf) side should be in the same
columns for the histogram feature to reveal.
2.4.6. Frequency polygon

Frequency polygon is one alternative to presenting data in a histogram. The only
difference is that a frequency polygon is a line plot of the frequencies against the cor-
responding class mid-points. The points are joined by straight lines or a smooth curve.
Cumulative frequency curve

From a frequency polygon, one can deduce an ogive. An ogive is a line graph con-
structed from cumulative frequency data. The cumulative frequency value is plotted
against the upper limit of a class interval. Using the data below a cumulative fre-
quency table can be found as:
Marks Frequencies Cumulative frequency

10 - 19 7 7
20 - 29 2 7+2 = 9
30 - 39 9 9 + 9 = 18
40 - 49 10 28
50 - 59 6 34
60 - 69 1 35
An ogive from the above information is:

30 Introduction
2.5. Worked examples

Question
Solution
Question
Solution
2.6. Exercises
1. Classify the following data sources as either primary or secondary and internal
or external
(a) The economic statistics quoted in The Financial Gazette.

(b) The sum assured values on Life Assurance polices within your company.
(c) The financial reports of all companies on the Zimbabwean Stock Exchange
for the purpose of analyzing earnings per share.
(d) Employment statistics published by ZimStats.
(e) Market research findings on driving habits conducted by the ZRP Traffic
section.
2. Define primary and secondary data. Include in your answers the advantages and
disadvantages of both data types. Give two examples of secondary data.
3. What is the difference between primary and secondary data?
4. Areas of continents of the World
Continent Area in millions of km2

Africa 30.3
Asia 26.9
Europe 4.9
North America 24.3
Oceania 8.5
South America 17.9
Russia 29.5
(a) Draw a bar chart of the above information.

(b) Construct a pie chart to represent the total area.
24 19 21 27 20 17 17 32 22 26 18 13 23 30 10
13 18 22 34 16 18 23 15 19 28 25 25 20 17 15
5. The distance, in km, travelled by a courier service motorcycle on 30 trips were

recorded by the driver as:
(a) Define the random variable, the data type and the measurement scale.
(b) From the data, prepare:
(i) an absolute frequency distribution,
(ii) a relative frequency distribution and
(iii) a less than ogive.
(c) Construct the following graphs:
(i) a histogram of the relative frequency distribution,
(ii) stem and leaf diagram of the original data.
(d) From the graphs, read off what percentage of trips were:
(i) between 25 and 30 km long,
(ii) under 25km,
(iii) 22km or more?
32 Introduction
Chapter 3
Measures of Central Tendency
3.1. Introduction
From the previous unit, graphical displays and charts were discussed. These are useful
visual means of communicating broad overviews of the behaviour of a random variable.
However, there is a need for numerical measures which will convey more precise infor-
mation about the behaviour pattern of a random variable. The behaviour or pattern of
any random variable can be described by measures of:
• Central tendency and
• Dispersion of observations about a central value.
3.2. Important Notation

The following notation will be required as we calculate measures of central tendency
and dispersion.
a) Number of observations
The number of all observations in a population is denoted by N whilst those in a sam-
ple are denoted by n. N is a population size while n is a sample size.
b) Observations
A list of values in a data set is named using a random vaiable name X or Y. Each of
the observation is called xi for i = 1, 2, 3, ..., N for a population. This means x1 is the
first value, x2 is the second value, x3 is the third value and so on until the last value
called xN .
c) Sum of observation
Adding up values in a dataset of N values are statistically done by x1 +x2 +x3 +...+xN .
N
X
This is written in short as: xi .
i=1
34 Measures of Central Tendency
d) Summing squares of observations

This means squaring each individual observation and then add them up ie. x21 + x22 +
N
X
x23 + ... + x2N and can be written as x2i .
i=1
3.3. Measures of central tendency

These are statistical measures which quantify where the majority of observations are
concentrated. They are also called measures of location. A central tendency statistic
represents a typical value or middle data point of a set of observations and are useful
for comparing data sets. These measures may be based on the source that is whether
they are from a population or from a sample. If from a population, we talk of a param-
eter and if from a sample we refer to statistic. The three main measures of central
tendency are:
• Arithmetic mean or average,
• Mode and
• Median
Each of these measures will be discussed and computed for grouped and ungrouped
data.
3.4. The arithmetic mean

Given a set of n sample data values denoted by xi for i = 1, 2, 3, ...n, the arithmetic
mean is defined by X̄. For a population, we refer to a population mean, µ.
3.4.1. Mean for ungrouped data
The sample arithmetic mean for ungrouped data is defined as:
sum of all observations

x̄ =
total number of observations
x1 + x2 + x3 + ... + xn
x̄ =
Pn n
i=1 x i
x̄ = (3.1)
n
(3.2)
Where:
• n is the number of observations in the sample,

Measures of Central Tendency 35
• xi is the value of the ith observation of random variable X and
• X̄ is a sample arithmetic mean.
The population arithmetic mean is defined as:
x1 + x2 + x3 + ... + xN
µ =
N
XN
xi
i=1
µ =
N
Where:
• N is the number of observations in the population,
• xi is the ith observation of random variable X and
• µ is a population arithmetic mean.
Illustration - Mean for ungrouped data
3.4.2. Mean for grouped data

Grouped data is represented by a frequency distribution. All that is known is the
frequency with which observations appear in each of the m classes. Thus the sum
of all the observation cannot be determined exactly. Consequently, it is not possible
to compute an exact arithmetic mean for the data set. The computed mean is an
approximation of the actual arithmetic mean.
f1 x1 + f2 x2 + f3 x3 + ... + fn xm
x̄ =
f1 + f2 + f3 + ...fm
Xm
fi xi
i=1
x̄ = P (3.3)
fi
The population mean for grouped data is:
f1 x1 + f2 x2 + f3 x3 + ... + fn xm
µ =
f1 + f2 + f3 + ...fm
Xm
fi xi
i=1
µ = (3.4)
N
Where:
• m is the number of classes in the frequency distribution,
• n is the number of observations in the sample,

• xi is the value of the ith observation of random variable X,
• fi is frequency of the ith class and
• X̄ is a sample arithmetic mean.
Illustration - Mean for grouped data

Given marks for a test data in a table, calculate the mean student mark.
Student mark, X 5 8 11 13 14 18
Number of students 2 1 4 5 3 2
Solution
The mean student mark is given as:
f1 x1 + f2 x2 + f3 x3 + ... + fn xm
x̄ =
f1 + f2 + f3 + ...fm
2(5) + 1(8) + 4(11) + ...2(18)
x̄ =
2 + 1 + 4 + ...2
205
x̄ = = 12.1
17
Note
If the frequency distribution table has observation in class intervals, say, 6 - 10, 11- 15,
16 - 20 etc. use the midpoints of the class interval to represent the observation xi . To
get the midpoint of a class interval use the formular, Midpoint = Lower value +
2
U pper value
.
Properties of the mean
• The arithmetic mean uses all values of the data set in its computation.
• The sum of deviation of each observation from the mean value is equal zero.
Pn
i.e. i=1 (xi − x̄) = 0. This makes the mean an unbiased statistical measure of
central location.
Advantages of the mean

The advantages of the mean are that it:
• is easy to understand,
• easy to compute,
• uses every value in the dataset and
• is suitable for advanced calculations.
Disadvantages of the mean

The disadvantages of the mean are that it:
• ussually does not correspond to a true data value,
• affected or distorted by extreme values in the dataset. Extreme values are called
outliers.
• is not valid to compute the mean for nominal or ordinal scaled data. It is only
meaningful to compute the arithmetic mean for ratio scaled data that is discrete
or continuous data.
• often a poor measure for grouped data with open-ended extreme classes
There are other means that can be calculated for different distribution of values. These
are harmonic, geometric and weighted arithmetic means. We will not discuss
them in this module.
3.5. The Mode

The mode of a given set of data is that observation with the highest frequency or that
observation which occurs most frequently. In other words, it is the most occurring
value in a data set. A distribution can have one mode (unimodal), two modes (bimodal)
or many modes (multimodal). Calculations for a mode from a population or sample is
the same.
3.5.1. Mode for ungrouped data

In ungrouped data set, mode is obtained by observing the data carefully then finding
the most frequently occurring observation. However, if the number of observations is
too large, the mode can be found by arranging the data in ascending order and by in-
spection, identify that value that occurs most frequently.
Illustration
If you are given a list of colours as Blue (B), Green (G), Red (R) and Yellow (Y). Con-
sider a sample YGBRBBRGYB, picked from a mixed bag. What is the modal colour?
Solution
The modal colour is Blue since it appears most, with a highest frequency of 4.
3.5.2. Mode for grouped data

In finding mode for grouped data, we first identify the modal class that is that class
interval with the highest frequency. Since mode is a value, we calculate the modal
value using the formula
c(f1 − f0 )
M ode = lmo + (3.5)
2f1 − f0 − f2
where:
• lmo is the lower limit of modal class interval,
• f1 is the frequency of the modal class interval,
• f0 is the frequency of the class preceeding the modal class interval,
• f2 is the frequency of interval succeeding the modal class interval and
• c is the width of the modal class interval.
Illustration - Mode for grouped data

Find the modal test mark for the following mark distribution.
Test mark, x 5 - 10 10 - 15 15 - 20 20 - 25 25 - 30
Frequency 3 5 7 2 4
Solution
We seek to use the formula
c(f1 − f0 )
M ode = lmo +
2f1 − f0 − f2
where 15-20 is the modal class interval with the highest frequency of 7, lmo = 15,
f1 = 7, f0 = 5, f2 = 2, and c = 5. Substituting these in the equation above yields
5(7 − 5)
M ode = 15 +
2(7) − 5 − 2
10
M ode = 15 +
7
M ode = 16.42
Advantages of the mode

The advantages of the mode are that it:
• corresponds to a particular observation,
• is not affected by extreme values (outliers) and
• is easy to determine.
Disadvantages of the mode

The disadvantages are that it:
• only uses the most common observation,
• does not have a mathematical represantation and
• often not unique and may not even exist e.g. for ungrouped data.
3.6. The Median
The median is the value of a random variable which divides an ordered (ascending or
descending order) data set into two equal parts or that value that lies at the centre of
an ordered distribution. It is also called the second quartile Q2 or 50th percentile. Half
of the observation will fall below this value and the other half above it. If the number
of observations, n, is odd, then the median is observation ( n+1 th
2 ) . If the number of
observations is even, then median is the average of ( n2 )th and ( n2 + 1)th observation. For
grouped data in a frequency distribution table, use the formular below after identifying
the median class interval, which is the ( n+1 th
2 ) interval. The median formular is:
c( n2 − F (<))
M edian = Lme + (3.6)
fme
where
• Lme is the lower limit of the median class interval,
• c is the median class interval width,
• F (<) if the cumulative frequency of the interval just before the median class
interval and
• fme is the frequency of the median class interval.
Illustration - Median for ungrouped data

Given the following income data presented in a frequency distribution table, find the
median income.
Income ($) 3800 4100 4400 4900 5200 5500 6000

Number of workers 12 13 25 17 15 12 6
Solution
The number of observations is 100, which is even thus the median is mean of ( n2 )th and
( n2 + 1)th observations i.e. the mean of the 50th and 51st observations. To find these
observations we first find the cumulative frequencies of the data set.
Income ($) 3800 4100 4400 4900 5200 5500 6000

Cumulative number of workers 12 25 50 67 82 94 100
The 50th observation is 4400 and 51st observation is 4900. Thus
4400 + 4900
M edian =
2
M edian = 4650
Interpretation
This means 50% of the workers get income less than $4650 and another 50% get income
which is more than $4650.
3.6.1. Median for grouped data
Given the following grouped data in a frequency table, find the median.
Income ($) 3601- 3801- 4101- 4401- 4901- 5201- 5501-

3800 4100 4400 4900 5200 5500 6000
Cumulative frequency 12 25 50 67 82 94 100
We use a standard formular above to calculate the median of the above grouped data
which is
c( n2 − F (<))
M edian = Lme +
fme
where:
• me is the median class,
• Lme is the lower limit of median class,
• n is the sample size that is total number of observations,
• F (<) is the cumulative frequency of class prior to median class and
• c is the median class width.
We calculate the cumulative frequencies and then identify the median class which is
the class containing the ( n+1 th
2 ) observation.
Illustration - Median for grouped data

Calculate the median of following grouped marks of Statistics Test 1. The test was
marked out of 50.
Mark 0-10 10-20 20-30 30-40 40-50

Frequency 2 12 22 8 6
Solution
First and foremost, order the data set, in this case its already ordered. Then calculate
the cumulative frequencies we get.
Mark 0-10 10-20 20-30 30-40 40-50

Cumulative Frequency 2 14 36 44 50
The median position is ( n+1 50+1 th th

2 ) = 2 = 25.5 value. The 25.5 value will lie in the 20 -
30 class interval. We use the formula
c( n2 − F (<))
M edian = Lme +
fme
Where the median class is 20 - 30, c = 10, n = 50, F (<) = 14, fme = 22 and Lme = 20.
Substituting we have
10[ 50
2 − 14]
Me = 20 +
22
110
Me = 20 +
22
Me = 25
Interpretation
This implies that 50% of the students got less than 25 marks and the other 50% got
more than 25 marks.
Advantages of the median

Advantages of the median is that it is:
• unaffected by outliers,
• easy to determine,
• it highlights skewness and
• ussually corresponds to a particular observation.
Disadvantages of the median

Disadvantages of the median is that it is:
• inappropriate for categorical data,
• does not have a useful mathematical formular and
• best suited as a central location measure for interval-scaled data such as rating
scales.
3.7. Quartiles
Quartiles are observations that divide an ordered data set into quarters (four equal
parts). Lower Quartile, Q1 is the first quartile or 25th percentile. It is that observa-
tion which separates the lower 25 percent of the observations from the top 75 percent of
ordered observations. Middle Quartile, Q2 is the second quartile or 50th percentile or
the median. It divides an ordered data set into two equal halves. The middle quartile
is also called the median. Upper Quartile, Q3 is the third quartile or 75th percentile.
It is that observation which divides observations into the lower 75 percent from the
top 25 percent.
To compute quartiles, a similar formular is used as for calculating median. The only
difference lies in (i) the identification of the quartile position, and (ii) the choice of the
appropriate quartile interval. Each quartile position is determined as follows:
For the first quartile Q1 position, use ( n4 )th value, for Q2 position, use ( n2 )th value, and
for Q3 position use ( 3n
4 )
th to calculate the position of the respective quartiles. The
appropriate quartile interval is that interval into which the quartile position falls.
Like the median calculations, this is identified using the less than ogive. A formular
for Q1 is:
c( n4 − F (<))
Q1 = Lq1 + (3.7)
fq1
where:
• Lq1 is the lower limit of the lower quartile class interval,
• F (<) is the cumulative frequency of the class interval before the lower quartile
interval and
• fq1 is the lower quartile interval frequency.
3.7.1. Quartiles for ungrouped data

Ungrouped data distribution is easy to calculate the quartiles. Simply identify the
quartile positions and identify the value of the variable that lies at that position.
Illustration
Using income data below, find Q1 .
Income ($) 3800 4100 4400 4900 5200 5500 6000

Solution
Constructing a cumulative frequency table to use for the calculation, we have:
n = 100, hence Q1 position is at n4 = 100 th
4 = 25 position. Arranging number of workers
cummulatively i.e. coming up with a cummulative distribution table 25th value lies at
income $4100. Hence Q1 is $4100.
Income ($) 3800 4100 4400 4900 5200 5500 6000

Cumulative number of workers 12 25 50 67 82 94 100
3.7.2. Quartiles for grouped data
In calculating quartiles for grouped data, use of the formular is required since the
position of the quartile will be a interval of the observations. The formular allows
us to find the exact value. Find the first, second and third quartile values from the
distribution below.
Mark 0-9 10-19 20-29 30-39 40-49

The lower quartile, Q1

Using a cumulative frequency table below,
Q1 position n4 = 50 th th
4 = 12.5 position. Q1 interval = 10 - 19 since the 12.5 observation
Mark 0-9 10-19 20-29 30-39 40-49

Cumulative frequency 2 14 36 44 50
falls within this interval. The formula for Q1 is:
c( n4 − F (<))
Q1 = Lq1 +
fq1
Where Lq1 = 10 c = 10, n = 50, F (<) = 2 and fq1 = 12. Thus substituting into the
formula, we get:
c[ n4 − F (<)]
Q1 = Lq1 +
fq1
10[ 50
4 − 2] 105
Q1 = 10 + = 10 +
12 12
Q1 = 18.75
(3.8)
Interpretation
25% of the students got marks below 18.75 or 75% of the students got marks above
18.75.
3.7.3. The second quartile, Q2 (Median)
Q2 position, use n2 = 50 th th
2 = 25 position. Q2 interval = 20 - 29 since the 25 observation
falls within these limits. The formula for Q2 is:
c[ n2 − F (<)]
Q2 = Lq2 +
fq2
On substituting the values in the formula, we get Q2 = 25 marks.
3.7.4. The upper quartile, Q3
Q3 position 3n4 =
3×50
4 = 37.5th position. Q3 interval = 30 - 39 since the 37.5th observa-
tion falls within this limit. The formula for Q3 is:
c[ 3n
4 − F (<)]
Q3 = Lq3 + (3.9)
fq3
where Lq3 = 30, n = 50, F (<) = 36, fq3 = 8 and c = 10. Thus:
c[ 3n
4 − F (<)]
Q3 = Lq3 +
fq3
10[ 3×50
4 − 36]
Q3 = 30 +
8
Q3 = 31.875
c[ 3n
4 − F (<)]
Q3 = Lq3 +
fq3
10[ 3×50
4 − 36]
Q3 = 30 + = 31.875
8
Interpretation:
75% of the students got below 31.875 marks. Alternatively, 25% of the students got
above 31.875 marks.
3.7.5. Percentiles
In general, any percentile value can be found by adjusting the median formula to: (i)
Find the required percentiles position and from this and (ii) Establish the percentile
interval.
Illustration
90th percentile position = 0.9 × n, 35th percentile position = 0.35 × n, 29th percentile
position(Q1 ) = 0.29 × n
Uses of percentiles:
Percentiles are used to identify various non-central values. For example, if it is desired
to work with a truncated dataset which excludes extreme values at either end of the
ordered dataset.
3.8. Skewness
Skewness is departure from symmetry. A departure from symmetry is observed by

comparing the mean, median and mode.
1. If mean = median = mode the frequency distribution is symmetrical. A polygon

of such data resambles a normal distribution.
2. If mean < median < mode the frequency distribution is negatively skewed i.e.
skewed to the left.
3. If mean > median > mode the frequency distribution is positively skewed i.e.
skewed to the right.
Remark:
1. If a distribution is distorted by extreme values (i.e. skewed) then the median or

the mode is more representative than the mean.
2. If the frequency distribution is skewed, the median may be the best measure of
central location as it is not pulled by extreme values, nor is it as highly influenced
by the frequency of occurrence.
3.9. Kurtosis
Kurtosis is the measure of the degree of peakedness of a distribution. Frequency dis-

tributions can be described as: leptokurtic, mesokurtic and platykurtic.
• Leptokurtic- highly peaked distribution i.e. a heavy concentration of observa-

tions of around the central location.
• Mesokurtic moderately peaked distribution.
• Platykurtic flat distribution i.e. the observations are widely spread about the
central location.
Sick days Number of employees

5-6 67
7-8 91
9-10 67
11-12 5
3.10. Exercises
1. The number of days in a year that employees in a certain company were away
from work due to illness is given in the following table:
Find the modal class and the modal sick days and interpret.
2. A company employs 12 persons in managerial positions. Their seniority (in years

of service) and sex are listed below:
Sex F M F M F M M F F F F M
Seniority (yrs) 8 15 6 2 9 21 9 3 4 7 2 10
(a) Find the seniority mean, median and mode for the above data.
(b) Which of the mean, median and mode is the least useful measure of location
for the seniority data? Give a reason for your answer.
(c) Find the mode for the sex data. Does this indicate anything about the em-
ployment practice of the company when compared to the medians for the
seniority data for males and females?
Chapter 4
Measures of Dispersion
4.1. Introduction
Spread or Dispersion refers to the extent by which the observations of a random vari-
able are scattered about the central value. Measures of dispersion provide useful infor-
mation with which the reliability of the central value may be judged. Widely dispersed
observations indicate low reliability and less representativeness of the central value.
Conversely, a high concentration of observation about the central value increases con-
fidence in the reliability and representativeness of the central value. Measures of
dispersion include range, variance and standard deviation.
4.2. The Range
The range is the difference between the highest and the lowest observed values in a
dataset. For ungrouped dataset,
Range = Xmax − Xmin
For a grouped dataset,
Range = U pper limit of last interval − Lower limit of f irst interval
The range is a crude estimate of spread. It is calculated, but is distorted by extreme

values (outliers). An outlier would either be xmax or xmin . It is therefore a volatile and
unstable measure of dispersion. It also provides no information on the clustering of
observations within the dataset about a central value as it uses only two observations
in its computation.
Illustration - The Range

Given the following data in a frequency distribution table, find the range.
Solution
48 Measures of Dispersion
Income $ 3800 4100 4400 4900 5200 5500 6000

Number of Workers 12 13 25 17 15 12 6
Range = Xmax − Xmin = 6000 − 3800 = 2200
For grouped distribution with class intervals, xmin is the lower limit of the lower class
interval and xmax is the upper limit of the highest class interval.
Interquartile range, IQR

This modified range, (IQR) is the difference between the upper and lower quartiles i.e.
Interquartile Range = Q3 − Q1
This modified range removes some of the instability inherent in the range if outliers
are present, but it excludes 50 percent of all observations from further analysis. This
measure of dispersion, like the range, also provides no information on the clustering
of observations within the dataset as it uses only two observations.
Quartile deviation
A measure of variation based on this modified range is called quartile deviation (QD)
or the semi-interquartile range. It is found by dividing the interquartile range in half
i.e.
Q3 − Q1
Quartile deviation =
2
Remember when calculating this measure you order your dataset first to calculated Q3
and Q1 . The quartile deviation is an appropriate measure of spread for the median. It
identifies the range below and above the median within which 50 percent of observa-
tions are likely to fall. It is a useful measure of spread if the sample of observations
contains excessive outliers as it ignores the top 25 percent and bottom 25 percent of
the ranked observations.
4.3. The Variance
The most useful and reliable measures of dispersion are those that take every observa-
tion into account and are based on an average deviation from a central value. Variance
is such a measure of dispersion. Population variance is denoted by σ 2 whereas sample
variance is denoted by s2 .
Variance for ungrouped data

Measures of Dispersion 49
Sample variance for ungrouped data is given by:

Pn
2 − x̄)2
i=1 (xi
s = (4.1)
n−1
A computational formular for sample variance is given by
x2 − nx̄2
P
2
s =
n−1
Population variance is given by:

PN
2 i=1 (xi − µ)2
σ = (4.2)
N
A computational formular is given by
x2 − N µ 2
P
2
σ =
N
The main difference between computational formulae for sample and population vari-
ances is on the denominator of the two. Population variance divides the numerator by
N whereas sample variance divides the numerator by n − 1.
Illustration - The Variance

Consider ages, in years, of 7 cars: 13, 7, 10, 15, 12, 18, 9. Find the sample variance of
the ages of cars.
Solution P
xi 84
Step 1: Find the sample mean, x̄ = = 7 = 12 years.
n
Step 2: Find the squared deviation of each observation from the sample mean. See
table below.
Car age, xi Mean, x̄ Deviation (xi − x̄) Deviations squared (xi − x̄)2
13 12 (13-12) = +1 12 ) = 1
7 12 (7-12) = -5 (−5)2 = 25
10 12 -2 4
15 12 +3 9
12 12 0 0
18 12 +6 36
9 12 -3 9
(xi − x̄)2 = 84
P P
(xi − x̄) = 0
Step 3: Find the average squared deviation that is the variance using the formular:
Pn
2 − x̄)2
i=1 (xi 84
s = =
n−1 7−1
s2 = 14 years2
Note
Divison by n would appear logical, but the variance statistic would then be a biased
measure of dispersion. It can be shown to be unbiased if division is by (n−1). For large
samples i.e for n greater than 30, however this distinction becomes less important.
x2 − nx̄2
P
2
Alternatively use the formula s = you will get the same value.
n−1
P 2
2 xi − nx̄2
s =
n−1
x2 = 1092,
P P
x = 84, n=7 and x̄ = 12, substituting the values in the above formular:
1092 − 7(122 ) 84
s2 = = = 14 years2
7−1 6
Variance for grouped data

Grouped data is data presented in a frequency distribution table. If the data is given
inclass intervals, use midpoints of class intervals to represent x-values. The midpoint
of a class interval is calculated as:
Lower limit + U pper limit

M idpoint =
2
Sample variance for such grouped data is calculated using the formular:
Pn
2 − x̄)2
i=1 f (xi
s = (4.3)
n−1
which is simplified to give:

f x2i − nx̄2
P
2
s =
n−1
Population variance is given by:
f x2i − N µ2
P
2
σ =
N
Illustration
Consider data for student marks obtained from Test 1. Calculate the sample variance
of the student marks shown below.
Marks 0-10 10-20 20-30 30-40 40-50
Solution
The midpoint in class intevals is calculated as:
Lower limit + U pper limit

M idpoint =
2
Marks, x Frequency, fi Midpoint, xi fi xi x2 f x2

0+10
0-10 2 2 =5 10 25 50
10-20 12 15 180 225 2700
20-30 22 25 550 625 13750
30-40 8 35 280 1225 9800
40-50 6 45 270 2025 12150
Total 50 1290 38450
P
Mean, x̄ = nf x = 1290
f x2 = 38450. Using the above formular, the
P
50 = 25.8 and
sample variance is,
f x2i − nx̄2
P
2
s =
n−1
38450 − 50(25.8)2 5168
s2 = =
50 − 1 49
s2 = 105.47 marks2
The variance is a measure of average squared deviation about the arithmetic mean.
It is expressed in squared units. Consequently, the meaning in a practical sense is
obscure. To provide meaning, the measure should be expressed in the original units of
the random variable.
4.4. The Standard deviation
A standard deviation is a measure which expresses the average deviation about the
mean in the original units of the random variable. The standard deviation is the
square root of the variance. Mathematically the standard deviation is calculated as:
A sample standard deviation is:

p √
sx = Sample variance = s2
sP
f x2i − nx̄2
sx = (4.4)
n−1
A population standard deviation is

rP
f x2i − N µ2
σ2 = (4.5)
N
The standard deviation is a relatively stable measure of dispersion across different

samples of the same random variable. It is therefore a rather powerful statistic. It
describes how the observations are spread about the mean.
4.5. The Coefficient of variation (CV)

From a sample, the coefficient of variation, CV is defined as follows
s
CV = × 100%
x̄
whereas a population coefficient of variation is
σ
CV = × 100%
µ
.
This ratio describes how large the measure of dispersion is relative to the mean of
the observation. A coefficient of variation value close to zero indicates low variability
and a tight clustering of observations about the mean. Conversely, a large coefficient
of variation value indicates that observations are more spread out about their mean
value. From our example above,
s 10.27
CV = × 100% = × 100% = 39.8%.
x̄ 25.8
4.6. Exercises
1. Find the mean and standard deviation for the following data which records the
duration, in minutes of 20 telephone calls for technical advice on car repairs by a
mechanic.
Duration Number of calls
0-≤1 7
1-≤2 0
2-≤3 3
3-≤4 1
4-≤5 9
At a cost of $2.60 per minute, what was the average cost of a call, and what
was the total cost paid by the 20 telephone callers. Calculate the coefficient of
variation and interpret it.
2. Employee bonuses earned by workers at a furniture factory in a recent month

(US$) were:
47 31 42 33 58 51 25 28 62 29 65 46
51 30 43 72 73 37 29 39 53 61 52 35
From the table above, find the
(a) Mean and standard deviation of bonuses.

(b) Interquartile range and quartile deviation.

(c) Coefficient of variation and comment.
3. Give three reasons why the standard deviation is regarded as a better measure
of dispersion than the range.
4. Discuss briefly which measure of dispersion would you use if the:
(a) mean is used as the measure of central location and why?

(b) median is used as a measure of central location and why?
5. Discuss the limitations of the range as a measure of dispersion.
6. Define the following terms as they are used in statistics.
(a) Outliers
(b) Skewness
(c) Kurtosis
Chapter 5
Basic Probability
5.1. Introduction
This unit introduces basic concepts and terminologies in probability. They include
events, types and rules of probabilities. Probability theory is fundamental to the area
of statistical inference. Inferential statistics deals with generalising the behaviour of
random variables from sample findings to the broader population. Probability theory
is used to quantify the uncertainties involved in making these generalisations.
5.2. Definition of terms
An event is a collection of possible outcomes from an experiment or a trial. For exam-

ple Head or Tail are events which can be obtained from an experiment of tossing a fair
coin.
An experiment is a process which generates events or outcomes. For instance, toss-

ing a fair die three times constitutes an experiment.
Probability is the chance or likelihood of a particular outcome out of a number of

possible outcomes occurring for a given event. Thus probability is a number between
0 and 1, which quantifies likelihood of an event occurring or not occurring. Therefore
probability range is 0 ≤ P (A) ≤ 1, where A is an event of a specific type.
Most decisions are made in the face of uncertainty. Probability is therefore, concerned
with uncertainty.
5.3. Approches to probability theory
There are two broad approaches to probability namely subjective and objective.
56 Basic Probability
Subjective probability
It is probability which is based on a personal judgement that a given event will occur.
There is no theoretical or empirical basis for producing subjective probabilities. In
other words this is probability of an event based on an educated guess, expert opin-
ion or just plain intuition. Subjective probabilities cannot be statistically verified and
there are not extensively used, hence will not be considered further in this module.
Examples
1. When commuters board a commuter omnibus, they assume that they will arrive
safely at their destinations, so P(arriving safely) = 1.
2. If you invest some money, you assume that you will get a good return, so P (good
return) = 0.9.
Objective probabilities
These are probabilities that can be verified, through repeated experimentation or em-
pirical observations. Mathematically it is defined as a ratio of two numbers:
r
P (A) =
n
Where:
• A is an event of a specific type,
• r is number of expected outcomes of event A,
• n is total number of possible outcomes and
• P(A) is probability of event A occurring.
Objective probabilities are derived either:
• a priori - that is when possible outcomes are known in advance such as tossing
a coin,selecting cards from a deck of cards. Classical probability will be given as:
N umber of outcomes f avouring event A

P (A) =
T otal number of possible outcomes
For example the probability of a Head if a fair coin is tossed once is, P (Head) =
1
2 = 0.5
• Empirically that is when the values of r and n are not known in advance and
have to be observed through data collection or from a relative frequency table you
can deduce probability of the different outcomes.
N umber of times event A has occured

P (A) =
N umber of times event A could have occured
Basic Probability 57
For instance, if out of a random sample of 90 customers buying bread 50 said

they prefer Bakers Inn bread, then relative frequency that a randomly selected
customer will prefer Bakers Inn bread is 50
90 = 0.56
• Theoretically - that is through use of theoretical probability distribution func-

tions. This is a mathematical formula that can be used to compute probabilities
for certain event types.
Objective probabilities are used extensively in statistical analysis.
5.4. Properties of probability
For a given event to follow a probability distribution, the following properties should
hold.
1. A probability value lies only between 0 and 1 that is 0 ≤ P (A) ≤ 1.
2. If an event A cannot occur i.e an impossible event, then P (A) = 0
3. If an event A is certain to occur, then P (A) = 1
4. The sum of probabilities of all possible outcomes of a random experiment equals

n
X
one, that is P (Ei ) = 1.
i=1
5. Complementary probabilities: If P (A) is the probability of event A occurring,

then the probability of event A not occurring is the compliment of event A and is
defined as: P (Ac ) = 1 − P (A). Note: P (Ac ) is also sometimes written as P (Ā) or
P (A0 ).
Illustration
Consider random process of drawing cards from a deck of cards. These probabilities
are called a priori probabilities.
26 1
1. Let A = event of selecting a red card. Then P (Red card) = 52 = 2 (26 possible red
cards out of 52 cards).
13 1
2. Let B = event of selecting a spade. Then P (Spade) = 52 = 4 (13 possible spades
out of a total of 52).
4 1
3. Let C = event of selecting an ace. Then P (Ace) = 52 = 13 (4 possible ace out of a
total of 52 cards).
1 12
4. Let D = event of selecting ’not an ace’. Then P (not an ace) = 1P (ace) = 1 − 13 = 13 .
5.5. Basic probability concepts
1. Intersection of two events - The intersection of two events A and B is the set
of outcomes that belong to both A and B simultaneously. It is written as A ∩ B or
A and B and the keyword is and.
2. Union of two events - The union of events A and B is the set of outcomes that
belong to either A or B or both and the key word is or. It is written as A ∪ B.
3. Complement of an event - The complement of an event A is the collection of all

possible outcomes that are not contained in event A but are in the Universal set.
That is P (Ac ) = 1P (A). Note P (Ac ) is also sometimes written as P (Ā) or P (A0 ).
In other words P (A) + P (A0 ) = 1.
5.6. Types of events
1. Mutually exclusive or disjoint events

These are events which cannot occur at the same time. The occurrence of one
event automatically prevents the occurrence of the other event. For mutually ex-
clusive events the intersection of events is empty i.e. there are no common events.
Examples
(a) Passing and failing the same examination are mutually exclusive. In other
words its not possible to pass and fail at the same time one examination.
(b) In tossing a fair die once, getting a 3 and a 5 are mutually exclusive. You get
one outcome at time and not both.
2. Non-mutually exclusive events

These are events which can occur simultaneously. The occurrence of one event
does not prevent the occurrence of the other events. The intersection set of the
events has got elements.
Examples
(a) In tossing a fair die once, getting an odd number or a number greater than
2 are non mutually exclusive events i.e. it is possible for the number to be
odd and at the same time being greater than 2.
(b) An individual can have more than one bank account i.e. if you open a bank
account it does not prevent you from opening another account with another
bank.
3. Collectively exhaustive events

Events are said to be collectively exhaustive when the union of all possible events
is equal to the sample space. This means that, in a single trial of a random ex-
periment, at least one of these events is certain to occur.
Example
Consider a random experiment of selecting companies from the Zimbabwe Stock
Exchange. Let event A = small company, event B = medium company and event
C = large company. Then (A ∪ B ∪ C) = sample space (small, medium, large
companies) = all ZSE companies.
4. Statistically independent events

Two events are said to be statistically independent if the occurrence of event A
has no effect on the outcome of event B occurring and vice versa.
Example
Let A = event that an employee is over 30 years of age, B = event that the em-
ployee is female. If it can be assumed or empirically verified that a randomly
selected employee is over 30 years of age from a large organisation is equally
likely to be either male or female employee, then the two events A and B are
statistically independent.
5. Statistically dependent events

Events are dependent if the occurrence of one of the event A affects the occur-
rence of the second event B. These will be discussed under conditional probability.
Remark
The terms statistically independent and mutually exclusive events should
not be confused. They are two very different concepts. When two events are
mutually exclusive, they are NOT statistically independent. They are dependent
in the sense that if one event happens, then the other event cannot happen. In
probability terms, the probability of the intersection of two mutually exclusive
events is zero, while the probability of two independent events is equal to the
product of the probabilities of the separate events.
5.7. Laws of probability
There are generally two laws in probability theory, namely, addition and multiplica-
tion Laws.
5.7.1. Addition Law

Addition laws pertain to mutually and non-mutually exclusive events only. The key
word is OR. What is the probability that event A OR B will occur:
• Addition Law for Mutually Exclusive events:

P (A or B) = P (A) + P (B)
P (A ∪ B) = P (A) + P (B)
Note: Use of Union (∪) means or
Example
What is the probability of getting a 5 or 6 if a fair die is tossed once?
Solution
The sample space has six possible outcomes 1, 2, 3, 4, 5, 6. Therefore P (5 or 6) =
P (5) + P (6) = 16 + 16 = 31
• Addition Law for Non-mutually Exclusive events

P (A or B) = P (A) + P (B) − P (A and B)
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
The intersection sign, ∩ means the joint probability of events A and B. P (A and B)
is subtracted to avoid double counting.
Example
What is the probability of getting an even number or a number less than four if
a fair die is tossed once?
Solution
Let event A = getting an even number and the elements are 2, 4, 6 and event B
= getting a number less than four and the elements are 1, 2, 3. Then P (A) =
3 3 1
6 and P (B) = 6 . Thus P (A and B) = 6 . There is only one element which is
common in A and B and the number is 2. Therefore
3 3 1 5
P (A or B) = P (A) + P (B) − P (A ∩ B) = + − =
6 6 6 6
Exercise 1
Sixty per cent of the population of a town read either magazine A or magazine B and
10% read both. If 50% read magazine A, what is the probability that one person,
selected at random, read magazine B?
5.7.2. Multiplication laws
Multiplication laws pertain to dependent and independent events. The key word is
AND
1. For independent events: P (A and B) = P (A) × P (B)
Illustration
What is the probability of getting a tail when two fair coins are tossed at the
same time?
Solution
Take events, T1 and T2 such that T1 = the probability of getting a tail from first
coin. T2 = the probability of getting a head from second coin. The two outcomes
do not affect each other. Therefore:
P (T 1 and T 2) = P (T 1) × P (T 2)
1 1 1
P (T 1 and T 2) = × =
2 2 4
2. Dependent events will be discussed on the section on conditional probability.
5.8. Types of probabilities
Objective probabilities can be classified into 3 categories, namely:
• Marginal probability,
• Joint probability and
• Conditional probabilities
Marginal probability
It is the probability of only a single event A occurring regardless of certain conditions
prevailing. It is written as P(A). A frequency distribution describes the occurrence of
only one characteristic of interest at a time and is used to estimate marginal probabil-
ities.
Joint probability
It is the chance that two or more events will occur simultaneously. It is the occurrence
of more than one event at the same time. If the joint probability on any two events is
zero, then the events are mutually exclusive.
Conditional probability
It is the probability that a given event occurs given that another event has already
occurred. P (A|B) means the probability that event A will occur given that event B has
already occurred.
P (A ∩ B)
P (A|B) = (5.1)
P (B)
This is possible provided P (B) > 0, and similarly
P (B ∩ A)
P (B|A) = (5.2)
P (A)
provided P (A) > 0
Note
P (A ∩ B) = P (B|A) × P (A) = P (A|B) × P (B). P(A and B) is the joint probability of
events A and B. P (B) is the probability of event B, which is a marginal probability.
Illustration - Joint marginal and Conditional probabilities

Consider the table below showing fees payment methods by University students dis-
tributed by sex
Sex
Payment Method Total
Male Female
Credit Card 10 15 25
Cash 8 6 14
Total 18 21 39
What is the probability of getting a person who is
a) (i) Female and uses a credit card?

(ii) Male and uses cash?
b) (i) Credit card user?

(ii) Female?
c) (i) Female given that she uses cash?

(ii) Credit card user given that he is a male?
Solution
1. This is a joint probability of the events. The sample space has 39 people alto-
gether.
(i) P(female and credit card) = 15
39 = 0.3846.
Note: The two events should not be confused with independent events. In this
case find the value in the intersection set of female column and credit card row
which is 15.
8
(ii) P(male and cash) = 39 = 0.2051.
It is the chance of two events occurring at the same time.
25
2. (i) P(credit card user) = 39 = 0.6.
This question requires marginal probability, the conditions prevailing is sex or

payment method.
Note: The prevailing condition which has been ignored is sex.
21
(ii) P(female) = 39 = 0.5385.
The condition which has been ignored is payment method. For joint probabilities,
consider values inside the table as a ratio of the grand total 39. For marginal
probabilities consider row and column totals as ratios of grand total.
3. This question requires conditional probability.

6
P (F emale and Cash user) 39 6
i) P(female—cash user) = = 14 = 14 = 0.4286
P (Cash user) 39
10
P (Credit card and M ale) 39 5
ii) P(credit card—male)= = 18 = 9 = 0.5556
P (M ale) 39
Exercise 2
A golfer has 12 golf shirts in his closet. Suppose 9 of these shirts are white and the
others are blue. He gets dressed in the dark, so he just grabs a shirt and puts it on.
He does this for two days in a row, taking a fresh shirt for each day and put worn one
in washing basket and does not do laundry. What is the likelihood that both shirts
selected are white?
5.9. Contigency Tables

A contingency table is a cross-tabulation that simultaneously summarises two vari-
ables of interest. The level of measurement can be norminal. It is a table that is used
to classify sample observations according to two or more identifiable characteristics.
Exercise
A survey of 150 students classified each student according to gender and the number
of movies watched in a month. Each respondent is classified according to two criteria,

that is, the number of movies watched and gender.
Gender
Movie watched Total
Male Female
0 20 40 60
1 40 30 70
2 or more 10 10 20
Total 70 80 150
Find the probability that, in a given month, a selected student
i) is a male student.
ii) watched movie only once.
iii) is a female student who watched a movie more than once.
iv) has not watched any movie given that he is a male student.
v) is a female student given that she has only watched a movie once.
5.10. Tree diagrams

A tree diagram is a graph that is helpful in presenting information for determining
probabilities of events that involves several stages. Each branch in the tree represents
and event. The branches of the tree diagram are weighted by probabilities. The prob-
ability depend on the selection procedure such that selection may involve replacement
or non-replacement of items. The selection procedure definitely affects the resulting
probabilities.
Illustration
Refering to the previous illustration of 150 students, use a tree diagram to find the
probability of selecting a male student given that he has seen one movie?
5.11. Counting rules

Probability computations involve counting the number of successful outcomes (r) and
the total number of possible outcomes (n) and expressing them as a ratio. Often the
values of r and n are not feasible to count because of the large number of possible
outcomes involved. Counting rules assist in finding values of r and n. There are three
basic counting rules:
• Multiplication rule,
• Permutations and
• Combinations.
5.11.1. Multiplication rule

The multiplication rule is applied in two ways:
a) The total number of ways in which n objects can be arranged in order is given by:
n! read as n f actorial where n! = n(n − 1)(n − 2)(n − 3) . . . 3.2.1
Note that 0! = 1.
Illustration
The number of different ways in which 7 horses can complete a race is given by:
7! = 7.6.5.4.3.2.1 = 5040 different ways.
b) If a particular random process has
• n1 possible outcomes on the first trial

• n2 possible outcomes on the second trial and
• nj possible outcomes on the last trial.
Then the total number of outcomes for the j trials is: n1 × n2 × n3 × n4 × ............ × nj
Illustration
A restaurant menu has a choice of 4 starters, 10 main courses and 6 desserts. What is
the total number of meals that can be ordered in this restaurant.
Solution
The total numbers of possible meals that can be ordered are: 4 × 10 × 6 = 240 meals.
5.11.2. Permutations
A permutation is a number of distinct ways in which a group of objects can be arranged.
Each possible ordered arrangement is called a permutation. In a permutation, the or-
der is importatnt such that ABC, ACB and CBA are considered different. The number
of ways of arranging r objects selected from n objects where ordering is important,
is given by the formula:
n!
Prn = (5.3)
(n − r)!
Where n! = n f actorial = n(n − 1)(n − 2)(n − 3)....3.2.1. and r = numbers of objects

selected at a time and n = total number of objects.
Illustration
10 horses compete in a race.
(i) How many distinct ways are there of the first 3 horses past the post?
(ii) What is the probability of predicting the order of the first 3 horses past the post?
Solution
(i) Since the order of 3 horses is important, it is appropriate to use the permutation
formula.
10!
That is: Prn = P310 = (10−3)! = 720
There are 720 distinct ways of selecting 3 horses out of 10 horses.
(ii) The probability of selecting the first 3 horses past the post is:
P (f irst 3 horses) = Selecting 3 out1 of 10 horses = 720
1
chance of winning.
5.11.3. Combinations
A combination is the number of different ways of arranging a subset of objects se-

lected from a group of objects where ordering is not important. In a combination, ABC
is similar to ACB, BAC, BCA, CAB and CBA. Each possible arrangement is called a
combination. The number of ways of arranging r objects selected from n objects, not
considering order, is given by the formula:
n!
Crn = (5.4)
(n − r)!r!
Where n! is defined as before, r is number of objects selected and n is total number of

objects.
Illustration
10 horses complete in a race.
(i) How many ways are there of the first 3 horses past the post, not considering the
order in which the first three pass the post?
(ii) What is the probability of predicting the first 3 horses past the post, in any order?
Solution
(i) The order of the first 3 horses is not important, hence apply the combination for-
mula.
n! 10!
Crn = = = 120
(n − r)!r! (10 − 3)!7!
There are 120 different ways of selecting the first 3 horses out of 10 horses, with-
out regard to order.
(ii) The probability of selecting the first 3 horses past the post, disregarding order is
given by
P (f irst 3 horses) = Selecting1 3 horses = 120
1
chance of winning.
5.12. Exercise
1 Find the values of:
(a) P47
(b) C28
2 There are 5 levels of shelving in a supermarket. If 3 brands of soup must each

be placed on a separate shelf, how many different ways can a packer arrange the
soup brands?
3 In an examination a student is asked to answer three questions from an exami-

nation paper containing eight questions. How many deferent selections are pos-
sible?
4 A company has 12 products in its product range. It wishes to advertise in the

local newspaper, but due to space constraints, it is only allowed to display 7 of
its products at a time. How many different ways can this company compose a
display in the local newspaper?
Chapter 6
Probability Distributions
6.1. Introduction
This unit will study probability distributions. A probability distribution gives the en-
tire range of values that can occur based on an experiment. A probability distribution
is similar to a relative frequency distribution. However in steady of describing the
past, it describes a likely future event. For instance a drug manufacturer may claim
a treatment will cause weight loss for 80% of the population. A consumer protection
agency may test the treatment on a sample of six people. If the manufacturers claim
is true, it is almost impossible to have an outcome where no one in the sample loses
weight and its most likely that 5 out of the 6 do lose weight.
6.2. Definition
A probability distribution is a listing of all the possible outcomes of an experiment and

their associated probabilities.
6.3. Random variables
A random variable is a function whose value is a real number determined by each ele-
ment in the sample space. In other words its a quantity resulting from an experiment
that, by chance, can assume different values. There are two types of random variables,
discrete random variables and continuous random variables.
Discrete random variable is a variable that can assume a countable number of

values. Examples include:
• Number of defective light bulbs obtained when three light bulbs are selected at
random from a consignment could be 0, 1, 2, or 3.
• The number of employees absent from the day shift on Monday.

70 Probability Distributions
• The daily number of accidents that occur in a given city.
Continuous random variable is a variable that can assume values corresponding

to any of the points contained in one or more intervals. Examples include:
• The waiting time for customers to receive their order at a manufacturing com-
pany.
• Tire pressure of an automobile (in kPa).
• The height of each student is this class.
The choice of a particular probability distribution function depends primarily on the

nature of the random variable (i.e. discrete or continuous) under study. Thus we have
either discrete or continuous probability distribution.
6.4. Random variable probability distributions
The probability distribution of a discrete random variable is a graph, table or formula

that specifies the probability associated with each possible value the random variable
can assume. The random variable assumes a countable number of values.
Illustration
Find the probability distribution of the sum of numbers when a pair of dice is thrown.
Solution
Let X be a random variable whose values of x are the possible totals of the outcomes
of the two dies. Then x can be an integer from 2 to 12. Two dice can fall in 6 × 6 ways
1 2
each with a probability of 36 . For example, P (X = 3) = 36 since a total of 3 can occur
in two ways, that is (1,2) or (2, 1). The probability distribution is shown in the table
below:
x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P(X= x) 36 36 36 36 36 36 36 36 36 36 36
Exercise 1
1 Three coins are tossed all at once. Let X be the number of heads obtained. Find
the probability distribution of X.
2 Suppose you are interested in the number of Tails showing face up of coin. What
is the probability distribution for the number of Tails?
Probability Distributions 71
6.5. Properties of discrete random variable distribution

Let X be a discrete random variable that can assume values x1 , x2 , x3 , . . . , xn . Then
P (X = x) is a probability distribution if:
i. The probability of any value of x is never negative and should be a fraction that
is 0 ≤ P (X = x) ≤ 1
n
X
ii. P (X = x1 ) + P (X = x2 ) + . . . + P (X = xn ) = P (x = xi ) = 1. The sum of
i=1
probabilities of the discrete random variable should be equal to 1.
iii. The mean or expected value of a discrete random variable X is x̄ and given by:
X
x̄ = E(X) = xi P (X = xi )
all x
iv. Variance of a discrete random variable X is
V ar(X) = s2 = E(X − µ)2

X
s2 = (xi − µ)2 P (X = xi )
all x
Which simplifies to:
V ar(X) = E(X 2 ) − (E(X))2

s2 = E(X 2 ) − x̄2
Illustration
Consider the following probability distribution for a discrete random variable, X. Verify
x 0 1 2 5 10
P (X = xi ) 0.05 0.25 0.30 0.20 0.20
the probability properties and find the standard deviation of the distribution.
Solution
i. All P (X = xi ) are between 0 and 1, for i = 0, 1, 2, 5, 10.
ii. Sum of probabilities should be equal to 1.

X
P (X = xi ) = P (X = x1 ) + P (X = x2 ) + . . . + P (X = x5 )
= 0.05 + 0.25 + 0.30 + 0.20 + 0.20
= 1
iii. The mean is given by:

X
x̄ = xi P (X = xi )
x̄ = 0 × 0.05 + 1 × 0.25 + 2 × 0.30 + 5 × 0.20 + 10 × 0.20
x̄ = 3.85
iv. The variance is given by

n
X
V ar(X) = E(X − µ)2 = (xi − µ)2 P (X = xi )
i=1
2
s = (0 − 3.85) × 0.05 + (1 − 3.85)2 × 0.25 + . . . + (10 − 3.85)2 × 0.20
2
s2 = 11.6275
p √
v. Standard deviation, s2 = V ar(X) = 11.6275 = 3.410
6.6. Probability terminology and notation

a) At most 3, means not more than 3. Here 3 is an arbitrary number, it therefore
means 3 is the maximum discrete value which can be assumed by a random
variable. Let X be the random variable, taking x = 0, 1, 2, 3, . . . , n, where n is the
sample size. Notation: P (X ≤ 3) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3).
b) At least 3, means not less than 3. The minimum that can be assumed is 3 since
3 is not less than itself. Notation: P (X ≥ 3) = P (X = 3) + P (X = 4) + P (X =
5) + . . . + P (X = n).
c) Less than 3 this effectively means values below 3, and 3 is not included. Notation:
P (X < 3) = P (X = 0) + P (X = 1) + P (X = 2).
d) More than 3, means values above 3, in discrete terms it is from 4 upwards. No-
tation: P (X > 3) = P (X = 4) + P (X = 5) + P (X = 6) + . . . + P (X = n) or using
the complimentary rule it is given as 1 − P (X ≤ 3).
e) Exactly 3, it means equal to 3. Notation: P (X = 3).
f) Between 3 and 6 means the discrete values between 3 and 6, which are 4 and 5.
However it should be noted that the limits can be exclusive or inclusive. Notation
for exclusive: P (3 < X < 6) = P (X = 4) + P (X = 5). Notation for inclusive:
P (3 ≤ X ≤ 3) = P (X = 3) + P (X = 4) + P (X = 5) + P (X = 6).
Exercise 2
Consider the following probability distribution that characterises a marketing ana-
lyst’s belief concerning the probabilities associated with the number, x of sales that a
company might expect per month for a new super computer:
x 0 1 2 3 4 5 6 7 8
P(X=x) 0.02 0.08 0.15 0.19 0.24 0.17 0.10 0.04 0.01
a What is the probability that the company will sell:
i At most three computers.

ii At least three computers.
iii Less than three computers.
iv More than three computers.
v Exactly four computers.
vi Between three and seven computers inclusive.
b Find the mean, variance and standard deviation of X.
6.7. Discrete probability distributions

For the purpose of this course we will focus on three special and commonly used dis-
crete probability distributions that are Bernoulli, Binomial and Poisson distributions.
In discrete probability distributions, the random variable, X takes whole numbers
starting at zero and follows certain conditions. Use of specific formulae for that di-
stirbution will be used in calculating the probability of the given random variable. The
term discrete means the random variable takes integer values from 0, 1, 2, 3 . . . .
6.7.1. Bernoulli distribution
In probability theory the Bernoulli distribution named after Swiss scientist Jacob
Bernoulli, is the probability distribution of a random variable which takes value 1
with success probability and value 0 with failure probability. A random variable X
which has two possible outcomes, say 0 and 1, is called a Bernoulli random variable.
The probability distribution of X is:
P (X = 1) = p
P (X = 0) = 1 − p
i.e. P (X = 0) = 1 − P (X = 1) = 1 − p
This distribution best describes all situations where a ”trial” is made resulting in ei-
ther ”success” or ”failure,” such as when tossing a coin or when modeling the success
or failure of a surgical procedure. The Bernoulli distribution function is defined as:
f (X = x) = px (1 − p)1−x , for x = 0, 1. (6.1)

Where, p is the probability that a particular event (e.g. success) will occur.
Note- Bernoulli experiment is performed only once and has only two possible out-
comes (success and failure).
Illustration
Tossing a fair coin, you get a head or a tail each with probability of 0.5. Thus, if a head
is labelled 1 and a tail 0, the random variable X representing the outcome takes values
0 or 1. If the probability that X = 1 is p, then we have that:
1
P (X = 1) = .
2
1 1
P (X = 0) = 1 − = ,
2 2
since events X = 1 and X = 0 are mutually exclusive.
6.7.2. Binomial distribution
A Binomial distribution arises when a Bernoulli experiment is repeated two or more

times. Data often arise in the form of counts or proportions which are realizations of a
discrete random variable. A common situation is to record how many times an event
occurs in n repetitions of an experiment, i.e. for each repetition the event either occurs
(a ”success”) or does not occur (a ”failure”). More specifically, consider the following
experimental process:
1. There are n trials.
2. Each trial results in either a success or a failure.
3. The probability of a success, p is constant from trial to trial.
4. The trials are independent.
An experiment satisfying these four conditions is called a binomial experiment. The

outcome of this type of experiment is the number of successes, i.e., a count. The dis-
crete variable X representing the number of successes is called a binomial random
variable. The possible counts, X = 0, 1, 2, . . . n, and their associated probabilities de-
fine the binomial distribution, denoted by Bin(n, p).
Suppose we repeat a Bernoulli p experiment n times and count the number X of suc-
cesses, the distribution of X is called the Binomial, Bin(n, p) random variable. The
quantities n and p are called parameters and they specify the distribution.
If X = X1 + X2 + . . . + Xn , where Xi are independent and identically distributed

Bernoulli random variables, then X is called a Binomial random variable. Thus Bino-
mial probability distribution function for X ≈ Bin(n, p) is given by:
P (X = x) = Cxn px (1 − p)n−x (6.2)
n!
For x = 0, 1, 2, . . . , n, 0 < p < 1 and Cxn = (n−x)!x! . The notation Bin(n, p) means a
Binomial distribution with parameters n and p.
Mean and variance of the Binomial distribution

The mean and variance of the random variable that follows a Binomial distribution is
given by:
1. Mean, µx = E(X) = np
2. Variance, σx2 = V ar(X) = np(1 − p) = npq
Illustration - Binomial distribution

Given an experiment of tossing a fair coin six times, find the:
i) Probability of getting exactly 4 heads.
ii) Mean and
iii) Variance.
Solution:
i) Let X be the number of heads when a fair coin is tossed. Thus:

1 1 15
P (X = 4) = C46 ( )4 (1 − )2 = .
2 2 64
1
ii) Mean= np = 6 × 2 = 3.
1
iii) Variance= npq = 6 × 2 × (1 − 12 ) = 1.5.
Example
1. A manufacturer of nails claims that only 3% of its nails produced are defective. A
random sample of 24 nails is selected, what is the probability that 5 of the nails
are defective?
Solution
P (def ective) = p = 0.003, n = 24

P (X = 5) = C524 (0.003)5 (0.997)19
P (X = 5) =
2. A certain rare blood type can be found in only 0.05% of people. If the population
of a randomly selected group is 3000, what is the probability that at least two
persons in the group have this rare blood type?
Solution
6.7.3. Poisson distribution
The Poisson distribution, named after French mathematician Simon Denis Poisson,
is a discrete probability distribution that expresses the probability of a given number
of events occurring in a fixed interval of time, distance, mass, volume if these events
occur with a known average rate and independently of the time since the last event.
The Poisson distribution can also be used for the number of events in other specified
intervals such as distance, area or volume.
A Poison random variable is a discrete random variable that can take integer val-
ues from 0 up to infinite (∞). The parameter for this distribution is λ, i.e. P o(λ). The
Poisson probability distribution function for X ≈ P o(λ) is given by:
λx e−λ
P (X = x) = (6.3)
x!
For x = 0, 1, 2, . . . ∞ and 0 < λ < ∞.
Examples of Poisson experiments
i) The number of cars arriving at a parking station in one-hour time interval.
ii) The number of defective screws per consignment.
iii) Number of typing errors per page.
iv) Number of particles of a given chemical in a litre of water.
The Poisson question: What is the probability of r occurrences of a given outcome be-
ing observed in a predetermined time, space or volume interval?
Illustration - Poisson distribution

The number of students arriving at a takeaway every 15 minutes follows a Poisson
random variable with parameter λ = 0.2. Find the probability that zero, at most one
and more than two students arrive at the takeaway.
Solution:
X ≈ P o(0.2), using the formular that:
λx e−λ
P (X = x) =
x!
i) Probability that no students arrive:
0.20 e−0.2
P (X = 0) =
0!
P (X = 0) = 0.8187
ii) Probability that at most one student arrive
P (X ≤ 1) = P (X = 0) + P (X = 1)
0.20 e−0.2 0.21 e−0.2
P (X ≤ 1) = +
0! 1!
P (X ≤ 1) = 0.9824
iii) Probability that at least two students arrive
P (X ≥ 2) = P (X = 2) + P (X = 3) + . . .
P (X ≥ 2) = 1 − (P (X = 0) + P (X = 1))
P (X ≥ 2) = 1 − (0.8187 + 0.1637)
P (X ≥ 2) = 0.0176
Mean and variance of Poisson random variable

The mean and variance of the random variable, X that follows a Poisson distribution
is given by:
1) Mean = E(X) = λ
2) Variance = V ar(X) = λ
Example
1) A textile producer has established that a spinning machine stops randomly due
to thread breakages at an average rate of 5 stoppages per hour. What is the
probability that in a given hour on a spinning machine:
i) 3 stoppages will occur?

ii) at most 2 stoppages will occur?
iii) more than 4 stoppages will occur?
iv) between 2 and 6 stoppages will occur?

v) No more than 1 stoppage will occur in a given two-hour interval?
Solution
2) The arrival of patients at a rural clinic is 2 patients per hour.
a) In any given hour, what is the probability that:

i. no patient will arrive?
ii. exactly six patients will arrive?
iii. not less than 2 patients will arrive?
b) Determine the variance.
Solution
Remark
As a general rule, always check that the time, space or volume interval over which
occurrences of the random variable are observed is the same as the time, space or
volume interval corresponding to the average rate of occurrences, λ. When they differ,
adjust the rate of occurrences to coincide with the observed interval.
6.8. Continuous probability distributions

We will discuss three continuous random variable distributions that are the Uniform,
Exponential and Normal distributions only. Continuous distributions take real values
from −∞ to +∞. Probability of a random variable, X, with a probability distribution
function, f (x), is defined as
Z ∞
P (X = x) = f (x)dx
−∞
This is equal to 1 if f (x) is a probability density function, pdf. Probability for different
intervals are got by integrating the pdf over the given limits that is:
Z b
P (a ≤ x ≤ b) = f (x)dx
a
6.8.1. The Uniform distribution

This distribution is also known as the rectangular distribution. A continuous uniform
variable has a probability density over an interval. Its probability distribution density
function is:
(
1
f (x) = b−a a < x < b (6.4)
0 elsewhere
and is illustrated graphically as: insert a graph
Properties of the Uniform distribution

The mean and variance of the Uniform distribution is given by:
i) Mean is given by
1
E(X) = (b + a) (6.5)
2
ii) Variance is given by
(b − a)2
V ar(X) = (6.6)
12
Note
The probability that X falls in some interval say [c,d] can be easily calculated by inte-
grating the density function
1
f (x) =
b−a
to obtain
c−d
P (X = (c, d)) =
b−a
Illustration
The marks of students from a certain examination are uniformly distributed in the
interval 50 to 75. The density function for the marks is given by:
(
1
75−50 50 < x < 75
f (X = x) =
0 elsewhere
Find the mean and variance of this distribution.
Solution:
1) The mean is given by E(X) = 12 (b + a) = 12 (75 + 50) = 62.5
2 2
2) The variance is given by (b−a)
12 = (75−50)
12 = 52.083
Interpretation
The average mark for the examination was 62.5 with a variance of 52.083.
Exercise
For the continuous uniform distribution defined of the interval [a, b], where b > a,
show that
i) Mean = 12 (b + a) and
(b−a)2
ii) Variance = 12
6.8.2. The Exponential distribution
An exponential distribution variable is a continuous random variable that can take on

any positive value. X is said to be an exponential random variable with a parameter λ
and a probability density function
(
λe−λx x > 0
f (x) = (6.7)
0 otherwise
The exponential distribution function often arises in practice as the distribution of

waiting time i.e. the amount of time until a specified event occurs. Examples include
time until a customer arrives, or time until a machine fails.
Illustration
Suppose that the length of a phone call in minutes is exponentially distributed with
parameter, λ = 0.1. If someone arrives immediately ahead of you at a public telephone
booth, what is the probability that you will wait for at least 20 minutes.
Solution
Let X be the length of a phone call made in front of you. Then
Z ∞
P (X > 20) = 0.1e−0.1x dx
20
P (X > 20) = −e−0.1x |∞
20
P (X > 20) = e−2 = 0.2706
6.8.3. The Normal distribution
One of the most useful and frequently encountered continuous random variable distri-
butions is called the Normal distribution. Its graph is called the normal curve, which
is bell shaped. The curve describes the distribution of so many sets of data that occur
in nature, industry and research.
Characteristics of the Normal Distribution
i. It is bell-shaped.
ii. It is symmetrical about a central value, the mean µ.
iii. The tails of the distribution never touch the axis (i.e. asymptotic).
iv. A normally distributed random variable is described by two parameters namely

the mean and variance. A random variable X is said to be Normally distributed
and denoted by X ≈ N (µ, σ 2 ).
v. The area under the curve is equal to 1.
A random variable, X is said to be normally distributed with a probability density

function given by:
1 −1 x−µ 2
f (x) = √ e2( σ ) , −∞ < x < ∞ (6.8)
2πσ
where µ = mean of the random variable X and σ 2 = variance of the random variable X.
The random variable X is represented as: X ≈ N (µ, σ 2 ). µ and σ 2 are said to be the
parameters of X.
It is difficult to use the probability distribution function of the normal distribution to

calculate the probabilities for X. Hence the process of standardisation is used so that
the probability values are taken directly from the standard normal distribution table.
This table indicates the probabilities corresponding to different values of Z starting
at -3. The process of standardisation involves calculating the value of Z by use of a
formular:
X −µ
Z= ∼ N (0, 1) (6.9)
σ
6.8.4. The standard normal distribution

The standard normal distribution is a special kind of a normal distribution with mean
zero and variance 1. Z is called a standard normal distribution random variable and
written as: Z ≈ N (0, 1). The cumulative distribution function of Z is denoted by Φ(Z),
and is denoted by P (Z < z) = Φ(Z). The values of this are found in the Standard
normal distribution tables.
Illustration
Use Standard normal distribution tables to find the probabilities below.
a. P (Z ≥ −2)
b. P (Z > 0.79)
c. P (−1.11 < Z < −0.7)
d. P (−1.3 < Z < 2.1)
e. P (Z ≤ −3)
f. P (0.04 < Z < 0.46)

Solution
a.
P (Z ≥ −2) = 1 − P (Z ≤ −2)
P (Z ≥ −2) = 1 − Φ(−2)
P (Z ≥ −2) = 1 − 0.0228
P (Z ≥ −2) = 0.9772
b.
P (Z > 0.79) = 1 − P (Z < 0.79)

P (Z > 0.79) = 1 − Φ(0.79)
P (Z > 0.79) = 1 − 0.7852
P (Z > 0.79) = 0.2148
c.
P (−1.11 < Z < −0.7) = Φ(−0.79) − Φ(−1.11)

P (−1.11 < Z < −0.7) = 0.2420 − 0.1335
P (−1.11 < Z < −0.7) = 0.1085
d.
P (−1.3 < Z < 2.1) = Φ(2.1) − Φ(−1.3)

P (−1.3 < Z < 2.1) = 09821 − 0.0968
P (−1.3 < Z < 2.1) = 0.8853
e.
P (Z ≤ −3) = Φ(−3)
P (Z ≤ −3) = 0.0013
f.
P (0.04 < Z < 1.46) = Φ(1.46) − Φ(0.04)

P (0.04 < Z < 1.46) = 0.9278 − 0.5160
P (0.04 < Z < 1.46) = 0.4118
Finding probabilities using standard normal distribution

The normal random variable, X with a mean µ and variance σ has a standard normal
distribution
x−µ
Z=
σ
Given the distribution of X, to find probability of the given events, standardise X and
use standardised nornal distribution table to read the probabilities value. An illustra-
tion to find the probability of a normal distribution is given below.
Illustration
Chapter 7
Confidence Intervals
7.1. Introduction
We are now in the knowledge that a population parameter can be estimated from sam-
ple data by calculating the corresponding point estimate. This chapter is motivated by
the desire to understand the goodness of such a point estimate. However, due to sam-
pling variability, it is almost never the case that the population parameter equals the
sample statistic. Further, the point estimate does not provide any information about
its closeness to the true population parameter. Thus, we cannot rely on point estimates
for decision making and policy formulation in day to day living and or in any organisa-
tion, institution or country. We need bounds that represent a range of plausible values
for a population parameter. Such ranges are called confidence interval estimates.
To obtain the interval estimates, the same data from which the point estimate was
obtained is used. Interval estimates may be in the form of a confidence interval whose
purpose is to bound population parameters such as the mean, the proportion, the vari-
ance, and the standard deviation; a tolerance interval which bounds a selected propor-
tion of the population; and a prediction interval which places bounds on one or more
future observations from a population.
7.2. Confidence Intervals
It is noted that we cannot be certain that an interval contains the true but unknown
population parameter since only a sample from the full population is used to compute
both the point estimate and the interval estimate A confidence interval is constructed
so that there is high probability that it does contain the true but unknown population
parameter. Generally, a 100(1 − α)% confidence interval equals
point estimate ± reliability coefficient × s.e.(parameter) (7.1)

86 Confidence Intervals
where α is the level of significance between zero and one; 1−α is a value called the ”con-
fidence coefficient”, 100(1−α)% is the confidence level, parameter estimate is a value for
the point estimate such as for the sample mean, x, or for the population proportion, pb;
reliability coefficient is a probability point obtained from an appropriate table as dic-
tated by, for example, zα (standard normal z-value) or t( α2 , n − 1) (student t-distribution
value); and s.e.(parameter) read standard error of the parameter, measures the close-
ness of the point estimate to the true population parameter i.e. it measures the preci-
sion of an estimate in getting the parameter.
7.3. Confidence interval for the Population Mean
The overall assumption made is that the sample comes from a normally distributed
population.
Case 1: If the population variance is known

Suppose that, in addition to the overall assumption, the variance of the pop-
ulation, σ 2 , is known. Then a random variable called the sample mean, X, is
defined such that
σ2
X ∼ N (µ, ),
n
whose standardised result is
(X − µ)
Z= √
σ/ n
which simplifies to: √
n(X − µ)
Z= ∼ N (0, 1)
σ
Z ∼ N (0, 1) means Z is normally distributed with mean zero and standard
deviation 1. The 100(1 − α) % confidence interval estimate for the population
mean may also take the form `1 ≤ µ ≤ `2 where the end points `1 and `2 are
called lower and upper confidence limits respectively and are computed from
the sample data. Different samples will produce different values for the end
points.
A confidence interval x ± z α2 √σn has the lower and upper limits as:
σ
`1 = x − z α2 × √
n
and
σ
`2 = x + z α2 × √
n
Thus, a 100(1 − α)% confidence interval for the population mean is given by:
Confidence Intervals 87
σ σ
x − z α2 × √ ≤ µ ≤ x + z α2 × √ (7.2)
n n
Illustration
Consider data of weights in kg, of ten randomly selected students.
64.3 64.6 64.8 64.2 64.5 64.3 64.6 64.8 64.2 64.3
Assume that it is normally distributed with population variance of 1. Construct

a 95% confidence interval for the population mean.
Solution
Using the data, n = 10, x = 64.46, the level of significance, α = 5% = 0.05, and
from the given assumption, σ 2 = 1. Now, the 95% confidence interval for the
population mean is
σ σ
x − z0.025 × √ ≤ µ ≤ x + z0.025 × √
n n
Substituting we have
1 1
64.46 − 1.96 × √ ≤ µ ≤ 64.46 + 1.96 × √ .
10 10
1.96 is the standard z-value from standard normal tables that gives a cum-
mulative probability of 0.975. Simplifying we then have the 95% confidence
interval for the population mean as
63.84 ≤ µ ≤ 65.08.
Interpretation
The above confidence interval estimation, the population mean is within 63.84 ≤
µ ≤ 65.08 with a probability of 0.95.
Exercise
For the example above, separately construct a 90% and 99% confidence interval
for the population mean.
Further, the length or width of a confidence interval is given by `2 − `1 . Now,

how long are the resulting confidence intervals? Of the three confidence inter-
vals constructed so far, which one is the most precise?
Task Starting from the cases considered above, what is the general relation-
ship between confidence levels and their precision?
Remark
The precision of a confidence interval is inversely proportional to the confidence
level. It is desirable to obtain a confidence interval that is short enough for pur-
poses of decision making and that also has adequate confidence. This is easily
the reason why the 95% confidence level is the default confidence level chosen
by researchers and practitioners.
Case 2: If population variance is not known
(a): Large Samples (n > 30)

It was assumed in the foregoing discussion that the population distribution is
normal with an unknown µ and a known standard deviation σ. However, these
assumptions may be dropped when dealing with large-samples.
Let the observations X1 , X2 , ..., Xn be a random sample from a population with

unknown mean, µ and an unknown variance, σ 2 . If n is large, then
σ2
X ∼ N (µ, )
n
and it follows that √
n(X − µ)
Z= ∼ N (0, 1).
σ
In this case n is large and so it is permissible to replace the unknown σ by s.
This has close to no effect on the distribution of Z.
√
For large n, the quantity n(X−µ)
s
follows a standard normal distribution with
mean 0 and a standard deviation of 1.
Then the 100(1 − α)% confidence interval for µ is

s s
x − z α2 × √ ≤ µ ≤ x + z α2 × √ (7.3)
n n
which is true regardless of the sample’s underlying distribution.
Illustration
A study was carried out in Zimbabwe to investigate pollutant contamination
in small fish. A sample of small fish was selected from 53 rivers across the
country and the pollutant concentration in the muscle tissue was measured
(ppm). The pollutant concentration values are shown below. Construct a 95%
confidence interval for the population mean, µ.
1.230 1.330 0.040 0.044 1.200 0.270 0.490 0.190 0.940 0.520 0.830
0.810 0.710 0.500 0.490 1.160 0.050 0.150 0.400 0.190 0.650 0.770
1.080 0.980 0.630 0.560 0.410 0.730 0.430 0.590 0.340 0.340 0.270
0.840 0.500 0.340 0.280 0.340 0.250 0.750 0.870 0.560 0.100 0.170
0.180 0.190 0.040 0.490 0.270 1.100 0.160 0.210 0.860
Solution
Since n > 30, then the 95% confidence interval for µ is
0.3486 0.3486
0.5250 − 1.96 × √ ≤ µ ≤ 0.5250 + 1.96 × √
53 53
which simplifies to
0.431 ≤ µ ≤ 0.619
Exercise
Construct 90% and 99% confidence interval for µ using the above data. Fur-
ther, using the above data construct the 90%, 95%, and the 99% lower and
upper confidence interval for the population mean.
(b): Small Samples (For n ≤ 30)

It is now necessary to introduce a new confidence interval construction proce-
dure that addresses the scenario of small samples. In many cases, it is reason-
able to assume that the underlying distribution is normal and that moderate
departure from normality will have little effect on validity of the result.
Remark:
In the equally likely event that the assumption is unreasonable, an alternative
is to use the non-parametric procedures which are valid regardless of underly-
ing populations.
For our purposes, it will be reasonable to assume that the population of interest
is normal with an unknown mean, µ, and an unknown variance, σ 2 . A small
random sample of size n is drawn. Let X and S2 be the sample mean and
sample variance, respectively. We wish to construct a two-sided confidence
interval on µ . The population variance, σ 2 , is unknown and it is a reasonable
procedure to use s2 to estimate σ 2 . Then the random variable Z is replaced
with t (the student t-distribution) which is given by:
(X − µ)
t= √
s/ n
√
n(X − µ)
t=
s
which is a random variable that follows the student’s t-distribution with n − 1
degrees of freedom which are associated with the estimated standard deviation.
Notation
We let tα ,n−1 and t α2 ,n−1 be the value of the random variable T with n − 1 de-
grees of freedom above which we find a probability α or α2 respectively.
The 100(1 − α)% confidence interval for population mean µ is given by

s s
x − t α2 ,n−1 × √ ≤ µ ≤ x + t α2 ,n−1 × √
n n
where t α2 ,n−1 is the upper 100( 1−α

2
) percentage point of the t-distribution with
n − 1 degrees of freedom.
Illustration
Consider the following data obtained from a local Transport Logistics company.
Data shows the distance travelled daily by one of the company’s trucks.
19.8 10.1 14.9 7.5 15.4 15.4 15.4 18.5 7.9 12.7 11.9
11.4 11.4 14.1 17.6 16.7 15.8 19.5 8.8 13.6 11.9 11.4
Construct a 95% confidence interval for the population mean, µ.
Solution
Since our sample is small, n = 22, then the 95% confidence interval for the
population mean is given by
s s
x − t α2 ,n−1 × √ ≤ µ ≤ x + t α2 ,n−1 × √
n n
Substituting yields
3.55 3.55
13.71 − 2.080 × √ ≤ µ ≤ 13.71 + 2.080 × √
22 22
and simplifying we have

12.1 ≤ µ ≤ 15.3
as the 95% confidence interval for µ.
Exercise
For the above data,construct the 90% and the 99% confidence intervals on the
population mean and interpret the two confidence intervals. Further, construct
the 90%, the 95% and the 99% lower and upper confidence limits. Give an inter-
pretation of each and all of them.
Remark
One-sided confidence intervals for the mean of a normal population are con-
structed by choosing the appropriate lower or upper confidence limit and then
replacing t α2 ,n−1 by tα,n−1 .
7.4. Confidence interval for a population proportion
Suppose that a random sample of size n, large n, has been taken from a large
population and that x but less than n observations in this sample belong to a
class of interest. Then pb calculated as nx is a point estimator of the proportion
of the population p that belongs to this class. It is noted that n and p are
the parameters of a binomial distribution. The sampling distribution of pb is
approximately normal with mean p and variance p(1−p) n
if p is not too close to
either 0 or 1 and if n is relatively large. To apply this, it is required that np
and n(1-p) be greater than or equal to 5. We are saying that: If n is large, then
the distribution of
pb − p
Z=q ∼ N (0, 1).
p(1−p)
n
For large samples, which usually is the case when dealing with proportions, a
satisfactory 100(1 − α)% confidence interval on the population proportion p is
r r
pb(1 − pb) pb(1 − pb)
pb − z α2 × ≤ p ≤ pb + z α2 × (7.4)
n n
α
where pb is the point estimate of p, and z α2 is the upper 2
probability point of the
standard normal distribution.
Illustration
In a random sample of 85 stone sculptures, 10 have a surface finish that is
rougher than the expected. Construct a 95% confidence interval for the popu-
lation proportion of stone sculptures with a surface finish that is rougher than
the expected.
Solution
Using the formular above, a two-sided 95% confidence interval for p is
r r
0.12(1 − 0.12) 0.12(1 − 0.12)
0.12 − 1.96 × ≤ p ≤ 0.12 + 1.96 ×
85 85
which simplifies to
0.05 ≤ p ≤ 0.19
Remark
The one-sided lower and upper confidence intervals are respectively given as
r
pb(1 − pb)
pb − zα × ≤ p
n
and r
pb(1 − pb)
p ≤ pb + zα ×
n
Exercise
In the above example, construct and interpret the 95% and the 99% lower and
upper confidence limits for the population proportion.
7.5. Confidence interval for the population variance
Let X1 , X2 , ..., Xn be a random sample from a normal distribution with mean µ

and variance σ 2 , and let s2 be the sample variance. Then the random variable
(n − 1)s2
V =
σ2
has a chi-square (χ2 ) distribution with n − 1 degrees of freedom.
Now, if s2 is the sample variance from a random sample of n observations from

a normal distribution with unknown variance, σ 2 , then a 100(1−α)% confidence
interval on σ 2 is
(n − 1)S 2 2 (n − 1)s2
≤σ ≤ 2
χ2(α ,n−1) χ(1− α ,n−1)
2 2
where χ2(α ,n−1) and χ2(1− α ,n−1) are the upper and lower 100(1− α2 ) percentage points
2 2
of the χ2 distribution with n − 1 degrees of freedom, respectively.
Illustration
An Entrepreneur has got an automatic filling machine that she uses to fill bot-
tles with liquid detergent. A random sample of 20 bottles results in a sample

variance of fill volume of s2 equal to 0.0153 (f luid ounces)2 .
Assume that the fill volume is normally distributed. Then a 95% upper con-
fidence interval is
(n − 1)s2
σ2 ≤ 2
χ(1−α,n−1)
substituting yields
(20 − 1) × 0.0153
σ2 ≤
χ2(1−0.05,20−1)
simplifying we have
19 × 0.0153
σ2 ≤
χ2(0.95,19)
where χ2( 0.95, 19) is 10.117.
so we get
19 × 0.0153
σ2 ≤
10.117
giving
σ 2 ≤ 0.0287
NB: The statistical tables round off 10.117 to 3 s.f.
7.6. Confidence interval for population standard devia-

tion
The one-sided lower and upper confidence intervals for σ 2 are
(n − 1)S 2
≤ σ2
χ2(α,n−1)
and
(n − 1)S 2
σ2 ≤
χ2(1−α,n−1)
Remark
Clearly, the lower and upper confidence intervals for σ are the square roots of
the corresponding limits in the above equations.
We state that σ 2 ≤ 0.0287, is converted into an upper confidence limit for the
population standard deviation σ by taking the square root of both sides. The
resulting 95% confidence interval is σ ≤ 0.17.
Exercise
Using the information from the above illustration, construct a 90% lower and
upper confidence limits for the population standard deviation, σ.
7.7. Confidence interval for difference of two populations

means
The overall assumption remains in place. And, the same with everything else.
We are simply considering two populations and constructing confidence inter-
vals for the difference in two population means, µ1 −µ2 . The confidence interval
for difference between two population means are:
s
s21 s2
(x¯1 − x¯2 ) ± z α2 ( + 2) (7.5)
n1 n2
This is simplified as:

s s
s21 s22 s21 s2
(x¯1 − x¯2 ) − z α2 ( + ) < (µ1 − µ2 ) < (x¯1 − x¯2 ) + z α2 ( + 2) (7.6)
n1 n2 n1 n2
7.7.1. Case 1: If population variance is known
Illustration
An entrepreneur is interested in reducing the drying time of a wall paint. Two
formulations of the paint are tested; formulation 1 is the standard, and formu-
lation 2 has a new drying ingredient that should reduce the drying time. From
experience, it is known that the standard deviation of drying time is 8 min-
utes, and this inherent variability should be unaffected by the addition of the
new ingredient. Ten specimens are painted with formulation 1, and another
10 specimen are painted with formulation 2; the 20 specimen are painted in
random order. The two sample mean drying times are 121 minutes and 112
minutes, respectively. Construct a 99% confidence interval for the difference in
the two population means.
Solution
Solution to be provided.
7.7.2. Case 2: If population variances are unknown
This is called the homogeneous variance assumption. It also assumed the

variances are equal.
Illustration
The following data is from two populations, A and B. Ten samples from A had
a mean of 90.0 with a sample standard deviation of s1 = 5.0, while 15 sam-
ples from B had a mean of 87.0 with a sample standard deviation of s2 = 4.0.
Assume that the populations, A and B are normally distributed and that both
normal populations have the same standard deviation. Construct a 95% confi-
dence interval on the difference in the two population means.
Solution
Solution to be provided.
Chapter 8
Hypothesis Testing
8.1. Definitions and critical clarifications
Hypotheses
A hypothesis is a statement about a population. Testing of hypotheses involves
evaluation of two hypothesis called the null and the alternative denoted H0 and
H1 respectively. An H0 is the assertion that a population parameter takes on a
particular value. On the other hand H1 expresses the way in which the value
of a population parameter may deviate from that specified under H0 . The di-
rection of deviation may be specified as a one - sided or one tailed test or may
not be specified as a two sided or two tailed test.
We take time to point out that the language and grammar of testing of hy-
potheses does not use the word ”accept” or any of its numerous synonyms. This
is beyond semantics. To say one ”accepts” the null hypothesis is to imply that
they have proved the null hypothesis to be true. This practice is incorrect. The
null hypothesis is the claim that is usually set up with the expectation of re-
jecting it. The null hypothesis is assumed true until proven otherwise. If the
weight of evidence points to the belief that the null hypothesis is unlikely with
high probability, then there exists a statistical basis upon which we may reject
the null hypothesis. The design of hypotheses tests is such that they are with
the null hypothesis until there is enough evidence that suggest support for the
alternative hypothesis. Clearly, the design is never about selecting the more
likely of the two hypotheses. Let’s take this to our legal system. One is consid-
ered not guilty until proven otherwise. It is the job of the prosecutor to build a
case that is put evidence before the court of law that the person in question is
guilty. The judge will give their verdict as guilty or not guilty but will NEVER
give their verdict with an import of being innocent. By and large, the courts of
law are a classical example of constant testing of hypotheses procedure. So, let
it be clear that on the basis of the data from the sample, we either reject the
98 Hypothesis Testing
null hypothesis or fail to reject the null hypothesis.
In the words of R. A. Fisher

In relation to any experiment we may speak of ... the ”null hypothesis,” and it
should be noted that the null hypothesis is never proved or established, but is
possibly disproved, in the course of experimentation. Every experiment may be
said to exist only in order to give the facts a chance of disproving the null hy-
pothesis.
Remarks
1. The H0 reflects the position of no change and will always be worded as an

equality.
2. The language which implies ”acceptance” of the null hypothesis is both

misleading and against the grammar of the testing of hypotheses.
Test statistic
This is a value calculated from sample data and is used to decide on rejecting
H0 .
Critical region
This is a range of values which is such that when the test statistic falls into it
then H0 would be rejected.
Critical value
Is a value that separates the rejection region and the non-rejection region.
Type I error
Occurs when a true null hypothesis is rejected. A null hypothesis is rejected
when in actual fact it is true.
Type II error
It occurs when a false null hypothesis is not rejected. Alternatively, it is when
a null hypothesis is not rejected when in actual fact it is false.
Level of significance of a Test

Is the probability of making a Type I error expressed as a percentage. It is
denoted by α.
Power of a Statistical Test

Hypothesis Testing 99
It is the probability that the testing of hypotheses procedure rejects the null
hypothesis when the null hypotheses is indeed false.
8.2. General procedure on Hypotheses Testing
The following steps are recommended in applying the testing of hypotheses

procedure.
• From the problem context, identify the parameter of interest.
• Clearly state the hypotheses i.e. H0 and H1 .
• Identify or choose the level of significance, α.
• Determine an appropriate test statistic.
• Obtain the critical value from appropriate tables.
• Compute the test statistic by substituting necessary statistics into an ap-

propriate equation.
• Decide on the basis of a decision criterion that rejects H0 if, upon compar-
ison, the test statistic is more extreme than a critical value.
• Conclude on the basis of the decision’s import, and report in the context of
the problem.
8.3. Hypothesis testing concerning Population Mean
8.3.1. Case 1: If the population variance is known
The overall normality assumption is made. In this case, we further assume

that the population variance, σ 2 , is known. The sample mean X which is a point
estimator of µ is a random variable with population mean µ and population
2
variance σn . The test statistic is
√
n(x − µ0 )
Zcal = ∼ N (0, 1)
σ
Exercise
Consider the following data where the population mean is claimed to be 50. σ
= 2, α = 0.05, n = 25, and x = 51.3 What conclusions should be drawn about the
claim?
Guidelines to the solution

It is given that the population is normal and the population standard deviation
is known. Z score is the test statistic. The testing of hypothesis procedure is
two sided. At the 0.05 level of significance and based on the sample evidence,
we conclude that the population mean is different from 50.
Exercise
For the above exercise, instead of using hypothesis testing procedure, construct
a 95% confidence interval. Test the same hypothesis using the confidence inter-
val. Is the value specified under H0 contained in the confidence interval? Or, is
zero contained in the confidence interval? What conclusions should be drawn?
Hint on the decision criterion using confidence interval approach

When you use confidence interval approach is testing a hypothesis, the value
specified under H0 should be contained in the confidence interval, then we fail
to reject H0 otherwise we fail to accept H0 .
8.3.2. Case 2: If the population variance is not known
(a) Large samples scenario (For n > 30)

The Z test statistic is used when n > 30. We find the critical value using the
level of significance specified or 0.05 if not specified on the basis of the test be-
ing a one-sided or two-sided.
Illustration
Let the mean cost of an Introduction to Statistics textbook be µ. In testing the
claim that the average price of textbook is not $34.50 a sample of 36 current
textbooks had selling costs with a sample mean $32.00 and a sample standard
deviation of $6.30. Using a 10% level of significance, what conclusion can be
made?
Solution
This is a two-tailed test with n > 30 and α = 0.1 thus, the critical value is a z -
value ±1.96. Detailed solution to be done.
(b) Small sample scenario (For n ≤ 30)

We consider now the case of hypothesis testing on the mean of a population
with an unknown variance, σ 2 . The test statistic is
√
n(x − µ0 )
tcal =
s
which follows a t-distribution with n − 1 degrees of freedom.
Exercise
The increased availability of light materials with high strength has revolution-
ized the design and manufacture of golf clubs, particularly drivers. Clubs with
hollow heads and very thin faces can result in much longer tee shots, especially
for players of modest skills. This is due partly to the spring-like effect that the
thin face imparts to the ball. Firing a golf ball at the head of the club and mea-
suring the ratio of the outgoing velocity of the ball to the incoming velocity can
quantify this spring-like effect. The ratio of velocities is called the coefficient
of restitution of the club. An experiment was performed in which 15 drivers
produced by a particular club maker were selected at random and their coeffi-
cients of restitution measured. In the experiment the golf balls were fired from
an air cannon so that the incoming velocity and spin rate of the ball could be
precisely controlled. The sample mean and sample standard deviation are x =
0.83725 and s = 0.02456. Determine if there is evidence at the α = 0.05 level to
support the claim that the mean coefficient of restitution exceeds 0.82.
Guidelines to the expected solution

The mean is the parameter of interest. The population standard deviation is
unknown and the sample size is small. Therefore, the appropriate test statistic
to be used is the t - statistic and the corresponding critical value is tcrit = 1.76.
Exercise
For the above exercise, instead of using the testing of hypothesis procedure,
construct a 95% confidence interval. Test the same hypothesis using the confi-
dence interval approach.
8.4. Hypothesis testing concerning the Population Propor-

tion
Illustration
The advertised claim for batteries for cell phones is set at 48 operating hours,
with proper charging procedures. A study of 5000 batteries is carried out and
15 stop operating prior to 48 hours. Do these experimental results support the
claim that less than 0.2 percent of the company’s batteries will fail during the
advertised time period, with proper charging procedures? Use a hypothesis
testing procedure with α = 0.01. Is the conclusion the same at the 10% level of
significance?
Solution
H0 : p = 0.002 against
15
H1 : p < 0.002 with pb = 5000
= 0.003.
Note
By claiming a value of the population proportion to be 0.002 implies that the
population proportion. Hence letting p0 to be the hypothesised population pro-
portion value yields
pb − p0
Zcal = q = 1.5827 < Zcrit = Z0.01 = 2.3263
p0 (1−p0 )
n
We fail to reject H0 and conclude that, at the 1% level of significance, there

is no enough evidence to suggest that less than 0.2 percent of the company’s
batteries will fail during the advertised time period, with proper charging pro-
cedures.
Exercise
Let p be the proportion of new car loans having a 48 months period. In some
year p = 0.74. Suppose it is believed that this has declined and accordingly we
wish to test this belief using a 1% level of significance. What is the conclusion
if 350 of a sample of 500 new car loans have a time period of 48 months?
8.5. Comparing two populations
We now extend the previous one population results to the difference of means
for two populations. This test is done to test if the two population are similar
or producing similar results.
8.5.1. Hypothesis testing concerning difference between two popula-

tion means
Case 1: If population variances are known
Illustration
Consider the following gasoline mileages of two makes of light trucks. The
trucks 1 and 2 have the population means and populations standard devia-
tions 28 and 6, and 24 and 9 respectively. If 35 of truck 1 and 40 of truck 2 are

tested, test the claim that the mean difference is 4.
Solution
Exercise in the lecture.
Remark
In inferential applications the population variances σ12 and σ22 are generally not
known and must be estimated by s21 and s22 . The standard error is estimated by
s
s21 s2
standard error = + 2
n1 n2
Case 2: Unknown population variance and small sample (n1 + n2 ≤ 31)
We assume that the variances of both distributions σ12 and σ22 are unknown but
equal. This common variance is estimated by a quantity called pooled variance
denoted s2p and calculated as
(n1 − 1)s21 + (n2 − 1)s22

s2p =
n1 + n2 − 2
Thus, the test statistic is a t-distribution given by
(x1 − x2 ) − (µ1 − µ2 )
tcal = q
s2p ( n11 + n12 )
which follows a t- distribution with n1 + n2 − 2 degrees of freedom.
Illustration
Consider the following data. n1 = 10, x1 = 90, s1 = 5, n2 = 15, x2 = 87 and
s2 = 4. Assume that the populations are normally distributed and that both
populations have the same standard deviation. At the 5% level of significance,
can we conclude that there is a difference in the two population means?
Solution
Left as an exercise.
8.6. Independent and dependent samples
In testing for the equality of two population means, we may choose to select
two random samples one from each population and compare their means. If
these sample means exhibit a difference, then we reject the null hypothesis
that H0 : µ1 − µ2 = 0. Another approach is to try and match the subjects from
the two populations according to variables which will be expected to have an
influence on the variable under study. The two samples are no longer indepen-
dent and the inferences are now based on the differences of the observations
from the matched pairs.
Case 3: Independent Samples

An illustrative example will be vital in exposing the testing of hypothesis pro-
cedure.
Illustration
Samples of two brands of pork sausage are tested for their fat content. The re-
sults of the percentage of fat are summarised as follows: Brand A (n = 50, x =
26.0, s = 9.0) and Brand B (n = 46, x = 29.3, s = 8.0). Can we conclude that there
is sufficient evidence to suggest that there is a difference in the fat contented
of the two brands of pork sausage? Use a 5% level of significance.
Solution
Left as an exercise for the lecture.
Case 4: Dependent or Paired samples

Given two paired samples X11 , X12 , · · · , X1n1 and X21 , X22 , · · · , X2n2 we form a sin-
gle sample of the differences d1 , d2 , · · · , dn
where d1 = X11 − X21 , d2 = X12 − X22 , · · · , dn = X1n1 − X2n2 .
For the new single sample, we find its mean, d, that estimates the population
mean for the differences, µd and standard deviation, sd . Assuming that the
original populations are normally distributed with equal means i.e. µ1 = µ2
and equal variances, the population mean for the differences µd is zero and a
standard error that is estimated by √sdn .
d
The test statistic in this case is tcal = s.e.
The hypotheses tests concerning µ1 and µ2 are now based on the sample mean
using the single sample and we have a modified null hypothesis, H0 : µd = 0

against an appropriate alternative hypothesis as instructed by the situation.
Illustration
Five machines are tested for wind resistance with two types of grills. Their
drag coefficients were determined and recorded as follows.
Machine 1 2 3 4 5
Grill A 0.47 0.46 0.40 0.44 0.43
Grill B 0.50 0.45 0.47 0.44 0.48
Using a 5% level of significance test for the difference in the drag coefficients
due to type of grill.
Solution
Left as an exercise during the lecture.
8.6.1. Advantages of paired comparisons
• By pairing, we remove the additional source of variation and hence reduce

the random variation as measured by s2p in the case without pairing and
s2d in the case of pairing. Therefore, s2d > s2p implies a gain in precision due
to pairing.
• The confidence interval based on the paired comparison is much narrower

than that from two sample analysis using unpaired observations. This
also implies a gain in precision due to pairing.
• It may be less expensive since in most cases fewer experimental units are
used when compared to a two sample design.
8.6.2. Disadvantages of paired comparisons
• There is a substantial loss in the degrees of freedom in a paired compari-

son than in a two sample t - test.
• A rest period may be required between applying the first and second treat-
ment in order to minimise the carry over effect from the first treatment.
Even, then the carry over effect may not be completely eliminated.
8.7. Test Procedure concerning difference of two Popula-

tion Proportions
Suppose that two independent random samples of sizes n1 and n2 are taken
from two populations, and let x1 and x2 represent the number of observations
that belong to the class of interest in sample 1 and sample 2, respectively. In
testing the hypotheses
H0 : p1 − p2 = 0
H1 : p1 − p2 6= 0,
the test statistic is

(pb1 − pb2 ) − (p1 − p2 )
Zcal = q
p1 (1−p1 )
n1
+ p2 (1−p
n2
2)
which is approximately standard normal.

If H0 : p1 − p2 = 0 is true, then p1 = p2 . Thus p1 = p2 = p such that the test
statistic becomes
(pb1 − pb2 ) − (p1 − p2 )
Zcal = q
p(1 − p)[ n11 + n12 ]
which still is approximately standard normal. The common population propor-

tion, p, is estimated by
x1 + x2
pb = .
n1 + n2
Assuming that H0 : p1 − p2 = 0 is true, the test statistic is therefore given by
(pb1 − pb2 )
Zcal = q
pb(1 − pb)[ n11 + 1
n2
]
Illustration
Consider the following situation in which comparison is made of two concept
exposition methods. Method A is the standard and method B is the proposed. A
class of 200 Statistics students at a University is used. The students were ran-
domly assigned to two groups of equal size. One group was exposed to method
A and the other group was exposed to method B. At the end of the semester, 19
of the students exposed to method B showed improvement, while 27 of those
exposed to method A improved. At the 5% level of significance, is there suffi-
cient reason to believe that method A is effective in concept exposition?
Solution
First, we state the hypotheses:
H0 : pA − pB = 0
H1 : pA − pB 6= 0
Then, we extract the given data: nA = nB = 100, pbA = 0.27, pbB = 0.19, xA = 27,
and xB = 19. Thus, pb = 0.23.
The test statistic and the critical value are Zcal = 1.35 and Zcrit = 1.96 respec-
tively.
After comparing Zcal and Zcrit , the decision is that we fail to reject H0 . From
this decision, we therefore conclude that, at the 5% level of significance, there
is no sufficient evidence to support the assertion that method A is effective in
concept exposition.
Exercise
A study is made of business support of the immigration enforcement practices.
Suppose 73% of a sample of 300 cross border traders and 64% of the light man-
ufacturers said they fully supported the policies being proposed. Is there suffi-
cient evidence to conclude that the proposed policies are equally supported by
the two groups sampled. Use a 1% level of significance.
8.8. Tests for Independence: χ2 -test
Tests for independence are performed on categorical data such as when testing
for independence of opinion on a public policy and gender. The data is con-
tained in what is called a contingency table. The hypotheses are tested using a
Chi - square test statistic, χ2cal .
Illustration
A company operates four machines three shifts each day. From production
records, the following data on the number of breakdowns are collected:
Machines
Shifts A B C D
1 4 3 2 1
2 3 1 9 4
3 1 1 6 0
Using 5% level of significance, test the hypothesis that breakdowns are inde-
pendent of the shift.
Solution: To be provided.
Exercise
Grades in Statistics and Communication Skills courses taken simultaneously
were recorded as follows for a particular group of students.
Com. Skills Grade

Stats Grade 1 2.1 2.2 Other
1 25 6 17 13
2.1 17 16 15 6
2.2 18 4 18 10
Other 10 8 11 20
Are the grades in Statistics and Communication Skills related? Use α = 0.01.
8.9. Ending Remarks
It is required that one may demonstrate that hypothesis testing and confidence
intervals are equivalent procedures in so far as decision making or inference
about population parameters is concerned. However, each procedure presents
different insights. What is the major difference between these two cousin pro-
cedures?
Chapter 9
Regression Analysis
9.1. Introduction
It is important to note that the approach used here first exposes the useful con-
cepts of the regression analysis technique, gives an illustrative example on the
application of these concepts, and then wraps up with a practice question.
Many problems that are encountered in everyday life involve exploring the
relationships between two or more variables. Regression analysis is a statis-
tical tool that is very useful for these types of problems. For example, in the
clothing industry, the sales obtained from selling particular designer outfits is
related to the amount of time spent advertising the label. Regression analysis
can be used to build a model to predict the sales given the amount of time de-
voted to advertising the label. In the sciences, regression analysis models can
be used for process optimization. For instance, finding the temperature levels
that maximises yield or for purposes of process control.
After studying this chapter, you are expected to be able to i) use simple linear
regression for building models to everyday data. ii) apply the method of least
squares to estimate the parameters in a linear regression model. iii) use the
fitted regression model to make a prediction of a future observation. iv) inter-
pret the scatter plot, the correlation coefficient, the coefficient of determination
and the regression parameters.
9.2. Uses of Regression Analysis
The uses of regression are but not limited to:
• understanding underlying processes
• prediction
110 Regression Analysis
• forecasting in the near future
• optimisation
• control purposes
9.3. Abuses of Regression Analysis
Regression analysis is widely used and frequently misused. Several common

abuses of regression include developing statistically significant relationships
among variables that are completely unrelated in a cause - effect sense. A
strong observed association between variables does not necessarily translate
into a cause - effect relationship between the variables. Therefore, care must
be exercised when choosing variables on which to perform regression analysis.
Regression relationships are valid only for values of the explanatory variable
within the range of the original data. The linear relationship that we have
assumed may be valid over the original range of X, but may unlikely remain
so as we extrapolate i.e. if we use values of X beyond the range in question to
estimate the value of Y. Alternatively put, as we stride from the range of the
values of X for which data were collected, our certainty about the validity of
the assumed model tend to fade away. We caution that linear regression mod-
els are not necessarily valid for extrapolation purposes. Note that in many life
situations extrapolation of a regression model may be the only way to approach
a given problem.
We will concentrate on two random variables: the explanatory variable or the

independent variable or the cause variable which is denoted by X, and the re-
sponse variable or the dependent variable or the effect variable denoted by Y.
These two variables vary together. X causes Y to vary or X explains the re-
sponse in Y. Such situations are modeled using the simple linear regression
analysis technique because they have only one explanatory variable or inde-
pendent variable.
9.4. The Simple Linear Regression model
The simple linear regression model is an equation of a straight line given by
Y = a + bX +
Regression Analysis 111
where Y is the response or dependent variable, a is a regression coefficient or

regression parameter called the intercept, b is a regression coefficient or param-
eter called the slope, X is the explanatory or independent variable and e is the
random error term.
The random error term follows a normal distribution with a mean zero and
an unknown variance σ 2 . For completeness, we state that the random errors
corresponding to different observations are also assumed to be uncorrelated or
independent random variables. To determine the appropriateness of employing
simple linear regression we use (1) the scatter plot and or (2) the correlation
coefficient techniques.
9.4.1. The scatter plot
The choice of the model is based on inspection of a scatter diagram. We merely

use our eyes to inspect the nature of the relationship exhibited by the points.
The slope of the points will tell us the direction of the relationship and the
distances between the points will tell us the nature of the magnitude of the re-
lationship. Invariably, we note that there is a deep seated tendency to join the
points on the scatter plot. The points in a scatter diagram need not be joined
and that there is NO line of whatever form to be drawn on a scatter diagram.
A scatter plot will physically show if there is negative, positive or no correlation

between the two variables.
9.4.2. The regression equation
Having established that a linear relationship exists between the random vari-
ables X and Y, we proceed to fit the linear regression model or line or equation.
To fit a regression model is to estimate the regression coefficients a and b. The
estimated regression coefficients are denoted b a and bb. The fitted model is writ-
ten in the form
Yb = b
a + bbX
Now, we have fitted a model and we wish to determine how good it is and then
use it for prediction of new values for the system in question. To determine
how good our model is we calculate the values of the response variable for each
and every value of the explanatory variable and then note the difference. This
difference obtained by subtracting the value of the fitted model from the actu-
ally observed is the error in our model for that observation and it is called the
residual. By performing what is called residual analysis we are able to come
up with a statement on the adequacy of our fitted regression model.
After establishing the adequacy of our model we then proceed to predict future
values of the response variable for the system in question. This is technically
called forecasting.
Computation of the regression coefficients

The method used to compute the regression coefficients is called the least squares
method. The method requires to estimate the value of bb first then use it to get
the value of b
a using the equations below.
We first estimate the slope, b as

P P P
bb = n Pxy − Px y
(9.1)
n x − ( x)2
2
And then we estimate a by
a = y − bbx
b (9.2)
Interpretation of regression coefficients

The intercept, a is the value of Y when X = 0. The slope, b indicates the change
in value of Y when X changes by a unit value.
Naturally, how much of the variability in the response variable has been ex-
plained by fitting the regression model? To answer this question we need to
compute the following coefficient.
Illustration
Consider the following set of observations. Take X to be the exploratory vari-
able and Y to be the response variable.
Y 1 0 1 2 5 1 4 6 2 3 5 4 6 8 4
X 60 63 65 70 70 70 80 90 80 80 85 89 90 90 90
a) Draw a scatter plot for the above data. Comment on the suitability of
using simple linear regression to describe the relationship.
b) Calculate and comment on the Pearson correlation coefficient.
c) Fit the regression model using the method of least squares. Interpret the
regression coefficients.
d) State how much of the variation in Y has been accounted for by fitting the
linear regression model.
e) Using the fitted regression model, what is the value of Y when X = 60?
What is the residual?
f) What is the value of Y when X = 95?
Solution
A scatter diagram of the above data is shown in the figure below.
Figure 9.1: A scatter diagram of X and Y values
The scatter diagram shows a positive linear relationship between x and y val-
ues. This shows that a linear regression equation can be established.
9.4.3. The correlation coefficient
A correlation coefficient measures the strength and direction of the relation-

ship between two variables that vary together. The Pearson product moment
correlation coefficient denoted by r is a measure of the extend of the relation-
ship between the variables. It is given by
P P P
n xy − x y
r=p P P P P (9.3)
[n x2 − ( x)2 ][n y 2 − ( y)2 ]
Note that −1 ≤ r ≤ 1. Though seemingly intimidating, this formula is quite

user friendly.
Interpretation of r
When interpreting r note should be taken to mention the magnitude / size of
the correlation and the direction of the linear relationship. The absence of the
other component renders the interpretation incomplete. We now give typical

interpretations for some values of r. If r = −1, then we say that there is a
perfect negative linear relationship between X and Y. If r = 1, then there is a
perfect positive linear relationship between X and Y. If r = 0, then there is a no
linear relationship between X and Y. Note that this implies that other forms
of relationships may best model the situation. If r = −0.9, then that there is a
very strong negative linear relationship between X and Y. If r = 0.9, then there
is a very strong positive linear relationship between X and Y. If r = −0.75, then
that there is a strong negative linear relationship between X and Y. If r = 0.75,
then there is a strong positive linear relationship between X and Y. If r = −0.5,
then there is a fair negative linear relationship between X and Y. If r = 5, then
there is a fair positive linear relationship between X and Y. Any value of r that
is less than −0.5 or 0.5 says that there is a weak negative or weak positive
linear relationship between X and Y. Thus, using simple linear regression to
model the situation is not advisable.
Remark
The interpretation of r must clearly state the magnitude or size and direction of
the strength of relationship between the random variables under investigation.
9.4.4. The coefficient of determination, R2
The coefficient of determination establishes the amount of variability in the re-

sponse variable that is explained or accounted for by fitting a regression model.
It is obtained by squaring the correlation coefficient, r and we have 0 ≤ r2 ≤ 1.
Expressing r2 as a percentage
R2 = r2 × 100%
gives the amount of variability in the response variable that is explained by

fitted regression model.
Exercise
Consider the following quantities for two random variables X and Y. Let X be
the cause variable and Y be the effect variable.
X X X
n = 20, x = 24, y = 1843, y 2 = 170045,
X X
x2 = 29 and xy = 2215
a) Is it appropriate to employ simple linear regression analysis on these

data?
b) Fit the regression model using the method of least squares. What is the
meaning of the regression coefficients?
c) How much of the variability in Y is explained by the fitted linear regres-

sion model above.
d) Using the fitted regression model, what would be the value of Y when X =
2? What is the residual?
e) What is the value of Y when X = 25?
f) Comment on the usefulness of the values in parts (d) and (e) given that
P
for twenty observations x = 24. Hint: You are expected to reflect on the
uses and abuses of the regression analysis technique.
Chapter 10
Index numbers
10.1. Objectives
After reading this chapter, you will be conversant with

1. The concept of Index Numbers
2. Uses of Index Number
3. Different types of Index Numbers
4. Aggregates method of constructing Index Numbers
10.2. Introduction
Index numbers are today one of the most widely used statistical indicators to
see changes in values of commodities. They are generally used to indicate the
state of the economy, index numbers are called barometers of economic activity.
Index numbers are used in comparing production, sales, changes in exports or
imports over a certain period of time or wages as a measure of cost of living. It
is well known that the wage contracts of workers in our country are tied to the
cost of living which is measured by index numbers.
10.3. What is an Index Number?
An index number is a statistical measure designed to show changes in a vari-

able or a group of related variables with respect to time, geographic location or
other characteristics such as income, profession, etc.
10.3.1. Characteristics of Index Numbers
1. These are expressed as a percentage: Index number is calculated as a

ratio of the current value to a base value and expressed as a percentage.
118 Introduction
It must be clearly understood that the index number for the base year is
always 100. An index number is commonly referred to as an index.
2. Index numbers are specialized averages: An index number is an average

with a difference. An index number is used for purposes of comparison
in cases where the series being compared could be expressed in different
units i.e. a manufactured products index is constructed using items like
dairy products, sugar, edible oils, tea and coffee, etc. These items naturally
are expressed in different units like sugar in kg, milk in litres, etc. The
index number is obtained as a result of an average of all these items,
which are expressed in different units. On the other hand, average is a
single figure representing a group expressed in the same units.
3. Index numbers measure changes that are not directly measurable. An in-
dex number is used for measuring the magnitude of changes in such phe-
nomenon, which are not capable of direct measurement. Index numbers
essentially capture the changes in the group of related variables over a pe-
riod of time. For example, if the index of industrial production is 215.1%
in 1992 (base year 1980) it means that the industrial production in that
year was up by 2.15 times compared to 1980. But it does not, however,
mean that the net increase in the index reflects an equivalent increase in
industrial production in all sectors of the industry. Some sectors might
have increased their production more than 2.15 times while other sectors
may have increased their production only marginally.
10.3.2. Uses of Index Numbers
Index numbers are used to:
1. Establishing trends - Index numbers when analyzed reveal a general trend

of the phenomenon under study. For eg. Index numbers of unemployment
of the country not only reflects the trends in the phenomenon but are use-
ful in determining factors leading to unemployment.
2. Helping in policy making - It is widely known that the dearness allowances

paid to the employees is linked to the cost of living index, generally the
consumer price index. From time to time it is the cost of living index,
which forms the basis of many a wages agreement between the employees
union and the employer. Thus index numbers guide policy making.
3. Determining purchasing power of the dollar. Usually index numbers are

used to determine the purchasing power of the dollar. Suppose the con-
sumers price index for urban non-manual employees increased from $100
Index numbers 119
in 2004 to $202 in 2006, the real purchasing power of the dollar can be
found out as follows:
100
= 0.495
202
The above calculation means that if a dollar was worth $100 in 2004 its
purchasing power is $49.50 in 2006.
4. Deflating time series data - Index numbers play a vital role in adjusting
the original data to reflect reality. For example, nominal income (income
at current prices) can be transformed into real income(reflecting the ac-
tual purchasing power) by using income deflators. Similarly, assume that
industrial production is represented in value terms as a product of vol-
ume of production and price. If the subsequent years industrial produc-
tion were to be higher by 20% in value, the increase may not be as a result
of increase in the volume of production as one would have it but because
of increase in the price. The inflation which has caused the increase in the
series can be eliminated by the usage of an appropriate price index and
thus making the series real.
10.4. Types of Index Numbers
There are three principal types of indices which are: i) Price index, ii) Quantity
index and iii) Value index.
1. Price Index - The most frequently used form of index numbers is the price
index. A price index compares charges in price of commodities. If an
attempt is being made to compare the prices of edible oils this year to the
prices of edible oils last year, it involves, firstly, a comparison of two price
situations over time and secondly, the heterogeneity of the commodities
given the various varieties of commodities. By constructing a price index
number, we are summarizing the price movements of each type of oil in
this group of edible oils into a single number called the price index. The
Whole Price Index (WPI) and the Consumer Price Index (CPI) are some of
the popularly used price indices.
2. Quantity Index - A quantity index measures the change in quantity from

one period to another. If in the above example, instead of the price of
edible oils, we are interested in the quantum of production of edible oils
in those years, then we are comparing quantities in two different years or
over a period of time. It is the quantity index that needs to be constructed
here. The popular quantity index used in this country and elsewhere is
120 Introduction
the index of industrial production (HP). The index of industrial production

measures the increase or decrease in the level of industrial production in
a given period compared to some base period.
3. Value Index - The value index is a combination index. It combines price

and quantity changes to present a more spatial comparison. The value
index as such measures changes in net monetary worth. Though the value
index enables comparison of value of a commodity in a year to the value of
that commodity in a base year, it has limited use. Usually value index is
used in sales, inventories, foreign trade, etc. Its limited use is owing to the
inability of the value index to distinguish the effects of price and quantity
separately. What are the methods of constructing Index Numbers?
10.5. Methods of constructing Index Numbers
There are two approaches for constructing an index number namely Aggre-
gate and Average of relatives methods. The index numbers constructed in
either of these methods could be either a weighted or an unweighted index
number.
10.5.1. Aggregate method
Under the aggregate method we have weighted and unweighted aggregates

index.
Unweighted Aggregates Index

An unweighted aggregate index is calculated by summing the current year
or given years elements and then dividing the result by the sum of the same
elements for the base period. To construct a price index, the following mathe-
matical formula may be used.
P
P1
U nweighted Aggregate P rice Index = P × 100% (10.1)
P0
P P
Where, P1 = Sum of all elements in the composite for current year and P0
= Sum of all elements in the composite for base year.
Merits and demerits of Unweighted Aggregate method
Merit - This is the simplest method of constructing index numbers.

Index numbers 121
Demerits - It does not consider the relative importance of the various com-
modities involved. The unweighted index doesn’t reflect the reality since the
price changes are not linked to any usage or consumption levels.
Illustration
Construct an unweighted index for the three commodities taking 2010 as the
base year.
Prices
Commodities
2010 2012
Oranges (Pockets) 20 28
Milk (Ltr) 5 8
Gas 76 100
The unweighted aggregate price index (UAPI) by:

P
P1 28 + 8 + 100
U AP I = P × 100% =
P0 20 + 5 + 76
136
U AP I = × 100%
101
U AP I = 134.65%
Above we measured changes in general price levels on the basis of changes in

prices of a few items. A comparison has been made between the prices of 2012
and that of the base year 2010.
Interpretation
The price index of 134.65% means that the prices of commodities rose by 34.65%
from 2010 to 2012.
10.5.2. Weighted Aggregates Index
In a weighted aggregates index, weights are assigned according to their sig-

nificance and consequently the weighted index improves the accuracy of the
general price level estimate based on the calculated index. The level of con-
sumption is taken as a measure of its importance in computing a weighted
aggregates index. There are various methods of assigning weights to an in-
dex. The more important ones are: i) Laspeyres ii) Paasche iii) Fixed Weight
Aggregates and iv) Fishers Ideal Method.
122 Introduction
10.6. Laspeyres Index
Laspeyres method uses the quantities consumed during the base period in com-
puting the index number. This method is also the most commonly used method
which incidentally requires quantity measures for only one period. Laspeyres
index can be calculated using the following formula:
P
P1 Q 0
Laspeyres P rice Index(LP I) = P × 100% (10.2)
P0 Q 0
Where, P1 = Prices in the current year, P0 = Prices in the base year, Q0 = Quan-
tities in the base year.
Laspeyres price index calculates the changes in the aggregate value of the
base year’s list of goods when valued at current year prices. In other words,
Laspeyres index measures the difference between the theoretical cost in a given
year and the actual cost in the base year of maintaining a standard of living
as in the base year. Laspeyres quantity index can be calculated by using the
formula:
P
P0 Q1
Laspeyres Quantity Index(LQI) = P × 100% (10.3)
P0 Q0
Where, Q1 = Quantities in the current year and Q0 , P0 are as defined earlier.
Illustration
Calculate the Laspeyres price and quantity indices for the following production
data.
Prices Production
Product P0 P1 Q0 Q1 P 0 Q0 P1 Q0 P0 Q1
1985 1990 1985 1990
Rice 46.60 58.00 700 910 32620.00 42406.00 40600.00
Sugar 14.57 17.92 620 950 9033.40 13841.50 11110.40
Salt 69.46 85.10 205 300 14239.30 20838.00 17445.50
Wheat 33.84 40.30 330 470 11167.20 15904.80 13299.00
Solution
Laspeyres price index is:
P
P1 Q0
Laspeyres P rice Index(LP I) = P × 100%
P0 Q0
Index numbers 123
910(46.6) + 950(14.57) + 300(69.46) + 470(33.84)

LP I =
700(46.6) + 620(14.57) + 205(69.46) + 330(33.84)
9299030
LP I = × 100%
6705990
LP I = 138.67%
A 38.67% increase in price for 1990 compared to 1985.
Laspeyres quantity index is:

P
P0 Q 1
Laspeyres Quantity Index(LQI) = P × 100%
P0 Q 0
82454.90
LQI = × 100%
67059.90
LQI = 122.96%
A 22.96% increase in price for 1990 compared to 1985.
Merits and demerits of Laspeyres method
Merits - A Laspeyres index is simpler in calculation and can be computed once

the current year prices are known, as the weight are base year quantities in a
price index. This enables us an easy comparability of one index with another.
Demerits - Laspeyres tends to overestimate the rise in prices or has an up-

ward bias. There is usually a decrease in the consumption of those items for
which there has been a considerable price hike and the usage of base year quan-
tities will result in assigning too much weight to prices that have increased the
most and the net result is that the numerator of the Laspeyres index will be
too large. Similarly, when the prices go down, consumers tend to demand more
of those items that have declined the most and hence the usage of base period
quantities will result in too low weight to prices that have decreased the most
and the net result is that the numerator of the Laspeyres index will again be
too large. This is a major disadvantage of the Laspeyres index. However, the
Laspeyres index remains most popular for reason of its practicability. In most
countries, index numbers are constructed by using Laspeyres formula.
10.7. Paasche Index
Paasche index calculation is similar to Laspeyres index calculations. The dif-

ference is that the Paasche method uses quantity measures for the current
124 Introduction
period rather than for the base period. The Paasche index can be calculated
using the formula.
P
P1 Q1
P aasche P rice Index(P P I) = P × 100% (10.4)
P0 Q1
Where P1 = Prices in the current year P0 = Prices in the base year Q1 = Quan-
tities in the current year. The Paasche quantity index is given by:
P
P1 Q1
P aasche Quantity Index(P QI) = P × 100% (10.5)
P1 Q0
Merits and demerits of Paasche’s Index
Merit - Paasches index attaches weights according to their significance.
Demerits - Paasche index is not frequently used in practice when the num-
ber of commodities is large. This is because for Paasche index, revised weights
or quantities must be computed for each year examined. Such information
is either unavailable or hard to gather adding to the data collection expense,
which makes the index unpopular. Paasche index tends to underestimate the
rise in prices or has a downward bias.
Illustration
The table below represents prices and quantities of commodities A, B, C and D
for the years 1992 and 1993. Calculate the Paasche price and quantity indices.
1992 1993
Commodity Price Quantity Price Quantity P0 Q0 P0 Q1 P 1 Q0 P1 Q1
A 3 18 4 15 54 45 72 60
B 5 6 5 9 30 45 30 45
C 4 20 6 26 80 104 120 156
D 1 14 3 15 14 15 42 45
178 209 264 306
Solution
Paasche Price Index, (PPI) is:
P
P1 Q 1
P aasche P rice Index(P P I) = P × 100%
P0 Q 1
306
PPI =
209
Index numbers 125
P P I = 146.41%
Paasche Quantity Index, (PQI)

P
P1 Q1
P aasche Quantity Index(P QI) = P × 100%
P1 Q0
306
P QI =
264
P QI = 115.91%
Paasche price index is 146.41% showing a 46.41% increase.
The difference between Paasche index and Laspeyres index reflects the change
in consumption patterns of the commodities A, B, C and D used in that table.
As the weighted aggregates price index for the set of prices was 148.31% us-
ing the Laspeyres method and 146.41% using the Paasche method for the same
set, it indicates a trend towards less expensive goods. Generally, Laspeyres and
Paasche methods tend to produce opposite extremes in index values computed
from the same data. The use of Paasche index requires the continuous use of
new quantity weights for each period considered. As opposed to the Laspeyres
index, Paasche index generally tends to under estimate the prices or has a
downward bias. Because people tend to spend less on goods when their prices
are rising, the use of the Paasche which bases on current weighting, produces
an index which does not estimate the raise in prices rightly showing a down-
ward bias. Since all prices or all quantities do not move in the same order,
the goods which have risen in price more than others at a time when prices in
general are rising will tend to have current quantities and they will thus have
less weight in the Paasche index.
10.8. Fisher’s Index
Prof. Irving Fisher has proposed a formula for constructing index numbers, as
a geometric mean of the Laspeyres and Paasche indices i.e. Fisher’s quantity
and price index are calculated as:
p
F isher0 s Quantity Index = (Laspeyres Quantity Index × P aasche Quantity Index)
0
p
F isher s Quantity Index = (LQI × P QI) (10.6)
126 Introduction
p
F isher0 s P rice Index = (Laspeyres P rice Index × P aasche P rice Index)
0
p
F isher s P rice Index = (LP I × P P I) (10.7)
The following advantages can be cited in favor of Fishers Index:
1. Theoretically, geometric mean is considered the best average for the con-
struction of index numbers and Fishers index uses geometric mean.
2. As already noted, Laspeyres index and Paasche index indicate opposing

characteristics and Fishers index reduces their respective biases. In fact,
Fishers ideal index is free from any bias. This has been amply demon-
strated by the time reversal and factor reversal tests.
3. Both the current year and base year prices and quantities are taken into
account by this index. The Index is not widely used owing to the practical
limitations of collecting data. Fishers Ideal Quantity Index can be found
out by the formula.

Full Stats Notes

Uploaded by

Full Stats Notes

Uploaded by

Contents

2 Data and Data Presentation 17

3 Measures of Central Tendency 33

9 Regression Analysis 109

9.4. The Simple Linear Regression model . . . . . . . . . . . . . . . . . . . . . 110

10 Index numbers 117

1.1. Overview of Statistics

Statistics is when individual data is collected, summarized, analysed and presented

Input Process Output

1.2. Definition of terms

The following terms shall be used in this module more often.

2. Business and economic statistics - These are numerical data on employment,

A Statistician is an individual who collects data, analyses it using statistical tech-

1.3. Sampling techniques

1.3.1. Types of sampling

Reasons for sampling

1.4. Probability sampling methods

1.4.1. Simple random sampling

Advantages of simple random sampling

• It eliminates bias due to the personal judgement or discretion of the researcher.

• More representative of the population.

• Estimates are more accurate.

Disadvantages of simple random sampling

• Requires an up to date sampling frame.

Illustration - Simple random sampling

1.4.2. Systematic random sampling

Illustration - Systematic random sampling

1.4.3. Stratified sampling

Illustration - Stratified sampling

1.4.4. Cluster sampling

Difference between a cluster and a stratum

Probability sampling methods summary

Simple random sampling

1.5. Non-probability sampling methods

1.5.1. Convinience sampling method

1.5.2. Quota sampling method

This is a sampling method in which certain distinct or known characteristics in the

1.5.3. Expert sampling method

1.5.4. Chain referral sampling method

1.6. Sampling errors

Interviewer influence error

1.7. Data collection methods

1.7.1. Observation method

no opportunity to investigate the behavior further. Desk research involves consulting

1.7.2. Interview method

to complete, limited possibilities of probing or further investigations, data collection

1.7.3. Experimentation method

1.8. Worked examples

Data and Data Presentation

2.2. Data types

2.2.1. Qualitative random variables

Examples of qualitative random variables

Random variables Response categories Data code

2.2.2. Quantitative random variables

Examples of quantitative random variables

Random variables Response range Data

Data type 1 - Data measurement scales

such data is referred to as nominal-scaled data. There is no implied ordering between

Examples of nominal-scaled data

Qualitative random variables Response categories Data code

Examples of Ordinal-scaled data

Qualitative random variables Response categories Data codes

Examples of Interval-scaled data

Example 1 of ratio-scaled data

Quantitative random variable Response data values

Example 2: Ratio-scaled data

becomes ordinal-scaled. However, the random variable remains quantitative in na-

Random variable Response category Data code used

A second classification of data type is either discrete and continuous data.

Examples of random variables generating discrete data

Examples of random variables generating continuous data

2.3. Data sources

Internal data sources

External data sources

2.3.1. Primary data sources

Advantages of primary data