Full Stats Notes
Full Stats Notes
1 Introduction 5
1.1. Overview of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2. Definition of terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3. Sampling techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1. Types of sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4. Probability sampling methods . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1. Simple random sampling . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.2. Systematic random sampling . . . . . . . . . . . . . . . . . . . . . 9
1.4.3. Stratified sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.4. Cluster sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5. Non-probability sampling methods . . . . . . . . . . . . . . . . . . . . . . 12
1.5.1. Convinience sampling method . . . . . . . . . . . . . . . . . . . . . 12
1.5.2. Quota sampling method . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.3. Expert sampling method . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5.4. Chain referral sampling method . . . . . . . . . . . . . . . . . . . 12
1.6. Sampling errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.7. Data collection methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.7.1. Observation method . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.7.2. Interview method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.7.3. Experimentation method . . . . . . . . . . . . . . . . . . . . . . . . 15
1.8. Worked examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1
2 CONTENTS
4 Measures of Dispersion 47
4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2. The Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3. The Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4. The Standard deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5. The Coefficient of variation (CV) . . . . . . . . . . . . . . . . . . . . . . . 52
4.6. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 Basic Probability 55
5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2. Definition of terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3. Approches to probability theory . . . . . . . . . . . . . . . . . . . . . . . . 55
5.4. Properties of probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.5. Basic probability concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.6. Types of events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.7. Laws of probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.7.1. Addition Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.7.2. Multiplication laws . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.8. Types of probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.9. Contigency Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.10.Tree diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.11.Counting rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.11.1. Multiplication rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.11.2. Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.11.3. Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.12.Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
CONTENTS 3
6 Probability Distributions 69
6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.3. Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.4. Random variable probability distributions . . . . . . . . . . . . . . . . . . 70
6.5. Properties of discrete random variable distribution . . . . . . . . . . . . 71
6.6. Probability terminology and notation . . . . . . . . . . . . . . . . . . . . . 72
6.7. Discrete probability distributions . . . . . . . . . . . . . . . . . . . . . . . 73
6.7.1. Bernoulli distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.7.2. Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.7.3. Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.8. Continuous probability distributions . . . . . . . . . . . . . . . . . . . . . 78
6.8.1. The Uniform distribution . . . . . . . . . . . . . . . . . . . . . . . 78
6.8.2. The Exponential distribution . . . . . . . . . . . . . . . . . . . . . 80
6.8.3. The Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . 80
6.8.4. The standard normal distribution . . . . . . . . . . . . . . . . . . 81
7 Confidence Intervals 85
7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.2. Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.3. Confidence interval for the Population Mean . . . . . . . . . . . . . . . . 86
7.4. Confidence interval for a population proportion . . . . . . . . . . . . . . . 91
7.5. Confidence interval for the population variance . . . . . . . . . . . . . . . 92
7.6. Confidence interval for population standard deviation . . . . . . . . . . . 93
7.7. Confidence interval for difference of two populations means . . . . . . . 94
7.7.1. Case 1: If population variance is known . . . . . . . . . . . . . . . 94
7.7.2. Case 2: If population variances are unknown . . . . . . . . . . . . 95
8 Hypothesis Testing 97
8.1. Definitions and critical clarifications . . . . . . . . . . . . . . . . . . . . . 97
8.2. General procedure on Hypotheses Testing . . . . . . . . . . . . . . . . . . 99
8.3. Hypothesis testing concerning Population Mean . . . . . . . . . . . . . . 99
8.3.1. Case 1: If the population variance is known . . . . . . . . . . . . . 99
8.3.2. Case 2: If the population variance is not known . . . . . . . . . . 100
8.4. Hypothesis testing concerning the Population Proportion . . . . . . . . . 101
8.5. Comparing two populations . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.5.1. Hypothesis testing concerning difference between two population
means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.6. Independent and dependent samples . . . . . . . . . . . . . . . . . . . . . 104
8.6.1. Advantages of paired comparisons . . . . . . . . . . . . . . . . . . 105
8.6.2. Disadvantages of paired comparisons . . . . . . . . . . . . . . . . 105
8.7. Test Procedure concerning difference of two Population Proportions . . . 106
8.8. Tests for Independence: χ2 -test . . . . . . . . . . . . . . . . . . . . . . . . 107
8.9. Ending Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Introduction
An understanding of statistics allows one to: i) Perform simple statistical data ma-
nipulation and analysis. ii) Intelligently prepare and interpret reports in numerical
terms. iii) Communicate effectively with statistical analysts. iv) Make good decisions.
Statistics
Definition 1
Statistics refers to the methodology of collecting, presenting and analysis of data and
the use of such data.
Definition 2
In common usage, it refers to numerical data. This means any collection of data or
information constitutes what is referred to as Statistics. Some examples under this
definition are:
6 Introduction
1. Vital statistics - These are numerical data on births, marriages, divorces, com-
municable diseases, harvests, accidents etc.
3. Social statistics - These are numeric data on housing, crime, education etc.
Definition 3 - Statistics is making sense of data.
In Statistics, we usually deal with large volumes of data making it difficult to study
each observation, in order to draw conclusions about the source of the data. We seek
statistical methods that can summarise the data so that we can draw conclusions about
these data without scrutinising each observations. Such methods fall under area of
statistics called descriptive statistics.
Population
A population is a collection of elements about which we wish to make an inference.
The population must be clearly defined before the sample is taken.
Parameter(s)
These are numeric measures derived from a population e.g. population mean (µ), pop-
ulation variance (σ 2 ) and population standard deviation (σ).
Data
Data is what is more readily available from a variety of sources and of varying quality
and quantity. Precisely data is individual observation on a variable and in itself con-
veys no useful information.
Information
To make sound decision, one needs good and quality information. Information must be
timely, accurate, relevant, adequate and readily available. Information is defined as
processed data.
Random variable
A variable is any characteristic being measured or observed. Since a variable can take
on different values at each measurement it is termed a random variable. For example,
sales, company turnover, weight, height, yield, number of babies born, colour of vehicle
etc.
Introduction 7
Target population
This is a population whose properties are estimated via a sample or usually the ’total’
population.
Sample
A sample is a collection of sampling units drawn from a population. Data is obtained
from the sample and used to describe characteristics of the population. A sample can
also be defined as a subset or part of or a fraction of a population.
statistic(s)
The term statistics with lowercase s indicates numeric measure(s) derived from a sam-
ple e.g. sample mean (x̄), sample variance (s2 ) and sample standard deviation (s).
Sampling frame
A sampling frame is a list of sampling units. A set of information used to identify a
sample population for statistical treatment. It includes a numerical identifier for each
individual, plus other identifying information about characteristics of the individuals,
to aid in analysis and allow for division into further frames for more in-depth analysis.
Sampling
Sampling is a process used in statistical analysis in which a predetermined number of
observations is taken from a larger population. The methodology used to sample from
a larger population depend on the type of analysis being performed which include
simple random sampling, systematic sampling and cluster sampling. These sampling
methods will be discussed later.
Sampling units
Sampling units are non-overlapping collection of elements from the entire population.
It is a member of both the sampling frame and sample. The sampling units partition
the population of interest.
Probability sampling
Probability sampling has a distinguishing characteristic that each unit in the popula-
8 Introduction
tion has a known, non-zero probability of being included in the sample thus, it is clear
that every subject or unit has an equal chance of being selected from the population.
These probabilities are usually equal for each unit. It eliminates the danger of being
biased in the selection process due to one’s own opinion or desire.
Non-probability Sampling
Non-probability sampling is a process where probabilities cannot be assigned to the
units objectively and hence it is difficult to determine the reliability of the sample
results in terms of probability. A sample is selected according to one’s convenience
or generality in nature. It is a good technique for pilot or feasibility studies. Exam-
ples include purposive sampling, convenience sampling and quota sampling. In non-
probability sampling, the units that make up the sample are collected with no specific
probability structure in mind e.g. units making up the sample through volunteering.
Remark
We shall focus on probability sampling because if an appropriate technique is chosen,
then it assures sample representativeness and hence the errors for the sampling can
be estimated.
Sample size
Reliability degree of the conclusions that we can obtain i.e. an estimation of the error
that we are going to have. An inappropriate selection of the elements of the sample
can cause further errors once we want to estimate the corresponding population pa-
rameters.
The four methods of probability sampling are simple random, systematic, stratified
and cluster sampling methods.
Requires that each element of the population have an equal chance of being selected.
A simple random sample is selected by assigning a number to each element in the
population list and then using a random number table to draw out the elements of the
sample. The element with the number drawn out makes it into the sample. The popu-
Introduction 9
lation is ”mixed up” before a previously specified number, n (sample size), of elements
is selected at random. Each member of the population is selected one at a time, inde-
pendent of one another. However, it is noted that all elements of the study population
are either physically present or listed.
Also, regardless of the process used for this method, the process can be laborious espe-
cially when the list of the population is long or it is completed manually without the
aid of a computer. A simple random sample can be got using calculator by use of the
random key, a computer using excel function =rand(), or random number tables.
In this method, every set of n elements in the population has an equal chance of being
selected as the sample unit.
• Numbering of the elements in a population may be time consuming e.g. for large
populations.
element that is 17th , 27th , 37th , 47th and ... up to 997th . Care must be taken when using
systematic sampling to ensure that the original population list has not been ordered
in a way that introduces any non-random factors into the sampling.
The official may initially randomly select the 15th student. The elevation factor, k =
4000 th th th
200 = 20 student. The official would then keep adding 20 and selecting the 35 , 55 ,
75th student and so on to register for the tour of regional universities until the end of
the list is reached.
Remark
In cases where the population is large and the population list is available, systematic
sampling is usually preferred over simple random sampling since it is more convenient
to the experimenter.
It is used when representatives from each homogeneous subgroup within the popula-
tion need to be represented in the sample. The first step in stratified sampling is to
divide the population into subgroups called strata based on mutually exclusive crite-
ria. Random or systematic samples are then taken from each subgroup. The sampling
fraction for each subgroup may be taken in the same proportion as the subgroup has
in the population.
Remark
Stratified sampling can also sample an equal number of items from each subgroup.
Introduction 11
In cluster sampling, the population that is being sampled is divided into naturally oc-
curring groups called clusters. A cluster is as heterogeneous as possible to matching
the population clusters which says that a cluster is representative of the population.
A random sample is then taken from within one or more selected clusters.
Illustration
An organization with 300 small branches providing a service country wide has an em-
ployee at the HQ who is interested in auditing compliance to some company standards.
The employee might use cluster sampling to randomly select 40 branches as represen-
tatives for the audit and then randomly sample coding systems for auditing from just
the 40.
Remark
Cluster sampling can tell us a lot about that particular cluster, but unless the clusters
are selected randomly and a lot of clusters are sampled, generalizations cannot always
be made about the entire population.
Systematic sampling
Each member of the study population is either assembled or listed, a random start is
designated, then members of the population are selected at equal intervals
Stratified sampling
Each member of the study population is assigned to a homogeneous subgroup or stra-
tum, and then a random sample is selected from each stratum.
Cluster sampling
Each member of the study population is assigned to a heterogeneous subgroup or clus-
ter, then clusters are selected at random and all members of a selected cluster are
included in the sample.
12 Introduction
This is a sampling methods which is based on the proximity of the population elements
to the decision maker. Being at the right place at the right time. Elements nearby are
selected and those not in close physical or communication range are not considered.
The method is also called availability sampling method.
This is a sampling method in which the decision maker has direct or indirect control
over which elements are to be included in the sample. The emthod is appropriate
when the decision maker feels that some members have better or more information
than others or some members are more representative than others. The emthod is
also called judgemental sampling method.
The researcher starts with a person who displays qualities of interest then refers to
the next and so on. The method is also called snowballing or networking sampling
method.
Selection error
Introduction 13
Selection error occurs when some elements of the population have a higher probability
of being selected than others. Consider a scenario where a manager of a local super-
market wishes to measure how satisfied his customers are? He proceeds to interview
some of them from 08:00 to 12:00. Clearly, the customers who do their shopping in the
afternoon are left out and will not be represented making the sample unrepresentative
of all the customers. Such kind of errors can be avoided by choosing the sample so that
all the customers have the same probability of being selected. This is a sampling error.
Non-response error
It is possible that some of the elements of the population do not want or cannot answer
certain questions. It may also happen when we have a questionnaire including per-
sonal questions, that some of the members of the population do not answer honestly or
would rather avoid answering. These errors are generally very complicated to avoid,
but in case that we want to check honesty in answers, we can include some questions
called filter questions to detect if the answers are honest. This is a non-sampling error.
Remark
A sample that is not representative of the population is called a biased sample. Ques-
tions relating to selecting out of naturally arise. These are: When concluding about
the population, how many of the population elements is represented by each one of the
sample elements? What proportion of the population are we selecting? The responses
lie in the following data collection methods discussed below.
The three data collection methods are: observation, interviews and experimenta-
tion. Depending on the type of research and data to be collected, different methods
can be used to collect that data set.
This method has the direct and desk research methods. Direct observation involves
collecting data by observing the item in action. Examples for this method are: pedes-
trian flow at a junction, traffic flow at a road intersection, purchase behavior of a
commodity in a shop, quality control inspection etc. An advantage of this method is
that the respondent behaves in a natural way since he is not aware that he is being
observed. A disadvantage is that it is a passive form of data collection. Also there is
14 Introduction
This method collects primary data through direct questioning. A questionnaire is the
instrument used to structure the data collection process. Three approaches in data
collection using interviews are: personal, postal and telephone interviews.
Personal interviews
A questionnaire is completed through face-to-face contact with the respondent. A re-
searcher carries out an interview with the respondent through use of guided questions.
Advantages for this method are: high response rate, it allows probing for reasons,
data collection is immediate, data accuracy is assured, useful for technical data, non-
verbal responses can be observed and noted, more questions can be asked, responses
are spontaneous and use of aided-recall questions is possible. Disadvantages for this
method are that it is time consuming, it requires trained and experienced interview-
ers, fewer interviews are conducted because of cost and time constraints, biased data
can be collected if interviewer is inexperienced.
Telephone interviews
The interview is conducted through telephone between the interviewer and intervie-
wee. The researcher asks questions from a guided questionnaire through phoning
the respondent. Advantages of this method are: it allows quicker contact with geo-
graphically dispersed respondents, callbacks can be made if respondent is not initially
available, low cost, interviewer probing is possible, clarity on questions can be provided
by the interviewer and a larger sample of respondents can be reached in short space
of time. Disadvantages are that respondent anonymity is lost, non-verbal responses
cannot be observed, trained interviewers are required hence more costly, possible in-
terviewer bias, respondent may terminate interview prematurely, and sampling errors
are compounded if more respondents do not have telephones.
Postal surveys
When target population is large or geographically dispersed then use of postal ques-
tionnaires is considered most suitable. It involves posting questionnaires to the se-
lected sampling units. Advantages of this method is that larger sample of respon-
dents can be reached, very costs effective, interviewer bias is eliminated, respondents
have more time to consider their responses, anonymity of respondents is assured re-
sulting in more honest responses, respondents are more willing to answer personal
questions. The disadvantages for this method are: low response rate, respondents
cannot get clarity on some questions, mailed questionnaires must be short and simple
Introduction 15
Solution
Question
Solution
16 Introduction
Chapter 2
2.1. Introduction
A statistician collects data, analyses it using statistical techniques, interprets the re-
sults and makes conclusions and recommendations on the basis of the analysis. The
word data keeps turning in our discussion. Data is the ”blood of statistics”. It refers to
the raw, unprocessed facts or figures
The world of statistics resolves around data, there is no statistics without data. What
is data? How is it collected? Why do we collect it? These are the questions to be
answered in this chapter.
An understanding of nature of data is necessary for two reasons. It enables a user to:
assess data quality and to select the appropriate statistical method to use to analyse
the data.
Quality of data is influenced by three factors that are type, source and method used
to collect data. The type of data gathered determines the type of analysis which can
be performed on the data. Certain statistical methods are valid for certain data types
only. An incorrect application of a statistical method to a particular data type can ren-
der the findings invalid and also give incorrect results.
Data type is determined by the nature of the random variables which the data rep-
resents. Random variables are essentially of two kinds that are qualitative and quan-
titative.
18 Introduction
Each random variable category is associated with a different type of data. There are
two classifications of data types.
Data measurement scales include nominal, ordinal, interval and ratio-scaled data.
Nominal-scaled data
Objects or events are distinguished on the basis of a name. Nominal-scaled data is
associated mainly with qualitative random variables. Where data of qualitative ran-
dom variables is assigned to one of a number of categories of equal importance, then
Data and Data Presentation 19
Each observation of the random variables is assigned to only one of the categories pro-
vided. Arithmetic calculations cannot be meaningfully performed on the coded values
assigned to each category. They are only numeric codes which are arbitrarily assigned
and can be counted. Nominal-scaled data is the weakest form of data, since only a
limited range of statistical analysis can be formed on such data.
Ordinal-scaled data
Objects or events are distinguished on the basis of the relative amounts of some char-
acteristics they posses. The magnitude between measurements is not reflected in the
rank. Such data is associated mainly with qualitative random variables. Like nominal-
scaled data, ordinal-scaled data is also assigned to only one of a number of coded cat-
egories, but there is now a ranking implied between the categories in terms of being
better, bigger, longer, older, taller, or stronger, etc. While there is an implied differ-
ence between the categories, this difference cannot be measured exactly. That is, the
distance between categories cannot be quantified nor assumed to be equal. Ordinal-
scaled data is generated from ranked responses in market research studies.
There is a wider range of valid statistical methods (i.e. the area of non-parametric
statistics) available for the analysis of ordinal-scaled data than there is for nominal-
scaled data. Ordinal-scaled data is also generated from a ”counting process”.
20 Introduction
Interval-scaled data
Interval-scaled data is associated with quantitative random variables. Differences
can be measured between values of a quantitative random variable. Thus interval-
scaled data possesses both order and distance properties. Interval-scaled data, how-
ever, does not possess an absolute origin. Therefore the ratio of values cannot be mean-
ingfully compared for interval-scaled data. The absolute difference makes sense when
interval-scaled data has been collected.
Interval-scaled data is most often generated in marketing studies through rating re-
sponses on a continuum scale. A wide range of statistical technique Ratio-scaled data
This data is associated mainly with quantitative random variables. If the full range of
arithmetic operations can be meaningfully performed on the observations of a random
variable, the data associated with that random variable is termed ratio-scaled. It is a
numeric data with a zero origin. The zero origin indicates the absence of the attribute
being measured.
Such data are the strongest form of statistical data which can be gathered and lends
itself to the widest range of statistical methods. Ratio-scaled data can be manipulated
meaningfully through normal arithmetic operations. Ratio-scaled data is gathered
through a measurement process. It should be noted that if ratio-scaled data is grouped
into categories, the data type becomes ordinal-scaled. This then reduces the scope for
statistical analysis on the random variable.
When data capturing instruments are set up, care must be exercised to ensure that
the most useful form of data is captured. However, this is not always possible for
reasons of convenience, cost and sensitivity of information. This applies particularly
to random variables such as age, personal income, company turnover and consumer
behavior questions of a personal nature. The functional area of marketing generates
mostly categorical that is nominal/ordinal data arising from consumer studies, while
the areas of finance or accounting and production generate mainly quantitative (ratio)
data. Human resources management generates a mix of qualitative and quantitative
data for analysis.
Data type 2
Discrete data
A random variable whose observations can take on only specific values, usually only
integer values, is referred to as a discrete random variable. In such instances, certain
values are valid, while others are invalid.
Continuous data
A random variable whose observations take on any value in an interval is said to gen-
erate continuous data. This means that any value between a lower and an upper limit
is valid.
caravan. (iv) Tensile strength of material. (v) Speed of an aircraft. (vi) Length of a
ladder.
Data for statistical analysis are available from any different sources. There are two
classification types of data sources that are: internal or external and primary or sec-
ondary sources.
Data which is captured at the point where it is generated is called primary data. Such
data is captured for the first time and with specific purpose in mind. Examples of data
sources are similar to those for internal data source but also include survey data that
is personnel, salary and market research surveys.
Data collected and processed by others for a purpose other than the problem at hand
are called secondary data. Such data are already in existence either within or outside
an organisation that is one can get both internal and external secondary data. The
problem at hand is to determine whether data is primary or secondary. Examples of
internal secondary data sources are: Aged market research figures, previous financial
statements of your company and past sales reports. Examples of external secondary
data sources are reports produced by external data sources.
A frequency distribution table is a table that summarises the random variable showing
how it is distributed from the lowest to the highest value and the number of occurrence
(frequencies) of the random variable values. It can show distribution of exact values
of the random variable in class intervals. Frequency distribution tables can display
values for grouped or ungrouped data sets. An example of a frequency distribution
table is shown below.
24 Introduction
Marks Frequencies
10 - 19 7
20 - 29 10
30 - 39 9
40 - 49 3
50 - 59 5
60 - 69 1
A pie chart as the name suggests, is a circle divided into segments like a pie cut into
pieces from the centre of a circle going outwards. Each segment represents one or
more values taken by a variable. Such charts are used to display qualitative data. An
example below illustrates how to and see how to construct and interpret a pie chart.
We now express these age groups as proportions or percentages and then indicate
There are only 4 groups. What we wish to do is to represent these percentages of age
groups as angles in degrees that add up to 360o (the total number of degrees in a circle)
as shown in column 5 table 2.4.2. The calculation of the angle of the ith category can
be done directly from the observations by using the formular.
Xi
Angle i = Pn × 360o
i=1 Xi
Data and Data Presentation 25
ie. each observation multiplied by 3600 divided by the sum of the observations.
2.4.4. Histogram
A histogram is a graph drawn from a frequency distribution. It is used to represent
continuous quantitative data. It usually consists of adjacent, touching rectangles or
bars. The area of each rectangle is drawn in proportion to the frequency corresponding
to that frequency class. When the class intervals are equal, the area of each rectangle
is a constant multiple of height and so the histogram can be drawn as for a bar chart,
except that the rectangles are touching. If the class intervals are not equal, the fre-
quencies are adjusted accordingly to come up with frequency densities for the larger
class intervals.
Illutration - Histogram
Consider results of a test written by 45 students and marked out of 70. Data is pre-
sented in categories in table below. Use the data in the table to draw a histogram for
the mark distribution.
Marks Frequencies
10 - 19 7
20 - 29 10
30 - 39 9
40 - 49 3
50 - 59 5
60 - 69 1
Data and Data Presentation 27
A stem and leaf diagram is basically a histogram where the rectangles are built up to
the correct height by individual numbers. Each data value is split up into its stem,
the first digit or first two digits, etc., depending on the data and its leaves. Thus, the
number 23 will have a stem 2 and leaf 3. The number 7 has stem 0 and leaf 7. Perhaps
an example will illustrate this clearly.
7 15 22 38 12 18 14 26 20 15 22 34 12 18 24
19 14 29 21 32 12 17 24 13 25 20 15 31 11 16
23 39 19 14 28 20 9 16 22 39 13 25 19 14 31
To display this information in a stem and leaf plot, we take stems 0, 1, 2 and 3 and
list them on the left side of a vertical line and the leaves on the right side opposite the
appropriate stem. The stem and leaf display of these data are represented below. A
stem and leaf display should always have a key that indicates how data is displayed
ie. Key: 0|7 = 7 or Key: 3|8 = 38.
Stem Leaf
0 79
1 122233444455566788999
2 000122234455689
3 1124899
28 Introduction
Key: 3|8 = 38
Take note that 1st , 2nd , 3rd etc. number on the right (leaf) side should be in the same
columns for the histogram feature to reveal.
Solution
Question
Solution
2.6. Exercises
1. Classify the following data sources as either primary or secondary and internal
or external
2. Define primary and secondary data. Include in your answers the advantages and
disadvantages of both data types. Give two examples of secondary data.
24 19 21 27 20 17 17 32 22 26 18 13 23 30 10
13 18 22 34 16 18 23 15 19 28 25 25 20 17 15
(a) Define the random variable, the data type and the measurement scale.
(b) From the data, prepare:
(i) an absolute frequency distribution,
(ii) a relative frequency distribution and
(iii) a less than ogive.
(c) Construct the following graphs:
(i) a histogram of the relative frequency distribution,
(ii) stem and leaf diagram of the original data.
(d) From the graphs, read off what percentage of trips were:
(i) between 25 and 30 km long,
(ii) under 25km,
(iii) 22km or more?
32 Introduction
Chapter 3
3.1. Introduction
From the previous unit, graphical displays and charts were discussed. These are useful
visual means of communicating broad overviews of the behaviour of a random variable.
However, there is a need for numerical measures which will convey more precise infor-
mation about the behaviour pattern of a random variable. The behaviour or pattern of
any random variable can be described by measures of:
a) Number of observations
The number of all observations in a population is denoted by N whilst those in a sam-
ple are denoted by n. N is a population size while n is a sample size.
b) Observations
A list of values in a data set is named using a random vaiable name X or Y. Each of
the observation is called xi for i = 1, 2, 3, ..., N for a population. This means x1 is the
first value, x2 is the second value, x3 is the third value and so on until the last value
called xN .
c) Sum of observation
Adding up values in a dataset of N values are statistically done by x1 +x2 +x3 +...+xN .
N
X
This is written in short as: xi .
i=1
34 Measures of Central Tendency
• Mode and
• Median
Each of these measures will be discussed and computed for grouped and ungrouped
data.
Where:
x1 + x2 + x3 + ... + xN
µ =
N
XN
xi
i=1
µ =
N
Where:
f1 x1 + f2 x2 + f3 x3 + ... + fn xm
x̄ =
f1 + f2 + f3 + ...fm
Xm
fi xi
i=1
x̄ = P (3.3)
fi
f1 x1 + f2 x2 + f3 x3 + ... + fn xm
µ =
f1 + f2 + f3 + ...fm
Xm
fi xi
i=1
µ = (3.4)
N
Where:
Student mark, X 5 8 11 13 14 18
Number of students 2 1 4 5 3 2
Solution
The mean student mark is given as:
f1 x1 + f2 x2 + f3 x3 + ... + fn xm
x̄ =
f1 + f2 + f3 + ...fm
2(5) + 1(8) + 4(11) + ...2(18)
x̄ =
2 + 1 + 4 + ...2
205
x̄ = = 12.1
17
Note
If the frequency distribution table has observation in class intervals, say, 6 - 10, 11- 15,
16 - 20 etc. use the midpoints of the class interval to represent the observation xi . To
get the midpoint of a class interval use the formular, Midpoint = Lower value +
2
U pper value
.
• The arithmetic mean uses all values of the data set in its computation.
• The sum of deviation of each observation from the mean value is equal zero.
Pn
i.e. i=1 (xi − x̄) = 0. This makes the mean an unbiased statistical measure of
central location.
• is easy to understand,
• easy to compute,
• affected or distorted by extreme values in the dataset. Extreme values are called
outliers.
• is not valid to compute the mean for nominal or ordinal scaled data. It is only
meaningful to compute the arithmetic mean for ratio scaled data that is discrete
or continuous data.
• often a poor measure for grouped data with open-ended extreme classes
There are other means that can be calculated for different distribution of values. These
are harmonic, geometric and weighted arithmetic means. We will not discuss
them in this module.
Illustration
If you are given a list of colours as Blue (B), Green (G), Red (R) and Yellow (Y). Con-
sider a sample YGBRBBRGYB, picked from a mixed bag. What is the modal colour?
Solution
The modal colour is Blue since it appears most, with a highest frequency of 4.
c(f1 − f0 )
M ode = lmo + (3.5)
2f1 − f0 − f2
38 Measures of Central Tendency
where:
Test mark, x 5 - 10 10 - 15 15 - 20 20 - 25 25 - 30
Frequency 3 5 7 2 4
Solution
We seek to use the formula
c(f1 − f0 )
M ode = lmo +
2f1 − f0 − f2
where 15-20 is the modal class interval with the highest frequency of 7, lmo = 15,
f1 = 7, f0 = 5, f2 = 2, and c = 5. Substituting these in the equation above yields
5(7 − 5)
M ode = 15 +
2(7) − 5 − 2
10
M ode = 15 +
7
M ode = 16.42
• is easy to determine.
• often not unique and may not even exist e.g. for ungrouped data.
Measures of Central Tendency 39
The median is the value of a random variable which divides an ordered (ascending or
descending order) data set into two equal parts or that value that lies at the centre of
an ordered distribution. It is also called the second quartile Q2 or 50th percentile. Half
of the observation will fall below this value and the other half above it. If the number
of observations, n, is odd, then the median is observation ( n+1 th
2 ) . If the number of
observations is even, then median is the average of ( n2 )th and ( n2 + 1)th observation. For
grouped data in a frequency distribution table, use the formular below after identifying
the median class interval, which is the ( n+1 th
2 ) interval. The median formular is:
c( n2 − F (<))
M edian = Lme + (3.6)
fme
where
• F (<) if the cumulative frequency of the interval just before the median class
interval and
Solution
The number of observations is 100, which is even thus the median is mean of ( n2 )th and
( n2 + 1)th observations i.e. the mean of the 50th and 51st observations. To find these
observations we first find the cumulative frequencies of the data set.
4400 + 4900
M edian =
2
M edian = 4650
40 Measures of Central Tendency
Interpretation
This means 50% of the workers get income less than $4650 and another 50% get income
which is more than $4650.
Given the following grouped data in a frequency table, find the median.
We use a standard formular above to calculate the median of the above grouped data
which is
c( n2 − F (<))
M edian = Lme +
fme
where:
We calculate the cumulative frequencies and then identify the median class which is
the class containing the ( n+1 th
2 ) observation.
Solution
First and foremost, order the data set, in this case its already ordered. Then calculate
the cumulative frequencies we get.
Measures of Central Tendency 41
c( n2 − F (<))
M edian = Lme +
fme
Where the median class is 20 - 30, c = 10, n = 50, F (<) = 14, fme = 22 and Lme = 20.
Substituting we have
10[ 50
2 − 14]
Me = 20 +
22
110
Me = 20 +
22
Me = 25
Interpretation
This implies that 50% of the students got less than 25 marks and the other 50% got
more than 25 marks.
• unaffected by outliers,
• easy to determine,
• best suited as a central location measure for interval-scaled data such as rating
scales.
3.7. Quartiles
Quartiles are observations that divide an ordered data set into quarters (four equal
parts). Lower Quartile, Q1 is the first quartile or 25th percentile. It is that observa-
42 Measures of Central Tendency
tion which separates the lower 25 percent of the observations from the top 75 percent of
ordered observations. Middle Quartile, Q2 is the second quartile or 50th percentile or
the median. It divides an ordered data set into two equal halves. The middle quartile
is also called the median. Upper Quartile, Q3 is the third quartile or 75th percentile.
It is that observation which divides observations into the lower 75 percent from the
top 25 percent.
To compute quartiles, a similar formular is used as for calculating median. The only
difference lies in (i) the identification of the quartile position, and (ii) the choice of the
appropriate quartile interval. Each quartile position is determined as follows:
For the first quartile Q1 position, use ( n4 )th value, for Q2 position, use ( n2 )th value, and
for Q3 position use ( 3n
4 )
th to calculate the position of the respective quartiles. The
appropriate quartile interval is that interval into which the quartile position falls.
Like the median calculations, this is identified using the less than ogive. A formular
for Q1 is:
c( n4 − F (<))
Q1 = Lq1 + (3.7)
fq1
where:
• F (<) is the cumulative frequency of the class interval before the lower quartile
interval and
Illustration
Using income data below, find Q1 .
Solution
Constructing a cumulative frequency table to use for the calculation, we have:
n = 100, hence Q1 position is at n4 = 100 th
4 = 25 position. Arranging number of workers
cummulatively i.e. coming up with a cummulative distribution table 25th value lies at
income $4100. Hence Q1 is $4100.
Measures of Central Tendency 43
In calculating quartiles for grouped data, use of the formular is required since the
position of the quartile will be a interval of the observations. The formular allows
us to find the exact value. Find the first, second and third quartile values from the
distribution below.
c( n4 − F (<))
Q1 = Lq1 +
fq1
Where Lq1 = 10 c = 10, n = 50, F (<) = 2 and fq1 = 12. Thus substituting into the
formula, we get:
c[ n4 − F (<)]
Q1 = Lq1 +
fq1
10[ 50
4 − 2] 105
Q1 = 10 + = 10 +
12 12
Q1 = 18.75
(3.8)
Interpretation
25% of the students got marks below 18.75 or 75% of the students got marks above
18.75.
44 Measures of Central Tendency
Q2 position, use n2 = 50 th th
2 = 25 position. Q2 interval = 20 - 29 since the 25 observation
falls within these limits. The formula for Q2 is:
c[ n2 − F (<)]
Q2 = Lq2 +
fq2
Q3 position 3n4 =
3×50
4 = 37.5th position. Q3 interval = 30 - 39 since the 37.5th observa-
tion falls within this limit. The formula for Q3 is:
c[ 3n
4 − F (<)]
Q3 = Lq3 + (3.9)
fq3
where Lq3 = 30, n = 50, F (<) = 36, fq3 = 8 and c = 10. Thus:
c[ 3n
4 − F (<)]
Q3 = Lq3 +
fq3
10[ 3×50
4 − 36]
Q3 = 30 +
8
Q3 = 31.875
c[ 3n
4 − F (<)]
Q3 = Lq3 +
fq3
10[ 3×50
4 − 36]
Q3 = 30 + = 31.875
8
Interpretation:
75% of the students got below 31.875 marks. Alternatively, 25% of the students got
above 31.875 marks.
3.7.5. Percentiles
In general, any percentile value can be found by adjusting the median formula to: (i)
Find the required percentiles position and from this and (ii) Establish the percentile
interval.
Illustration
90th percentile position = 0.9 × n, 35th percentile position = 0.35 × n, 29th percentile
position(Q1 ) = 0.29 × n
Measures of Central Tendency 45
Uses of percentiles:
Percentiles are used to identify various non-central values. For example, if it is desired
to work with a truncated dataset which excludes extreme values at either end of the
ordered dataset.
3.8. Skewness
2. If mean < median < mode the frequency distribution is negatively skewed i.e.
skewed to the left.
3. If mean > median > mode the frequency distribution is positively skewed i.e.
skewed to the right.
Remark:
2. If the frequency distribution is skewed, the median may be the best measure of
central location as it is not pulled by extreme values, nor is it as highly influenced
by the frequency of occurrence.
3.9. Kurtosis
• Platykurtic flat distribution i.e. the observations are widely spread about the
central location.
46 Measures of Central Tendency
3.10. Exercises
1. The number of days in a year that employees in a certain company were away
from work due to illness is given in the following table:
Find the modal class and the modal sick days and interpret.
Sex F M F M F M M F F F F M
Seniority (yrs) 8 15 6 2 9 21 9 3 4 7 2 10
(a) Find the seniority mean, median and mode for the above data.
(b) Which of the mean, median and mode is the least useful measure of location
for the seniority data? Give a reason for your answer.
(c) Find the mode for the sex data. Does this indicate anything about the em-
ployment practice of the company when compared to the medians for the
seniority data for males and females?
Chapter 4
Measures of Dispersion
4.1. Introduction
Spread or Dispersion refers to the extent by which the observations of a random vari-
able are scattered about the central value. Measures of dispersion provide useful infor-
mation with which the reliability of the central value may be judged. Widely dispersed
observations indicate low reliability and less representativeness of the central value.
Conversely, a high concentration of observation about the central value increases con-
fidence in the reliability and representativeness of the central value. Measures of
dispersion include range, variance and standard deviation.
The range is the difference between the highest and the lowest observed values in a
dataset. For ungrouped dataset,
For grouped distribution with class intervals, xmin is the lower limit of the lower class
interval and xmax is the upper limit of the highest class interval.
Interquartile Range = Q3 − Q1
This modified range removes some of the instability inherent in the range if outliers
are present, but it excludes 50 percent of all observations from further analysis. This
measure of dispersion, like the range, also provides no information on the clustering
of observations within the dataset as it uses only two observations.
Quartile deviation
A measure of variation based on this modified range is called quartile deviation (QD)
or the semi-interquartile range. It is found by dividing the interquartile range in half
i.e.
Q3 − Q1
Quartile deviation =
2
Remember when calculating this measure you order your dataset first to calculated Q3
and Q1 . The quartile deviation is an appropriate measure of spread for the median. It
identifies the range below and above the median within which 50 percent of observa-
tions are likely to fall. It is a useful measure of spread if the sample of observations
contains excessive outliers as it ignores the top 25 percent and bottom 25 percent of
the ranked observations.
The most useful and reliable measures of dispersion are those that take every observa-
tion into account and are based on an average deviation from a central value. Variance
is such a measure of dispersion. Population variance is denoted by σ 2 whereas sample
variance is denoted by s2 .
x2 − nx̄2
P
2
s =
n−1
x2 − N µ 2
P
2
σ =
N
The main difference between computational formulae for sample and population vari-
ances is on the denominator of the two. Population variance divides the numerator by
N whereas sample variance divides the numerator by n − 1.
Solution P
xi 84
Step 1: Find the sample mean, x̄ = = 7 = 12 years.
n
Step 2: Find the squared deviation of each observation from the sample mean. See
table below.
Car age, xi Mean, x̄ Deviation (xi − x̄) Deviations squared (xi − x̄)2
13 12 (13-12) = +1 12 ) = 1
7 12 (7-12) = -5 (−5)2 = 25
10 12 -2 4
15 12 +3 9
12 12 0 0
18 12 +6 36
9 12 -3 9
(xi − x̄)2 = 84
P P
(xi − x̄) = 0
Step 3: Find the average squared deviation that is the variance using the formular:
Pn
2 − x̄)2
i=1 (xi 84
s = =
n−1 7−1
50 Measures of Dispersion
s2 = 14 years2
Note
Divison by n would appear logical, but the variance statistic would then be a biased
measure of dispersion. It can be shown to be unbiased if division is by (n−1). For large
samples i.e for n greater than 30, however this distinction becomes less important.
x2 − nx̄2
P
2
Alternatively use the formula s = you will get the same value.
n−1
P 2
2 xi − nx̄2
s =
n−1
x2 = 1092,
P P
x = 84, n=7 and x̄ = 12, substituting the values in the above formular:
1092 − 7(122 ) 84
s2 = = = 14 years2
7−1 6
Sample variance for such grouped data is calculated using the formular:
Pn
2 − x̄)2
i=1 f (xi
s = (4.3)
n−1
f x2i − N µ2
P
2
σ =
N
Illustration
Consider data for student marks obtained from Test 1. Calculate the sample variance
of the student marks shown below.
Marks 0-10 10-20 20-30 30-40 40-50
Frequency 2 12 22 8 6
Solution
The midpoint in class intevals is calculated as:
f x2i − nx̄2
P
2
s =
n−1
38450 − 50(25.8)2 5168
s2 = =
50 − 1 49
s2 = 105.47 marks2
The variance is a measure of average squared deviation about the arithmetic mean.
It is expressed in squared units. Consequently, the meaning in a practical sense is
obscure. To provide meaning, the measure should be expressed in the original units of
the random variable.
A standard deviation is a measure which expresses the average deviation about the
mean in the original units of the random variable. The standard deviation is the
square root of the variance. Mathematically the standard deviation is calculated as:
sP
f x2i − nx̄2
sx = (4.4)
n−1
s
CV = × 100%
x̄
σ
CV = × 100%
µ
.
This ratio describes how large the measure of dispersion is relative to the mean of
the observation. A coefficient of variation value close to zero indicates low variability
and a tight clustering of observations about the mean. Conversely, a large coefficient
of variation value indicates that observations are more spread out about their mean
value. From our example above,
s 10.27
CV = × 100% = × 100% = 39.8%.
x̄ 25.8
4.6. Exercises
1. Find the mean and standard deviation for the following data which records the
duration, in minutes of 20 telephone calls for technical advice on car repairs by a
mechanic.
Duration Number of calls
0-≤1 7
1-≤2 0
2-≤3 3
3-≤4 1
4-≤5 9
At a cost of $2.60 per minute, what was the average cost of a call, and what
was the total cost paid by the 20 telephone callers. Calculate the coefficient of
variation and interpret it.
47 31 42 33 58 51 25 28 62 29 65 46
51 30 43 72 73 37 29 39 53 61 52 35
3. Give three reasons why the standard deviation is regarded as a better measure
of dispersion than the range.
(a) Outliers
(b) Skewness
(c) Kurtosis
54 Measures of Dispersion
Chapter 5
Basic Probability
5.1. Introduction
This unit introduces basic concepts and terminologies in probability. They include
events, types and rules of probabilities. Probability theory is fundamental to the area
of statistical inference. Inferential statistics deals with generalising the behaviour of
random variables from sample findings to the broader population. Probability theory
is used to quantify the uncertainties involved in making these generalisations.
Most decisions are made in the face of uncertainty. Probability is therefore, concerned
with uncertainty.
There are two broad approaches to probability namely subjective and objective.
56 Basic Probability
Subjective probability
It is probability which is based on a personal judgement that a given event will occur.
There is no theoretical or empirical basis for producing subjective probabilities. In
other words this is probability of an event based on an educated guess, expert opin-
ion or just plain intuition. Subjective probabilities cannot be statistically verified and
there are not extensively used, hence will not be considered further in this module.
Examples
1. When commuters board a commuter omnibus, they assume that they will arrive
safely at their destinations, so P(arriving safely) = 1.
2. If you invest some money, you assume that you will get a good return, so P (good
return) = 0.9.
Objective probabilities
These are probabilities that can be verified, through repeated experimentation or em-
pirical observations. Mathematically it is defined as a ratio of two numbers:
r
P (A) =
n
Where:
• a priori - that is when possible outcomes are known in advance such as tossing
a coin,selecting cards from a deck of cards. Classical probability will be given as:
For example the probability of a Head if a fair coin is tossed once is, P (Head) =
1
2 = 0.5
• Empirically that is when the values of r and n are not known in advance and
have to be observed through data collection or from a relative frequency table you
can deduce probability of the different outcomes.
For a given event to follow a probability distribution, the following properties should
hold.
Illustration
Consider random process of drawing cards from a deck of cards. These probabilities
are called a priori probabilities.
26 1
1. Let A = event of selecting a red card. Then P (Red card) = 52 = 2 (26 possible red
cards out of 52 cards).
13 1
2. Let B = event of selecting a spade. Then P (Spade) = 52 = 4 (13 possible spades
out of a total of 52).
4 1
3. Let C = event of selecting an ace. Then P (Ace) = 52 = 13 (4 possible ace out of a
total of 52 cards).
1 12
4. Let D = event of selecting ’not an ace’. Then P (not an ace) = 1P (ace) = 1 − 13 = 13 .
58 Basic Probability
1. Intersection of two events - The intersection of two events A and B is the set
of outcomes that belong to both A and B simultaneously. It is written as A ∩ B or
A and B and the keyword is and.
2. Union of two events - The union of events A and B is the set of outcomes that
belong to either A or B or both and the key word is or. It is written as A ∪ B.
Examples
(a) Passing and failing the same examination are mutually exclusive. In other
words its not possible to pass and fail at the same time one examination.
(b) In tossing a fair die once, getting a 3 and a 5 are mutually exclusive. You get
one outcome at time and not both.
Examples
(a) In tossing a fair die once, getting an odd number or a number greater than
2 are non mutually exclusive events i.e. it is possible for the number to be
odd and at the same time being greater than 2.
(b) An individual can have more than one bank account i.e. if you open a bank
account it does not prevent you from opening another account with another
bank.
Basic Probability 59
Example
Consider a random experiment of selecting companies from the Zimbabwe Stock
Exchange. Let event A = small company, event B = medium company and event
C = large company. Then (A ∪ B ∪ C) = sample space (small, medium, large
companies) = all ZSE companies.
Example
Let A = event that an employee is over 30 years of age, B = event that the em-
ployee is female. If it can be assumed or empirically verified that a randomly
selected employee is over 30 years of age from a large organisation is equally
likely to be either male or female employee, then the two events A and B are
statistically independent.
Remark
The terms statistically independent and mutually exclusive events should
not be confused. They are two very different concepts. When two events are
mutually exclusive, they are NOT statistically independent. They are dependent
in the sense that if one event happens, then the other event cannot happen. In
probability terms, the probability of the intersection of two mutually exclusive
events is zero, while the probability of two independent events is equal to the
product of the probabilities of the separate events.
There are generally two laws in probability theory, namely, addition and multiplica-
tion Laws.
60 Basic Probability
P (A ∪ B) = P (A) + P (B)
Note: Use of Union (∪) means or
Example
What is the probability of getting a 5 or 6 if a fair die is tossed once?
Solution
The sample space has six possible outcomes 1, 2, 3, 4, 5, 6. Therefore P (5 or 6) =
P (5) + P (6) = 16 + 16 = 31
The intersection sign, ∩ means the joint probability of events A and B. P (A and B)
is subtracted to avoid double counting.
Example
What is the probability of getting an even number or a number less than four if
a fair die is tossed once?
Solution
Let event A = getting an even number and the elements are 2, 4, 6 and event B
= getting a number less than four and the elements are 1, 2, 3. Then P (A) =
3 3 1
6 and P (B) = 6 . Thus P (A and B) = 6 . There is only one element which is
common in A and B and the number is 2. Therefore
3 3 1 5
P (A or B) = P (A) + P (B) − P (A ∩ B) = + − =
6 6 6 6
Exercise 1
Sixty per cent of the population of a town read either magazine A or magazine B and
10% read both. If 50% read magazine A, what is the probability that one person,
selected at random, read magazine B?
Basic Probability 61
Multiplication laws pertain to dependent and independent events. The key word is
AND
Illustration
What is the probability of getting a tail when two fair coins are tossed at the
same time?
Solution
Take events, T1 and T2 such that T1 = the probability of getting a tail from first
coin. T2 = the probability of getting a head from second coin. The two outcomes
do not affect each other. Therefore:
P (T 1 and T 2) = P (T 1) × P (T 2)
1 1 1
P (T 1 and T 2) = × =
2 2 4
• Marginal probability,
• Conditional probabilities
Marginal probability
It is the probability of only a single event A occurring regardless of certain conditions
prevailing. It is written as P(A). A frequency distribution describes the occurrence of
only one characteristic of interest at a time and is used to estimate marginal probabil-
ities.
Joint probability
It is the chance that two or more events will occur simultaneously. It is the occurrence
of more than one event at the same time. If the joint probability on any two events is
zero, then the events are mutually exclusive.
62 Basic Probability
Conditional probability
It is the probability that a given event occurs given that another event has already
occurred. P (A|B) means the probability that event A will occur given that event B has
already occurred.
P (A ∩ B)
P (A|B) = (5.1)
P (B)
P (B ∩ A)
P (B|A) = (5.2)
P (A)
Note
P (A ∩ B) = P (B|A) × P (A) = P (A|B) × P (B). P(A and B) is the joint probability of
events A and B. P (B) is the probability of event B, which is a marginal probability.
Sex
Payment Method Total
Male Female
Credit Card 10 15 25
Cash 8 6 14
Total 18 21 39
Solution
1. This is a joint probability of the events. The sample space has 39 people alto-
gether.
(i) P(female and credit card) = 15
39 = 0.3846.
Basic Probability 63
Note: The two events should not be confused with independent events. In this
case find the value in the intersection set of female column and credit card row
which is 15.
8
(ii) P(male and cash) = 39 = 0.2051.
It is the chance of two events occurring at the same time.
25
2. (i) P(credit card user) = 39 = 0.6.
21
(ii) P(female) = 39 = 0.5385.
The condition which has been ignored is payment method. For joint probabilities,
consider values inside the table as a ratio of the grand total 39. For marginal
probabilities consider row and column totals as ratios of grand total.
Exercise 2
A golfer has 12 golf shirts in his closet. Suppose 9 of these shirts are white and the
others are blue. He gets dressed in the dark, so he just grabs a shirt and puts it on.
He does this for two days in a row, taking a fresh shirt for each day and put worn one
in washing basket and does not do laundry. What is the likelihood that both shirts
selected are white?
Exercise
A survey of 150 students classified each student according to gender and the number
64 Basic Probability
Gender
Movie watched Total
Male Female
0 20 40 60
1 40 30 70
2 or more 10 10 20
Total 70 80 150
i) is a male student.
iv) has not watched any movie given that he is a male student.
v) is a female student given that she has only watched a movie once.
Illustration
Refering to the previous illustration of 150 students, use a tree diagram to find the
probability of selecting a male student given that he has seen one movie?
• Multiplication rule,
• Permutations and
• Combinations.
a) The total number of ways in which n objects can be arranged in order is given by:
n! read as n f actorial where n! = n(n − 1)(n − 2)(n − 3) . . . 3.2.1
Note that 0! = 1.
66 Basic Probability
Illustration
The number of different ways in which 7 horses can complete a race is given by:
7! = 7.6.5.4.3.2.1 = 5040 different ways.
Then the total number of outcomes for the j trials is: n1 × n2 × n3 × n4 × ............ × nj
Illustration
A restaurant menu has a choice of 4 starters, 10 main courses and 6 desserts. What is
the total number of meals that can be ordered in this restaurant.
Solution
The total numbers of possible meals that can be ordered are: 4 × 10 × 6 = 240 meals.
5.11.2. Permutations
A permutation is a number of distinct ways in which a group of objects can be arranged.
Each possible ordered arrangement is called a permutation. In a permutation, the or-
der is importatnt such that ABC, ACB and CBA are considered different. The number
of ways of arranging r objects selected from n objects where ordering is important,
is given by the formula:
n!
Prn = (5.3)
(n − r)!
Illustration
10 horses compete in a race.
(i) How many distinct ways are there of the first 3 horses past the post?
(ii) What is the probability of predicting the order of the first 3 horses past the post?
Solution
(i) Since the order of 3 horses is important, it is appropriate to use the permutation
formula.
10!
That is: Prn = P310 = (10−3)! = 720
There are 720 distinct ways of selecting 3 horses out of 10 horses.
Basic Probability 67
(ii) The probability of selecting the first 3 horses past the post is:
P (f irst 3 horses) = Selecting 3 out1 of 10 horses = 720
1
chance of winning.
5.11.3. Combinations
n!
Crn = (5.4)
(n − r)!r!
Illustration
10 horses complete in a race.
(i) How many ways are there of the first 3 horses past the post, not considering the
order in which the first three pass the post?
(ii) What is the probability of predicting the first 3 horses past the post, in any order?
Solution
(i) The order of the first 3 horses is not important, hence apply the combination for-
mula.
n! 10!
Crn = = = 120
(n − r)!r! (10 − 3)!7!
There are 120 different ways of selecting the first 3 horses out of 10 horses, with-
out regard to order.
(ii) The probability of selecting the first 3 horses past the post, disregarding order is
given by
P (f irst 3 horses) = Selecting1 3 horses = 120
1
chance of winning.
5.12. Exercise
(a) P47
(b) C28
68 Basic Probability
Probability Distributions
6.1. Introduction
This unit will study probability distributions. A probability distribution gives the en-
tire range of values that can occur based on an experiment. A probability distribution
is similar to a relative frequency distribution. However in steady of describing the
past, it describes a likely future event. For instance a drug manufacturer may claim
a treatment will cause weight loss for 80% of the population. A consumer protection
agency may test the treatment on a sample of six people. If the manufacturers claim
is true, it is almost impossible to have an outcome where no one in the sample loses
weight and its most likely that 5 out of the 6 do lose weight.
6.2. Definition
A random variable is a function whose value is a real number determined by each ele-
ment in the sample space. In other words its a quantity resulting from an experiment
that, by chance, can assume different values. There are two types of random variables,
discrete random variables and continuous random variables.
• Number of defective light bulbs obtained when three light bulbs are selected at
random from a consignment could be 0, 1, 2, or 3.
• The waiting time for customers to receive their order at a manufacturing com-
pany.
Illustration
Find the probability distribution of the sum of numbers when a pair of dice is thrown.
Solution
Let X be a random variable whose values of x are the possible totals of the outcomes
of the two dies. Then x can be an integer from 2 to 12. Two dice can fall in 6 × 6 ways
1 2
each with a probability of 36 . For example, P (X = 3) = 36 since a total of 3 can occur
in two ways, that is (1,2) or (2, 1). The probability distribution is shown in the table
below:
x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P(X= x) 36 36 36 36 36 36 36 36 36 36 36
Exercise 1
1 Three coins are tossed all at once. Let X be the number of heads obtained. Find
the probability distribution of X.
2 Suppose you are interested in the number of Tails showing face up of coin. What
is the probability distribution for the number of Tails?
Probability Distributions 71
i. The probability of any value of x is never negative and should be a fraction that
is 0 ≤ P (X = x) ≤ 1
n
X
ii. P (X = x1 ) + P (X = x2 ) + . . . + P (X = xn ) = P (x = xi ) = 1. The sum of
i=1
probabilities of the discrete random variable should be equal to 1.
iii. The mean or expected value of a discrete random variable X is x̄ and given by:
X
x̄ = E(X) = xi P (X = xi )
all x
Illustration
Consider the following probability distribution for a discrete random variable, X. Verify
x 0 1 2 5 10
P (X = xi ) 0.05 0.25 0.30 0.20 0.20
the probability properties and find the standard deviation of the distribution.
Solution
s2 = 11.6275
p √
v. Standard deviation, s2 = V ar(X) = 11.6275 = 3.410
b) At least 3, means not less than 3. The minimum that can be assumed is 3 since
3 is not less than itself. Notation: P (X ≥ 3) = P (X = 3) + P (X = 4) + P (X =
5) + . . . + P (X = n).
c) Less than 3 this effectively means values below 3, and 3 is not included. Notation:
P (X < 3) = P (X = 0) + P (X = 1) + P (X = 2).
d) More than 3, means values above 3, in discrete terms it is from 4 upwards. No-
tation: P (X > 3) = P (X = 4) + P (X = 5) + P (X = 6) + . . . + P (X = n) or using
the complimentary rule it is given as 1 − P (X ≤ 3).
f) Between 3 and 6 means the discrete values between 3 and 6, which are 4 and 5.
However it should be noted that the limits can be exclusive or inclusive. Notation
for exclusive: P (3 < X < 6) = P (X = 4) + P (X = 5). Notation for inclusive:
P (3 ≤ X ≤ 3) = P (X = 3) + P (X = 4) + P (X = 5) + P (X = 6).
Exercise 2
Consider the following probability distribution that characterises a marketing ana-
lyst’s belief concerning the probabilities associated with the number, x of sales that a
company might expect per month for a new super computer:
Probability Distributions 73
x 0 1 2 3 4 5 6 7 8
P(X=x) 0.02 0.08 0.15 0.19 0.24 0.17 0.10 0.04 0.01
In probability theory the Bernoulli distribution named after Swiss scientist Jacob
Bernoulli, is the probability distribution of a random variable which takes value 1
with success probability and value 0 with failure probability. A random variable X
which has two possible outcomes, say 0 and 1, is called a Bernoulli random variable.
The probability distribution of X is:
P (X = 1) = p
P (X = 0) = 1 − p
i.e. P (X = 0) = 1 − P (X = 1) = 1 − p
This distribution best describes all situations where a ”trial” is made resulting in ei-
ther ”success” or ”failure,” such as when tossing a coin or when modeling the success
or failure of a surgical procedure. The Bernoulli distribution function is defined as:
Where, p is the probability that a particular event (e.g. success) will occur.
Note- Bernoulli experiment is performed only once and has only two possible out-
comes (success and failure).
Illustration
Tossing a fair coin, you get a head or a tail each with probability of 0.5. Thus, if a head
is labelled 1 and a tail 0, the random variable X representing the outcome takes values
0 or 1. If the probability that X = 1 is p, then we have that:
1
P (X = 1) = .
2
1 1
P (X = 0) = 1 − = ,
2 2
since events X = 1 and X = 0 are mutually exclusive.
Suppose we repeat a Bernoulli p experiment n times and count the number X of suc-
cesses, the distribution of X is called the Binomial, Bin(n, p) random variable. The
quantities n and p are called parameters and they specify the distribution.
Probability Distributions 75
n!
For x = 0, 1, 2, . . . , n, 0 < p < 1 and Cxn = (n−x)!x! . The notation Bin(n, p) means a
Binomial distribution with parameters n and p.
1. Mean, µx = E(X) = np
iii) Variance.
Solution:
Example
1. A manufacturer of nails claims that only 3% of its nails produced are defective. A
random sample of 24 nails is selected, what is the probability that 5 of the nails
are defective?
Solution
2. A certain rare blood type can be found in only 0.05% of people. If the population
of a randomly selected group is 3000, what is the probability that at least two
persons in the group have this rare blood type?
Solution
The Poisson distribution, named after French mathematician Simon Denis Poisson,
is a discrete probability distribution that expresses the probability of a given number
of events occurring in a fixed interval of time, distance, mass, volume if these events
occur with a known average rate and independently of the time since the last event.
The Poisson distribution can also be used for the number of events in other specified
intervals such as distance, area or volume.
A Poison random variable is a discrete random variable that can take integer val-
ues from 0 up to infinite (∞). The parameter for this distribution is λ, i.e. P o(λ). The
Poisson probability distribution function for X ≈ P o(λ) is given by:
λx e−λ
P (X = x) = (6.3)
x!
The Poisson question: What is the probability of r occurrences of a given outcome be-
ing observed in a predetermined time, space or volume interval?
Solution:
X ≈ P o(0.2), using the formular that:
λx e−λ
P (X = x) =
x!
0.20 e−0.2
P (X = 0) =
0!
P (X = 0) = 0.8187
P (X ≤ 1) = P (X = 0) + P (X = 1)
0.20 e−0.2 0.21 e−0.2
P (X ≤ 1) = +
0! 1!
P (X ≤ 1) = 0.9824
P (X ≥ 2) = P (X = 2) + P (X = 3) + . . .
P (X ≥ 2) = 1 − (P (X = 0) + P (X = 1))
P (X ≥ 2) = 1 − (0.8187 + 0.1637)
P (X ≥ 2) = 0.0176
1) Mean = E(X) = λ
2) Variance = V ar(X) = λ
Example
1) A textile producer has established that a spinning machine stops randomly due
to thread breakages at an average rate of 5 stoppages per hour. What is the
probability that in a given hour on a spinning machine:
Solution
Solution
Remark
As a general rule, always check that the time, space or volume interval over which
occurrences of the random variable are observed is the same as the time, space or
volume interval corresponding to the average rate of occurrences, λ. When they differ,
adjust the rate of occurrences to coincide with the observed interval.
This is equal to 1 if f (x) is a probability density function, pdf. Probability for different
intervals are got by integrating the pdf over the given limits that is:
Z b
P (a ≤ x ≤ b) = f (x)dx
a
i) Mean is given by
1
E(X) = (b + a) (6.5)
2
(b − a)2
V ar(X) = (6.6)
12
Note
The probability that X falls in some interval say [c,d] can be easily calculated by inte-
grating the density function
1
f (x) =
b−a
to obtain
c−d
P (X = (c, d)) =
b−a
Illustration
The marks of students from a certain examination are uniformly distributed in the
interval 50 to 75. The density function for the marks is given by:
(
1
75−50 50 < x < 75
f (X = x) =
0 elsewhere
Solution:
1) The mean is given by E(X) = 12 (b + a) = 12 (75 + 50) = 62.5
2 2
2) The variance is given by (b−a)
12 = (75−50)
12 = 52.083
Interpretation
The average mark for the examination was 62.5 with a variance of 52.083.
Exercise
For the continuous uniform distribution defined of the interval [a, b], where b > a,
show that
i) Mean = 12 (b + a) and
80 Probability Distributions
(b−a)2
ii) Variance = 12
Illustration
Suppose that the length of a phone call in minutes is exponentially distributed with
parameter, λ = 0.1. If someone arrives immediately ahead of you at a public telephone
booth, what is the probability that you will wait for at least 20 minutes.
Solution
Let X be the length of a phone call made in front of you. Then
Z ∞
P (X > 20) = 0.1e−0.1x dx
20
P (X > 20) = −e−0.1x |∞
20
One of the most useful and frequently encountered continuous random variable distri-
butions is called the Normal distribution. Its graph is called the normal curve, which
is bell shaped. The curve describes the distribution of so many sets of data that occur
in nature, industry and research.
i. It is bell-shaped.
iii. The tails of the distribution never touch the axis (i.e. asymptotic).
Probability Distributions 81
1 −1 x−µ 2
f (x) = √ e2( σ ) , −∞ < x < ∞ (6.8)
2πσ
where µ = mean of the random variable X and σ 2 = variance of the random variable X.
The random variable X is represented as: X ≈ N (µ, σ 2 ). µ and σ 2 are said to be the
parameters of X.
X −µ
Z= ∼ N (0, 1) (6.9)
σ
Illustration
Use Standard normal distribution tables to find the probabilities below.
a. P (Z ≥ −2)
b. P (Z > 0.79)
e. P (Z ≤ −3)
Solution
a.
P (Z ≥ −2) = 1 − P (Z ≤ −2)
P (Z ≥ −2) = 1 − Φ(−2)
P (Z ≥ −2) = 1 − 0.0228
P (Z ≥ −2) = 0.9772
b.
c.
d.
e.
P (Z ≤ −3) = Φ(−3)
P (Z ≤ −3) = 0.0013
f.
The normal random variable, X with a mean µ and variance σ has a standard normal
distribution
x−µ
Z=
σ
Given the distribution of X, to find probability of the given events, standardise X and
use standardised nornal distribution table to read the probabilities value. An illustra-
tion to find the probability of a normal distribution is given below.
Illustration
84 Probability Distributions
Chapter 7
Confidence Intervals
7.1. Introduction
We are now in the knowledge that a population parameter can be estimated from sam-
ple data by calculating the corresponding point estimate. This chapter is motivated by
the desire to understand the goodness of such a point estimate. However, due to sam-
pling variability, it is almost never the case that the population parameter equals the
sample statistic. Further, the point estimate does not provide any information about
its closeness to the true population parameter. Thus, we cannot rely on point estimates
for decision making and policy formulation in day to day living and or in any organisa-
tion, institution or country. We need bounds that represent a range of plausible values
for a population parameter. Such ranges are called confidence interval estimates.
To obtain the interval estimates, the same data from which the point estimate was
obtained is used. Interval estimates may be in the form of a confidence interval whose
purpose is to bound population parameters such as the mean, the proportion, the vari-
ance, and the standard deviation; a tolerance interval which bounds a selected propor-
tion of the population; and a prediction interval which places bounds on one or more
future observations from a population.
It is noted that we cannot be certain that an interval contains the true but unknown
population parameter since only a sample from the full population is used to compute
both the point estimate and the interval estimate A confidence interval is constructed
so that there is high probability that it does contain the true but unknown population
parameter. Generally, a 100(1 − α)% confidence interval equals
where α is the level of significance between zero and one; 1−α is a value called the ”con-
fidence coefficient”, 100(1−α)% is the confidence level, parameter estimate is a value for
the point estimate such as for the sample mean, x, or for the population proportion, pb;
reliability coefficient is a probability point obtained from an appropriate table as dic-
tated by, for example, zα (standard normal z-value) or t( α2 , n − 1) (student t-distribution
value); and s.e.(parameter) read standard error of the parameter, measures the close-
ness of the point estimate to the true population parameter i.e. it measures the preci-
sion of an estimate in getting the parameter.
The overall assumption made is that the sample comes from a normally distributed
population.
A confidence interval x ± z α2 √σn has the lower and upper limits as:
σ
`1 = x − z α2 × √
n
and
σ
`2 = x + z α2 × √
n
Thus, a 100(1 − α)% confidence interval for the population mean is given by:
Confidence Intervals 87
σ σ
x − z α2 × √ ≤ µ ≤ x + z α2 × √ (7.2)
n n
Illustration
Consider data of weights in kg, of ten randomly selected students.
64.3 64.6 64.8 64.2 64.5 64.3 64.6 64.8 64.2 64.3
Solution
Using the data, n = 10, x = 64.46, the level of significance, α = 5% = 0.05, and
from the given assumption, σ 2 = 1. Now, the 95% confidence interval for the
population mean is
σ σ
x − z0.025 × √ ≤ µ ≤ x + z0.025 × √
n n
Substituting we have
1 1
64.46 − 1.96 × √ ≤ µ ≤ 64.46 + 1.96 × √ .
10 10
1.96 is the standard z-value from standard normal tables that gives a cum-
mulative probability of 0.975. Simplifying we then have the 95% confidence
interval for the population mean as
63.84 ≤ µ ≤ 65.08.
Interpretation
The above confidence interval estimation, the population mean is within 63.84 ≤
µ ≤ 65.08 with a probability of 0.95.
Exercise
For the example above, separately construct a 90% and 99% confidence interval
for the population mean.
Task Starting from the cases considered above, what is the general relation-
ship between confidence levels and their precision?
Remark
The precision of a confidence interval is inversely proportional to the confidence
level. It is desirable to obtain a confidence interval that is short enough for pur-
poses of decision making and that also has adequate confidence. This is easily
the reason why the 95% confidence level is the default confidence level chosen
by researchers and practitioners.
σ2
X ∼ N (µ, )
n
and it follows that √
n(X − µ)
Z= ∼ N (0, 1).
σ
In this case n is large and so it is permissible to replace the unknown σ by s.
This has close to no effect on the distribution of Z.
√
For large n, the quantity n(X−µ)
s
follows a standard normal distribution with
mean 0 and a standard deviation of 1.
Illustration
A study was carried out in Zimbabwe to investigate pollutant contamination
in small fish. A sample of small fish was selected from 53 rivers across the
Confidence Intervals 89
country and the pollutant concentration in the muscle tissue was measured
(ppm). The pollutant concentration values are shown below. Construct a 95%
confidence interval for the population mean, µ.
1.230 1.330 0.040 0.044 1.200 0.270 0.490 0.190 0.940 0.520 0.830
0.810 0.710 0.500 0.490 1.160 0.050 0.150 0.400 0.190 0.650 0.770
1.080 0.980 0.630 0.560 0.410 0.730 0.430 0.590 0.340 0.340 0.270
0.840 0.500 0.340 0.280 0.340 0.250 0.750 0.870 0.560 0.100 0.170
0.180 0.190 0.040 0.490 0.270 1.100 0.160 0.210 0.860
Solution
Since n > 30, then the 95% confidence interval for µ is
0.3486 0.3486
0.5250 − 1.96 × √ ≤ µ ≤ 0.5250 + 1.96 × √
53 53
which simplifies to
0.431 ≤ µ ≤ 0.619
Exercise
Construct 90% and 99% confidence interval for µ using the above data. Fur-
ther, using the above data construct the 90%, 95%, and the 99% lower and
upper confidence interval for the population mean.
Remark:
In the equally likely event that the assumption is unreasonable, an alternative
is to use the non-parametric procedures which are valid regardless of underly-
ing populations.
For our purposes, it will be reasonable to assume that the population of interest
is normal with an unknown mean, µ, and an unknown variance, σ 2 . A small
random sample of size n is drawn. Let X and S2 be the sample mean and
sample variance, respectively. We wish to construct a two-sided confidence
interval on µ . The population variance, σ 2 , is unknown and it is a reasonable
procedure to use s2 to estimate σ 2 . Then the random variable Z is replaced
with t (the student t-distribution) which is given by:
90 Confidence Intervals
(X − µ)
t= √
s/ n
√
n(X − µ)
t=
s
which is a random variable that follows the student’s t-distribution with n − 1
degrees of freedom which are associated with the estimated standard deviation.
Notation
We let tα ,n−1 and t α2 ,n−1 be the value of the random variable T with n − 1 de-
grees of freedom above which we find a probability α or α2 respectively.
Illustration
Consider the following data obtained from a local Transport Logistics company.
Data shows the distance travelled daily by one of the company’s trucks.
19.8 10.1 14.9 7.5 15.4 15.4 15.4 18.5 7.9 12.7 11.9
11.4 11.4 14.1 17.6 16.7 15.8 19.5 8.8 13.6 11.9 11.4
Solution
Since our sample is small, n = 22, then the 95% confidence interval for the
population mean is given by
s s
x − t α2 ,n−1 × √ ≤ µ ≤ x + t α2 ,n−1 × √
n n
Substituting yields
3.55 3.55
13.71 − 2.080 × √ ≤ µ ≤ 13.71 + 2.080 × √
22 22
Exercise
For the above data,construct the 90% and the 99% confidence intervals on the
population mean and interpret the two confidence intervals. Further, construct
the 90%, the 95% and the 99% lower and upper confidence limits. Give an inter-
pretation of each and all of them.
Remark
One-sided confidence intervals for the mean of a normal population are con-
structed by choosing the appropriate lower or upper confidence limit and then
replacing t α2 ,n−1 by tα,n−1 .
Suppose that a random sample of size n, large n, has been taken from a large
population and that x but less than n observations in this sample belong to a
class of interest. Then pb calculated as nx is a point estimator of the proportion
of the population p that belongs to this class. It is noted that n and p are
the parameters of a binomial distribution. The sampling distribution of pb is
approximately normal with mean p and variance p(1−p) n
if p is not too close to
either 0 or 1 and if n is relatively large. To apply this, it is required that np
and n(1-p) be greater than or equal to 5. We are saying that: If n is large, then
the distribution of
pb − p
Z=q ∼ N (0, 1).
p(1−p)
n
For large samples, which usually is the case when dealing with proportions, a
satisfactory 100(1 − α)% confidence interval on the population proportion p is
r r
pb(1 − pb) pb(1 − pb)
pb − z α2 × ≤ p ≤ pb + z α2 × (7.4)
n n
α
where pb is the point estimate of p, and z α2 is the upper 2
probability point of the
standard normal distribution.
Illustration
In a random sample of 85 stone sculptures, 10 have a surface finish that is
rougher than the expected. Construct a 95% confidence interval for the popu-
lation proportion of stone sculptures with a surface finish that is rougher than
the expected.
92 Confidence Intervals
Solution
Using the formular above, a two-sided 95% confidence interval for p is
r r
0.12(1 − 0.12) 0.12(1 − 0.12)
0.12 − 1.96 × ≤ p ≤ 0.12 + 1.96 ×
85 85
which simplifies to
0.05 ≤ p ≤ 0.19
Remark
The one-sided lower and upper confidence intervals are respectively given as
r
pb(1 − pb)
pb − zα × ≤ p
n
and r
pb(1 − pb)
p ≤ pb + zα ×
n
Exercise
In the above example, construct and interpret the 95% and the 99% lower and
upper confidence limits for the population proportion.
(n − 1)s2
V =
σ2
has a chi-square (χ2 ) distribution with n − 1 degrees of freedom.
where χ2(α ,n−1) and χ2(1− α ,n−1) are the upper and lower 100(1− α2 ) percentage points
2 2
of the χ2 distribution with n − 1 degrees of freedom, respectively.
Illustration
An Entrepreneur has got an automatic filling machine that she uses to fill bot-
Confidence Intervals 93
Assume that the fill volume is normally distributed. Then a 95% upper con-
fidence interval is
(n − 1)s2
σ2 ≤ 2
χ(1−α,n−1)
substituting yields
(20 − 1) × 0.0153
σ2 ≤
χ2(1−0.05,20−1)
simplifying we have
19 × 0.0153
σ2 ≤
χ2(0.95,19)
so we get
19 × 0.0153
σ2 ≤
10.117
giving
σ 2 ≤ 0.0287
(n − 1)S 2
≤ σ2
χ2(α,n−1)
and
(n − 1)S 2
σ2 ≤
χ2(1−α,n−1)
Remark
Clearly, the lower and upper confidence intervals for σ are the square roots of
the corresponding limits in the above equations.
We state that σ 2 ≤ 0.0287, is converted into an upper confidence limit for the
population standard deviation σ by taking the square root of both sides. The
94 Confidence Intervals
Exercise
Using the information from the above illustration, construct a 90% lower and
upper confidence limits for the population standard deviation, σ.
The overall assumption remains in place. And, the same with everything else.
We are simply considering two populations and constructing confidence inter-
vals for the difference in two population means, µ1 −µ2 . The confidence interval
for difference between two population means are:
s
s21 s2
(x¯1 − x¯2 ) ± z α2 ( + 2) (7.5)
n1 n2
Illustration
An entrepreneur is interested in reducing the drying time of a wall paint. Two
formulations of the paint are tested; formulation 1 is the standard, and formu-
lation 2 has a new drying ingredient that should reduce the drying time. From
experience, it is known that the standard deviation of drying time is 8 min-
utes, and this inherent variability should be unaffected by the addition of the
new ingredient. Ten specimens are painted with formulation 1, and another
10 specimen are painted with formulation 2; the 20 specimen are painted in
random order. The two sample mean drying times are 121 minutes and 112
minutes, respectively. Construct a 99% confidence interval for the difference in
the two population means.
Solution
Solution to be provided.
Confidence Intervals 95
Illustration
The following data is from two populations, A and B. Ten samples from A had
a mean of 90.0 with a sample standard deviation of s1 = 5.0, while 15 sam-
ples from B had a mean of 87.0 with a sample standard deviation of s2 = 4.0.
Assume that the populations, A and B are normally distributed and that both
normal populations have the same standard deviation. Construct a 95% confi-
dence interval on the difference in the two population means.
Solution
Solution to be provided.
96 Confidence Intervals
Chapter 8
Hypothesis Testing
Hypotheses
A hypothesis is a statement about a population. Testing of hypotheses involves
evaluation of two hypothesis called the null and the alternative denoted H0 and
H1 respectively. An H0 is the assertion that a population parameter takes on a
particular value. On the other hand H1 expresses the way in which the value
of a population parameter may deviate from that specified under H0 . The di-
rection of deviation may be specified as a one - sided or one tailed test or may
not be specified as a two sided or two tailed test.
We take time to point out that the language and grammar of testing of hy-
potheses does not use the word ”accept” or any of its numerous synonyms. This
is beyond semantics. To say one ”accepts” the null hypothesis is to imply that
they have proved the null hypothesis to be true. This practice is incorrect. The
null hypothesis is the claim that is usually set up with the expectation of re-
jecting it. The null hypothesis is assumed true until proven otherwise. If the
weight of evidence points to the belief that the null hypothesis is unlikely with
high probability, then there exists a statistical basis upon which we may reject
the null hypothesis. The design of hypotheses tests is such that they are with
the null hypothesis until there is enough evidence that suggest support for the
alternative hypothesis. Clearly, the design is never about selecting the more
likely of the two hypotheses. Let’s take this to our legal system. One is consid-
ered not guilty until proven otherwise. It is the job of the prosecutor to build a
case that is put evidence before the court of law that the person in question is
guilty. The judge will give their verdict as guilty or not guilty but will NEVER
give their verdict with an import of being innocent. By and large, the courts of
law are a classical example of constant testing of hypotheses procedure. So, let
it be clear that on the basis of the data from the sample, we either reject the
98 Hypothesis Testing
Remarks
Test statistic
This is a value calculated from sample data and is used to decide on rejecting
H0 .
Critical region
This is a range of values which is such that when the test statistic falls into it
then H0 would be rejected.
Critical value
Is a value that separates the rejection region and the non-rejection region.
Type I error
Occurs when a true null hypothesis is rejected. A null hypothesis is rejected
when in actual fact it is true.
Type II error
It occurs when a false null hypothesis is not rejected. Alternatively, it is when
a null hypothesis is not rejected when in actual fact it is false.
It is the probability that the testing of hypotheses procedure rejects the null
hypothesis when the null hypotheses is indeed false.
• Decide on the basis of a decision criterion that rejects H0 if, upon compar-
ison, the test statistic is more extreme than a critical value.
• Conclude on the basis of the decision’s import, and report in the context of
the problem.
Exercise
For the above exercise, instead of using hypothesis testing procedure, construct
a 95% confidence interval. Test the same hypothesis using the confidence inter-
val. Is the value specified under H0 contained in the confidence interval? Or, is
zero contained in the confidence interval? What conclusions should be drawn?
Illustration
Let the mean cost of an Introduction to Statistics textbook be µ. In testing the
claim that the average price of textbook is not $34.50 a sample of 36 current
textbooks had selling costs with a sample mean $32.00 and a sample standard
deviation of $6.30. Using a 10% level of significance, what conclusion can be
made?
Solution
This is a two-tailed test with n > 30 and α = 0.1 thus, the critical value is a z -
value ±1.96. Detailed solution to be done.
Exercise
The increased availability of light materials with high strength has revolution-
ized the design and manufacture of golf clubs, particularly drivers. Clubs with
hollow heads and very thin faces can result in much longer tee shots, especially
for players of modest skills. This is due partly to the spring-like effect that the
thin face imparts to the ball. Firing a golf ball at the head of the club and mea-
suring the ratio of the outgoing velocity of the ball to the incoming velocity can
quantify this spring-like effect. The ratio of velocities is called the coefficient
of restitution of the club. An experiment was performed in which 15 drivers
produced by a particular club maker were selected at random and their coeffi-
cients of restitution measured. In the experiment the golf balls were fired from
an air cannon so that the incoming velocity and spin rate of the ball could be
precisely controlled. The sample mean and sample standard deviation are x =
0.83725 and s = 0.02456. Determine if there is evidence at the α = 0.05 level to
support the claim that the mean coefficient of restitution exceeds 0.82.
Exercise
For the above exercise, instead of using the testing of hypothesis procedure,
construct a 95% confidence interval. Test the same hypothesis using the confi-
dence interval approach.
Illustration
The advertised claim for batteries for cell phones is set at 48 operating hours,
with proper charging procedures. A study of 5000 batteries is carried out and
15 stop operating prior to 48 hours. Do these experimental results support the
claim that less than 0.2 percent of the company’s batteries will fail during the
advertised time period, with proper charging procedures? Use a hypothesis
testing procedure with α = 0.01. Is the conclusion the same at the 10% level of
102 Hypothesis Testing
significance?
Solution
H0 : p = 0.002 against
15
H1 : p < 0.002 with pb = 5000
= 0.003.
Note
By claiming a value of the population proportion to be 0.002 implies that the
population proportion. Hence letting p0 to be the hypothesised population pro-
portion value yields
pb − p0
Zcal = q = 1.5827 < Zcrit = Z0.01 = 2.3263
p0 (1−p0 )
n
Exercise
Let p be the proportion of new car loans having a 48 months period. In some
year p = 0.74. Suppose it is believed that this has declined and accordingly we
wish to test this belief using a 1% level of significance. What is the conclusion
if 350 of a sample of 500 new car loans have a time period of 48 months?
We now extend the previous one population results to the difference of means
for two populations. This test is done to test if the two population are similar
or producing similar results.
Illustration
Consider the following gasoline mileages of two makes of light trucks. The
trucks 1 and 2 have the population means and populations standard devia-
Hypothesis Testing 103
Solution
Exercise in the lecture.
Remark
In inferential applications the population variances σ12 and σ22 are generally not
known and must be estimated by s21 and s22 . The standard error is estimated by
s
s21 s2
standard error = + 2
n1 n2
We assume that the variances of both distributions σ12 and σ22 are unknown but
equal. This common variance is estimated by a quantity called pooled variance
denoted s2p and calculated as
(x1 − x2 ) − (µ1 − µ2 )
tcal = q
s2p ( n11 + n12 )
Illustration
Consider the following data. n1 = 10, x1 = 90, s1 = 5, n2 = 15, x2 = 87 and
s2 = 4. Assume that the populations are normally distributed and that both
populations have the same standard deviation. At the 5% level of significance,
can we conclude that there is a difference in the two population means?
Solution
Left as an exercise.
104 Hypothesis Testing
In testing for the equality of two population means, we may choose to select
two random samples one from each population and compare their means. If
these sample means exhibit a difference, then we reject the null hypothesis
that H0 : µ1 − µ2 = 0. Another approach is to try and match the subjects from
the two populations according to variables which will be expected to have an
influence on the variable under study. The two samples are no longer indepen-
dent and the inferences are now based on the differences of the observations
from the matched pairs.
Illustration
Samples of two brands of pork sausage are tested for their fat content. The re-
sults of the percentage of fat are summarised as follows: Brand A (n = 50, x =
26.0, s = 9.0) and Brand B (n = 46, x = 29.3, s = 8.0). Can we conclude that there
is sufficient evidence to suggest that there is a difference in the fat contented
of the two brands of pork sausage? Use a 5% level of significance.
Solution
Left as an exercise for the lecture.
For the new single sample, we find its mean, d, that estimates the population
mean for the differences, µd and standard deviation, sd . Assuming that the
original populations are normally distributed with equal means i.e. µ1 = µ2
and equal variances, the population mean for the differences µd is zero and a
standard error that is estimated by √sdn .
d
The test statistic in this case is tcal = s.e.
The hypotheses tests concerning µ1 and µ2 are now based on the sample mean
Hypothesis Testing 105
Illustration
Five machines are tested for wind resistance with two types of grills. Their
drag coefficients were determined and recorded as follows.
Machine 1 2 3 4 5
Grill A 0.47 0.46 0.40 0.44 0.43
Grill B 0.50 0.45 0.47 0.44 0.48
Using a 5% level of significance test for the difference in the drag coefficients
due to type of grill.
Solution
Left as an exercise during the lecture.
• It may be less expensive since in most cases fewer experimental units are
used when compared to a two sample design.
• A rest period may be required between applying the first and second treat-
ment in order to minimise the carry over effect from the first treatment.
Even, then the carry over effect may not be completely eliminated.
106 Hypothesis Testing
Suppose that two independent random samples of sizes n1 and n2 are taken
from two populations, and let x1 and x2 represent the number of observations
that belong to the class of interest in sample 1 and sample 2, respectively. In
testing the hypotheses
H0 : p1 − p2 = 0
H1 : p1 − p2 6= 0,
(pb1 − pb2 )
Zcal = q
pb(1 − pb)[ n11 + 1
n2
]
Illustration
Consider the following situation in which comparison is made of two concept
exposition methods. Method A is the standard and method B is the proposed. A
class of 200 Statistics students at a University is used. The students were ran-
domly assigned to two groups of equal size. One group was exposed to method
A and the other group was exposed to method B. At the end of the semester, 19
of the students exposed to method B showed improvement, while 27 of those
exposed to method A improved. At the 5% level of significance, is there suffi-
cient reason to believe that method A is effective in concept exposition?
Solution
Hypothesis Testing 107
H0 : pA − pB = 0
H1 : pA − pB 6= 0
Then, we extract the given data: nA = nB = 100, pbA = 0.27, pbB = 0.19, xA = 27,
and xB = 19. Thus, pb = 0.23.
The test statistic and the critical value are Zcal = 1.35 and Zcrit = 1.96 respec-
tively.
After comparing Zcal and Zcrit , the decision is that we fail to reject H0 . From
this decision, we therefore conclude that, at the 5% level of significance, there
is no sufficient evidence to support the assertion that method A is effective in
concept exposition.
Exercise
A study is made of business support of the immigration enforcement practices.
Suppose 73% of a sample of 300 cross border traders and 64% of the light man-
ufacturers said they fully supported the policies being proposed. Is there suffi-
cient evidence to conclude that the proposed policies are equally supported by
the two groups sampled. Use a 1% level of significance.
Tests for independence are performed on categorical data such as when testing
for independence of opinion on a public policy and gender. The data is con-
tained in what is called a contingency table. The hypotheses are tested using a
Chi - square test statistic, χ2cal .
Illustration
A company operates four machines three shifts each day. From production
records, the following data on the number of breakdowns are collected:
Machines
Shifts A B C D
1 4 3 2 1
2 3 1 9 4
3 1 1 6 0
Using 5% level of significance, test the hypothesis that breakdowns are inde-
108 Hypothesis Testing
Solution: To be provided.
Exercise
Grades in Statistics and Communication Skills courses taken simultaneously
were recorded as follows for a particular group of students.
Are the grades in Statistics and Communication Skills related? Use α = 0.01.
It is required that one may demonstrate that hypothesis testing and confidence
intervals are equivalent procedures in so far as decision making or inference
about population parameters is concerned. However, each procedure presents
different insights. What is the major difference between these two cousin pro-
cedures?
Chapter 9
Regression Analysis
9.1. Introduction
It is important to note that the approach used here first exposes the useful con-
cepts of the regression analysis technique, gives an illustrative example on the
application of these concepts, and then wraps up with a practice question.
Many problems that are encountered in everyday life involve exploring the
relationships between two or more variables. Regression analysis is a statis-
tical tool that is very useful for these types of problems. For example, in the
clothing industry, the sales obtained from selling particular designer outfits is
related to the amount of time spent advertising the label. Regression analysis
can be used to build a model to predict the sales given the amount of time de-
voted to advertising the label. In the sciences, regression analysis models can
be used for process optimization. For instance, finding the temperature levels
that maximises yield or for purposes of process control.
After studying this chapter, you are expected to be able to i) use simple linear
regression for building models to everyday data. ii) apply the method of least
squares to estimate the parameters in a linear regression model. iii) use the
fitted regression model to make a prediction of a future observation. iv) inter-
pret the scatter plot, the correlation coefficient, the coefficient of determination
and the regression parameters.
• prediction
110 Regression Analysis
• optimisation
• control purposes
Regression relationships are valid only for values of the explanatory variable
within the range of the original data. The linear relationship that we have
assumed may be valid over the original range of X, but may unlikely remain
so as we extrapolate i.e. if we use values of X beyond the range in question to
estimate the value of Y. Alternatively put, as we stride from the range of the
values of X for which data were collected, our certainty about the validity of
the assumed model tend to fade away. We caution that linear regression mod-
els are not necessarily valid for extrapolation purposes. Note that in many life
situations extrapolation of a regression model may be the only way to approach
a given problem.
Y = a + bX +
Regression Analysis 111
The random error term follows a normal distribution with a mean zero and
an unknown variance σ 2 . For completeness, we state that the random errors
corresponding to different observations are also assumed to be uncorrelated or
independent random variables. To determine the appropriateness of employing
simple linear regression we use (1) the scatter plot and or (2) the correlation
coefficient techniques.
Having established that a linear relationship exists between the random vari-
ables X and Y, we proceed to fit the linear regression model or line or equation.
To fit a regression model is to estimate the regression coefficients a and b. The
estimated regression coefficients are denoted b a and bb. The fitted model is writ-
ten in the form
Yb = b
a + bbX
Now, we have fitted a model and we wish to determine how good it is and then
use it for prediction of new values for the system in question. To determine
how good our model is we calculate the values of the response variable for each
and every value of the explanatory variable and then note the difference. This
difference obtained by subtracting the value of the fitted model from the actu-
ally observed is the error in our model for that observation and it is called the
residual. By performing what is called residual analysis we are able to come
112 Regression Analysis
After establishing the adequacy of our model we then proceed to predict future
values of the response variable for the system in question. This is technically
called forecasting.
a = y − bbx
b (9.2)
Naturally, how much of the variability in the response variable has been ex-
plained by fitting the regression model? To answer this question we need to
compute the following coefficient.
Illustration
Consider the following set of observations. Take X to be the exploratory vari-
able and Y to be the response variable.
Y 1 0 1 2 5 1 4 6 2 3 5 4 6 8 4
X 60 63 65 70 70 70 80 90 80 80 85 89 90 90 90
a) Draw a scatter plot for the above data. Comment on the suitability of
using simple linear regression to describe the relationship.
c) Fit the regression model using the method of least squares. Interpret the
regression coefficients.
Regression Analysis 113
d) State how much of the variation in Y has been accounted for by fitting the
linear regression model.
e) Using the fitted regression model, what is the value of Y when X = 60?
What is the residual?
Solution
A scatter diagram of the above data is shown in the figure below.
The scatter diagram shows a positive linear relationship between x and y val-
ues. This shows that a linear regression equation can be established.
Interpretation of r
When interpreting r note should be taken to mention the magnitude / size of
the correlation and the direction of the linear relationship. The absence of the
114 Regression Analysis
R2 = r2 × 100%
Exercise
Consider the following quantities for two random variables X and Y. Let X be
the cause variable and Y be the effect variable.
X X X
n = 20, x = 24, y = 1843, y 2 = 170045,
X X
x2 = 29 and xy = 2215
b) Fit the regression model using the method of least squares. What is the
meaning of the regression coefficients?
d) Using the fitted regression model, what would be the value of Y when X =
2? What is the residual?
f) Comment on the usefulness of the values in parts (d) and (e) given that
P
for twenty observations x = 24. Hint: You are expected to reflect on the
uses and abuses of the regression analysis technique.
116 Regression Analysis
Chapter 10
Index numbers
10.1. Objectives
10.2. Introduction
Index numbers are today one of the most widely used statistical indicators to
see changes in values of commodities. They are generally used to indicate the
state of the economy, index numbers are called barometers of economic activity.
Index numbers are used in comparing production, sales, changes in exports or
imports over a certain period of time or wages as a measure of cost of living. It
is well known that the wage contracts of workers in our country are tied to the
cost of living which is measured by index numbers.
It must be clearly understood that the index number for the base year is
always 100. An index number is commonly referred to as an index.
3. Index numbers measure changes that are not directly measurable. An in-
dex number is used for measuring the magnitude of changes in such phe-
nomenon, which are not capable of direct measurement. Index numbers
essentially capture the changes in the group of related variables over a pe-
riod of time. For example, if the index of industrial production is 215.1%
in 1992 (base year 1980) it means that the industrial production in that
year was up by 2.15 times compared to 1980. But it does not, however,
mean that the net increase in the index reflects an equivalent increase in
industrial production in all sectors of the industry. Some sectors might
have increased their production more than 2.15 times while other sectors
may have increased their production only marginally.
in 2004 to $202 in 2006, the real purchasing power of the dollar can be
found out as follows:
100
= 0.495
202
The above calculation means that if a dollar was worth $100 in 2004 its
purchasing power is $49.50 in 2006.
4. Deflating time series data - Index numbers play a vital role in adjusting
the original data to reflect reality. For example, nominal income (income
at current prices) can be transformed into real income(reflecting the ac-
tual purchasing power) by using income deflators. Similarly, assume that
industrial production is represented in value terms as a product of vol-
ume of production and price. If the subsequent years industrial produc-
tion were to be higher by 20% in value, the increase may not be as a result
of increase in the volume of production as one would have it but because
of increase in the price. The inflation which has caused the increase in the
series can be eliminated by the usage of an appropriate price index and
thus making the series real.
There are three principal types of indices which are: i) Price index, ii) Quantity
index and iii) Value index.
1. Price Index - The most frequently used form of index numbers is the price
index. A price index compares charges in price of commodities. If an
attempt is being made to compare the prices of edible oils this year to the
prices of edible oils last year, it involves, firstly, a comparison of two price
situations over time and secondly, the heterogeneity of the commodities
given the various varieties of commodities. By constructing a price index
number, we are summarizing the price movements of each type of oil in
this group of edible oils into a single number called the price index. The
Whole Price Index (WPI) and the Consumer Price Index (CPI) are some of
the popularly used price indices.
There are two approaches for constructing an index number namely Aggre-
gate and Average of relatives methods. The index numbers constructed in
either of these methods could be either a weighted or an unweighted index
number.
Demerits - It does not consider the relative importance of the various com-
modities involved. The unweighted index doesn’t reflect the reality since the
price changes are not linked to any usage or consumption levels.
Illustration
Construct an unweighted index for the three commodities taking 2010 as the
base year.
Prices
Commodities
2010 2012
Oranges (Pockets) 20 28
Milk (Ltr) 5 8
Gas 76 100
Interpretation
The price index of 134.65% means that the prices of commodities rose by 34.65%
from 2010 to 2012.
Laspeyres method uses the quantities consumed during the base period in com-
puting the index number. This method is also the most commonly used method
which incidentally requires quantity measures for only one period. Laspeyres
index can be calculated using the following formula:
P
P1 Q 0
Laspeyres P rice Index(LP I) = P × 100% (10.2)
P0 Q 0
Where, P1 = Prices in the current year, P0 = Prices in the base year, Q0 = Quan-
tities in the base year.
Laspeyres price index calculates the changes in the aggregate value of the
base year’s list of goods when valued at current year prices. In other words,
Laspeyres index measures the difference between the theoretical cost in a given
year and the actual cost in the base year of maintaining a standard of living
as in the base year. Laspeyres quantity index can be calculated by using the
formula:
P
P0 Q1
Laspeyres Quantity Index(LQI) = P × 100% (10.3)
P0 Q0
Illustration
Calculate the Laspeyres price and quantity indices for the following production
data.
Prices Production
Product P0 P1 Q0 Q1 P 0 Q0 P1 Q0 P0 Q1
1985 1990 1985 1990
Rice 46.60 58.00 700 910 32620.00 42406.00 40600.00
Sugar 14.57 17.92 620 950 9033.40 13841.50 11110.40
Salt 69.46 85.10 205 300 14239.30 20838.00 17445.50
Wheat 33.84 40.30 330 470 11167.20 15904.80 13299.00
Solution
Laspeyres price index is:
P
P1 Q0
Laspeyres P rice Index(LP I) = P × 100%
P0 Q0
Index numbers 123
period rather than for the base period. The Paasche index can be calculated
using the formula.
P
P1 Q1
P aasche P rice Index(P P I) = P × 100% (10.4)
P0 Q1
Where P1 = Prices in the current year P0 = Prices in the base year Q1 = Quan-
tities in the current year. The Paasche quantity index is given by:
P
P1 Q1
P aasche Quantity Index(P QI) = P × 100% (10.5)
P1 Q0
Demerits - Paasche index is not frequently used in practice when the num-
ber of commodities is large. This is because for Paasche index, revised weights
or quantities must be computed for each year examined. Such information
is either unavailable or hard to gather adding to the data collection expense,
which makes the index unpopular. Paasche index tends to underestimate the
rise in prices or has a downward bias.
Illustration
The table below represents prices and quantities of commodities A, B, C and D
for the years 1992 and 1993. Calculate the Paasche price and quantity indices.
1992 1993
Commodity Price Quantity Price Quantity P0 Q0 P0 Q1 P 1 Q0 P1 Q1
A 3 18 4 15 54 45 72 60
B 5 6 5 9 30 45 30 45
C 4 20 6 26 80 104 120 156
D 1 14 3 15 14 15 42 45
178 209 264 306
Solution
Paasche Price Index, (PPI) is:
P
P1 Q 1
P aasche P rice Index(P P I) = P × 100%
P0 Q 1
306
PPI =
209
Index numbers 125
P P I = 146.41%
The difference between Paasche index and Laspeyres index reflects the change
in consumption patterns of the commodities A, B, C and D used in that table.
As the weighted aggregates price index for the set of prices was 148.31% us-
ing the Laspeyres method and 146.41% using the Paasche method for the same
set, it indicates a trend towards less expensive goods. Generally, Laspeyres and
Paasche methods tend to produce opposite extremes in index values computed
from the same data. The use of Paasche index requires the continuous use of
new quantity weights for each period considered. As opposed to the Laspeyres
index, Paasche index generally tends to under estimate the prices or has a
downward bias. Because people tend to spend less on goods when their prices
are rising, the use of the Paasche which bases on current weighting, produces
an index which does not estimate the raise in prices rightly showing a down-
ward bias. Since all prices or all quantities do not move in the same order,
the goods which have risen in price more than others at a time when prices in
general are rising will tend to have current quantities and they will thus have
less weight in the Paasche index.
Prof. Irving Fisher has proposed a formula for constructing index numbers, as
a geometric mean of the Laspeyres and Paasche indices i.e. Fisher’s quantity
and price index are calculated as:
p
F isher0 s Quantity Index = (Laspeyres Quantity Index × P aasche Quantity Index)
0
p
F isher s Quantity Index = (LQI × P QI) (10.6)
126 Introduction
p
F isher0 s P rice Index = (Laspeyres P rice Index × P aasche P rice Index)
0
p
F isher s P rice Index = (LP I × P P I) (10.7)
1. Theoretically, geometric mean is considered the best average for the con-
struction of index numbers and Fishers index uses geometric mean.
3. Both the current year and base year prices and quantities are taken into
account by this index. The Index is not widely used owing to the practical
limitations of collecting data. Fishers Ideal Quantity Index can be found
out by the formula.