Variability
Variability
MEASURES OF VARIABILITY
Measures of average such as the median and mean represent the typical value
for a dataset. Within the dataset the actual values usually differ from one another and
from the average value itself. The extent to which the median and mean are good
representatives of the values in the original dataset depends upon the variability or
dispersion in the original data. Datasets are said to have high dispersion when they
contain values considerably higher and lower than the mean value. In figure 1 the
number of different sized tutorial groups in semester 1 and semester 2 are presented.
In both semesters the mean and median tutorial group size is 5 students, however the
groups in semester 2 show more dispersion (or variability in size) than those in
semester 1. Dispersion within a dataset can be measured or described in several ways
including the range, inter-quartile range and standard deviation.
The Range.
The range is the most obvious measure of dispersion and is the difference
between the lowest and highest values in a dataset. In figure 1, the size of the largest
semester 1 tutorial group is 6 students and the size of the smallest group is 4 students,
resulting in a range of 2 (6-4). In semester 2, the largest tutorial group size is 7 students
and the smallest tutorial group contains 3 students, therefore the range is 4 (7-3).
The range is simple to compute and is useful when you wish to evaluate
the whole of a dataset.
The range is useful for showing the spread within a dataset and for
comparing the spread between similar datasets.
1 rensonrobles@yahoo.com
STATISTICS 101
To find the range in marks the highest and lowest values need to be found from
the table. The highest coursework mark was 48 and the lowest was 27 giving a range of
21. In the examination, the highest mark was 45 and the lowest 12 producing a range of
33. This indicates that there was wider variation in the students performance in the
examination than in the coursework for this module. Since the range is based solely on
the two most extreme values within the dataset, if one of these is either exceptionally
high or low (sometimes referred to as outlier) it will result in a range that is not typical
of the variability within the dataset. For example, imagine in the above example that
one student failed to hand in any coursework and was awarded a mark of zero,
however they sat the exam and scored 40. The range for the coursework marks would
now become 48 (48-0), rather than 21, however the new range is not typical of the
dataset as a whole and is distorted by the outlier in the coursework marks. In order to
reduce the problems caused by outliers in a dataset, the inter-quartile range is often
calculated instead of the range.
2 rensonrobles@yahoo.com
STATISTICS 101
The median lies at the mid-point between the two central values (10th and 11th)
= half-way between 60 and 62 = 61
The lower quartile lies at the mid-point between the 5th and 6th values
= half-way between 52 and 53 = 52.5
The upper quartile lies at the mid-point between the 15th and 16th values
= half-way between 70 and 71 = 70.5
The inter-quartile range for this dataset is therefore 70.5 - 52.5 = 18 whereas the
range is: 80 - 43 = 37.
The inter-quartile range provides a clearer picture of the overall dataset by
removing/ignoring the outlying values. Like the range however, the inter-quartile range
is a measure of dispersion that is based upon only two values from the dataset.
Statistically, the standard deviation is a more powerful measure of dispersion because it
takes into account every value in the dataset. The standard deviation is explored in the
next section of this guide.
Calculating the Inter-quartile range using Excel.
The method Excel uses to calculate quartiles is not commonly used and tends to
produce unusual results particularly when the dataset contains only a few values. For
this reason you may be best to calculate the inter-quartile range by hand.
3 rensonrobles@yahoo.com
STATISTICS 101
determine the proportion of values that lie within a particular range of the mean value.
For such distributions it is always the case that 68% of values are less than one
standard deviation (1SD) away from the mean value, that 95% of values are less than
two standard deviations (2SD) away from the mean and that 99% of values are less
than three standard deviations (3SD) away from the mean. Figure 3 shows this concept
in diagrammatical form.
4 rensonrobles@yahoo.com
STATISTICS 101
x x2 x
2 2
or
N N N
Where x represents each value in the population, is the mean value of the
population, is the summation (or total), and N is the number of values in the
population.
The standard deviation of a sample is known as S and is calculated using:
x
2
n 1
Where x represents each value in the population, x is the mean value of the
sample, is the summation (or total), and n-1 is the number of values in the sample
minus 1.
5 rensonrobles@yahoo.com
STATISTICS 101
presented as column A of the spreadsheet (figure 5). As you have only made 5 trips you
do not have any further information and you are therefore measuring the whole
population at this point in time. The command to find the population standard deviation
in Excel is =STDEVP(VALUES) and in this case the command is =STDEVP(A2:A6) which
gives an answer of 0.49. Basing your results on the population standard deviation and
assuming that your first 5 trips in your new car have been typical of your usual
journeys, you can be 99% confident that your new car will do between 14.75 (MEAN-
3SD) and 17.69 (MEAN+3SD) kilometres per litre .
The same data can be used to demonstrate how to calculate the sample standard
deviation in Excel. In this case, imagine that the data in column A represent the
kilometres per litre found for a sample of 5 new cars tested by the manufacturer. The
population standard deviation is calculated using =STDEV(VALUES) and in this case the
command is =STDEV(A2:A6) which produces an answer of 0.55. The sample standard
deviation will always be greater than the population standard deviation when they are
calculated for the same dataset. This is because the formula for the sample standard
deviation has to take into account the possibility of there being more variation in the
true population than has been measured in the sample. Based on their sample of 5 cars,
and therefore using the sample standard deviation, the manufacturers could state with
99% confidence that similar cars will do between 14.57 (MEAN-3SD) and 17.87
(MEAN+3SD) kilometres per litre . These examples show the quick method of
calculating standard deviations using a cell range. Each of the commands can also be
written out in a longer format with the individual kilometres/litre entered.
For example entering: =STDEV(16.13,16.40,15.81,17.07,15.69) produces an identical
result to =STDEV(A2:A6). However, if one of the values in column A was found to be
incorrect and adjusted, the cell range method would automatically update the
6 rensonrobles@yahoo.com
STATISTICS 101
calculation of the standard deviation whereas the longer format will require manual
adjustment of the command.
VARIANCE
The variance and the closely-related standard deviation are measures of
how spread out a distribution is. In other words, they are measures of variability.
The variance is computed as the average squared deviation of each number from
its mean. For example, for the numbers 1, 2, and 3, the mean is 2 and the variance is:
x
x
2 2
x2
2
or 2
N N N
f M
fM 2 fM
2 2
or
N N N
Where - is the standard deviation
M - class mark
- Mean
f frequency
N total frequency
Example:
Below are the scores of 40 BS Architecture students in Building Technology.
Compute the variance and standard deviation.
Class
Class frequency
Interval (f)
Mark M M 2 f M
2
(M)
45 - 49 2 47 -16.875 284.766 569.5313
50 - 54 5 52 -11.875 141.016 705.0781
55 - 59 6 57 -6.875 47.2656 283.5938
60 - 64 10 62 -1.875 3.51563 35.15625
65 - 69 4 67 3.125 9.76563 39.0625
70 - 74 6 72 8.125 66.0156 396.0938
7 rensonrobles@yahoo.com
STATISTICS 101
Exercises:
Find the variance and standard deviation of the following set of data.
1. 44, 49, 52, 62, 53, 48, 54, 49, 46, 51
2. 12, 13, 10, 14, 14, 15, 17, 17, 10. 12, 11
3. 5.5, 4.3, 3.4, 5.6, 5.4, 7.8
4. 65, 75, 73, 50, 60, 64, 69, 62, 67, 85
5. 85, 79, 57, 39, 45, 71, 67, 87, 91, 49
6. 43, 51, 53, 110, 50, 48, 87, 69, 68, 91
7. Class Frequency
2732 1
3338 0
3944 6
4549 4
5055 2
8. Class Frequency
59 1
913 2
1317 5
1720 6
2024 3
9. Class Frequency
913 1
1419 6
2025 2
2628 5
2932 9
8 rensonrobles@yahoo.com