Quantitative Methods -1
Jighyasu Gaur
Statistics
•The term statistics can refer to
numerical facts such as averages,
medians, percent, and index numbers
that help us understand a variety of
business and economic situations.
•Statistics can also refer to the art and
science of collecting, analyzing,
presenting, and interpreting data.
2
•Data
– Data are facts and figures collected, analysed,
and summarized for presentation and
interpretation.
•Data Set
– All the data collected in a particular study are
referred to as the data set for the study.
3
What is Categorical data?
Slide4
Summarizing Categorical Data
Frequency Distribution
Relative Frequency Distribution
Percent Frequency Distribution
Bar Chart
Pie Chart
Slide5
Frequency Distribution
A frequency distribution is a tabular summary of
data showing the frequency (or number) of items
in each of several non-overlapping classes.
The objective is to provide insights about the data
that cannot be quickly obtained by looking only at
the original data.
Slide6
Frequency Distribution
Example: Marada Inn
Guests staying at Marada Inn were asked to rate the
quality of their accommodations as being excellent,
above average, average, below average, or poor. The
ratings provided by a sample of 20 guests are:
Below Average Average Above Average
Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average
Slide7
Frequency Distribution
Example: Marada Inn
Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
Slide8
Relative Frequency Distribution
The relative frequency of a class is the fraction or
proportion of the total number of data items
belonging to the class.
A relative frequency distribution is a tabular
summary of a set of data showing the relative
frequency for each class.
Slide9
Percent Frequency Distribution
The percent frequency of a class is the relative
frequency multiplied by 100.
A percent frequency distribution is a tabular
summary of a set of data showing the percent
frequency for each class.
Slide10
Relative Frequency and
Percent Frequency Distributions
Example: Marada Inn
Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25 .10(100) = 10
Above Average .45 45
Excellent .05 5
Total 1.00 100
1/20 = .05
Slide11
Bar Chart
A bar chart is a graphical device for depicting
qualitative data.
On one axis (usually the horizontal axis), we specify
the labels that are used for each of the classes.
A frequency, relative frequency, or percent frequency
scale can be used for the other axis (usually the
vertical axis).
Using a bar of fixed width drawn above each class
label, we extend the height appropriately.
The bars are separated to emphasize the fact that each
class is a separate category.
Slide12
Bar Chart
10 Marada Inn Quality Ratings
9
8
7
Frequency
6
5
4
3
2
1
Rating
Poor Below Average Above Excellent
Average Average
Slide13
Pareto Diagram
In quality control, bar charts are used to identify the
most important causes of problems.
When the bars are arranged in descending order of
height from left to right (with the most frequently
occurring cause appearing first) the bar chart is
called a Pareto diagram.
This diagram is named for its founder, Vilfredo
Pareto, an Italian economist.
Slide14
Pie Chart
The pie chart is a commonly used graphical device
for presenting relative frequency and percent
frequency distributions for categorical data.
First draw a circle; then use the relative frequencies
to subdivide the circle into sectors that correspond to
the relative frequency for each class.
Since there are 360 degrees in a circle, a class with a
relative frequency of .25 would consume .25(360) = 90
degrees of the circle.
Slide15
Pie Chart
Marada Inn Quality Ratings
Excellent
5%
Poor
10%
Below
Average
Above 15%
Average
45%
Average
25%
Slide16
Example: Marada Inn
Insights Gained from the Preceding Pie Chart
• One-half of the customers surveyed gave Marada
a quality rating of “above average” or “excellent”
(looking at the left side of the pie). This might
please the manager.
• For each customer who gave an “excellent” rating,
there were two customers who gave a “poor”
rating (looking at the top of the pie). This should
displease the manager.
Slide17
Summarizing Quantitative Data
Frequency Distribution
Relative Frequency and
Percent Frequency Distributions
Dot Plot
Histogram
Cumulative Distributions
Ogive
Slide18
Frequency Distribution
Example: Hudson Auto Repair
The manager of Hudson Auto would like to gain a
better understanding of the cost of parts used in the
engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.
Slide19
Frequency Distribution
Example: Hudson Auto Repair
Sample of Parts Cost($) for 50 Tune-ups
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Slide20
Frequency Distribution
The three steps necessary to define the classes for a
frequency distribution with quantitative data are:
1. Determine the number of non-overlapping classes.
2. Determine the width of each class.
3. Determine the class limits.
Slide21
Frequency Distribution
Guidelines for Determining the Number of Classes
• Use between 5 and 20 classes.
• Data sets with a larger number of elements
usually require a larger number of classes.
• Smaller data sets usually require fewer classes.
The goal is to use enough classes to show the
variation in the data, but not so many classes
that some contain only a few data items.
Slide22
Frequency Distribution
Guidelines for Determining the Width of Each Class
• Use classes of equal width.
• Approximate Class Width =
Making the classes the same
width reduces the chance of
inappropriate interpretations.
Slide23
Frequency Distribution
Note on Number of Classes and Class Width
• In practice, the number of classes and the
appropriate class width are determined by trial
and error.
• Once a possible number of classes is chosen, the
appropriate class width is found.
• The process can be repeated for a different
number of classes.
• Ultimately, the analyst uses judgment to
determine the combination of the number of
classes and class width that provides the best
frequency distribution for summarizing the data.
Slide24
Frequency Distribution
Guidelines for Determining the Class Limits
• Class limits must be chosen so that each data
item belongs to one and only one class.
• The lower class limit identifies the smallest
possible data value assigned to the class.
• The upper class limit identifies the largest
possible data value assigned to the class.
• The appropriate values for the class limits
depend on the level of accuracy of the data.
An open-end class requires only a
lower class limit or an upper class limit.
Slide25
Frequency Distribution
Example: Hudson Auto Repair
If we choose six classes:
Approximate Class Width = (109 - 52)/6 = 9.5 10
Parts Cost ($) Frequency
50-59 2
60-69 13
70-79 16
80-89 7
90-99 7
100-109 5
Total 50
Slide26
Relative Frequency and
Percent Frequency Distributions
Example: Hudson Auto Repair
Parts Relative Percent
Cost ($) Frequency Frequency
50-59 .04 4
60-69 .26 2/50 26 .04(100)
70-79 .32 32
80-89 .14 14 Percent
frequency is
90-99 .14 14 the relative
100-109 .10 10 frequency
Total 1.00 100 multiplied
by 100.
Slide27
Relative Frequency and
Percent Frequency Distributions
Example: Hudson Auto Repair
Insights Gained from the % Frequency Distribution:
• Only 4% of the parts costs are in the $50-59 class.
• 30% of the parts costs are under $70.
• The greatest percentage (32% or almost one-third)
of the parts costs are in the $70-79 class.
• 10% of the parts costs are $100 or more.
Slide28
Histogram
Another common graphical presentation of
quantitative data is a histogram.
The variable of interest is placed on the horizontal
axis.
A rectangle is drawn above each class interval with
its height corresponding to the interval’s frequency,
relative frequency, or percent frequency.
Unlike a bar graph, a histogram has no natural
separation between rectangles of adjacent classes.
Slide29
Histogram
Example: Hudson Auto Repair
18
Tune-up Parts Cost
16
14
Frequency
12
10
8
6
4
2
Parts
50-59 60-69 70-79 80-89 90-99 100-110 Cost ($)
Slide30
Histograms Showing Skewness
Symmetric
• Left tail is the mirror image of the right tail
• Examples: heights and weights of people
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Slide31
Histograms Showing Skewness
Moderately Skewed Left
• A longer tail to the left
• Example: exam scores
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Slide32
Histograms Showing Skewness
Moderately Right Skewed
• A Longer tail to the right
• Example: housing values
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Slide33
Histograms Showing Skewness
Highly Skewed Right
• A very long tail to the right
• Example: executive salaries
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Slide34
Cumulative Distributions
Cumulative frequency distribution - shows the
number of items with values less than or equal to the
upper limit of each class..
Cumulative relative frequency distribution – shows
the proportion of items with values less than or
equal to the upper limit of each class.
Cumulative percent frequency distribution – shows
the percentage of items with values less than or
equal to the upper limit of each class.
Slide35
Cumulative Distributions
The last entry in a cumulative frequency distribution
always equals the total number of observations.
The last entry in a cumulative relative frequency
distribution always equals 1.00.
The last entry in a cumulative percent frequency
distribution always equals 100.
Slide36
Cumulative Distributions
Hudson Auto Repair
Cumulative Cumulative
Cumulative Relative Percent
Cost ($) Frequency Frequency Frequency
< 59 2 .04 4
< 69 15 .30 30
< 79 31 2 + 13 .62 15/50 62 .30(100)
< 89 38 .76 76
< 99 45 .90 90
< 109 50 1.00 100
Slide37
Descriptive Statistics
Slide38
Descriptive Statistics
Most of the statistical information in newspapers,
magazines, company reports, and other
publications consists of data that are summarized
and presented in a form that is easy to understand.
Such summaries of data, which may be tabular,
graphical, or numerical, are referred to as descriptive
statistics.
Slide39
Descriptive Statistics: Numerical Measures
Measures of Location
Measures of Variability
Slide40
Measures of Location
Mean
If the measures are computed
Median
for data from a sample,
Mode they are called sample statistics.
Percentiles
Quartiles If the measures are computed
for data from a population,
they are called population parameters.
A sample statistic is referred to
as the point estimator of the
corresponding population parameter.
Slide41
Mean
Perhaps the most important measure of location is
the mean.
The mean provides a measure of central location.
The mean of a data set is the average of all the data
values.
The sample mean is the point estimator of the
population mean m.
Slide42
Sample Mean
Sum of the values
of the n observations
Number of
observations
in the sample
Slide43
Population Mean m
Sum of the values
of the N observations
Number of
observations in
the population
Slide44
Sample Mean
Example: Apartment Rents
Seventy efficiency apartments were randomly
sampled in a small college town. The monthly rent
prices for these apartments are listed below.
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440
Slide45
Sample Mean
Example: Apartment Rents
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440
Slide46
Median
The median of a data set is the value in the middle
when the data items are arranged in ascending order.
Whenever a data set has extreme values, the median
is the preferred measure of central location.
The median is the measure of location most often
reported for annual income and property value data.
A few extremely large incomes or property values
can inflate the mean.
Slide47
Median
For an odd number of observations:
26 18 27 12 14 27 19 7 observations
12 14 18 19 26 27 27 in ascending order
the median is the middle value.
Median = 19
Slide48
Median
For an even number of observations:
26 18 27 12 14 27 30 19 8 observations
12 14 18 19 26 27 27 30 in ascending order
the median is the average of the middle two values.
Median = (19 + 26)/2 = 22.5
Slide49
Median
Example: Apartment Rents
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.
Slide50
Trimmed Mean
Another measure, sometimes used when extreme
values are present, is the trimmed mean.
It is obtained by deleting a percentage of the
smallest and largest values from a data set and then
computing the mean of the remaining values.
For example, the 5% trimmed mean is obtained by
removing the smallest 5% and the largest 5% of the
data values and then computing the mean of the
remaining values.
Slide51
Mode
The mode of a data set is the value that occurs with
greatest frequency.
The greatest frequency can occur at two or more
different values.
If the data have exactly two modes, the data are
bimodal.
If the data have more than two modes, the data are
multimodal.
Caution: If the data are bimodal or multimodal,
Excel’s MODE function will incorrectly identify a
single mode.
Slide52
Mode
Example: Apartment Rents
450 occurred most frequently (7 times)
Mode = 450
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.
Slide53
Descriptive Statistics: Numerical Measures for
Grouped Data
Measures of Location : Mean
Class 0-1 1-2 2-3 3-4 4-5 5-6
Frequency 1 4 8 6 3 1
Slide54
Descriptive Statistics: Numerical Measures for
Grouped Data
Measures of Location : Mean
Class x f fx
0-1 0.5 1 0.5
1-2 1.5 4 6
2-3 2.5 8 20
3-4 3.5 6 21
4-5 4.5 3 13.5
5-6 5.5 1 5.5
Total 23 66.5
Mean = (66.5/23) = 2.9
Slide55
Descriptive Statistics: Numerical Measures for
Grouped Data
Measures of Location : Mean
Find mean for the following data set
Class 05-10 10-15 15-20 20-25 25-30
Frequency 10 12 16 14 8
Slide56
Descriptive Statistics: Numerical Measures for
Grouped Data
Measures of Location : Mean
Class x f fx
05-10 7.50 10 75
10-15 12.50 12 150
15-20 17.50 16 280
20-25 22.50 14 315
25-30 27.50 8 220
Total 60 1040
Mean = (1040/60) = 17.33
Slide57
Percentiles
A percentile provides information about how the
data are spread over the interval from the smallest
value to the largest value.
Admission test scores for colleges and universities
are frequently reported in terms of percentiles.
The pth percentile of a data set is a value such that at
least p percent of the items take on this value or less
and at least (100 - p) percent of the items take on this
value or more.
Slide58
Percentiles
Arrange the data in ascending order.
Compute index i, the position of the pth percentile.
i = (p/100)n
If i is not an integer, round up. The p th percentile
is the value in the i th position.
If i is an integer, the p th percentile is the average
of the values in positions i and i +1.
Slide59
80th Percentile
Example: Apartment Rents
i = (p/100)n = (80/100)70 = 56
Averaging the 56th and 57th data values:
80th Percentile = (535 + 549)/2 = 542
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.
Slide60
80th Percentile
Example: Apartment Rents
“At least 80% of the “At least 20% of the
items take on a items take on a
value of 542 or less.” value of 542 or more.”
56/70 = .8 or 80% 14/70 = .2 or 20%
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Slide61
Quartiles
Quartiles are specific percentiles.
First Quartile = 25th Percentile
Second Quartile = 50th Percentile = Median
Third Quartile = 75th Percentile
Slide62
Third Quartile
Example: Apartment Rents
Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.
Slide63
Measures of Variability
It is often desirable to consider measures of variability
(dispersion), as well as measures of location.
For example, in choosing supplier A or supplier B we
might consider not only the average delivery time for
each, but also the variability in delivery time for each.
Slide64
Measures of Variability
Range
Interquartile Range
Variance
Standard Deviation
Coefficient of Variation
Slide65
Range
The range of a data set is the difference between the
largest and smallest data values.
It is the simplest measure of variability.
It is very sensitive to the smallest and largest data
values.
Slide66
Range
Example: Apartment Rents
Range = largest value - smallest value
Range = 615 - 425 = 190
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.
Slide67
Interquartile Range
The interquartile range of a data set is the difference
between the third quartile and the first quartile.
It is the range for the middle 50% of the data.
It overcomes the sensitivity to extreme data values.
Slide68
Interquartile Range
Example: Apartment Rents
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.
Slide69
Variance
The variance is a measure of variability that utilizes
all the data.
It is based on the difference between the value of
each observation (xi) and the mean ( for a sample,
m for a population).
The variance is useful in comparing the variability
of two or more variables.
Slide70
Variance
The variance is the average of the squared
differences between each data value and the mean.
The variance is computed as follows:
for a for a
sample population
Slide71
Standard Deviation
The standard deviation of a data set is the positive
square root of the variance.
It is measured in the same units as the data, making
it more easily interpreted than the variance.
Slide72
Standard Deviation
The standard deviation is computed as follows:
for a for a
sample population
Example
Slide73
Coefficient of Variation
The coefficient of variation indicates how large the
standard deviation is in relation to the mean.
The coefficient of variation is computed as follows:
for a for a
sample population
Slide74
Sample Variance, Standard Deviation,
And Coefficient of Variation
Example: Apartment Rents
• Variance
• Standard Deviation the standard
deviation is
about 11%
of the mean
• Coefficient of Variation
Slide75