Course: Basic Statistics
Faculty: Dr. Avanish Singh Chauhan
ACADEMIC YEAR: 2025 - 26
Unit: II – Descriptive Statistics PROGRAM: MBA
SEMESTER: I
Types of Statistics
• Statistics
• The branch of mathematics that transforms data into
useful information for decision makers.
Descriptive Statistics Inferential Statistics
Collecting, summarizing, and Drawing conclusions and/or
describing data making decisions concerning a
population based only on sample
data
6
Descriptive Statistics
• Collect data
• e.g., Survey
• Present data
• e.g., Tables and graphs
• Characterize data
• e.g., Sample mean =
X i
n
7
Descriptive Summary Measures
Describing Data Numerically
Central Tendency Quartiles Variation Shape
Arithmetic Mean Range Skewness
Median Interquartile Range
Mode Variance
Geometric Mean Standard Deviation
Coefficient of Variation
Measures of Central Tendency
Overview
Central Tendency
Arithmetic Mean Median Mode Geometric Mean
X i
XG = ( X1 X2 Xn )1/ n
X= i=1
n Midpoint of Most
ranked values frequently
observed
value
Arithmetic Mean
• The arithmetic mean (mean) is the most common
measure of central tendency
• For a sample of size n:
X i
X1 + X2 + + Xn
X= i=1
=
n n
Sample size Observed values
Arithmetic Mean
(continued)
• The most common measure of central tendency
• Mean = sum of values divided by the number of
values
• Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 + 2 + 3 + 4 + 5 15 1 + 2 + 3 + 4 + 10 20
= =3 = =4
5 5 5 5
Median
• In an ordered array, the median is the “middle”
number (50% above, 50% below)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
• Not affected by extreme values
Finding the Median
• The location of the median:
n +1
Median position = position in the ordered data
2
• If the number of values is odd, the median is the middle number
• If the number of values is even, the median is the average of the two
middle numbers
n +1
• Note that is not the value of the median, only the position
2
of the median in the ranked data
Mode
• A measure of central tendency
• Value that occurs most often
• Not affected by extreme values
• Used for either numerical or categorical
(nominal) data
• There may may be no mode
• There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
Example
• Five houses on a hill by the beach
$2,000 K
House Prices:
$2,000,000
500,000 $500 K
300,000 $300 K
100,000
100,000
$100 K
$100 K
Example: Summary Statistics
• Mean: ($3,000,000/5)
House Prices:
= $600,000
$2,000,000
500,000
300,000 • Median: middle value of ranked data
100,000
100,000
= $300,000
Sum $3,000,000
• Mode: most frequent value
= $100,000
Quartiles
• Quartiles split the ranked data into 4 segments
with an equal number of values per segment
25% 25% 25% 25%
Q1 Q2 Q3
◼ The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
◼ Q2 is the same as the median (50% are smaller, 50% are
larger)
◼ Only 25% of the observations are greater than the third
quartile
Quartile Formulas
Find a quartile by determining the value in the appropriate
position in the ranked data, where
First quartile position: Q1 = (n+1)/4
Second quartile position: Q2 = (n+1)/2 (the median position)
Third quartile position: Q3 = 3(n+1)/4
where n is the number of observed values
Quartiles
◼ Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Q1 and Q3 are measures of noncentral location
Q2 = median, a measure of central tendency
Quartiles
◼ Example:
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data,
so Q1 = 12.5
Q2 is in the (9+1)/2 = 5th position of the ranked data,
so Q2 = median = 16
Q3 is in the 3(9+1)/4 = 7.5 position of the ranked data,
so Q3 = 19.5
Measures of Variation
Variation
Range Interquartile Variance Standard Coefficient of
Range Deviation Variation
◼ Measures of variation give
information on the spread
or variability of the data
values.
Same center,
different variation
Range
• Simplest measure of variation
• Difference between the largest and the smallest
values in a set of data:
Range = Xlargest – Xsmallest
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Disadvantages of the Range
• Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
• Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Interquartile Range
• Can eliminate some outlier problems by using
the interquartile range
• Eliminate some high- and low-valued
observations and calculate the range from the
remaining values
• Interquartile range = 3rd quartile – 1st quartile
= Q3 – Q1
Interquartile Range
Standard Deviation
• Most commonly used measure of variation
• Shows variation about the mean
• Is the square root of the variance
• Has the same units as the original data
n
• Sample standard deviation:
(X − X)
i
2
S= i=1
n -1
Example: Sample Standard Deviation
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = X = 16
(10 − X)2 + (12 − X)2 + (14 − X)2 + + (24 − X)2
S=
n −1
(10 − 16)2 + (12 − 16)2 + (14 − 16)2 + + (24 − 16)2
=
8 −1
A measure of the “average” scatter
130
= = 4.3095 around the mean
7
Measuring Variation
Small standard deviation
Large standard deviation
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21
S = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 0.926
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 4.567
Shape of a Distribution
• Describes how data are distributed
• Measures of Shape
• Symmetric or Skewed [Skewness]
Skewness
Skewness is a measure of the asymmetry of a distribution. A
distribution is asymmetrical when its left and right side are not
mirror images. A distribution can have right (or positive), left (or
negative), or zero skewness. A right-skewed distribution is longer
on the right side of its peak, and a left-skewed distribution is
longer on the left side of its peak.
• In a distribution with right skew, the mean is always greater
than median.
• In a distribution with zero skew, the mean and median are equal.
• In a distribution with left skew, the mean is always less than
median.
Dr. Avanish Singh Chauhan [ डॉ. अवनीश स िंह चौहान ]
Associate Professor | Faculty of Management Studies
9680099891
[Link]@[Link]
Office Location: SB511H