Unit 1
Unit 1
Statistics: Statistics is a science dealing with the collection, analysis, interpretation, and
presentation of numerical data.
Descriptive Statistics: Statistics used to describe or reach a conclusion about that same
group
Inferential Statistics: data gathered from a sample and uses statistics generated to reach
conclusions about the population from which the sample was taken.
Basic Statistical Concepts
Census: When the researcher gathers data from the whole population for a given
measurement of interest.
Ungrouped data: the data that has not been summarized in any way is referred to as
ungrouped data.
Grouped data: Data that have been organized into a frequency distribution are called
grouped data.
Quantitative Data Graphs: Histograms, frequency polygons, ogives, dot plots, stem and
leaf plots
Qualitative Data Graphs: Pie charts, Bar graphs, Pareto Charts, Scatter plots etc.
The following data represent the ages of patients admitted to a small hospital on
September 2023.
85 75 66 43 40
88 80 56 56 67
89 83 65 53 75
87 83 52 44 48
Construct a frequency distribution. Compute the sample mean from the frequency
distribution
Steps for designing frequency distribution:
1. Calculate Range
a. Arithmetic mean
b. Median
c. Mode
2. Measure of Dispersion
a. Absolute
b. Relative
Measures of Dispersion/Variability/Spread
While a measure of central tendency describes the typical value, measures of variability
define how far away the data points tend to fall from the center.
A low dispersion indicates that the data points tend to be clustered tightly around the
centre. High dispersion signifies that they tend to fall further away.
.
Graphical Presentation of Dispersion/Variability/Spread
.
Why Understanding of Variability is Important
If our morning commute takes much longer than the mean travel
time, we will be late for work.
.
Measures of Variance
Range
Inter-quartile Range
Variance/standard deviation
Inter-Quartile range Variance
But who is the Hero of the Story?
The standard deviation (SD) is a single number that summarizes the variability in a dataset.
The standard deviation uses the original data units, simplifying the interpretation.
Suppose a pizza restaurant measures its delivery time in minutes and has an SD of 5. In that case, the
interpretation is that the typical delivery occurs 5 minutes before or after the mean time.
After calculating the standard deviation, you can use various methods to evaluate it. The graphs above
incorporate the SD into the normal probability distribution. Alternatively, you can use the Empirical Rule
or Chebyshev’s Theorem to assess how the standard deviation relates to the distribution of values.
Alternatively, you can calculate the coefficient of variation, which uses both the SD and the mean.
The Empirical Rule for the Standard Deviation of a Normal Distribution
Normal distribution is used to determine the proportion of the values that fall within a
specified number of standard deviations from the mean.
In pizza delivery example where we have a mean delivery time of 20 minutes and a
standard deviation of 5 minutes. Using the Empirical Rule, we can use the mean and
standard deviation to determine that 68% of the delivery times will fall between 15-25
minutes (20 +/- 5) and 95% will fall between 10-30 minutes (20 +/- 2*5).
Comparing Summary Statistics among groups