DESCRIPTIVE
STATISTICS
Instructor: Nexon P. Castillo
OVERVIEW OF USING DATA
DATA
• the facts and figures collected, analyzed, and summarized for
presentation and interpretation.
“The number of Americans with diabetes will nearly double in the next 25 years.”
(Source: Diabetes Care)
Variable
A characteristic or a quantity of interest that can take on
different values.
Observation
A set of values corresponding to a set of variables.
TYPES OF DATA
• Population
The collection of all outcomes, responses,
measurements, or counts that are of interest.
• Sample
A subset, or part, of the population.
TYPES OF DATA
• Quantitative
If numeric and arithmetic operations, such as addition,
subtraction, multiplication, and division, can be performed
on them.
• Categorical
If arithmetic operations cannot be per formed on the
data.
TYPES OF DATA
• Cross-Sectional
Collected from several entities at the same, or
approximately the same, point in time.
• Time Series
Collected over several time periods. Graphs of time series data
are frequently found in business and economic publications. Such
graphs help analysts understand what happened in the past, identify
trends over time, and project future levels for the time series.
TYPES OF DATA
• Sources of Data:
Experimental Study
A variable of interest is first identified. Then one or
more other variables are identified and controlled or
manipulated to obtain data about how these variables
influence the variable of interest.
Observational Study
Studies make no attempt to control the variables of interest. A
survey is perhaps the most common type of observational study. For
instance, in a personal interview survey, research questions are first
identified. Then a questionnaire is designed and administered to a
sample of individuals.
MODIFYING DATA
Sorting and Filtering Data in Excel
Suppose that we want to sort these automobiles by March 2010 sales
instead of by March 2011 sales. To do this, we use Excel’s Sort function, as
shown in the following steps:
Step 1: Select cells A1:F21
Step 2: Click the Data tab in the Ribbon
Step 3: Click Sort in the Sort & Filter group
Step 4: Select the check box for My data has headers
Step 5. In the first Sort by dropdown menu, select Sales (March 2010)
Step 6. In the Order dropdown menu, select Largest to Smallest
Step 7. Click OK
MODIFYING DATA
Sorting and Filtering Data in Excel
Now let’s suppose that we are interested only in seeing the sales of
models made by Toyota. We can do this using Excel’s Filter function:
Step 1. Select cells A1:F21
Step 2. Click the Data tab in the Ribbon
Step 3. Click Filter in the Sort & Filter group
Step 4. Click on the Filter Arrow in column B, next to Manufacturer
Step 5. If all choices are checked, you can easily deselect all choices by
unchecking (Select All). Then select only the check box for Toyota.
Step 6. Click OK
MODIFYING DATA
Conditional Formatting of Data in Excel
Conditional formatting in Excel can make it easy to identify data
that satisfy certain conditions in a data set.
Step 1: Starting with the original data, select cells F1:F21
Step 2: Click the Home tab in the Ribbon
Step 3: Click Conditional Formatting in the Styles group
Step 4: Select Highlight Cells Rules, and click Less Than from the
dropdown menu
Step 5: Enter 0% in the Format cells that are LESS THAN: box
Step 6: Click OK
CREATING DISTRIBUTION from DATA
Frequency Distribution for Categorical Data
A frequency distribution is a summary of data that shows
the number (frequency) of observations in each of several
nonoverlapping classes, typically referred to as bins.
We can use Excel to calculate the frequency of categorical
observations occurring in a data set using the COUNTIF
function.
CREATING DISTRIBUTION from DATA
Relative Frequency and Percent Frequency Distribution
A relative frequency distribution is a tabular summary of
data showing the relative frequency for each bin. A percent
frequency distribution summarizes the percent frequency of the
data for each bin.
The relative frequency of a bin equals the fraction or proportion of
items belonging to a class. For a data set with n observations, the relative
frequency of each bin can be determined as follows:
CREATING DISTRIBUTION from DATA
Frequency Distributions for Quantitative Data
The three steps necessary to define the classes for a
frequency distribution with quantitative data are as follows:
1. Determine the number of nonoverlapping bins.
2. Determine the width of each bin.
3. Determine the bin limits.
CREATING DISTRIBUTION from DATA
Frequency Distributions for Quantitative Data
Number of Bins
Bins are formed by specifying the ranges used to group the data. As
a general guideline, we recommend using from 5 to 20 bins. For a small
number of data items, as few as five or six bins may be used to summarize
the data. For a larger number of data items, more bins are usually required.
The goal is to use enough bins to show the variation in the data, but not so
many that some contain only a few data items.
CREATING DISTRIBUTION from DATA
Frequency Distributions for Quantitative Data
Width of the Bins
Second, choose a width for the bins. As a general guideline, we
recommend that the width be the same for each bin. Thus the choices of the
number of bins and the width of bins are not independent decisions. A larger
number of bins means a smaller bin width and vice versa. To determine an
approximate bin width, we begin by identifying the largest and smallest
data values. Then, with the desired number of bins specified, we can use the
following expression to determine the approximate bin width.
CREATING DISTRIBUTION from DATA
Frequency Distributions for Quantitative Data
Bin Limits
Bin limits must be chosen so that each data item belongs to one and
only one class. The lower bin limit identifies the smallest possible data
value assigned to the bin. The upper bin limit identifies the largest possible
data value assigned to the class. In developing frequency distributions for
qualitative data, we did not need to specify bin limits because each data
item naturally fell into a separate bin.
CREATING DISTRIBUTION from DATA
Frequency Distributions for Quantitative Data
We can use the FREQUENCY function in Excel to count the
number of observations in each bin.
Step 1: Select cells B10:B14
Step 2: Type the formula =FREQUENCY(A2:D6, A10:A14). The range
A2:D6 defines the data set, and the range A10:A14 defines the bins.
Step 3: Press CTRL+SHIFT+ENTER after typing the formula in Step 2.
CREATING DISTRIBUTION from DATA
Cumulative Distribution
Cumulative Frequency Distribution
A variation of the frequency distribution that provides another
tabular summary of quantitative data.