Identifying Types of Variables
Identifying Types of Variables
decision makers.
DESCRIPTIVE STATISTICS
Descriptive statistics are the methods that help collect, summarize, present, and analyze a set
of data.
INFERENTIAL STATISTICS
Inferential statistics are the methods that use the data collected from a small group to draw
conclusions about a larger group.
VARIABLE
A variable is a characteristic of an item or individual.
DATA
Data are the different values associated with a variable.
POPULATION
A population consists of all the items or individuals about which you want to reach conclusions.
SAMPLE
A sample is the portion of a population selected for analysis.
PARAMETER
A parameter is a measure that describes a characteristic of a population.
STATISTIC
A statistic is a measure that describes a characteristic of a sample.
Measurement Scales
A nominal scale classifies data into distinct categories in which no ranking is implied.
ordinal scale classifies values into distinct categories in which ranking is implied.
interval scale is an ordered scale in which the difference between measurements is a meaningful
quantity but does not involve a true zero point. Temperature (in degrees Celsius or Fahrenheit)
Standardized exam score (e.g., ACT or SAT)
ratio scale is an ordered scale in which the difference between the measurements involves a
true zero point, as in height, weight, age, or salary measurements.
Simple random sampling is the basic sampling technique where we select a group of subjects (a sample) for
study from a larger group (a population). Each individual is chosen entirely by chance and each member of the
population has an equal chance of being included in the sample.
Stratified sampling refers to a type of sampling method . With stratified sampling, the researcher divides the
population into separate groups, called strata. Then, a probability sample (often a simple random sample ) is drawn
from each group. Stratified sampling has several advantages over simple random sampling
Cluster sampling is a sampling technique used when "natural" but relatively heterogeneous groupings are evident
in a statistical population. It is often used in marketing research. In this technique, the total population is divided into
these groups (or clusters) and a simple random sample of the groups is selected.
Systematic sampling is a type of probability sampling method in which samplemembers from a larger population
are selected according to a random starting point and a fixed periodic interval. This interval, called
the sampling interval, is calculated by dividing the population size by the desired sample size.
Define the variables that you want to study in order to solve a business problem or meet
a business objective
• Collect the data from appropriate sources
• Organize the data collected by developing tables
• Visualize the data by developing charts
• Analyze the data by examining the appropriate tables and charts (and in later chapters by
using other statistical methods) to reach conclusions.
CENTRAL TENDENCY
The central tendency is the extent to which the data values group around a typical or central
value.
VARIATION
The variation is the amount of dispersion, or scattering, of values away from a central value.
SHAPE
The shape is the pattern of the distribution of values from the lowest value to the highest value.
SAMPLE MEAN
The sample mean is the sum of the values in a sample divided by the number of values in the
sample. Because all the values play an equal role, a mean is greatly affected by any value that is
greatly different from the others. When you have such extreme values, you should avoid using
the mean as a measure of central tendency. The mean can suggest a typical or central value for
a data set.
The Median
The median is the middle value in an ordered array of data that has been ranked from smallest
to largest. Half the values are smaller than or equal to the median, and half the values are larger
than or equal to the median. The median is not affected by extreme values, so you can use the
median when extreme values are present.
The Mode
The mode is the value in a set of data that appears most frequently. Like the median and unlike
the mean, extreme values do not affect the mode. Often, there is no mode or there are several
modes in a set of data.
The characteristics of the range, variance, and standard deviation can be summarized as
follows:
• The greater the spread or dispersion of the data, the larger the range, variance, and standard
deviation.
• The smaller the spread or dispersion of the data, the smaller the range, variance, and standard
deviation.
If the values are all the same (so that there is no variation in the data), the range, variance,and
standard deviation will all equal zero.
• None of the measures of variation (the range, variance, and standard deviation) can ever be
negative.
Z Scores
An extreme value or outlier is a value located far away from the mean. The Z score, which is
the difference between the value and the mean, divided by the standard deviation, is useful in
identifying outliers. Values located far away from the mean will have either very small (negative)
Z scores or very large (positive) Z scores.
Shape
Shape is the pattern of the distribution of data values throughout the entire range of all the
values.
A distribution is either symmetrical or skewed. In a symmetrical distribution, the values below
the mean are distributed in exactly the same way as the values above the mean. In this case, the
low and high values balance each other out. In a skewed distribution, the values are not
symmetrical around the mean. This skewness results in an imbalance of low values or high
values.
Shape also can influence the relationship of the mean to the median. In most cases:
• Mean median: negative, or left-skewed
• Mean median: symmetric, or zero skewness
• Mean median: positive, or right-skewed
Skewness and kurtosis are two shape-related statistics. The skewness statistic measures the
extent to which a set of data is not symmetric. The kurtosis statistic measures the relative
concentration of values in the center of the distribution of a data set, as compared with the tails.
A symmetric distribution has a skewness value of zero. A right-skewed distribution has a positive
skewness value, and a left-skewed distribution has a negative skewness value.
A bell-shaped distribution has a kurtosis value of zero. A distribution that is flatter than a bell-
shaped distribution has a negative kurtosis value. A distribution with a sharper peak (one that
has a higher concentration of values in the center of the distribution than a bell-shaped
distribution) has a positive kurtosis value.