Module 1 - Review of Basic Concepts
Module 1 - Review of Basic Concepts
Learning Outcomes
At the end of this module you shall be able to:
1. define the concepts related to basic statistics;
2. enumerate and differentiate the types of data; and
3. discuss the importance of a frequency distribution in determining statistical probability.
Pre-Assessment
Before you begin this module, you must have answered first the Pre-Assessment
posted in Google Classroom. If you are unable to connect to Google Classroom you
can access the Pre-Assessment by using this link: https://bre.is/wTGnRKS2 or by
scanning the QR Code presented. You can access the Pre-Assessment using your
laptop or your mobile devices.
Discussion
VARIABLES
In mathematics, a variable, also called an unknown, is a quantity whose value is not
necessarily specified, but that can be determined according to certain rules. Mathematical variables
are expressed using italicized letters of the alphabet, usually in lowercase. For example, in the
expression x + y + z=5, the letters x , y , and z are variables that represent numbers. In statistics,
Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 2
variables are similar to those in mathematics. But there are some subtle distinctions. Perhaps most
important is this: In statistics, a variable is always associated with one or more experiments.
Categorical Variable
In statistics, a categorical variable also called qualiatative variable is a variable that can take
on one of a limited, and usually fixed, number of possible values, assigning each individual
or other unit of observation to a particular group or nominal category on the basis of some
qualitative property. A categorical variable describes the qualities of the objects of interest.
Examples of categorical variable:
The blood type of a person can be categorized as A, B, AB or O with each having
either on RhD positive or RhD negative antigen;
Marital status can either be single, married or widow/widower;
Religious affiliation.
Numerical Variable
A numerical variable, also called a quantitative variable, is a variable where the
measurement or number has a numerical meaning. A numerical variable describes
quantities of the objects of interest. Example of numerical data are:
The temperature of a given body or place is measured using numerical data;
The number of students in a statistics class.
Determine if the following variables is a categorical or numerical. Write C on the space provided
if the variable is categorical, otherwise write N.
Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 3
Discreet vs Continuous
Numerical data can either be discreet or continuous.
Discrete Variables
In statistics, a discrete variable, also called a meristic or discontinuous variable, is a variable
that can attain only specific values. The number of possible values is countable. It’s easy to
express the value of a discrete variable, because it can be assumed exact. Examples of
discreet variables are:
Continuous Variables
A continuous variable can attain infinitely many values over a certain span or range.
Instead of existing as specific values in which there is an increment between any two, a
continuous variable can change value to an arbitrarily tiny extent. This variable always has
a value between any two other values, no matter how close together the values might be.
Measurements of continuous variables are always approximations and depend on the
precision of the instrument that you are using to measure. Examples:
height;
weight;
age (can be presented as 7 years 6 months);
time; and
systolic blood pressure.
Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 4
Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 5
Example:
If we are interested in making generalizations about reading habits of all college
students in Tarlac, then the statistical population is the set of all colleges students in Tarlac
that exist now (currently enrolled), ever existed (those who already graduated), or will
exist in the future (incoming college students). Since in this case and many others it is
impossible to observe the entire statistical population, due to time constraints, constraints
of geographical accessibility, and constraints on the researcher’s resources, a researcher
would instead observe a statistical sample from the population in order to attempt to learn
something about the population as a whole.
Example:
From our previous example, having identified the population for a study regarding
the reading habit of college students we can use the following subsets may be considered as
sample of the study:
currently enrolled college students;
college students from public institutions;
college students from private institutions;
college students from a selected school; or
a group of college students which represent their respective schools.
When a sample consists of the whole population, it is called a census. When a sample
consists of a subset of a population whose elements are chosen at random, it is called a random
sample.
Types of Sample
Complete Sample
A complete sample is a set of objects from a parent population that includes all such
objects that satisfy a set of well-defined selection criteria. For example, a complete sample
of men from Tarlac taller than 2 meters would consist of a list of every male from Tarlac
taller than 2 meters. It wouldn’t include males from other provinces, or tall females, or
people shorter than 2 meters.
To compile such a complete sample requires a complete list of the parent
population, including data on height, gender, and nationality for each member of that parent
population.
Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 6
example, an unbiased sample of men from Tarlac taller than 2 meters might consist of a
randomly sampled subset of 1% of males from Tarlac taller than 2 meters. However, one
chosen from the electoral register might not be unbiased since, for example, males aged
under 18 will not be on the electoral register.
Random Sampling
The best way to avoid a biased or unrepresentative sample is to select a random sample,
also known as a probability sample. A random sample is defined as a sample wherein each
individual member of the population has a known, non-zero chance of being selected as part of the
sample. Several types of random samples are simple random samples, systematic samples,
stratified random samples, and cluster random samples.
Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 7
Cluster Sample
Cluster sampling divides the population into groups, or clusters. Some of these
clusters are randomly selected. Then, all the individuals in the chosen cluster are selected to
be in the sample. This process is often used because it can be cheaper and more time-
efficient.
For example, while surveying households within a city, we might choose to select
100 city blocks and then interview every household within the selected blocks, rather than
interview random households spread out over the entire city.
Systematic Sample
Systematic sampling relies on arranging the target population according to some
ordering scheme and then selecting elements at regular interval through that ordered list.
Systematic sampling involves a random start and them proceeds with the selection of every
k th element from then onward. In this case, k =¿ population ¿ ¿ sample ¿ ¿ ¿ ¿.
It is important that the starting point is not automatically the first in the list, but is
instead randomly chosen from within the first to the k th element in the list. A simple
example would be to select every 10th name from a list (an “every 10th” sample, also referred
to as “sampling with a skip of 10”).
Sampling Error
Sampling errors are incurred when the statistical characteristics of a population are
estimated from a subset, or sample, of that population. Since the sample does not include all
members of the population, statistics on the sample generally differ from the characteristics
of the entire population.
Example, if one measures the height of a thousand individuals from a country of one
million, the average height of the thousand is typically not the same as the average height of
all one million people in the country. Since sampling is typically done to determine the
characteristics of a whole population, the difference between the sample and population
values is considered an error.
FREQUENCY
The frequency of a particular outcome (result) of an event is the number of times that
outcome occurs within a specific sample of a population. In statistics, the term ‘‘ frequency’’ means
‘‘often-ness.’’ There are two species of statistical frequency: absolute frequency and relative
frequency.
Absolute Frequency
Absolute frequency is a statistical term describing the number of times a particular
piece of data or a particular value appears during a trial or set of trials. Essentially, absolute
frequency is a simple count of the number of times a value is observed. Suppose you toss a
die 6000 times. If the die is not ‘‘weighted,’’ you should expect that the die will turn up
showing one dot approximately 1000 times, two dots approximately 1000 times, and so on,
up to six dots approximately 1000 times. The absolute frequency in such an experiment is
therefore approximately 1000 for each face of the die.
Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 8
Relative Frequency
The relative frequency of an event is defined as the number of times that the event
occurs during experimental trials, divided by the total number of trials conducted. The
relative frequency is not a theoretical quantity, but an experimental one. We have to repeat
an experiment a number of times and count how many times the outcome of the trial is in
the event set. Because it is experimental, it is possible to get a different relative frequency
every time that we repeat an experiment. In our example above, when thrown 6000 times,
the relative frequency for each of the six faces of a die is approximately 1 in 6, which is
equivalent to about 16.67%.
Example:
The table above shows the result of a single hypothetical test wherein an “unweighted” die
was tossed 6000 times. The result of each toss was recorded. After 6000 trials, the face with one dot
showed 968 times. This is the absolute frequency of the event where a toss will result in a face with
one dot. The relative frequency was computed using the equation:
absolute frequency
Relative Frequence=¿
total number of trails
Hence, the relative frequency of the event resulting to a one dot face is:
968
¿
6000
¿ 0.1613 or 16.13%
Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 9
The following is the result of a hypothetical experiment where a balanced coin was tossed 30
times. Construct an absolute and relative frequency table.
H T T T H T H T T T
T H H H T T T T H H
H T H H T H T T T H
Parameter
A specific, well-defined characteristic of a population is known as a parameter of
that population. Parameters are numbers that summarize data for an entire population.
Statistics are numbers that summarize data from a sample, i.e. some subset of the entire
population. An example of this is that we are interested in learning about the average
weight of all middle-aged Filipino females. The population consists of all middle-aged
Filipino females and the parameter is their average weight.
Statistic
A specific characteristic of a sample is called a statistic of that sample. A statistic is
any summary number, like an average or percentage, that describes the sample. From our
example about parameter, if we are interested in learning about the average weight of
middle-aged Filipino females and we randomly selected 100 females, the 100 randomly
selected females is the sample set and their average is the statistic.
Example:
An entrepreneur wanted to open a new coffee shop in Paniqui. He wanted to determine the
average daily coffee intake of 21-year old and above residence of Paniqui. 50 interviewed
individual revealed that they have an average of 4 cups of coffee a day.
1. A researcher wants to estimate the average height of women aged 20 years or older. From a
Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 10
simple random sample of 45 women, the researcher obtains a sample mean height of 63.9
inches.
Population: _____________________________________________________________________________________________
Parameter: _____________________________________________________________________________________________
Sample: _____________________________________________________________________________________________
Statistic: _____________________________________________________________________________________________
2. A nutritionist wants to estimate the mean amount of sodium consumed by children under the
age of 10. From a random sample of 75 children under the age of 10, the nutritionist obtains a
sample mean of 2993 milligrams of sodium consumed.
Population: _____________________________________________________________________________________________
Parameter: _____________________________________________________________________________________________
Sample: _____________________________________________________________________________________________
Statistic: _____________________________________________________________________________________________
3. Nexium is a drug that can be used to reduce the acid produced by the body and heal damage
to the esophagus. A researcher wants to estimate the proportion of patients taking Nexium
that are healed within 8 weeks. A random sample of 224 patients suffering from acid reflux
disease is obtained, and 213 of those patients were healed after 8 weeks.
Population: _____________________________________________________________________________________________
Parameter: _____________________________________________________________________________________________
Sample: _____________________________________________________________________________________________
Statistic: _____________________________________________________________________________________________
4. A researcher wants to estimate the average farm size in Tarlac. From a simple random sample
of 40 farms, the researcher obtains a sample mean farm size of 731 acres.
Population: _____________________________________________________________________________________________
Parameter: _____________________________________________________________________________________________
Sample: _____________________________________________________________________________________________
Statistic: _____________________________________________________________________________________________
5. An education official wants to estimate the proportion of adults aged 18 or older who had
read at least one book during the previous year in the province of Tarlac. A random sample of
1006 adults aged 18 or older is obtained, and 835 of those adults had read at least one book
during the previous year.
Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 11
Population: _____________________________________________________________________________________________
Parameter: _____________________________________________________________________________________________
Sample: _____________________________________________________________________________________________
Statistic: _____________________________________________________________________________________________
6. The International Dairy Foods Association (IDFA) wants to estimate the average amount of
calcium Filipino male teenagers consume. From a random sample of 50 male teenagers, the
IDFA obtained a sample mean of 1081 milligrams of calcium consumed.
Population: _____________________________________________________________________________________________
Parameter: _____________________________________________________________________________________________
Sample: _____________________________________________________________________________________________
Statistic: _____________________________________________________________________________________________
7. A sociologist wants to the proportion of adults with children under the age of 18 that eat
dinner together 7 nights a week. A simple random sample of 1122 adults with children under
the age of 18 was obtained, and 337 of those adults reported eating dinner together with
their families 7 nights a week.
Population: _____________________________________________________________________________________________
Parameter: _____________________________________________________________________________________________
Sample: _____________________________________________________________________________________________
Statistic: _____________________________________________________________________________________________
8. A school administrator wants to estimate the mean score on the verbal portion of a
Standardized English Test for students whose first language is not English. From a simple
random sample of 20 students whose first language is not English, the administrator obtains
a sample mean verbal score of 458.
Population: _____________________________________________________________________________________________
Parameter: _____________________________________________________________________________________________
Sample: _____________________________________________________________________________________________
Statistic: _____________________________________________________________________________________________
9. A language teacher wanted to test the effectiveness of a new reading comprehension program
to kindergarten learners in the school districts of Paniqui. The mean score of 100 randomly
selected on the 50 items reading comprehension test is 27.52.
Population: _____________________________________________________________________________________________
Parameter: _____________________________________________________________________________________________
Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 12
Sample: _____________________________________________________________________________________________
Statistic: _____________________________________________________________________________________________
10. A nutritionist wanted to determine the average daily calorie intake of adolescent who are into
excessive mobile gaming. From a sample of 1000 adolescent it was revealed that their
average calorie intake is 1998 calories.
Population: _____________________________________________________________________________________________
Parameter: _____________________________________________________________________________________________
Sample: _____________________________________________________________________________________________
Statistic: _____________________________________________________________________________________________
SCALES OF MEASUREMENT
Data can be classified as being on one of four scales: nominal, ordinal, interval or ratio. Each
level of measurement has some important properties that are useful to know.
Nominal Scale
A nominal scale is a scale of measurement used to assign events or objects into
discrete categories. This form of scale does not require the use of numeric values or
categories ranked by class, but simply unique identifiers to label each distinct category.
Often regarded as the most basic form of measurement, nominal scales are used to
categorize and analyze data in many disciplines. Examples of nominal scale are sex (male or
female), eye color (black, brown, green, blue, etc.) and religious affiliation.
Ordinal Scale
Ordinal data is quantitative data which have naturally occurring orders and the
difference between is unknown. With ordinal scale, the order of the values is what’s
important and significant, but the differences between each one is not really known.
“Ordinal” indicates “order”. It can be named, grouped and also ranked. Ordinal scales are
typically measures of non-numeric concepts like satisfaction, happiness, discomfort, etc. An
example is that when you were asked how satisfied are you with your mobile device you
can answer “very satisfied”, “satisfied”, “so-so”, “unsatisfied” or “very unsatisfied”.
Interval Scale
The interval scale is a quantitative measurement scale where there is order, the
difference between the two variables is meaningful and equal, and the presence of zero is
arbitrary. It measures variables that exist along a common scale at equal intervals. The
measures used to calculate the distance between the variables are highly reliable.
Interval scales are numeric scales in which we know both the order and the exact
differences between the values. The classic example of an interval scale is Celsius
temperature because the difference between each value is the same. For example, the
difference between 60 and 50 degrees is a measurable 10 degrees, as is the difference
between 80 and 70 degrees.
Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 13
Ordinal Scale
Ratio scale is a type of variable measurement scale which is quantitative in nature. A
ratio scale is the most informative scale as it tends to tell about the order and number of the
object between the values of the scale. The most common examples of ratio scale are height,
money, age, weight etc.
Determine the level of measurement of each of the following data by writing nominal, ordinal,
interval, or ratio.
Post-Assessment
Answer the post-assessment posted on Google Classroom. If you are unable to
connect to the Google Classroom you can access the Post-Assessment by using this
Marvin Y. Arce
All Rights Reserved
2020
Advance Statistics: Self-Learning Module for College Students 14
link https://bre.is/brBJ4NYw or by scanning the QR Code presented. You can access the Post-
Assessment using your laptop or your mobile devices.
References
Spiegel M.R., Schiller J., Alu Srinivisan R. (2009). Schaum’s Outline: Probability and Statistics. The
McGraw-Hill Companies, Inc.
Marvin Y. Arce
All Rights Reserved
2020