Introduction To Statistics: Objectives
Introduction To Statistics: Objectives
Introduction to Statistics
INTRODUCTION
Welcome to the world of statistics! You are about to encounter numbers, tables, names, graphs,
probabilities, and trends –in other words, all about statistics.
The module will teach you what descriptive statistics is all about. Statistics is an orderly science;
hence it can be understood easily. A conceptual understanding of the statistical procedures used in
nursing as well as the computational skills to carry out these procedures is given in this module. At the
end of the module, some activities and exercises are given. Please do the activities and answer the
questions because they will enhance your mastery of the lesson. Approach this module with an open
and positive mind. You will like statistics because it is a very useful course.
OBJECTIVES
Statistics is the science of data. It is meaningful and useful science whose broad scope of application
to nursing and other health sciences, to government, to business and other physical and
biopsychosocial sciences is limitless. What about you, what comes to mind when you think of
statistics? Does it bring into your mind unemployment figures, election returns, or basketball scores?
Or is it simply a graduate course requirement you have to complete?
Statistics is logical. It has a key role in critical thinking in the classroom, in the hospital, on the job, or
in everyday life. Thus, the time you spend in studying the subject will repay you in many ways later.
Each of us has a built-in system of reference that helps us make decisions. One definite we also have a
built-in set of prejudices that may affect our decisions. One definite advantage of statistics is that it can
help us make decisions without prejudice. Moreover, statistics can be used for making decisions when
faced with uncertainties. For example, suppose you want to estimate the proportion of how many
among the nurses enrolled in this course will finish the course on time, you would need statistics to
predict the number of these who will finish versus those who will not.
The general prerequisite for statistical decision-making is the gathering of numerical facts or
information. Procedures for evaluating numerical data, together with rules of inference, are prime
topics in the study of statistics.
In this line of term, statistics are trained in collecting, evaluating, and drawing conclusions from
numerical information. More importantly, statisticians determine what information is relevant in
giving problem and whether the conclusions drawn from the study are to be trusted.
Statistical methods by themselves have no power to work miracles; however, these methods can help
us make some decisions. Furthermore, the statistical results should be interpreted by one who
understands not only the methods but also the subject matter, especially the conceptual or theoretical
framework to which statistics have been applied.
Thus, statistics is the science of data that involves collecting, classifying, summarizing, organizing,
analyzing, and interpreting numerical information or data.
Statistical methods are useful for studying, analyzing, and learning about population. A population is
a set of units / such as people, objects, transactions, or events, that we are interested in studying. For
example, populations may include:
1. People
1.1 all Filipino women working in foreign countries
1.2 all registered nurses in the Philippines
1.3 everyone who is enrolled in nursing in the WCC Antipolo.
2. Objects
2.1 all theses and dissertations done in 1998
2.2 all stores selling Filipino products
2.3 all shoes manufactured in Marikina
3. transactions
3.1 all memos of agreement signed by the WCC Antipolo administration in 1998
3.2 all sales of Jollibee foods delivered to the WCC College of Nursing from Antipolo
branch in January-February 1999
3.3 all promotions of the WCC Antipolo faculty in 1997
4. events
4.1 all victims of fireworks accidents brought to PGH emergency room in December 1998
and January 1999
4.2 all birthday celebrations of graduating students in April 1999
4.3 all births registered at all Manila hospitals on February 14, 1999
In the above examples, you will notice that each set includes all the units in the population.
According to McClane and Sincich (1997), it is possible to measure a characteristic for every unit in
the population if the population you wish to study is small. For example, if you are measuring the
high school GPA of all incoming first year students at WCC Antipolo, it is feasible to obtain these
data. When we measure a characteristic for every unit of a population, the result is a census of the
population.
Oftentimes it is not feasible to study the entire population. For instance, how would you measure the
weight and height of each 5 year old boy in the Philippines? For such a population conducting a
census would be prohibitively time consuming and very costly. A reasonable alternative is to select
and study a subset or a portion of the population.
A sample is a subset of a population. It is a finite number of units selected from the population. Thus,
sample is simply a part of the population. But not every sample is a representative of a population. To
be a representative, that sample must be selected randomly. A random sample is determined
completely by chance. According to Brase and Brase (1983) in a simple random sampling every
number or units of the population has an equal probability or chance of being included in the sample.
For example, instead of polling all 139,000 registered nurses in the Philippines regarding who they
voted for during the 1998 presidential election, a pollster can just randomly select a sample of 1,000
registered nurses to represent all the registered nurses in the Philippines.
In studying a population, we focus on one or more characteristics or properties of the units in the
population. Such characteristics are called variables.
Example 1
A PhD student in Nursing investigated the number of children per household in Quezon City.
A sample of 500 households in Quezon City was randomly selected to determine the number of
children per family.
a. Describe the population
b. Describe the sample
c. Describe the variable of interest
Solution
“Cola wars” is the popular term for the intense competition between Coca Cola and
Pepsi Cola displayed in their marketing campaigns. Their campaigns have featured movie and
television stars, rock videos, athletic endorsements, and claims of consumer preference based
on taste tests. Suppose, as part of a Pepsi marketing campaign, 1,000 cola consumers are
given a blind taste test (i.e. a taste test in which the two brand names are disguised). Each
consumer is asked to state a preference between brand A or brand B. the total number of
children per household is the variable of interest.
a. Describe the population
b. Describe the sample
c. Describe the variable of interest
Solution
a. The population of interest is the collection or set of all customers.
b. The sample is the 1,000 consumers selected from the population of all cola
consumers.
c. The characteristic that Pepsi wants to measure is the consumer’s cola preference.
1.2.3 Measurement
Statistics can be applied in the analysis of a variable the variable can be represented numerically. We
do this through the process of measurement. Measurement is the process we use to assign numbers
to variables of individual population units. For example, we can measure the teaching performance of
a faculty member by asking all his/her students to rate his/her performance on a scale from 1to 10. Or,
we can measure research assistant’s age by simply asking them their actual age. To gather data for a
variable we can use either quantitative measurements or qualitative measurements.
Quantitative measurements use a naturally occurring numerical scale to describe the size of a
particular data.
Examples:
1. The temperature (in degrees Celsius) at which 20 pieces of heat-resistant plastic begin to
melt.
2. The current unemployment rate (measured as a percentage) for each province and city of
the Philippines.
3. The scores of a sample of 150 NMAT medical students applicants administered
nationwide.
4. The successful master’s graduate students who finished the degree over a ten-year period.
Examples:
1. The political party affiliation (Lakes NUCD, Laban, Peoples’ Party, Masang Makabayan,
or Independent) of 100voters from Parañaque.
2. The academic status (pass or fail) on the comprehensive exam of 20 doctoral students.
3. The size of the refrigerators (big, medium, small) rented by each of a sample of 30 transient
boarders.
4. A taste taster’s ranking (best, worst, average) of four brands of salad dressing for a panel of
10 testers.
After the variables of interest for every unit in the sample or population are measured, the data are
analyzed either by descriptive or inferential statistical methods.
Descriptive statistics utilizes numerical and graphical methods to look for patterns in a data set, to
summarize the information in a convenient form.
Inferential statistics utilizes sample data to make estimate, decisions, predictions, or other
generalizations about a population. In this unit, we will only focus on descriptive statistics.
Let us now pause for some activities and exercises. Compare your responses with the answers given at
the end of this module. Do not skip these exercise questions; they are important.
As evidenced by media today, there is a need to evaluate the flood of information reaching our
homes. Each day the media present us with published results on economic, health, social and other
concerns. The growth in data collection associated with scientific phenomena, business operations,
and government activities (quality control, statistical auditing, forecasting, etc.) has been remarkable
in the 1990’s. This scenario demands from each one of us to develop a discerning sense – an ability to
use rational thought to interpret the meaning of data. This ability can help us make intelligent
decisions, inferences, and generalizations to think critically. This is possible with the use of statistics.
Statistical thinking involves applying rational thought to assess data and the inferences made from
them critically.
Are you still with me? Let us pause and do some activities.
1.4 SUMMATION NOTATION
In statistics, it is necessary to work with sums of numerical values. To express these, we make use of
standard notation. Let us consider the exam scores of Bertha Pila on 9 statistics exams.
In mathematical notation, letter X denotes a score in a data set. From Bertha’s scores, we have the
following data:
X1 = score on Exam 1 = 88
X2 = score on Exam 2 = 6
X3 = score on Exam 3 = 46
X4 = score on Exam 4 = 55
X5 = score on Exam 5 = 28
X6 = score on Exam 6 = 9
X7 = score on Exam 7 = 78
X8 = score on Exam 8 = 64
X9 = score on Exam 9 = 16
The numbers 1-9 written beside the Xs are called subscripts. They represent the first to the 9 th
observed score in a given data set. In this case, X 1 represents Bertha’s score on the first exam while X 9
represents her score on the ninth exam. In general, X I denotes the ith value in a data set. Using this
notation, the sum of Bertha’s exam scores can be expressed symbolically as:
X1 + X2 + X3 + X 4 + X5 + X6 + X7 + X 8 + X 9
But instead of writing down all this Xs, we can simply express this equation as, where
9
symbol ∑ ❑(Greek capital letter “sigma”) is the summation notation used in statistics.
∑ X
Thus,
i=1
to get the sum of the first, second, third, and ninth values.
In statistics, we always compute for the total sum and not for the partial sum, and so can be further
9 simplified to ∑ X which means “summation of all the scores” in a data set.
∑X
i=1
∑X = X1 + X2 + X3 + X 4 + X5 + X6 + X7 + X 8 + X 9
i=1
= 88 + 6 + 46 + 55 + 28 + 9 + 78 + 64 + 16
= 390
∑ XY =∑ X ∑ Y
32= (6)(15)
32 ≠ 90
Therefore, ∑ XY ≠ ∑ X ∑ Y
∑ (X +C ) = ∑ X +C
36 = 21 + 5
36 ≠ 26
Therefore, ∑ (X +C ) ≠ ∑ X +C
2
Rule 3:¿ ¿ is not equal to ∑ X
Example: X X2
2 4
4 16
6 36
2
∑ X = 12 ∑ X =56
Steps:
multiply each X value by itself
2
get ∑ X + ∑ X
2 2
check if (∑ X ) = ∑ X
2 2
(∑ X ) = ∑ X
(12)2 = 56
(12) (12) = 56
144 ≠ 56
2 2
Therefore, (∑ X ) ≠ ∑ X
SUMMARY
In this module, we saw that statistics is the study of how to collect, organize, analyze and interpret
numerical information. We investigated some types of problem where statistics can be used. In these
situations, we saw examples of population and samples. It is important to remember that the main role
of inferential statistics is to draw conclusions about a population based on information obtained from a
sample. Whereas the main role of descriptive statistics is to prevent or summarize a large mass of data
into a manageable form. We also saw in this module, the elements of statistics and finally we see the
role of statistics in critical thinking. With all this, let us cultivate a liking for this course. We shall
learn more as we study the other modules. Keep up the good work of reading your modules. Statistics
is a skill, you will soon have it.
2
Frequency Distributions
INTRODUCTION
The initial step in the descriptive process that is, describing the data and the cases that are presented by
those data, is the organization of otherwise disorganized information and the condensation of
otherwise unmanageably large quantities of information.
The large mass of data may be organized by a creating a frequency distribution table containing the
following components: frequency, percentage, cumulative frequency, and cumulative percentage. This
module discusses first the ungrouped frequency distributions and later, the grouped.
OBJECTIVES
TABLE 2.1 Raw Scores on the Statistics Final Examination of Masters’ Students
81 94 90 80 87 80 85 95
83 92 87 70 96 76 87 89
86 79 75 83 84 75 81 81
81 84 70 78 96 94 88 78
80 77 93 87 77 78 79 72
Table 2.2 on the other hand, present another version of the data in table 2.1. Notice that the final
examination scores are now arranged in order from lowest to highest in the first column, labeled X.
frequencies are then listed in the second column labeled f , showing how many students received each
listed score. When data are organized this way, we can see at a glance that the scores ranged from a
low of 70 to a high of 96, or that four students had a score of 84 and another four had a score of 87.
Such presentation is called an ungrouped frequency distribution. Ungrouped frequency distributions
begin the process of organizing the data into a meaningful form. You can incorporate in the ungrouped
frequency distribution table columns for raw score (X), frequency (f), percentage (%), cumulative
frequency (cf), and cumulative percentage(c%).
2.1.1 Frequencies
To determine the frequencies of the scores in the data set, arrange first the raw scores in ascending or
descending order (as shown in Table 2.2). Finally, under the f column, indicate the number of times
each score appeared in the data set (see Table 2.1). Notice that the sum of all the frequency values (cf)
is equal to N or the total number of observations or scores in the data set.
TABLE 2.2 ungrouped Frequency Distribution of the Statistics final Examination Scores of 40
Master’s Students
X f % cf c%
96 2 5.0 40 2 100.0
95 1 2.5 38 3 95.0
94 2 5.0 37 5 92.5
93 1 2.5 35 6 87.5
92 1 2.5 34 7 85.0
91 0 0.0 33 7 82.5
90 1 2.5 33 8 82.5
89 1 2.5 32 9 80.0
88 1 2.5 31 10 77.5
87 4 10.0 30 14 75.0
86 1 2.5 26 15 65.0
85 1 2.5 25 16 62.5
84 2 5.0 24 18 60.0
83 2 5.0 22 20 55.0
82 0 0.0 20 20 50.0
81 4 10.0 20 24 50.0
80 3 7.5 16 27 40.0
79 2 5.0 13 29 32.5
78 3 7.5 11 32 27.5
77 2 5.0 8 34 20.0
76 1 2.5 6 35 15.0
75 2 5.0 5 37 12.5
74 0 0.0 3 37 7.5
73 0 0.0 3 37 7.5
72 1 2.5 3 38 7.5
71 0 0.0 2 38 5.0
70 2 5.0 2 40 5.0
E f = N = 40
The percentage associated with each score can be computed using this equation:
Percentage (%) = f
N x 100
Where f = each score’s frequency of occurrence
N = total number of scores in the distribution
Percentages have one advantage over frequencies. It is often easier to compare two or more
percentages than frequencies. This is particularly true in instances when 2 or more different
distributions have different sample sizes.
The cumulative percentage for any given score is computed using this equation:
C% = cf
N X 100
Where cf = the cumulative frequency listed for a score
N = total number of scores in the distribution
To construct a grouped frequency distribution for the data set in Table 2.1, do the following steps:
i = ____R_____ = 4.5 or 5
# of class intervals
4. Determine f, %, cf. c%
Table 2.3 Grouped frequency Distribution of Statistics Final Exam Scores of 40 Nursing Masters’
Students.
Class Interval f % cf c%
95-99 3 7.5 40 100.0
90-94 5 12.5 37 92.5
85-89 8 20.0 32 80.0
80-84 11 27.5 24 60.0
75-79 10 25.0 13 32.5
70-74 3 7.5 3 7.5
In comparing Table 2.2 with Table 2.3, it is shown that the grouped frequency distribution table has
class intervals while the ungrouped has one. Furthermore, grouped frequency distributions provide a
simpler, more economical description of the data than do the ungrouped frequency distributions. By
combining several scores into one class interval, grouped frequency distributions reduced the total
amount of information is that must be digested y someone in.
Again, take a look at the class intervals in Table 2.3. Each class interval is bounded by numbers called
real limits or exact limit. Thus, the lower and upper or exact limits. For each class interval, the lower
exact limits of the class interval 85-89 are 84.5 and 89.5, respectively. Furthermore, each class interval
can be represented by one value and that is the midpoint. A midpoint is the middle value in a class
interval 80-84, the midpoint is 82.
SUMMARY
This module showed you the importance of arranging data and presenting them in distribution tables
that show the frequency, percentage, cumulative frequency and cumulative frequency.
One application of a frequency distribution is that it can give us an idea of how many students
performed below a given passing score. It can give us the picture of how well or how badly a student
performed in a class relative to the scores of the other students.
In the succeeding modules, you will have more of this frequency distribution theme presented in
graphs, histograms, and other position measures. I wish to encourage you to go on – statistics is not
really hard because it is a science of order and logic.
So, until next time, keep on doing the activities because they will build your statistical skills.