0% found this document useful (0 votes)
145 views7 pages

Understanding Data and Variables in Statistics

1. Statistics involves collecting, organizing, summarizing, analyzing, and drawing conclusions from data. Data are values that variables can assume, and variables are characteristics that can differ. 2. There are different types of variables (quantitative, qualitative, discrete, continuous) and levels of measurement (nominal, ordinal, interval, ratio). Data can be collected from populations or samples using various sampling methods. 3. Organizing data involves constructing frequency distributions to group values into classes. Presenting data uses charts and graphs like histograms, frequency polygons, and ogives to visualize patterns in the data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
145 views7 pages

Understanding Data and Variables in Statistics

1. Statistics involves collecting, organizing, summarizing, analyzing, and drawing conclusions from data. Data are values that variables can assume, and variables are characteristics that can differ. 2. There are different types of variables (quantitative, qualitative, discrete, continuous) and levels of measurement (nominal, ordinal, interval, ratio). Data can be collected from populations or samples using various sampling methods. 3. Organizing data involves constructing frequency distributions to group values into classes. Presenting data uses charts and graphs like histograms, frequency polygons, and ogives to visualize patterns in the data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

STATISTICS FOR RESEARCH

DATA AND VARIABLES

1 Statistics is the science of conducting studies to collect, organize, summarize, analyze, and draw
conclusions from data

2 Data are the values, either measurements or observations, that variables can assume

3 A variable is a characteristic or attribute that can assume different values

4 If the values of a variable are obtained by chance, it is called a random variable

5 If data is numerical, such the age of people for instance, it is called a Quantitative Variable. Being
numerical, these variables can be ordered and ranked

6 On the other hand, when variables take the value of some characteristic or attribute, such as
gender, which is not numerical, these variables are called Qualitative Variables.

7 Discrete variables are Quantitative variables that can be counted such as 6 children, 10 students
(there is no 0.5 child or .025 student)

8 Continuous Variables are also Quantitative Variables that can assume all values between any two
specific values. For instance, temperature which can have values between, say 0 and 100, or -5
and 200.

9 When the values of a Continuous Variable are rounded off to the nearest whole number, and
then grouped, the lower and upper boundaries of the grouping are given one additional decimal
place and always end with the digit 5.
For instance, the numerical data “71” belongs to the group whose boundaries are 70.5 (minus
0.5)and 71.5 (plus 0.5); 62.1 belongs to the group with boundaries from 62.05 (minus 0.05) and
62.15 (plus 0.05).

10 Data can also be classified according to how they are categorized, counted or measured using
measurement scales. For instance eggs can be ranked as small, medium, large; a community can
be classified as rural, suburban, or urban.

11 There are four (4) types of measurement scales: nominal, ordinal, interval, and ratio

12 Nominal level of measurement classifies data in non-overlapping categories which cannot be


ordered or ranked. For example Male or Female; Christian, Jew, Hindu; Note that even if data are
numbers but there is no meaningful rank or order that can be assigned to them, the level of
measurement is still nominal. Postal Zip codes are examples numbers that are nominal.
13 Ordinal level of measurement allows data to be categorized. These categories can be ordered or
ranked. For examples, grades can be ranked as A, B, C, D, E and F; Candidates can be ranked a
first, second, third. Note that precise differences between the ranks do not increase.

14 Interval level of measurement differs from the ordinal level when precise difference between
units exists, but there is no real zero. For example, temperature; Zero (0) temperature does not
mean there is no heat.

15 Ratio level of measurement possesses all the characteristics of Interval measurement but there is
a real zero. Examples are height, weight, area etc.

16 Sometimes data can be altered so that these can fit into a different level of measurement. For
instance the income of contractors. Although it is a ratio variable, when grouped as low, average,
and high, income becomes and ordinal variable.

17 Data may be obtained from a Population or a Sample.

18 Population consists of ALL subjects/objects being studied

19 Sample is a group of subject/objects selected from Population

20 Descriptive Statistics consists of the collection, organization, summarization, and presentation of


data used to describe a situation.

21 Inferential Statistics uses probability, i.e., the chance of something occurring, to determine how
certain things or events will turn out. It therefore consists of generalizing situations or conditions
for a population based on the samples, performing estimations and testing of hypothesis,
determining relationships among variables, and making predictions

DATA COLLECTION

1 Data are obtained by surveys from respondents, survey of records, or direct observation

2 Surveys can be done by Telephone survey, Mailed questionnaire, or Personal interview survey.

3 It is more convenient and less costly to obtain data from samples rather than from a population

4 Samples must not be biased, meaning, all subjects of the population must be represented in the
sample or have equal chances of being selected from the population. Otherwise data obtained
may be misleading.

5 The four methods used for sampling are: random, systematic, stratified, and cluster

6 Random sampling obtains samples by chance. For instance, subjects in a population are assigned
numbers. Numbers randomly generated by a calculator or computer are used as basis for
selecting the sample members. A table of random numbers can also be used. If random numbers
are not available, “fishbowl” technique is used. In the latter, the assigned numbers of the
subjects, or sometimes the name of the subjects, are written in a piece of paper, placed in a
container then drawn-out randomly. Subjects drawn-out make-up the sample

7 Systematic Sampling obtains samples following a certain order:


• All subjects of the population are numbered;
• a certain interval or multiple is identified;
• then samples are identified every interval.

Note however that in this sampling technique, the first number is selected by random.

8 Stratified Sampling proceeds by grouping the member of the population according to certain
characteristic important to the study. For instance, if the year level of students is a concern,
students can be group according to year level. Samples from each group will then be selected
afterwards.

9 Cluster Sampling obtains data from intact groups. For instance, if a study on medical patients is to
be done and there are several hospital, any of the hospitals can be selected randomly and the
patients in the selected hospitals are the subjects. This method of sampling is usually used when
the subjects are spread over a wide geographical area or when the population is very large.

ORGANIZING and PRESENTING DATA

1 Raw data must be organized in a meaningful way so as to be able to draw conclusions, or make
inferences about events. The most convenient method of organizing data is to construct a
frequency distribution.

2 After organizing data, these must be presented in a way that can be understood by those who
will benefit from them. The most common method of presentation is by constructing statistical
charts and graphs.

3 A frequency distribution is the organization of raw data in table form, using classes and
frequencies

4 The two types of frequency distribution are categorical and grouped

5 Categorical frequency distribution is used for data that can be placed in specific categories, such
as nominal or ordinal data. Examples of categories are religious affiliations, political affiliation,
major fields of study or course, profession, or even brand names.

For example, the following frequency distribution table shows what paint brand is preferred by
contractors
Paint Brand Tally frequency Percent, %
Boysen 5 25
Davis 3 15
Coat Saver 8 40
Rain or Shine 4 20
Total 20 100
6 When the range of data is large, the data must be grouped into classes that are more than one
unit in width.

To refresh your memory, study the example below and try to recall how the frequency
distribution was created.

How were the Class Limits established/determined?


How are the Class Boundaries obtained?

Example:
The following data represent the tensile strengths, in MPa, of various metals:
Notice that 100 and 134 are the lowest and highest data, respectively.

(Raw Data)
112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114

Grouped Frequency Distribution


Class Cumulative
Class Limits Tally Frequency
Boundaries Frequency
100-104 99.5-104.5 2 2
105-109 104.5-109.5 8 10
110-114 109.5-114.5 18 28
115-119 114.5-119.5 13 41
120-124 119.5-124.5 7 48
125-129 124.5-129.5 1 49
130-134 129.5-134.5 1 50

7 The three most commonly used graphs in research are:


• Histograms
• Frequency Polygon
• Cumulative frequency graph, also called Ogive (pronounced o-jive)

8 A histogram is a graph that displays data using continuous vertical bars of various heights to
represent the frequencies of the classes
9 A Frequency Polygon is a graph that displays data by using lines that connect plotted frequencies
at the midpoint of the classes. The frequencies are represented by the heights of the points

10 An Ogive is a graph that represent the cumulative frequencies of the classes in a frequency
distribution
11 When a histogram or frequency polygon is constructed, one would notice that the data will
follow certain patterns, a distribution shape. The shape may indicate that the there is one peak,
or two peaks. It would also show whether it is skewed to the Left (values taper of the left) or
skewed to the right (values taper off to the right). A U-shaped distribution can also occur.

12 It is important to note the distribution shape as it will determine the appropriate statistical
method to analyze data.

13 There are also other graphs such as the Pareto Chart, Time Series Graph, and the Pie Graph

14 A Pareto Chart is used to represent a frequency distribution for categorical variable and the
frequencies are displayed by the heights of vertical bars which are arranged in order from the
highest to the lowest

15 A Time series Graph represents data that occur over a specific period of time. Say for instance
rainfall distribution over several hours.

16 A Pie Graph is a circle that is divided into sections or sectors according to the percentage of
frequencies in each category of distribution.

17 Procedure for constructing a Grouped Frequency Distribution

a. Establish Class Width


Width = Range / number of classes
Where: Range = Highest value – Lowest value

Round up (not round-off) the width to a whole number


Number of classes is arbitrarily chosen between 5 to 20
There is no hard fast rule on the number of classes this but there should be enough such that
a clear picture of the data can be obtained.

It is preferred that the Class width be an ODD number to ensure that the Class midpoint has
the same place value as the data.
Where: Midpoint = (LCB + UCB)/2
OR
Midpoint = (LCL + UCL)/2

LCB = Lower Class Boundary


UCB = Upper Class Boundary

Example: in the class boundary “99.5-104.5”, LCB =99.5 and UCB = 104.5

LCL = Lower Class Limit


UCL = Upper Class Limit
Example: in the class limit “100-104” , LCL =100 and UCL = 104
b. Select a starting point for the Lower Class Limit
Starting point may be the lowest data value or lower

c. The starting point for the Upper Class Limit is obtained by:
Starting UCL = starting LCL + (width-1)

d. Establish all other Class Limits


To get the next Lower Limit add the width to the previous Lower Limit
To get the next Upper Limit add the width to the previous Upper Limit

Note that the Class Limits should have the same decimal place value as the raw data.

e. Establish Class Boundaries

Class Boundaries should have one more decimal place than the Class Limit and end with “5”.

The Lower class boundary is one decimal place lower than the Lower Class Limit, ending in
“5”, while the Upper Class Boundary is one decimal place higher than the Upper Class Limit,
also ending in “5”

Following the above if data are whole numbers, subtract or add 0.5 in Lower and Upper
Class Limits to obtain Lower and Upper Class Boundaries respectively;
if data are in tenths, subtract/add 0.05; in hundredths, 0.005

f. Tally to obtain the frequency of data

g. Cumulative Frequency
1st CF = frequency + 0, meaning 1st CF is the frequency itself
2nd CF = frequency + Previous CF
3rd CF = frequency + Previous CF
etc.

See example in item 6 under Organizing and Presenting Data

You might also like