Data Collection & Sampling Methods Sampling Methods
Introduction To Statistics: Part 2.
[Link]
sbanya@[Link]
University of Malawi, The Polytechnic
Monday 11th January, 2021
1 / 56
Data Collection & Sampling Methods Sampling Methods
Data Collection
Data Collection Strategy: No one best way: decision
depends on:
What you need to know: numbers or stories
Where the data reside: environment, files, people
Resources and time available
Complexity of the data to be collected
Frequency of data collection
Intended forms of data analysis
2 / 56
Data Collection & Sampling Methods Sampling Methods
Rules for Collecting Data.
Use multiple data collection methods
Use available data, but need to know
how the measures were defined
how the data were collected and cleaned
the extent of missing data
how accuracy of the data was ensured
3 / 56
Data Collection & Sampling Methods Sampling Methods
Rules for Collecting Data.
If must collect original data:
be sensitive to burden on others
pre-test, pre-test, pre-test
establish procedures and follow them (protocol)
maintain accurate records of definitions and coding
verify accuracy of coding, data input
4 / 56
Data Collection & Sampling Methods Sampling Methods
Data Collection Tools
Participatory Methods
Records and Secondary Data
Observation
Surveys and Interviews
Focus Groups
Diaries, Journals, Self-reported Checklists
Other Tools
5 / 56
Data Collection & Sampling Methods Sampling Methods
Participatory Methods
Involve groups or communities heavily in data collection
Examples:
community meetings
mapping
transect walks
6 / 56
Data Collection & Sampling Methods Sampling Methods
Community Meetings
One of the most common participatory methods
Must be well organized
agree on purpose
establish ground rules
who will speak
time allotted for speakers
format for questions and answers
7 / 56
Data Collection & Sampling Methods Sampling Methods
Records and Secondary data
Examples of sources:
files/records
computer data bases
industry or government reports
other reports or prior evaluations
census data and household survey data
electronic mailing lists and discussion groups
documents (budgets, organizational charts, policies and
procedures, maps, monitoring reports)
newspapers and television reports
8 / 56
Data Collection & Sampling Methods Sampling Methods
Using Existing Data Set
Key issues to consider: validity, reliability, accuracy, response
rates, data dictionaries, and missing data rates
9 / 56
Data Collection & Sampling Methods Sampling Methods
Advantages/Disadvantages
Advantages: Often less expensive and faster than
collecting the original data again
Disadvantage: There may be coding errors or other
problems. Data may not be exactly what is needed. You
may have difficulty getting access. You have to verify
validity and reliability of data
10 / 56
Data Collection & Sampling Methods Sampling Methods
Observation
One sees what is happening:
traffic patterns
land use patterns
layout of city and rural areas
quality of housing
condition of roads
conditions of buildings
who goes to a health clinic
11 / 56
Data Collection & Sampling Methods Sampling Methods
Observation is helpful when:
need direct information
trying to understand ongoing behavior
there is physical evidence, products, or outputs than can be
observed
need to provide alternative when other data collection is
unfeasible or inappropriate
12 / 56
Data Collection & Sampling Methods Sampling Methods
Ways to Record Information from Observations:
Observation guide: printed form with space to record
Recording sheet or checklist: Yes/no options; tallies, rating
scales
Field notes:least structured, recorded in narrative,
descriptive style
13 / 56
Data Collection & Sampling Methods Sampling Methods
Guidelines for Planning Observations
Have more than one observer, if feasible
Train observers so they observe the same things
Pilot test the observation data collection instrument
For less structured approach, have a few key questions in
mind
14 / 56
Data Collection & Sampling Methods Sampling Methods
Advantages/Disadvantages
Advantage: Collects data on actual vs. self- reported
behavior or perceptions. It is real-time vs. retrospective
Disadvantage: Observer bias, potentially unreliable;
interpretation and coding challenges; sampling can be a
problem; can be labor intensive; low response rates.
15 / 56
Data Collection & Sampling Methods Sampling Methods
Surveys and Interviews
Excellent for asking people about: perceptions, opinions,
ideas
Less accurate for measuring behavior
Sample should be representative of the whole
Big problem with response rates
16 / 56
Data Collection & Sampling Methods Sampling Methods
Modes of Survey
Telephone surveys
Self-administered questionnaires distributed by mail,
e-mail, or websites
Administered questionnaires, common in the development
context
In development context, often issues of language and
translation
17 / 56
Data Collection & Sampling Methods Sampling Methods
Advantage/Disadvantage
Advantage: Best when you want to know what people
think, believe, or perceive, only them can tell you that.
Disadvantage:People may not accurately recall their
behavior or may be reluctant to reveal their behavior if it is
illegal or stigmatized. What people think they do or say
they do is not always the same as what they actually do.
18 / 56
Data Collection & Sampling Methods Sampling Methods
Interviews.
Often semi-structured
Used to explore complex issues in depth
Forgiving of mistakes: unclear questions can be clarified
during the interview and changed for subsequent interviews
Can provide evaluators with an intuitive sense of the
situation
19 / 56
Data Collection & Sampling Methods Sampling Methods
Challenges of Interviews.
Can be expensive, labor intensive, and time consuming
Selective hearing on the part of the interviewer may miss
information that does not conform to pre-existing beliefs
Cultural sensitivity: e.g., gender issues
20 / 56
Data Collection & Sampling Methods Sampling Methods
Focus Group
Type of qualitative research where small homogeneous
groups of people are brought together to informally discuss
specific topics under the guidance of a moderator
Purpose: to identify issues and themes, not just interesting
information, and not ”counts”
21 / 56
Data Collection & Sampling Methods Sampling Methods
Focus Groups are Inappropriate when:
language barriers are insurmountable
evaluator has little control over the situation
trust cannot be established
free expression cannot be ensured
confidentiality cannot be assured
22 / 56
Data Collection & Sampling Methods Sampling Methods
Advantage/Disadvantage
Advantage: Can be conducted relatively quickly and
easily; may take less staff time than in-depth, in-person
interviews; allow flexibility to make changes in process and
questions; can explore different perspectives; can be fun.
Disadvantage: Analysis is time consuming; participants
not be representative of population, possibly biasing the
data; group may be influenced by moderator or dominant
group members.
23 / 56
Data Collection & Sampling Methods Sampling Methods
The Population
There are two different types of population:
Target Population: Consists of the group of population
units from whom we would like to collect data (e.g. all
students in the Unima)
Study or Survey Population: Consists of the group of
population units from whom we can collect data (e.g. all
students in UNIMA with laptops)
24 / 56
Data Collection & Sampling Methods Sampling Methods
The Population
NOTE: Ideally a sample survey should have collected data from
Target Population but in practice, we collect data from Study
Population due to some constraints.
25 / 56
Data Collection & Sampling Methods Sampling Methods
The Sample
A sample must be:
Unbiased: The chosen sample should be representative of
the entire population of interest. E.g. if we are interested
in the weight of primary school children, we should select a
sample that includes children from a range of primary
school classes and year groups.
Taken from the collect population: The sample should
only contain members of the population of interest. E.g. if
we are interested in the characteristics of primary school
children, the sample should not contain children from
secondary school.
26 / 56
Data Collection & Sampling Methods Sampling Methods
Sampling Methods
Grouped into two categories:
Non-Probability Sampling: Involves non-random
selection based on convenience or other criteria, allowing
you to easily collect initial data.
Probability Sampling: Involves random selection,
allowing you to make statistical inferences about the whole
group.
27 / 56
Data Collection & Sampling Methods Sampling Methods
Non-Probability Sampling
Has the following characteristics:
No sampling frame is used, therefore the chance of someone
being included in the sample cannot be calculated.
Results from the survey can be produced cheaply and
quickly.
Population coverage is poor since it only captures those
that are available to contribute at the time and/or are
interested enough in the subject under investigation;
It is difficult to make estimates of the population from the
sample results and any generalizations that are made must
be treated with caution.
Performing non-probability sampling is considerably less
expensive than probability sampling methods.
28 / 56
Data Collection & Sampling Methods Sampling Methods
Types of Non-probability Sampling
Convenience Sampling: Data is collected from any
willing and available respondent. Examples include
Street corner interviews;
Magazine and newspaper questionnaires; and
Phone-in polls.
The sample is likely to be unrepresentative of the
population, because only those who feel strongly about the
topic are likely to respond and interviewers may only
approach one particular type of respondent, usually those
that they feel comfortable with. Therefore, the results of
the survey may be biased.
29 / 56
Data Collection & Sampling Methods Sampling Methods
Types of Non-probability Sampling
Purposive Sampling:
Read on Purposive Sampling and write down what it
is,when to use it, advantages and disadvantages.
30 / 56
Data Collection & Sampling Methods Sampling Methods
Types of Non-probability Sampling
Quota Sampling: The population is divided into different
groups or classes according to different characteristics of
the population, and some percentage(proportion) of the
different groups in total population is fixed
In Quota sampling, researchers create a sample involving
individuals that represent a population.
Researchers choose these individuals according to specific
traits or qualities.
Quotas are devised to reflect the characteristics of the
population, hence quota sampling attempts to obtain a
more representative sample than convenience sampling, and
therefore more representative sample results should be
obtained.
31 / 56
Data Collection & Sampling Methods Sampling Methods
Quota Sampling Example & Steps
A study to investigate the proportion of those who eat Pizza
and Cake at home.
Steps
Divide the group into subgroups of some characteristics
Identify proportion of these subgroups in the population.
i.e. N = 10, 4 cakes and 6 pizza
Lastly, select subjects to form sample group: i.e. 50% cakes
(n = 2) and 50% pizza (n = 3), hence total sample n = 5
32 / 56
Data Collection & Sampling Methods Sampling Methods
Advantages & Disadvantages of Non-probability
Sampling
Advantages:
Non-probability sampling techniques are a more conducive
and practical method for researchers deploying surveys in
the real world.
Getting responses using non-probability sampling is
faster(time effective) and more cost-effective than
probability sampling because the sample is known to the
researcher. The respondents respond quickly as compared
to people randomly selected as they have a high motivation
level to participate.
Effective when it is unfeasible or impractical to conduct
probability sampling.
33 / 56
Data Collection & Sampling Methods Sampling Methods
Advantages & Disadvantages of Non-probability
Sampling
Disadvantages:
Lower level of generalization of research findings compared
to probability sampling
Difficulties in estimating sampling variability and
identifying possible bias
34 / 56
Data Collection & Sampling Methods Sampling Methods
Probability Sampling
All members of the study population have known probability of
being included in the sample
Has the following characteristics:
Use a sampling frame from which to select a sample
Select samples at random from the sampling frame.
Therefore every item on the sampling frame has a chance
of being selected and the probability of selection can be
calculated
Select a sample that is more representative of the
population (than non-probability methods) and
Researchers can calculate the accuracy of the survey
estimates
35 / 56
Data Collection & Sampling Methods Sampling Methods
Example Questions
1 What is the distribution of household sizes in Mulanje
district?
2 What proportion of children aged 6 and attending standard
1 in Mangochi sleep under a mosquito net?
3 What is the distribution of ages of University students in
Malawi?
36 / 56
Data Collection & Sampling Methods Sampling Methods
Some Important terms
Target population: Total population about which
information is required, e.g all University students at time
of study
Study population: The set of individuals from which
individuals to be studied will be selected, e.g all those
attending classes during the study period (when data
collection takes place)
Often these are identical or very similar. But not always
37 / 56
Data Collection & Sampling Methods Sampling Methods
Some Important terms Cont...
Population characteristic: The aspect(s) of the
population to be studied, e.g mean age, proportion of
babies who sleep under a net
Sampling units: The persons or groupings used to select
sample members, e.g households
Sampling frame: Set of sampling units, e.g schools in a
village
List: A real list of units in the sampling frame
38 / 56
Data Collection & Sampling Methods Sampling Methods
Some notation
Population size: N
Sample size: n
n
Sampling fraction: f = N
39 / 56
Data Collection & Sampling Methods Sampling Methods
Probability Sampling Methods
1 Simple Random sample
2 Systematic sample
3 Stratified sample
4 Cluster sample
5 Multi-stage sample
40 / 56
Data Collection & Sampling Methods Sampling Methods
Simple Random sample (SRS)
Each and every member of the study population has the
same chance of being selected into the sample.
The chance is equal to the sampling fraction (f) where
n
f=N .
Requirements:
A list of all members of the sampling frame
Possible methods:
Pieces of paper in a hat / drum
Random digit tables
Use random digit methods in a software package
41 / 56
Data Collection & Sampling Methods Sampling Methods
Replacement
Sample without replacement - once selected a sampling
unit cannot be drawn again
Sample with replacement - after being selected a sampling
unit can still be drawn again (same chance each time)
42 / 56
Data Collection & Sampling Methods Sampling Methods
Simple Random Sample (WITHOUT Replacement)
Step 1: List the N subjects in the study population. This is
the list of the sampling frame.
Step 2: Number entries in the listing from 1 to N
Step 3: Select n random numbers between 1 and N
Step 4: Use the list of the sampling frame to identify each
individual corresponding to the ID numbers selected
Step 5: Locate each and seek their consent to participate
in the survey
43 / 56
Data Collection & Sampling Methods Sampling Methods
Selecting n random numbers using Excel
Use function: RANDBETWEEN(1, N )
Repeat at least n times
Example
Select a SRS of 30 subjects from a population of 500
N = 500
n = 30
44 / 56
Data Collection & Sampling Methods Sampling Methods
Stratified random sampling
Stratification is the process of grouping the units within a
population of interest into homogeneous sub-groups called
strata
All strata should be mutually exclusive, that is that every
unit within the population of interest can only be assigned
to one strata.
Collectively the strata should also be exhaustive so that all
units are covered by one of the strata
45 / 56
Data Collection & Sampling Methods Sampling Methods
Stratified random sampling cont...
A stratified random sample can be chosen by following the steps
below:
Divide the population into groups called strata: The
population should be split into groups according to some
characteristic that is related to the subject of the survey
A sample is selected from within each stratum using SRS
method. We determine the number of units to be selected
from each strata using an allocation method. The methods
of allocation that such as equal, proportional or optimal
allocation.
The samples for each stratum are collated to form the total
sample of the population. This ensures that each stratum
is represented in the sample.
46 / 56
Data Collection & Sampling Methods Sampling Methods
Allocating the Sample among the Strata
Once we have split our population into strata, we need to
work out how many units to sample from each stratum.
There are three methods of allocating a sample of size n
among the different strata - equal allocation, proportional
allocation and optimal
47 / 56
Data Collection & Sampling Methods Sampling Methods
Advantages
1 The results of stratified random samples tend to be more
accurate (have lower variance) since the grouping together
of similar units controls for the variation within strata.
2 The sample obtained through stratification is more
representative of the population
3 Stratification also permits separate analyses on each group,
which researchers may find useful
48 / 56
Data Collection & Sampling Methods Sampling Methods
Disadvantages
1 This method is more costly and difficult to organize, since
it involves splitting the population into different strata and
taking a sample from each stratum
2 There is a danger of splitting the population into too many
small strata. This may mean that some of the strata may
not contain any sample members or the sample may not be
large enough to be spread across all of the strata
3 Sometimes there may be more than one variable that the
survey needs to be stratified by
49 / 56
Data Collection & Sampling Methods Sampling Methods
Systematic Random Sampling
Systematic random sampling
Use the anticipated population size and planned sample
size to determine the sampling fraction f to be used
Determine a sequence in which sampling units are added to
the list, eg entry in a register, order on a route
1
Determine the sampling interval k = f
Randomly select a number between 1 and k
Select this sampling unit
Then select every k ∗ th sampling unit thereafter
50 / 56
Data Collection & Sampling Methods Sampling Methods
Example
Target population: Patients attending the Out Patient
Department (OPD) at QECH
Number of patients expected in study period = 20, 000
Sample size = 200
Sampling fraction f = 1/100; k = 1/f = 100
Select a random number between 1 and 100, say 42
Approach 42nd patient, then 142nd , 242nd etc.
51 / 56
Data Collection & Sampling Methods Sampling Methods
Cluster Random Sampling
Cluster sampling
Used members of the study population are naturally in
groups, called clusters,
e.g villages for residence,
schools for education,
health center catchment areas for health care e.t.c.
Obtain a simple random sample of clusters
Sample members from the selected clusters only
May select only a sub-set of them
52 / 56
Data Collection & Sampling Methods Sampling Methods
Cluster Sampling Example
What proportion of standard 1 students sleep under a mosquito
net in Mangochi district?
Study population: Standard 1 students aged 6 in Mangochi
district
Population size: approximately 3,000
Number of schools = 54
Randomly select 7 schools and obtain data for every
standard 1 student in the chosen schools
7
Final sample size is approximately 3, 000 × = 389
54
53 / 56
Data Collection & Sampling Methods Sampling Methods
Do all members of the study population have known probability
of being included in the sample?
If Yes:
7
probability a school is selected = = 0.13
54
since all students in selected schools are selected this is also
probability a student is selected
Sometimes sampling of clusters uses sampling in proportion
to size
54 / 56
Data Collection & Sampling Methods Sampling Methods
What are the sampling units?
In cluster sampling the primary sampling units are the
clusters
Individuals that make up the clusters are secondary
sampling units
For the standard 1 students e.g:
primary sampling units -schools
secondary sampling units - students
55 / 56
Data Collection & Sampling Methods Sampling Methods
Multistage cluster sampling
56 / 56
Data Collection & Sampling Methods Sampling Methods
The End.
57 / 56