Two Stage
Two Stage
In cluster sampling, the clusters were considered sampling units and all the elements in the
selected clusters were enumerated completely. This method of sampling is economical under
certain circumstances but it is generally less efficient than sampling of individual units
directly, because it restricts the spread of the sample over the population. It is, therefore,
logical to expect that the efficiency of the estimator will be increased by distributing elements
over a large number of clusters and surveying only a sample of units in each selected cluster
instead of completely enumerating all the elements in the sample of small number of clusters.
This type of sampling which consists in first selecting the clusters and then selecting a
specified number of elements from each selected cluster is known as sub-sampling or two-
stage sampling. The clusters that form the units of sampling at first stage are termed first
stage units ( fsu ) or primary stage units and the elements within clusters which form the units
of sampling at the second stage are called second-stage units (ssu) or secondary units. For
example, in conducting a socio-economic survey in a region, villages or urban blocks may be
considered first stage units and the households the second stage units.
1 M
Yi. yij , mean per ssu of the i th fsu .
M j 1
1 N 1 N M
Y i. NM yij , mean per element in the population.
N i 1
Y
i 1 j 1
1 N
S b2
N 1 i 1
(Yi. Y ) 2 , mean square between fsu means.
1 M
S i2
M 1 j 1
( yij Yi. ) 2 , mean square between ssu within i th fsu .
1 N 2
S w2 Si , mean of the mean square between ssu within fsu .
N i 1
1 m
y i. yij , sample mean per ssu in the i th fsu or mean based on m selected
m j 1
ssu from the i th fsu .
1 n 1 n m
y i. nm yij ,
n i 1
y sample mean based on all the nm units.
i 1 j 1
126 RU Khan
Theorem: In two-stage sampling with equal first stage units, the sample mean y is
2 2
N n Sb M m S w
unbiased estimator of Y with variance V ( y ) .
N n M nm
Proof: Here, the expectation is taken in two stages. In the 1st stage, the expectation is taken
over all possible samples of ssu from the selected fsu (i.e. fixed n fsu ) and second
expectation is taken for all samples of size n fsu from N fsu . We have
1 n 1 n
E ( y ) E1 E 2 yi. i E1 E 2 ( yi. i) , since ssu are srs , then
n i 1
n i 1
1 n 1 N
E1 Yi. Yi. Y , as fsu are srs .
n i 1 N i 1
To obtain the variance of y , we have
1 n 1 1 2 1
n
1 1
E1 i
S
2
E1 ( S i )
n 2 i 1 m M n 2 i 1 m M
1 N 1 1 2 1 1 1 2
Si n m M S w
nN i 1 m M
(8.3)
Note:
n m
i) If f1 and f 2 are the sampling fractions in the first and second stages, the
N M
(1 f1 ) 2 (1 f 2 ) 2
result can be written as V ( y ) Sb Sw .
n nm
1 2 1 f2 2
ii) When N is large as compared to n , then V ( y ) S Sw.
n b nm
2
1 f1 2 S w
iii) When M is large as compared to m , the result reduces to V ( y ) Sb .
n nm
iv) When both N and M are large as compared to n and m respectively, then
S2 S2
V ( y) b w .
n nm
Remarks
i) If the selected fsu' s are completely enumerated i.e. m M , then this situation
corresponds to cluster sampling.
ii) If every fsu in the population is included in the sample, i.e. n N , then this case
corresponds to stratified sampling with fsu' s as strata.
Corollary: If the n fsu' s and m ssu' s from each chosen unit are selected by simple
1 n
random sampling, wor , the estimator Y NM yi. NM y is an unbiased of the
ˆ
n
i 1
1 f1 2 2 2 1 f2 2
population total Y and its variance is V (Yˆ ) N 2 M 2 Sb N M Sw.
n nm
Estimation of V ( y )
Define,
1 n 2 1 n
2
sw
n i 1
s i , and s 2
b
n 1 i 1
( y i. y ) 2 .
Consider
1 n 2
2
sw si , then
n i 1
1 n 1 n 1 N
2
E (s w ) E1 E 2 ( si2 i) E1 S i2 S i2 S w2 .
i 1
n i 1
n N i 1
2
This shows that s w is an unbiased estimate of S w2 , and
1 n
1 n 2
sb2
n 1
( yi. y ) n 1 yi. n y 2 , so that
2
i 1 i 1
128 RU Khan
1 n
E ( sb2 )
n 1
E ( yi2. ) n E ( y 2 ) (8.4)
i 1
Now
n n n
E yi2. E1 E 2 ( yi2. i ) E1 {Yi.2 V2 ( yi. i)}
i 1 i 1 i 1
n 1 1 2
n n
1 1
E1 Yi.2 i 1 i.
S E (Y 2
)
2
E1 ( S i )
i 1 m M i 1 i 1
m M
n n 1 1 N 2
[( N 1) S b2 N Y 2 ] Si
N N m M i 1
n 1 1
[( N 1) S b2 N Y 2 ] n S w2 . (8.5)
N m M
and
2
1 1 1 1 S
E ( y 2 ) V ( y ) Y 2 S b2 w Y 2 (8.6)
n N m M n
Substitute the values from equations (8.5) and (8.6) in equation (8.4), we get
1 n 1 1
E ( sb2 ) ( N 1) S b2 n Y 2 n S w2
n 1 N m M
1 1 1 1
n S b2 S w2 n Y 2
n N m M
1 n ( N 1) 2 1 1 N n 2
S b (n 1) S w2 Sb
n 1 N m M N
1 2 1 1 2
(n 1) S b (n 1) m M S w
n 1
1 1
S b2 S w2 (8.7)
m M
2
Since s w is an unbiased estimate of S w2 , it follows from equation (8.7) that an unbiased
1 1 2
estimate of S b2 is Sˆb2 sb2 s w .
m M
Therefore, an unbiased estimate of V ( y ) will be
2
1 1 1 1 2 1 1 sw
Vˆ ( y ) v ( y ) sb2 s w
n N m M m M n
2
1 1 1 1 1 1 2 1 1 sw
sb2 s w
n N n N m M m M n
Two stage sampling 129
2
1 1 1 1 2 1 1 1 2 1 1 sw
sb2 s w sb
n N m M N n N m M N
2
N n 2 M m sw
sb .
nN mM N
Note:
n m
i) If f1 and f 2 are the sampling fractions in the first and second stages, an
N M
alternative form of the result is
2
ˆ (1 f1 ) 2 (1 f 2 ) s w (1 f1 ) 2 f1 (1 f 2 ) 2
V ( y) v ( y) sb sb sw .
n m N n nm
1
ii) If N is large, Vˆ ( y ) v ( y ) sb2 .
n
where c 0 is the overhead cost, c1 is the cost of including an fsu in the sample and c2 is the
cost of including an ssu in the sample.
C c0 n c1 nm c2 C (say)
In a two stage sampling, the variance can be written as
2 2 2 2 2
1 1 1 1 S w Sb Sb S w S w
V ( y ) S b2 .
n N m M n n N nm nM
So that
S2 S2 S2 S2
V ( y ) b b w w V (say).
N n nm nM
where C and V are function of n and m . Choosing n and m to minimize V for fixed C
or C for fixed V are both equivalent to minimizing the product.
S2 S2 S2 S w2
(nc nmc ) S 2 S w
2
V C b w w (c1 mc2 )
n nm nM 1 2
b M m
c 2 mc2 2 c1 2
c1 Sb2 mc2 Sb2 1 S w 2
S w S w c2 S w
M M m
130 RU Khan
2 S w2 2 S w2 c 2
c1 S b mc2 S b 1 S c S2
M M m w 2 w
c S2
c1 mc2 1 S w2 c2 S w2 , where, S b2 w (8.8)
m M
Since equation (8.8) is independent of n , so minimizing V C will provide the optimum
value of m . We have to consider two cases according to 0 or 0 .
Case i) For optimum m , when 0 , differentiate (8.8) with respect to m and equate it to
zero, we have
c c S2
V C 0 c2 1 S w2 , m 2 1 w , and hence,
m m2 c2
c1 c1
mopt S w Sw .
c2 2 S w2
c2 S b
M
The optimum value of n is found by solving either the cost equation or the variance
equation, depending on which has been pre-assigned.
a) When cost is fixed, substitute the optimum value of m in the cost function and solve for
n as
c1 cc
C c0 n c1 n S w c2 n c1 S w 1 2 , then
c2
C c0
nopt .
c1c2
c1 S w
Substitute the value of mopt and nopt in the expression for variance, we get the minimum
variance.
b) When variance is fixed, substitute the optimum value of m in the expression for
variance, we get the optimum value of n as
S b2 S b2
S2 S2 1 S2 S2
V ( y) w w S b2 w w
N n nm nM n M nm
1 S w2 1 S c
w 2 , then
n c1 n c1
n Sw
c2
S w c2 / c1
nopt .
S b2
V
N
Substitute the value of mopt and nopt in the expression for cost, we get the minimum
cost.
Two stage sampling 131
Case ii) For optimum m , when 0 , in equation (7.8), the terms containing m are
c c
mc2 and 1 S w2 , clearly mc2 is negative, as 0 and 1 S w2 is minimum when m
m m
attain the possible attainable value, i.e. m M . Thus, V C will be minimum, when m M ,
so that, mopt M . The optimum value of n is found by solving either the cost equation or
the variance equation, depending on which has been pre-assigned.
a) When cost is fixed, substitute the optimum value of m in the cost function and solve for
n as
C c0
C c0 n c1 n M c2 n (c1 M c2 ) nopt .
c1 M c2
Substitute the value of mopt and nopt in the expression for variance, we get the minimum
variance.
b) When variance is fixed, substitute the optimum value of m in the expression for
variance, we get the optimum value of n as
S b2 S w2 S b2
S w2 S b2 S b2
V ( y) , nopt .
N n nM nM n 1 2
V Sb
N
Substitute the value of mopt and nopt in the expression for cost, we get the minimum
cost.
1 1 2 1 2
V ( ysr ) S S , for large N , and
nm NM nm
1 1 1 1 1
V ( y ) S b2 S w2 . (8.9)
n N nm M
( NM 1) S 2 N (M 1) S w
2
( NM 1) S 2 , from equation (8.10)
2
or N S w 2
N (M 1) S w ( NM 1) S 2 ( NM 1) S 2
132 RU Khan
NM 1
2
or NM S w ( NM 1) (1 ) S 2 or S w2 (1 ) S 2 (8.11)
NM
Substituting equation (8.11) in equation (8.10), we get
2
1 NM 1
S b2 ( NM 1) S 2 N ( M 1) (1 ) S 2
M ( N 1) NM
NM 1 M 1 NM 1 M ( M 1) (1 ) 2
1 (1 ) S 2 S
M ( N 1) M M ( N 1) M
NM 1 1 ( M 1) 2
S (8.12)
M ( N 1) M
Substituting equation (8.11), and (8.12) in equation (8.9), we get
N n NM 1 1 ( M 1) 2 1 M m NM 1
V ( y) S (1 ) S 2
nN M ( N 1) M n mM NM
N n NM 1 NM 1
2 S2 ( M 1) S 2
nN M ( N 1) M 2 ( N 1)
1 M m NM 1 2 NM 1
S S2
n mM NM NM
N n NM 1 1 M m NM 1 2
2 S2 S
nN M ( N 1) n mM NM
N n NM 1 1 M m NM 1
2 ( M 1) S 2 S2
nN M ( N 1) n mM NM
S 2 NM 1 N n m M m N n m ( M 1) M m
nm NM N 1 M M N 1 M M
If fpc is ignored at both the stages, then
S2
V ( y) [1 (m 1) ] .
nm
Thus, the relative efficiency of two stages sampling compared with simple random sampling
is given by
V ( y sr ) 1
RE .
V ( y) [1 (m 1) ]
It can be seen that the relative efficiency depends on the value of , if
i) 0, then V ( y sr ) V ( y ) , i.e. both methods are equally precise.
ii) 0, then V ( y sr ) (m 1)V ( y sr ) V ( y ) , and V ( y sr ) V ( y ) i.e. two stage sampling
is more precise.
iii) 0, then V ( y sr ) (m 1) V ( y sr ) V ( y ) , and V ( y sr ) V ( y ) , i.e. simple random
sampling is more precise.
Two stage sampling 133
i M
1
Yi. yij , mean of the observations in the i th fsu .
M i j 1
1 N
YN Yi. , the over all mean of fsu means.
N i 1
1 N M
M
N i 1
M i 0 , average number of ssu .
N
N Mi
1 1 N
Y
N yij M i Yi. , mean per element in the population.
NM i 1
Mi i 1 j 1
i 1
1 N
S b2
N 1 i 1
(Yi. YN ) 2 , mean square between fsu means.
Mi
1
S i2
Mi 1
( yij Yi. ) 2 , mean square between ssu within i th fsu .
j 1
m
1 i
y i. yij , sample mean per ssu of i th fsu .
mi j 1
There are several estimators of the population mean Y , here a few estimators may be
considered from the practical point of view.
1 n
1st estimate: It is defined by y I yi. .
n i 1
By definition,
1 n 1 n
E ( y I ) E1 E 2 ( yi. i) E1 Yi. , as ssu are under srs .
n i 1 n i 1
1 n 1 N
n i 1
E1 (Yi. ) Yi. YN Y , as fsu are under srs .
N i 1
134 RU Khan
Which shows that y I is a biased estimate of Y . The bias of the estimator is given as
1 N 1 N 1 N N
B E ( y I ) Y Yi. M i Yi. M Yi. M i Yi.
N i 1 NM i 1 N M i 1 i 1
1 N
(M i M ) Yi.
NM i 1
Consider
1 n 1 n 1 n
V1[ E 2 ( y I i)] V1 E 2 yi. i V1 E 2 ( yi. i ) V1 Yi.
n i 1
n
i 1
n
i 1
1 1
S b2 . (8.14)
n N
Consider
1 n 1 n 1 1 2
E1[V2 ( y I i)] E1
n2 V 2 ( y i. i )
E1 2 S i
i 1 i 1
n mi M i
1 n 1 1 1 N 1 1 2
2 m
E1 ( S i2 ) S i . (8.15)
n i 1 i M i nN i 1 mi M i
Substituting the values from (8.14) and (8.15) in (8.13), we get
1 1 2 1 N 1 1 2
V ( y I ) Sb
n N
nN i 1 mi M i
S i .
1 f 2 1 N 1 fi 2 n m
n
Sb
nN i 1 mi
S i , where f , and f i i .
N Mi
Estimation of V ( y I )
We have, for the sample
1 n 2
n mi
1 1
sb2
n 1
( yi. y I ) n 1 yi. n y I , and si m 1 ( yij yi. ) 2 .
2 2 2
i 1 i 1 i j 1
Also,
1 n 1 1 2 1 n 1 1
E si E1 E2 ( si2 i)
n i 1 mi M i n i 1 mi M i
1 n 1 1 2 1 n 1 1
E1 S i E1 ( S i2 )
n i 1 mi M i n i 1 mi M i
1 N 1 1 2
Si .
N i 1 mi M i
1 n 1 1 2 1 N 1 1 2
This shows that si is an unbiased estimate of S i , thus, it
n i 1 mi M i N i 1 mi M i
n
2 1 1 1 2
follows from (8.16) that an unbiased estimate of S b is S b sb
2 ˆ 2
si .
n i 1 mi M i
Therefore,
1 1 1 n 1 1 2 1 1 n 1 1 2
v( y I ) sb2 si si
n N n i 1 mi M i n n i 1 mi M i
1 1 1 n 1 1 2 1 n 1 1 2
sb2
n N
si si
n 2 i 1 mi M i nN i 1 mi M i
1 n 1 1 2
2 m
si .
n i 1 i M i
1 1 1 n 1 1 2 1 f 2 1 n 1 fi 2
sb2
n N
nN i 1 mi M i
si sb
n
nN i 1 mi
si .
1 2
Note: For large N , v( y I ) s .
n b
1 n M
2nd estimate: It is defined by y II
n i 1
ui yi. , where ui i .
M
By definition,
1 n 1 n Mi
E ( y II ) E1 E2 ui yi. i E1 ui E2 ( yi. i) , as ui is constant for
n i 1 n
i 1 M
fixed i .
1 n 1 n 1 N
E1 ui Yi. E1 (ui Yi. ) ui Yi.
n i 1 n i 1 N i 1
1 N
M i Yi. Y , which shows that y II is an unbiased estimate of Y .
NM i 1
136 RU Khan
1 1 2 1 N
S b , where S b
n N
2
N 1 i 1
(ui Yi. Y ) 2 .
Consider
1 n 1 n 1 1 2
E1[V2 ( y II i)] E1 ui2 V2 ( yi. i ) E1 ui2 S i
n 2 i 1 n 2 i 1 mi M i
1 n 2 1 1 1 N 2 1 1 2
2 i m
u E1 ( S i2 ) ui S i .
n i 1 i M i nN i 1 mi M i
Therefore,
1 1 1 N 2 1 1 2
V ( y II ) S b 2
n N
ui
nN i 1 mi M i
S i
1 f 2 1 N 2 1 fi 2
n
Sb ui
nN i 1 mi
S i .
Estimation of V ( y II )
We have, for the sample
1 n 1 n 2 2
sb 2
n 1 i 1
2
(ui yi. y II )
n 1 i 1
ui yi. n y II 2 .
Further, we know that
1 N 2 1 1 2
E ( sb 2 ) S b 2 ui
N i 1 mi M i
S i . (8.17)
Also
1 n 1 1 2 1 n 2 1 1
E ui2 si E1 ui E2 ( si2 i)
n i 1 mi M i n i 1 mi M i
1 n 2 1 1 2 1 n 2 1 1
E1 ui S i ui E1 ( S i2 )
n i 1 mi M i n i 1 mi M i
Two stage sampling 137
1 N 2 1 1 2
u i Si .
N i 1 m i M i
1 n 2 1 1 2 1 N 2 1 1 2
This shows that ui
n i 1 mi M i
si is an unbiased estimate of ui S i .
N i 1 mi M i
Therefore, it follows from (8.17) that an unbiased estimate of S b 2 is
1 n 1 1 2
Sˆb 2 sb 2 ui2 si .
n i 1 mi M i
Hence, an unbiased estimate of V ( y II ) will be
1 1 2 1 1 2 1 1 n 2 1 2
n
1 1
v ( y II ) sb ui2
si ui si
n N n i 1 mi M i n n i 1 mi M i
1 1 1 n 2 1 1 2 1 n 2 1 1 2
sb 2
n N
ui si ui si
n 2 i 1 mi M i nN i 1 mi M i
1 n 1 1 2
2 ui2 m si
M i
n i 1 i
1 1 1 n 2 1 1 2 1 f 2 1 n 2 1 fi 2
sb 2
n N
ui
nN i 1 mi M i
si
n
b
s ui
nN i 1 mi
si .
1 2
Note: For large N , v( y II ) s .
n b
n
1
Corollary: Show that the estimator Y NM i y i. estimates unbiased the population
ˆ
i 1 n
2 1 fi 2
N
1 f 2 N
total Y and its variance is V (Yˆ ) N 2 S b M i S i .
n n i 1 mi