Simple Random Sampling Without Replacement (SRSWOR)
Simple Random Sampling Without Replacement (SRSWOR)
Notations:
The following notations will be used in further notes:
1 n
1N
n n
1
So the probability of selecting any one of these samples is .
N
n
Note that a unit can be selected at any one of the n draws. Let ui be the ith unit selected in the sample.
This unit can be selected in the sample either at first draw, second draw, …, or nth draw.
Let P ij ( ) denotes the probability of selection of ui at the jth draw, j = 1,2,...,n. Then
Now if u u1, 2,...,un are the n units selected in the sample, then the probability of their selection is
Note that when the second unit is to be selected, then there are (n – 1) units left to be selected in the
sample from the population of (N – 1) units. Similarly, when the third unit is to be selected, then there
are (n – 2) units left to be selected in the sample from the population of (N – 2) units and so on. n
If P u( 1) , then
N
1 1
n
P u( 2) ,..., P u( n ) .
N 1 N n 1
Thus
1 2 1 1
P u u( 1, 2,..,un ) N Nn . n 1. Nn 2 ... N n 1 N .
Alternative approach:
The probability of drawing a sample in SRSWOR can alternatively be found as follows:
Let ui k( ) denotes the ith unit drawn at the kth draw. Note that the ith unit can be any unit out of the N
units. Then so (ui(1),ui(2),...,ui n( )) is an ordered sample in which the order of the units in which they are
drawn, i.e., ui(1) drawn at the first draw, ui(2) drawn at the second draw and so on, is also considered. The
P u( i k( ) | ui(1)ui(2)...ui k( 1)) 1 .
N k1
So
n
1 (N n)!
P s( o ) k1 N k1 N ! .
(N n )!
Probability of drawing a sample in a given order
N!
So the probability of drawing a sample in which the order of units in which they are drawn is
irrelevant n!(N n )! 1 .
N! N
n
Alternatively, let ui be the ith unit selected in the sample. This unit can be selected in the sample either
at first draw, second draw, …, or nth draw. At any stage, there are always N units in the population in
case of SRSWR, so the probability of selection of ui at any stage is 1/N for all i =
1,2,…,n. Then the probability of selection of n units u u1, 2,...,un in the sample is
A A( k 1, 2......Ak1)
1 1 1 1 1
1 N 1 N 11 N 2 ... 1 N k 2 N k 1
N 1 N 2 N k1 1
. ... .
N N 1 N k 2 N k 1
Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur
Page 55
1
N
2. SRSWR
1
P[selection of u j at kth draw] = .
N
Let us consider the sample arithmetic mean y n i1 yi as an estimator of the population mean
1 N
SRSWOR
n
n E(i1 yi )
1
E( )y
1
E t i
N
1 1 n ti
n N i1
n
N
1 1 n n
When n units are sampled from N units by without replacement, then each unit of the population can
occur with other units selected out of the remaining N 1 units is the population and each unit
N 1 N
So
N
n n N 1 N
N i1 yi
Y.
E y( j ) i
j
1
1
n
n
N
N
1
n j1 Y
Y
SRSWR
1
E( )y n E(in1 yi )
n
1
n i1 E y( i )
1n
n Y
Y.
1
where Pi for all i 1,2,...,N is the probability of selection of a unit. Thus y is an unbiased
N
estimator of the population mean under SRSWR also.
V( )y E(y Y )2
1 n 2 E
yi Y ) 2 (
1 n
2
1 n n E y( i Y )(y j Y )
2 E(yi Y ) n2 i j n
1 n 2 K
where K E y Y y Y( i )( i ) assuming that each observation has variance 2. Now we find K
ij
SRSWOR n
n
Consider
N N
1
E y( i Y )(yj Y ) N N (yk Y )(ye Y )
( 1) k
Since
Y ) (
k1 i1 k
N N
1 N N 1 2
N 1 2 1 S2
V(yWOR ) S 2 n n( 1)
Nn n N
N n 2
S.
Nn
SRSWR
N N
K E y( i Y )(yi Y )
ij
N N
E y( i Y E) (y je Y )
ij
0
N 1 2 V(yWR )
S.
Nn
It is to be noted that if N is infinite (large enough), then
S2
V( )y
n
N n
is both the cases of SRSWOR and SRSWR. So the factor is responsible for changing the
N
variance of y when the sample is drawn from a finite population in comparison to an infinite
N n
population. This is why is called a finite population correction (fpc) . It may be noted that
V(yWOR ) N n S2
Nn
N 1
V(yWR ) S2
Nn
N n 2 n1
S S2
Nn Nn
V(yWOR )a positive quantity
Thus
the basis of a sample, an estimator of S2 (or equivalently 2) is needed. Consider S2 as an estimator of s2
(or 2) and we investigate its biasedness for S2 in the cases of SRSWOR and
SRSWR,
2 1 n
s (y y) n1
i 2 i1
1 n 2
n 1 (yi Y )(y Y )
1 i
1 n 2 2
n 1 (yi Y ) n(y Y )
1 i
2 1n 2 2
E s( ) E y( i Y ) nE(y Y )
n1 i1
1 n
1 2
n 1 Var y( i )nVar( )y n1n nVar( )y
1 i
V(yWOR ) N
n
S2 Nn
and so
E s( 2) nn1 2 NNn n S2
N 1 2
V(yWR ) S
Nn
and so
1 2
E s( 2) nn1 2 NNn S
N Nn
N 1
S2
N
2
Hence
E s( 2) S22 is SRSWRis SRSWOR
Vˆ(yWOR ) N n s2 in case of SRSWOR and
Nn
Vˆ(yWR ) N 1. N s2
Nn N 1
Standard errors
The standard error of y is defined as Var( )y .
In order to estimate the standard error, one simple option is to consider the square root of the estimate
of the variance of the sample mean.
• under SRSWOR, a possible estimator is ˆ( )y N ns . Nn
1
N
• under SRSWR, a possible estimator is ˆ( )y s.
Nn
It is to be noted that this estimator does not possess the same properties as of Var( )y .
ˆ
Reason being if is an estimator of , then is not necessarily an estimator of .
In fact, the ˆ( )y is a negatively biased estimator under SRSWOR.
S 2 with ( )E 0,E(2) S 2.
Write
s (S 2 )1/2
assuming will be small as compared to S2 and as n becomes large, the probability of such an event
approaches one. Neglecting the powers of higher than two and taking expectation, we have
)
Var s( 2
E( )s 1
8S 4 S
where
2S 4 n1
Var s
(n1) 1 2n 3) for large N.
2 2
1N j
j N i1 Yi Y
4
2 4 : coefficient of kurtosis.
S
Thus
Var s( 2)
4S 2
S2 n1
n 1 2n 2 3.
n1
1 2n 2 3
and this does not depend on the coefficient of skewness.
This is an important result to be kept in mind while determining the sample size in which it is assumed
that S2 is known. If inflation factor is ignored and the population is non-normal, then the reliability on
s2 may be misleading.
Alternative approach:
The results for the unbiasedness property and the variance of the sample mean can also be proved in an
alternative way as follows:
(i) SRSWOR
With the ith unit of the population, we associate a random variable ai defined as follows:
ai 10,, if tif thehe ith unit occurs in the sampleth i1,2,...,N) i unit does not
Then,
( )
Var a( i ) E a( i2)E a( i )2 nN 2 n , i1,2,...,N
N
( )
Cov a a( i , j ) E a a( i j ) E a E a( i ) ( j ) nN 2 n , i
j 1,2,...,N. N (N 1)
We can rewrite the sample mean as
1N
y n i1 a yi i
Then
1
E( )y n iN1 E a y( i ) i
Y and
1 N 1N 2 N
Var
Var( )y 2 1 a yi i n2 i1 Var a( i )yi i j Cov
a a( i , j )y yi j . n i
Substituting the values of Var a( i ) and Cov a a( i , j ) in the expression of Var( )y and simplifying, we
get
Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur
Page 1818
Nn 2
Var( )y S.
Nn
2 1 n 2 2 1
N 2 2 s n 1 yi ny
2 1 N 2 2
(E s ) n 1 E a y( i ) i
n Var
( )y Y
( 1) i
Substituting the values of E(ai ) and Var( )y in this expression and simplifying, we get E s( 2) S2.
(ii) SRSWR
Let a random variable ai associated with the ith unit of the population denotes the number of times the
ith unit occurs in the sample i 1,2,..., N. So ai assumes values 0, 1, 2,…,n. The joint distribution of a
! 1
n
P a a( 1,
2,...,aN ) N
.
n N a i
!
i1
we have
i1
n
Cov a a(
i , j ) N2 , i j 1,2,..., N.
y n i1 a yi i .
Hence, taking the expectation of y and substituting the value of E(ai ) n N/ we obtain that
E( )y Y.
Further,
i
1
N 2
Var( )y S.
Nn
1
2
N 2 2
i1
Nn iN1 2
n.(NnN1) S2 nY 2
yi
(n1)(N 1) 2
S
N
2 N
1S2 2
E s( )
N
YT Yi NY
i1
YˆT NYˆ
Ny.
Obviously
E Yˆ NE y
T
NY
Nn n
ˆ
and the estimates of variance of Y T are
N s2 for SRSWOR
n
N(0,1) when 2 is known. If 2 is unknown and is estimated from the sample then y Y Var( )y
follows a t -distribution with (n1) degrees of freedom. When 2 is known, then the 100(1)%
confidence interval is given by
PZ y Y Z 1
2 Var( )y 2
when Z denotes the upper % points on N(0,1) distribution. Similarly, when 2 is unknown,
2
2
P t y Y t
1 2 Varˆ( )y 2
P y
or y t Varˆ( )y y y t Varˆ( ) 1
2 2
and the confidence limits are
where t denotes the upper % points on t -distribution with (n1) degrees of freedom.
2 2
Now we find the sample size under different criteria assuming that the samples have been drawn using
SRSWOR. The case for SRSWR can be derived similarly.
1. Prespecified variance
The sample size is to be determined such that the variance of y should not exceed a given value, say
V. In this case, find n such that
Var( )y V
N n or
( )y V
Nn
N n 2
or S V
Nn
1 1 V
or 2
nNS
1 1 1
or
n N ne
ne
n
ne
1
Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur
Page 2424
N
S2
where ne .
v
It may be noted here that ne can be known only when S2 is known. This reason compels to assume that
S should be known. The same reason will also be seen in other cases.
ne and nsmallest ne .
i.e., which is a small quantity. Such a requirement can be satisfied by associating a probability (1 )
with it and can be expressed as
y Y e
P 1
Var( )y Var( )y
which implies that
e
Z
Var( )y 2
or Z2 Var( )y e2
2
Z S 2
2
n e .
(1) should not exceed a prespecified amount W , then the sample size n is determined such that
2Z Var( )y W
2
n
2Z N S W 2
Nn
or 4Z22 1n N1 S2 W2
1 1 W2
or n N 4 2 2
ZS
2
W2
22
4Z S
or n 2 .
1 NW2
nsmallest W2
2 2
4Z S
2
NW
1 2
If N is large then
4Z2 S 2
2
nW2
nsmallest W2 .
If it is desired that the the coefficient of variation of y should not exceed a given or pre-specified value
of coefficient of variation, say C0 , then the required sample size n is to be determined such that
CV( )y C0
Var( )y
or C0
or Nn 2
C02 Y
1 1 C02
or 2
n N C
C2
2
o C
or n C 2
1 2
NC0
S
is the required sample size where C is the population coefficient of variation.
Y
The smallest sample size needed in this case is
C2
2
0
C2
C
nsmallest .
1 2
NC0
If N is large, then
C2
n
2
C
0
C2
and nsmalest
C
2 0
R with probability (1), then such requirement can be satisfied by expressing it like such requirement
can be satisfied by expressing it like
y Y RY
P 1 .
Var( )y Var( )y
Assuming the population to be normally distributed, y follows N Y , NNn n S2 .
Nn
or Z22 Nn S2 R Y2 2
1 1 R2
or n N C Z2 2
2
2
ZC
2
R
or n 2
ZC
1 2
1
N R
S
where C is the population coefficient of variation and should be known. Y
If N is large, then
Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur
Page 2929
z C 2
n 2 .
R
6. Pre-specified cost
Let an amount of money C is being designated for sample survey to called n observations, C0 be the
overhead cost and C1 be the cost of collection of one unit in the sample. Then the total cost C can be
expressed as
C C0 nC1
Or n C C0
C1
is the required sample size.