Project2_Stratified_Sampling

advertisement
Stratified Random Sampling
Stratified Random Sampling
• A stratified random sample is obtained by separating the
population elements into non-overlapping groups, called
strata
• Select a simple random sample from each stratum
Stratified Random Sampling…
• Eg: sampling fish from a stream with the goal being to
estimate the average length of trout
– Want to know the size of fish (length)
– Stream is made up of riffles, runs and pools
• larger (longer) fish live in the pools
• smaller fish in the riffles.
– Strata = stream habitat type
Why Choose Stratification?
• Minimize uncertainty
– equivalent to minimizing the variability associated with our response
variable
• Example
– If fish in riffles are similar in length (thus small within habitat
variability) then taking averages on a stratum by stratum basis will
mean low variation for each average
Simulation Comparing Stratified and
Simple Random Sampling
simu<- function(N1, N2, N3, n, no)
{
# N1: size Population 1
# N2: size Population 2
# N3: size Population 3
# n: sample size
# no: total number iteration
pop1<-rnorm(N1, 5, 1)
pop2<-rnorm(N2, 10, 2)
pop3<-rnorm(N3, 20, 4)
pop = c(pop1,pop2,pop3)
pop.mean = mean(pop)
N = N1 + N2 + N3
n1 = round(n*N1/N, 0)
n2 = round(n*N2/N, 0)
n3 = round(n*N3/N, 0)
me.srs = numeric(no)
me.st = numeric(no)
for(i in 1 : no)
{
sample.srs = sample(pop, n)
sample.st = c(sample(pop1,n1), sample(pop2,n2),
sample(pop3,n3))
me.srs[i] = mean(sample.srs)
me.st[i] = mean(sample.st)
}
a = min(me.srs)
b = max(me.srs)
par(mfrow=c(2,1))
hist(me.srs, main = "mean obtained by Simple Random Sample",
col ="red", xlim=c(a,b))
abline(v=pop.mean, lwd = 2.5)
hist(me.st, main = "mean obtained by Stratified Random Sample",
col="blue",xlim=c(a,b))
abline(v=pop.mean, lwd = 2.5)
cat("Population mean:",pop.mean,"\n")
}
simu(200,500,800, 100, 1000)
250
0 100
Frequency
mean obtained by Simple Random Sample
13
14
15
16
me.srs
250
100
0
Frequency
mean obtained by Stratified Random Sample
13
14
15
me.st
16
Note with the
stratified random
sample that the
sampling distribution
of the sample mean
is characterized by
less variation/uncertainty
than in the simple random
sample protocol.
Why Choose Stratification…
• Estimates of population parameters may be desired for
subgroups of the population
Eg: By stratifying on stream habitat type
• You can easily provide estimates of the mean fish length for each habitat
type (riffle, run, and pool)
• Separate confidence intervals for each of the strata
Why Choose Stratification…
• The cost per observation in the sample may be reduced
– Eg. Gear changes when habitat changes
• Simple random sampling of stream sections means more gear changes
Example Data
Riffles
5.02
5.28
6.40
5.15
5.86
6.25
4.73
6.11
Obtain a 95% bootstrap CI on
the mean length of fish across
the three habitats
Runs
14.10
12.80
14.22
15.42
14.47
14.71
14.10
13.47
13.67
14.00
14.10
14.94
Pools
16.39
18.34
17.73
17.96
18.27
19.44
17.76
17.61
18.76
16.22
16.88
18.91
16.98
20.49
16.90
17.79
19.69
19.31
18.66
16.29
Download