Stratified Random Sampling Stratified Random Sampling • A stratified random sample is obtained by separating the population elements into non-overlapping groups, called strata • Select a simple random sample from each stratum Stratified Random Sampling… • Eg: sampling fish from a stream with the goal being to estimate the average length of trout – Want to know the size of fish (length) – Stream is made up of riffles, runs and pools • larger (longer) fish live in the pools • smaller fish in the riffles. – Strata = stream habitat type Why Choose Stratification? • Minimize uncertainty – equivalent to minimizing the variability associated with our response variable • Example – If fish in riffles are similar in length (thus small within habitat variability) then taking averages on a stratum by stratum basis will mean low variation for each average Simulation Comparing Stratified and Simple Random Sampling simu<- function(N1, N2, N3, n, no) { # N1: size Population 1 # N2: size Population 2 # N3: size Population 3 # n: sample size # no: total number iteration pop1<-rnorm(N1, 5, 1) pop2<-rnorm(N2, 10, 2) pop3<-rnorm(N3, 20, 4) pop = c(pop1,pop2,pop3) pop.mean = mean(pop) N = N1 + N2 + N3 n1 = round(n*N1/N, 0) n2 = round(n*N2/N, 0) n3 = round(n*N3/N, 0) me.srs = numeric(no) me.st = numeric(no) for(i in 1 : no) { sample.srs = sample(pop, n) sample.st = c(sample(pop1,n1), sample(pop2,n2), sample(pop3,n3)) me.srs[i] = mean(sample.srs) me.st[i] = mean(sample.st) } a = min(me.srs) b = max(me.srs) par(mfrow=c(2,1)) hist(me.srs, main = "mean obtained by Simple Random Sample", col ="red", xlim=c(a,b)) abline(v=pop.mean, lwd = 2.5) hist(me.st, main = "mean obtained by Stratified Random Sample", col="blue",xlim=c(a,b)) abline(v=pop.mean, lwd = 2.5) cat("Population mean:",pop.mean,"\n") } simu(200,500,800, 100, 1000) 250 0 100 Frequency mean obtained by Simple Random Sample 13 14 15 16 me.srs 250 100 0 Frequency mean obtained by Stratified Random Sample 13 14 15 me.st 16 Note with the stratified random sample that the sampling distribution of the sample mean is characterized by less variation/uncertainty than in the simple random sample protocol. Why Choose Stratification… • Estimates of population parameters may be desired for subgroups of the population Eg: By stratifying on stream habitat type • You can easily provide estimates of the mean fish length for each habitat type (riffle, run, and pool) • Separate confidence intervals for each of the strata Why Choose Stratification… • The cost per observation in the sample may be reduced – Eg. Gear changes when habitat changes • Simple random sampling of stream sections means more gear changes Example Data Riffles 5.02 5.28 6.40 5.15 5.86 6.25 4.73 6.11 Obtain a 95% bootstrap CI on the mean length of fish across the three habitats Runs 14.10 12.80 14.22 15.42 14.47 14.71 14.10 13.47 13.67 14.00 14.10 14.94 Pools 16.39 18.34 17.73 17.96 18.27 19.44 17.76 17.61 18.76 16.22 16.88 18.91 16.98 20.49 16.90 17.79 19.69 19.31 18.66 16.29