Stat 475 Notes 8 Reading: Lohr, Chapter 4.2-4.5 Note for Homework 2: yU ˆ y B B For estimating a ratio xU with the estimator x , the standard error of B̂ is ˆ ) ( y Bx 2 n 1 SE ( Bˆ ) 1 2 iS n 1 N nxU If xU is unknown, then we substitute the sample mean x for it. i i I. Inference from a stratified sample Suppose we take a stratified sample from H strata with N1 , , N H units in the population in the strata ( N1 N H N ) and sample sizes in the strata of n1 , , nH . Our estimators of the population total and population mean are H H h 1 h 1 tˆstr tˆh N h yh ystr H tˆstr N h yh N h 1 N 1 The properties of these estimators follow directly from the property of simple random sample estimators: Unbiasedness. ystr and tˆstr are unbiased estimators of yU and t . This is true because H N H Nh H Nh E yh E[ yh ] h yhU yU h 1 N h 1 N h 1 N Variance of the estimators. Since we are sampling independently from the strata and we know Var (tˆh ) from simple random sampling theory, we have H H 2 Sh2 n h Var (tˆstr ) Var (tˆh ) 1 Nh N nh . h 1 h 1 h Variance estimates for stratified samples. We can obtain an unbiased estimator of Var (tˆstr ) by substituting the sample 2 2 estimates sh for the population quantities S h . Note that to estimate the variances, we need at least two units from each stratum. H 2 sh2 n h ˆ (tˆstr ) 1 Var N h N h nh h 1 2 H n N s 1 ˆ ( ystr ) 2 Var ˆ (tˆstr ) 1 h h h Var N N h N nh h 1 As always, the standard error of an estimator is the square ˆ ( ystr ) . root of the estimated variance: SE ( ystr ) Var 2 If either (1) the sample sizes within each stratum are large or (2) the sampling design has a large number of strata, an 2 approximate 95% confidence interval for the population mean yU is ystr 1.96* SE ( ystr ) Some survey researchers use the 0.975 quantile of the tdistribution with n H degrees of freedom instead of 1.96 (this multipler converges to 1.96 as n H gets large). Example 1: An advertising firm, interested in determining how much to emphasize television advertising in a certain county, decides to conduct a sample survey to estimate the average number of hours each week that households within the county watch television. The county contains two towns, A and B, and a rural area. Town A is built around a factory, and most households contain factory workers with school age children. Town B is an exclusive suburb of a city in a neighboring county and contains older residents with few echildren at home. There are 155 households in town A, 62 in town B and 93 in the rural area. Merits of using stratified random sampling in this situation: The population of households falls into three natural groupings, two towns and a rural area, according to geographic location. Thus, to use these divisions as three strata is quite natural simply for administrative convenience in selecting the samples and carrying out the fieldwork. In addition, each of the three groups of households should have similar behavioral patterns among residents within the group. We expect to see relatively small variability in number of hours of television viewing among households within a group, and this is precisely the situation in 3 which stratification produces a reduction in the variance of the estimate of the population mean. The advertising firm has enough time and money to interview n 40 households and decides to select random samples of size n1 20 from town A, n2 8 from town B and n3 12 from the rural area (We will discuss the choice of sample sizes later). The simple random samples are selected and the interviews are conducted. The data and summaries are shown below. towna=c(35,43,36,39,28,28,29,25,38,27,26,32,29,40,35,41,37,31,45,34); townb=c(27,15,4,41,49,25,10,30); rural=c(8,14,12,15,30,32,21,20,34,7,11,24); mean(towna) > [1] 33.9 mean(townb) > [1] 25.125 mean(rural) > [1] 19 sd(towna) > [1] 5.94625 sd(townb) > [1] 15.24502 sd(rural) > [1] 9.36143 A good way to view the key features of these samples and look for any outliers or unusual features is to make side-by-side boxplots. boxplot(towna,townb,rural,names=c("Town A","Town B","Rural"),main="Box plots of Television Viewing Time") 4 There do not appear to be any outliers or unusual features to be concerned about. Note that N 155, N 62, N 93, N 155 62 93 310 Our estimate of the population mean is H N ystr h yh h 1 N 1 (155)(33.90) (62)(25.12) (93)(19) 27.7 310 1 2 3 The standard error is 5 n SE ( ystr ) 1 h Nh h 1 H N h sh2 N nh 2 2 2 155 155 2 5.952 62 62 15.252 93 93 9.36 2 1 1 1 8 310 310 310 310 12 310 310 20 1.40 An approximate 95% confidence interval for the population mean is ystr 1.96SE ( ystr ) 27.7 1.96*1.40 (25.0, 30.4) II. Sampling Weights The stratified sampling estimator tˆstr can be expressed as a weighted sum of the individual sampling units. H H N tˆstr N h yh h yhj h 1 h 1 jSh nh The sampling weight whj ( N h / nh ) can be thought of as the number of units in the population represented by the sample member ( h, j ) . If the population has 1600 men and 400 women and the stratified sample design specifies sampling 200 men and 200 women, then each man in the sample has weight 8 and each woman has weight 2. Each woman in the sample represents herself and 1 other woman not selected to be in the sample, and each man represents himself and 7 other men not in the sample. Note that the probability of selecting the jth unit in the ith stratum to be in the sample is hj nh / N h , the sampling fraction in the hth stratum. Thus, the sampling is simply the reciprocal of the probability of selection: 6 whj 1 hj . The sum of the sampling weights equals the population size N ; each sampled unit “represents” a certain number of units in the population, so the whole sample “represents” the whole population. The stratified estimate of the population total may thus be written as: H tˆstr whj yhj h 1 jSh and the estimate of the population mean as H ystr w h 1 jS h H yhj hj w h 1 jS h . hj Example 1 continued. In Example 1, the weights are w N n Stratum hj h h Town A 155 20 7.75 Town B 62 8 7.75 Rural 93 12 7.75 The sampling weights are identical for each stratum. This is an example of proportional allocation. In proportional allocation, 7 so called because the number of sampled units in each stratum is proportional to the size of the stratum, the probability of selection hj nh / N h is the same ( n / N ) for all strata: in a population of 2400 men and 1600 women, proportional allocation with a 10% sample would mean sampling 240 men and 160 women. For a stratified random sample with proportional allocation, the probability that an individual will be selected in the sample, n / N , is the same as in a simple random sample but many of the “bad” samples that could occur in a simple random sample (for example, a sample in which all 400 persons are men) cannot be selected in a sample with proportional allocation. III. Optimal Allocation The objective in designing a sample survey is to maximize the information, i.e., minimize the variance of the estimator of the desired quantity, for a fixed total cost. Let C represent total cost, co represent overhead cost such as maintaining an office; and ch represent the cost of taking an observation in stratum h so that H C co ch nh . h 1 We want to allocate observations to strata in order to minimize Var ( ystr ) for a given total cost C or equivalently to minimize C for a fixed Var ( ystr ) . Suppose the costs c1 , 8 , ch are fixed. To minimize the variance for a fixed cost, we can prove, using calculus, that the optimal allocation has nh proportional to N h Sh ch for each h. Thus, the optimal sample size in stratum h is N h Sh c h n nh H N l Sl c l 1 l We thus sample heavily within a stratum if The stratum accounts for a large part of the population. The variance within the stratum is large; we sample more heavily to compensate for the heterogeneity. Sampling in the stratum is inexpensive. The variance of ystr is nh N h Sh2 Var ( ystr ) 1 N h N nh h 1 ˆ ( ystr ) equal to some fixed value D If we would like to set Var and we use the optimal allocation, then we can solve for the value of n that makes Var ( ystr ) equal to D . H H N S / c N S c h h h h h h h 1 h 1 n H N 2 D N h S h2 2 H h 1 9 Example 1 continued. The advertising firm finds that obtaining an observation from a rural household costs more than obtaining a response in town A or B. The increase is due to the costs of traveling from one rural household to another. The cost per observation in each town is estimated to be $9 (that is, c1 c2 9 ) and the cost per observation in the rural area $16 (that is, c3 16 ). The stratum standard devations (approximated by the strata sample variances from a prior survey) are S1 5, S2 15, S3 10 . Find the overall sample size n and the stratum sample sizes n1 , n2 , n3 that allow the firm to estimate, at minimum cost, the average television-viewing time with a margin of error equal to 2 hours. The margin of error is half the width of the 95% confidence interval which is approximately equal to 2*standard deviation of ystr . Thus, we want the standard deviation of ystr and the variance of ystr to be 1. We have H Nh Sh / ch h 1 H N S h 1 h h 155(5) 62(15) 93(10) 800.83 9 9 16 ch 155(5) 9 62(15) 9 93(10) 16 8835 Thus, 10 H H N S / c N S c h h h h h h h 1 h 1 n H N D N h S h2 2 h 1 (800.83)(8835) 57.42 58 (310) 21 27,125 Then, NS / c 155(5) / 3 1 58 n1 n 3 1 1 58(.32) 18.5 18 N S / c 800.83 h h h h 1 Similarly, 62(15) / 3 n2 58 58(.39) 22.6 23 800.83 93(10) / 4 n3 58 58(0.29) 16.8 17 800.83 Hence, we should select 18 households at random from town A, 23 from town B, and 17 from the rural area. We can then estimate the average number of hours spent watching television at minimum cost with a margin of error of 2 hours. Neyman allocation is a special case of optimal allocation used when the costs in the strata are approximately equal. Under Neyman allocation, nh is proportional to N h Sh . 11 If all variances in strata and costs are equal, proportional allocation is the same as optimal allocation. If we know the variances within each stratum and they differ, optimal allocation gives a smaller variance than proportional allocation. But optimal allocation is a more complicated scheme; often the simplicity and self weighting property of proportional allocation are worth the extra variance. In addition, the optimal allocation will differ for each variable being measured, whereas the proportional allocation depends only on the number of population units in each stratum. Variance comparisons for different designs Let y , ystr , pa , ystr ,na be for a sample of size n the mean from a simple random sample, a proportional allocation and the Neyman allocation respectively. Ignoring the finite population correction, 2 1 H Nh Var ( ystr , pa ) Var ( ystr ,na ) Sh S , n h 1 N H N where S h Sh h 1 N and 2 1 H Nh Var ( y ) Var ( ystr , pa ) yhU yU . n h 1 N Thus proportional allocation yields the same results as the optimal Neyman allocation (assuming costs are the same) when 12 the variances of the strata are all the same, but if the variances differ, the optimal allocation is better. Stratified random sampling with proportional allocation always gives a smaller variance than does simple random sampling. Comparing the equations for the variances under simple random sampling, proportional allocation and optimal allocation assuming costs of all observations are equal, we see that stratification with proportional allocation is better than simple random sampling if the strata means are quite variable and that stratification with optimal allocation is even better than stratification with proportional allocation if the strata standard deviations are variable. 13