Lohr 2.2 a) Unit 1 is included in samples 1 and 3. 1 is therefore 1/8 + 1/8 = 1/4 Unit 2 is included in samples 2 and 4. 2 is therefore 1/4 + 3/8 = 5/8 Unit 3 is included in samples 1 and 2. 3 is therefore 1/8 + 1/4 = 3/8 Unit 4 is included in samples 3, 4 and 5. 4 is therefore 1/8 + 3/8 + 1/8 = 5/8 Unit 5 is included in samples 1 and 5. 5 is therefore 1/8 + 1/8 = 1/4 Unit 6 is included in samples 1, 3 and 4. 6 is therefore 1/8 + 1/8 + 3/8 = 5/8 Unit 7 is included in samples 2 and 5. 7 is therefore 1/4 + 1/8 = 3/8 Unit 8 is included in samples 2 , 3, 4 and 5. 8 is therefore 1/4 + 1/8 + 3/8 + 1/8 = 7/8 b) Sample 1 2 3 4 5 y 1 4 7 7 4 4.75 2 4 7 8 4 5.25 1 4 7 8 4 5 2 4 7 8 4 5.25 4 7 7 8 4 6.5 Thus, the sampling distribution is tˆ Prob. 8 4.75 38 18 8 5.25 42 14 8 5 40 8 5.25 42 18 38 8 6.5 52 18 tˆ Prob. 38 1 8 40 1 8 42 5 8 52 1 8 Lohr 2.6 a) 30 25 Frequency 20 15 10 5 0 b) 0 2 6 4 Number of refereed publications 10 8 1 1 yˆU yS yi 28 0 4 1 3 2 110 50 iS 50 89 1.78 50 1 1 2 2 2 2 s2 yi y s 28 0 1.78 4 1 1.78 1 10 1.78 49 iS 49 7.1955 SE yˆ U n s2 50 7.1955 1 1 0.37 N n 807 50 d) # Sample units with no refereed publicatio ns n 28 0.56 50 pˆ n pˆ 1 pˆ 50 0.56 0.44 SE pˆ 1 1 0.0687 n 1 49 N 807 95% C.I. : p 0.56 1.96 0.0687 0.43 , 0.69 Lohr 2.26 Assume N n k k possible samples. One of them contains unit i 1 1 n Pr Unit i is in sample k N n N but 1 1 Pr S k N n Stratified sampling • Population divided into strata (one stratum). [Males and Females; Different regions; Age classes, etc.]. Stratum sizes N1, N2, … , NH • Sampling made form each stratum to account for different variation within different strata Increases precision. Sample sizes n1, n2, … , nH • Estimation of population totals and population means (proportions) by weighing sample means from all strata. Weights computed from relative stratum sizes (Nh / N ) H ˆyU y str N h yh h 1 N • Design planning: Select the total sample size based on precision requirements (typically length of confidence intervals). Allocate the sample units over strata – proportional allocation (only size-based), optimal allocation (size-, variation- and cost-based). Lohr 3.2 a) Sample strat1 y1 Pr S1 Sample strat2 1,2 1.5 1/6 4,7 1,4 1,8 2,4 2,8 4,8 2.5 4.5 3 5 6 1/6 1/6 1/6 1/6 1/6 4,7 4,7 7,7 7,7 7,7 y2 Pr S 2 5.5 1/6 5.5 5.5 7 7 7 1/6 1/6 1/6 1/6 1/6 b) There are 36 combinations of samples but only 12 combinations with unique values of total estimates y1 y2 tˆstr 8 0.5 y1 0.5 y2 1.5 5.5 28 2.5 5.5 32 3 5.5 34 1.5 7 34 2.5 7 4.5 5.5 38 40 3 7 40 5 5.5 42 4.5 7 46 6 5.5 46 5 7 48 6 7 52 Prob 3 1 / 6 (1 / 6) 1 / 12 3 1 / 6 (1 / 6) 1 / 12 3 1 / 6 (1 / 6) 1 / 12 3 1 / 6 (1 / 6) 1 / 12 3 1 / 6 (1 / 6) 1 / 12 3 1 / 6 (1 / 6) 1 / 12 3 1 / 6 (1 / 6) 1 / 12 3 1 / 6 (1 / 6) 1 / 12 3 1 / 6 (1 / 6) 1 / 12 3 1 / 6 (1 / 6) 1 / 12 3 1 / 6 (1 / 6) 1 / 12 3 1 / 6 (1 / 6) 1 / 12 tˆstr 28 32 34 38 40 42 46 48 52 Prob 1/12 1/12 1/6 1/12 1/6 1/12 1/6 1/12 1/12 c) tˆstr 28 32 34 38 40 42 46 48 52 Prob 1/12 1/12 1/6 1/12 1/6 1/12 1/6 1/12 1/12 1 1 1 1 1 ˆ E t str 28 32 34 38 40 12 12 6 12 6 1 1 1 1 42 46 48 52 40 12 6 12 12 2 1 2 1 2 1 V tˆstr 28 40 32 40 34 40 12 12 6 568 2 1 52 40 47.33 12 12 y1U 1 2 4 8 4 3.75 1 28.75 2 2 2 2 S12 1 3.75 2 3.75 4 3.75 8 3.75 3 3 y2U 4 7 7 7 4 6.25 1 6.75 2 2 2 2 S 22 4 6.25 7 6.25 7 6.25 7 6.25 2.25 3 3 1 1 1 1 ˆ E t str 8 y1, U y2, U 8 3.75 6.25 40 2 2 2 2 2 2 1 2 28 . 75 3 1 2 2 . 25 2 47.33 V tˆstr 8 1 1 2 4 2 2 4 2 Lohr 3.7 a) Stratum, h Nh nh yh 1 0 2 1 0 8 7 Biological , 1 102 7 22 7 3.143 Physical, 2 310 19 40 19 2.105 Social, 3 217 13 16 13 1.231 Humanities , 4 178 11 5 11 0.455 Total 807 50 sh2 1 0 - 3.1432 2 1 - 3.1432 6 6.810 2 0 8 - 3.143 8.211 4.359 0.873 4 4 N 22 40 16 5 h tˆstr N yh N h yh 102 310 217 178 1321 7 19 13 11 h 1 N h 1 2 n s 2 SE tˆstr 1 h N h h Nh nh h 1 4 7 19 2 6.810 2 8.211 1 1 102 310 102 7 310 19 13 11 2 4.359 2 0.873 1 217 1 178 13 11 217 y 178 48.6 str b) To compare with the results from 2.6 b divide both estimates with N: tˆstr 1321 ystr 1.64 Compare with 1.78 from 2.6 b N 807 SE tˆstr 48.6 SE ystr 0.06 Compare with 0.37 from 2.6 b N 807 c) 4 pˆ str h 1 Nh 102 1 310 10 217 9 178 8 pˆ h 0.57 N 807 7 807 19 807 13 807 11 n N pˆ 1 pˆ h SE pˆ str 1 h h h N N n 1 h 1 h h 2 4 7 102 1 7 6 7 19 310 10 19 9 19 1 1 102 807 6 310 807 18 2 2 13 217 9 13 4 13 11 178 8 11 3 11 1 1 217 807 12 178 807 10 0.0658 2 2 d) In 2.6 d) the standard error of the corresponding estimated proportiomn was 0.0687. Thus, the precsion has been reduced a bit. A corresponding confidence interval here gets the error margin 1.96 0.0658 0.129 compared to 0.135 of 2.6 d) Lohr 3.22 a) nh N h Sh N l l Sl ch cl n N h N S h N l N Sl ch ch l For 0/1 - data 2 p 1 p N 1 2 and in general 2 S N N h N ph 1 ph ch N N 1 ph 1 ph nh n N 1 N l N l N N 1 pl 1 pl ch Sh N h N ph 1 ph ch N h N pl 1 pl ch N n l Now, ch c , h 1,2 nh N h N ph 1 ph n N N p 1 p h l l l 0.4 0.10 0.90 2000 1079 0.4 0.10 0.90 0.6 0.03 0.97 n2 2000 1079 921 n1 b) N p 1 ph nh N h S h2 nh N h N 1 h V pˆ str 1 1 N h N nh Nh N nh h h 2 2 n N p 1 ph N 1 h h h N 1 h Nh N nh 2 N p 1 ph N can be assumed very large V pˆ str h h nh h N N Under proportion al allocation , nh h n N N p 1 ph 0.10 0.90 0.03 0.97 V pˆ str h h 0. 4 0 .6 2.673 10 5 n 2000 2000 h N Under optimal allocation (see a) ), n1 1079, n2 921 2 V pˆ str 0.4 2 0.10 0.90 2 0.03 0.97 0.6 2.472 10 5 1079 921 Cluster sampling • Population divided into heterogeneous groups - clusters, each serving as a mirror of the whole population [communities, living areas, schools, classes within a school] • Clusters are not the same as strata. Care should be taken so that two clusters by definition do not have different population properties. • Cluster sampling is a tool for economising the sampling. Precisions of estimates are usually worse than for simple random sampling (SRS) of observation units. • Cluster sampling can be made as one-stage, two-stage or multi-stage sampling. primary units [highest level, e.g. communities], secondary units [e.g. living areas], tertiary units [e.g. individuals] • Clusters can be of equal or non-equal sizes [different formulas for estimators] • Sampling at different stages can be made differently – with equal or unequal (Ch. 6) probabilities • Stratified sampling can be involved. E.g. If communities are clusters we still may consider individuals in owned homes to have different living habits than individuals in rented homes. Thus we may stratify within communities. • National surveys are almost always made with cluster sampling in a complex fashion. • Formulas are always more complicated, due to the more complex structure of the sampling – Cost is lower. Lohr 5.11 Claims are primary units, fields are secondary units. 215 (interesting) fields in each claim Clusters with equal sizes, M =215 One-stage sampling: SRS of primary units, checking all secondary units within a sampled primary unit. N 828 (claims) n 85 (number of sampled claims) Claim, i M i M 1 215 ti 4 2 3 4 215 215 215 3 2 2 5 6 215 215 2 2 7 215 1 28 29 215 215 1 0 85 215 0 a) N tˆ n yˆ ti iS 30636 828 360 4 3 4 2 22 1 57 0 85 85 30636 85 tˆ 0.0020 N M 828 215 2 tˆ 1 2 ti st N n 1 iS 2 2 2 30636 85 30636 85 30636 85 1 4 0.5583 57 0 3 828 828 828 84 2 s 85 0.5583 1 n 1 t 0.00036 1 1 SE yˆ M N n 215 828 85 b) tˆ 360 2 s n 85 0.5583 SE tˆ N 1 t 828 1 63.6 N n 828 85 Lohr 5.6 N 580 (cases) M i M 24 (cans) [equal cluster sizes] n 12 (sample of cases, primary sample) mi m 3 (sample size of secondary sample) ˆt unb wij yij N M i yij N M yij mi n m iS jSi iS jS i iS jS i n 580 24 1 5 7 4 2 4 0 0 0 12 3 ˆ tˆunb 262 580 24 580 262 131 50653 yunb 12 3 3 N 3 2 ˆ tˆunb Mi 1 M ˆ st2 t t y yij i i ij n 1 iS N n jSi jS i mi 2 2 1 24 24 1 5 7 262 3 0 0 0 262 3 11 3 3 2611.88 1 yij yi 2 si2 mi 1 jSi 2 2 2 1 1 5 7 1 5 7 1 5 7 2 s1 1 5 7 9.33 2 3 3 3 etc. 2 2 s m s n N 12 2611.88 2 t i i M i Vˆ tˆunb N 1 1 580 2 1 mi 12 N n n iS M i 580 2 580 3 1 24 2 9.33 1.33 1 3 7 12.33 5.33 2.33 4 2.33 6.33 0 12 24 71704800 1323600 73028400 Compare with 2 s 2611.88 t VˆWR tˆunb N 580 2 73219700 n 12 2