Suggested solutions to tasks in Exam in 732A28 Survey Sampling, 2011-11-09 1. a) ˆ t y N y S 5500 31 mg 170500 mg 170.50 g n s2 50 0.312 SE tˆy N 1 5500 1 N n 5500 50 95 % Confidence interval : 50 0.312 170500 1.96 5500 1 mg 170.50 0.47 g 5500 50 b) e desired width of C.I. for mean 2 n 800 5500 8 0.0727 2 110 1.96 2 0.312 68.92 Choose n 69 1.96 2 0.312 2 8 110 5500 2. a) ˆ t str N1 y1 N 2 y 2 5500 31 4100 45 355000 mg 355.00 g n s2 n s2 SE tˆstr 1 1 N12 1 1 2 N 22 2 N1 n1 N2 n2 2 2 50 40 2 0.31 2 0.35 1 1 329.530 5500 4100 50 40 5500 4100 99 % C.I. : 335000 2.576 329.530 mg 335.00 0.85 g b) Find a 95 % C.I. for the difference y1,U y 2,U : Var y1 y 2 independen tly taken samples assumed Var y1 Var y 2 n S2 n S2 1 1 1 1 2 2 N1 n1 N 2 n2 Thus, the standard error of the difference in sample means becomes n1 s12 n2 s22 50 0.312 40 0.352 SE y1 y2 1 1 1 1 N1 n1 N 2 n2 5500 50 4100 40 0.0703 and a 95 % confidence interval for the difference in population means becomes y1 y2 1.96 SE y1 y2 31 45 1.96 0.0703 14 0.14 Hence, there is significant difference between the mean content of MDMA between the two consignments. c) Optimal allocation will here be the same as Neyman-allocation since there are no specifications about cost. nh n N h sh 5500 0.31 n1 90 49 n2 41 N1 s1 N 2 s 2 5500 0.31 4100 0.35 3. a) Ratio estimate: pˆ r yˆ r M y M i iS i iS i M pˆ M iS i i 550 i iS 33 38 31 33 610 740 480 50 50 50 50 1602.2 550 610 740 480 2380 0.6732 0.67 67% 95% confidence interval: 1 n s2 1 Vˆ yˆ r 2 1 r M N n n N s r2 M iS i y i M i yˆ r mi si2 2 M 1 i M m iS i i 2 n 1 2 2 33 38 550 550 0.6732 610 610 0.6732 50 50 2 31 33 740 740 0.6732 480 480 0.6732 50 50 si2 mi s2 pˆ 1 pˆ i pˆ i 1 pˆ i i i mi 1 mi mi 1 2 3 1482.04 M 550 610 740 480 4 595 Vˆ yˆ r 1 595 2 4 1482.04 1 50 33 50 1 33 50 1 550 2 1 4 4 50 49 550 50 38 38 50 1 38 50 50 31 50 1 31 50 610 2 1 740 2 1 49 49 610 740 50 33 50 1 33 50 480 2 1 0.000991 49 480 99 % confidence interval 0.67 2.576 0.000991 0.67 0.08 0.59 , 0.75 Alternatively, we can use M0 to compute the mean cluster size 1 5952 0.000991 0.000849 2 643 0.67 2.576 0.000849 0.67 0.08 0.59,0.75 M 32150 50 643 Vˆ yˆ r b) A ratio estimate should be more efficient since the sizes of the living areas (clusters) vary substantially. However, there is always a bias with the ratio estimate and the smaller the sample size the larger the bias. c) No, for a two-stage cluster sampling with SRS in both stages to be self-weighing we require the ratio mi M i to be at least approximately constant. This ensures that each sampled individual would represent (approximately) the same number of individuals in the population. d) PPS-sampling: 1 1 33 38 31 33 pˆ i 0.675 0.68 68% n iS 4 50 50 50 50 1 2 Vˆ pˆ pps pˆ i pˆ pps n n 1 iS pˆ pps 2 2 2 2 1 33 38 31 33 0.675 0.675 0.675 0.675 4 3 50 50 50 50 0.000892 95% confidence interval 0.68 1.96 0.000892 0.68 0.06 0.62 , 0.74 4. a) Ratio estimate (when suitable values have been given in thousands of SEK): tˆyr Bˆ t x iS yi iS xi tx 1085 10000 313100 SEK 120.98 10 6 SEK 28080 2 2 2 n t s2 n t 1 1 Vˆ tˆyr 1 x e 1 x y i Bˆ xi N x n N x n n 1 iS 2 n t 1 1 1 x y i2 Bˆ 2 y i2 2 Bˆ y i xi N x n n 1 iS iS iS 2 n t 1 1 x s y2 Bˆ 2 s x2 2 Bˆ r s y s x N x n 2 90 3131000 1 1085 2 2 5.160 1 38.300 10000 28080 90 90 28080 2 1085 2 2 2 0.77 5.160 38.300 1000 SEK 18913037.4 1000 SEK 28080 95% confidence interval: 120.98 10 6 1.96 18913037.4 1000 SEK 120.98 8.52 10 6 SEK b) Post-stratified estimate: y post tˆpost N1 N y1, R 2 y 2, R 0.75 9300 0.25 19100 11750 SEK N N 10000 11750 SEK 117.5 10 6 SEK c) In a) the non-response is assumed to be MCAR, in b) at least MAR d) The salaries may be used as an explanatory variable in a model for regression imputation since the correlation to savings is rather high. Still we need to assume that the non-response is at least MAR.