 

advertisement
Suggested solutions to tasks in Exam in 732A28 Survey Sampling,
2011-11-09
1.
a) ˆ
t y  N  y S  5500  31 mg  170500 mg  170.50 g
n  s2
50  0.312


SE tˆy   N  1   
 5500  1 

N n

 5500  50
95 % Confidence interval :
50  0.312

170500  1.96  5500  1 
mg  170.50  0.47 g

 5500  50
b)
e  desired width of C.I. for mean  2 
n
800 5500
8

 0.0727 
2
110
1.96 2  0.312
 68.92  Choose n  69
1.96 2  0.312
2
8 110 
5500
2.
a) ˆ
t str  N1  y1  N 2  y 2  5500  31  4100  45  355000 mg  355.00 g

n 
s2 
n 
s2
SE tˆstr   1  1   N12  1  1  2   N 22  2 
N1 
n1 
N2 
n2

2
2
50 
40 


2 0.31
2 0.35
 1 
 1 
 329.530
  5500 
  4100 
50
40
 5500 
 4100 
 99 % C.I. : 335000  2.576  329.530 mg  335.00  0.85 g
b) Find a 95 % C.I. for the difference y1,U  y 2,U :
Var  y1  y 2   independen tly taken samples assumed  Var  y1   Var  y 2  

n  S2 
n  S2
 1  1   1   1  2   2
N1  n1 
N 2  n2

Thus, the standard error of the difference in sample means becomes

n1  s12 
n2  s22
50  0.312 
40  0.352





SE  y1  y2    1 
  1

 1 
 1 



N1  n1 
N 2  n2
 5500  50
 4100  40

 0.0703
and a 95 % confidence interval for the difference in population means becomes
y1  y2  1.96  SE y1  y2   31  45  1.96  0.0703   14  0.14
Hence, there is significant difference between the mean content of MDMA between the two
consignments.
c) Optimal allocation will here be the same as Neyman-allocation since there are no
specifications about cost.
nh  n 
N h  sh
5500  0.31
 n1 90 
 49  n2  41
N1  s1  N 2  s 2
5500  0.31  4100  0.35
3.
a) Ratio estimate:
pˆ r  yˆ r
M  y

M
i
iS
i
iS
i
 M  pˆ

M
iS
i
i

550 
i
iS
33
38
31
33
 610   740   480 
50
50
50
50  1602.2 
550  610  740  480
2380
 0.6732  0.67  67%
95% confidence interval:
1 
n  s2
1
Vˆ yˆ r  2  1    r 
M  N  n n  N
 
s r2 
 M
iS
i
 y i  M i  yˆ r

mi  si2 
2 

M

1


i
 M   m 
iS
i 
i 

2

n 1
2
2

33
38
 

  550   550  0.6732    610   610  0.6732  
50
50
 


2
31
33

 

  740   740  0.6732    480   480  0.6732 
50
50

 

si2 
mi
s2
pˆ  1  pˆ i 
 pˆ i  1  pˆ i   i  i
mi  1
mi
mi  1
2

 3  1482.04

M  550  610  740  480  4  595

 
Vˆ yˆ r 
1
595 2

4  1482.04
1 
50  33 50   1  33 50 

 1   

  550 2  1 


4
4  50 
49
 550 
 50 
38  38 50   1  38 50 
50  31 50   1  31 50 


 610 2  1 
 740 2  1 



49
49
 610 
 740 
50  33 50   1  33 50  

 480 2  1 
  0.000991

49
 480 

 99 % confidence interval
0.67  2.576  0.000991  0.67  0.08  0.59 , 0.75
Alternatively, we can use M0 to compute the mean cluster size
 
1
 5952  0.000991  0.000849
2
643
 0.67  2.576  0.000849  0.67  0.08  0.59,0.75
M  32150 50  643  Vˆ yˆ r 
b) A ratio estimate should be more efficient since the sizes of the living areas (clusters) vary
substantially. However, there is always a bias with the ratio estimate and the smaller the
sample size the larger the bias.
c) No, for a two-stage cluster sampling with SRS in both stages to be self-weighing we
require the ratio mi M i to be at least approximately constant. This ensures that each sampled
individual would represent (approximately) the same number of individuals in the population.
d) PPS-sampling:
1
1  33 38 31 33 
pˆ i    

   0.675  0.68  68%

n iS
4  50 50 50 50 
1
2
Vˆ  pˆ pps  
   pˆ i  pˆ pps  
n  n  1 iS
pˆ pps 
2
2
2
2
1  33
  38
  31
  33
 

   0.675     0.675     0.675     0.675   
4  3  50
  50
  50
  50
 
 0.000892
 95% confidence interval
0.68  1.96  0.000892  0.68  0.06  0.62 , 0.74 
4.
a) Ratio estimate (when suitable values have been given in thousands of SEK):
tˆyr  Bˆ  t x 


iS
yi
iS
xi
 tx 
1085
 10000  313100 SEK  120.98  10 6 SEK
28080
2

2

2
n   t  s2 
n  t  1 1 

Vˆ tˆyr   1     x   e  1     x   
   y i  Bˆ  xi  
 N   x  n  N   x  n n  1  iS

2
n  t  1 1 


 1     x   
   y i2  Bˆ 2   y i2  2  Bˆ   y i  xi  
 N   x  n n  1  iS
iS
iS

2


n  t  1

 1     x    s y2  Bˆ 2  s x2  2  Bˆ  r  s y  s x 
 N x n
2
90   3131000  1 

 1085 
2
2
   5.160  
 1 
  
  38.300 

10000
28080
90
90
28080

 




2

 1085 
2
2
 2
  0.77  5.160  38.300  1000 SEK   18913037.4 1000 SEK 
 28080 

 95% confidence interval:
120.98  10 6  1.96  18913037.4  1000 SEK  120.98  8.52  10 6 SEK
b) Post-stratified estimate:
y post 
 tˆpost
N1
N
 y1, R  2  y 2, R  0.75  9300  0.25  19100  11750 SEK
N
N
 10000  11750 SEK  117.5  10 6 SEK
c) In a) the non-response is assumed to be MCAR, in b) at least MAR
d) The salaries may be used as an explanatory variable in a model for regression
imputation since the correlation to savings is rather high. Still we need to assume that the
non-response is at least MAR.
Download