STT 825 F01 HOMEWORK #6 SOLUTIONS Due Wednesday

advertisement
STT 825
F01
HOMEWORK #6
SOLUTIONS
Due Wednesday, December 5
1. ( 19 points) We have a population with 8 clusters, and each cluster is composed of two strata. Consider the
psu's as brokers, the ssu's as accounts which are stratified by type of account (I or II). The yij = fee per account.
(Broker)
psu
1
2
3
4
5
6
7
8
Number of Accounts
Mi Type I Type II
100
50
50
100
50
50
100
60
40
100
30
70
200
100
100
200
100
100
200
120
80
200
40
160
We will take a with replacement PPS sample 2 brokers (psu's), then a stratified random sample (without
replacement) of 10 accounts, proportionally allocated.
a.
i. If Broker #8 (psu #8) is selected, how many accounts (ssu's) would be selected from Type I
Type II, respectively?
Select (40/200) x 10 = 2 accounts from I, select 8 from II.
and
ii. What is the probability that Broker #7 is selected for the sample?
M7/K = 200/1200 = 1/6 = .166667 = 7, so selection probability = 1-(1-.1666)2 = .3055
iii. Given that broker #7 is selected, we would then take a stratified random sample of 6 and 4 accounts
from Types I and II, respectively. Given that broker #7 is selected, what is the probability that account
#26 of Type I is then selected for the sample.
P(acct #26, Type I selected Broker #7 selected) = 6/120 = 1/20 = .05
iv. Give the PPS probabilities for each broker (psu).
Broker 1
2
3
4
5
i
1/12 1/12 1/12 1/12 1/6
or
.083
.083
.083
.083
.167
6
1/6
.167
7
1/6
.167
8
1/6
.167
b. Suppose we took the sample and selected brokers #3 and #5. Data is given below and descriptive statistics
are given below:
Row
y3Ii
y3IIi
y5Ii
y5IIi
1
2
3
4
5
6
20.40
12.37
17.28
9.14
10.50
15.57
26.79
19.05
36.39
20.27
15.59
19.28
20.28
33.46
30.97
17.12
17.63
16.21
25.01
22.52
Descriptive Statistics: y3Ii, y3IIi, y5Ii, y5IIi
Variable
y3Ii
y3IIi
y5Ii
N
6
4
5
Mean
14.21
25.63
23.92
Median
13.97
23.53
20.28
TrMean
14.21
25.63
23.92
1
StDev
4.30
7.94
7.82
SE Mean
1.76
3.97
3.50
y5IIi
5
19.70
------------------------------
17.63
19.70
3.85
1.72
i. Estimate the total fees for Broker #3 (use stratified sampling methods).
Broker #3: 60(14.21) + 40 (25.63) = 852.6 + 1025.2 = 1877.8
ii. Estimate the total fees for Broker #5 (use stratified sampling methods).
Broker #5: 100(23.92) + 100 (19.70) = 4362
c.
i. Now you have estimated totals for Brokers #3 and #5 (from part (c)). Get an estimate of the total fees
for all brokers by combining the 2 estimates using unequal probability sampling formulas.
tˆ  = ½{ tˆ 3/3 + tˆ 5/5} = ½{22533.6 + 26172} = 24352.8
ii. Compute its standard error of the estimator in (i) using unequal probability sampling formulas.
Just find the sample variance of 22533.6 and 26172: = (2575)2. Then the estimated v
variance = ½(2575)2 = 331,531.5. The SE = $1820.80
Problem #1, continued.
d.
i. Estimate the mean fee per account by the unbiased estimator.
= tˆ / K = 24,352.8/1200 = $20.29
ii. Give the standard error of the estimator in (i).
SE = 1820.80/1200 = $1.52
iii. Estimate K by the sum of the observed Mi/i.
= ½{100/.083 + 200/.167} = 1200. Note that the estimated K is equal to K here
because we have PPS.
iv. Estimate the mean fee per account by a ratio estimator (hint: use (iii)).
= $20.29 since estimated K is equal to K
e. i. Find j3, the probability that ssu #j is in the sample given psu #3 was selected.
j3 does not depend on the strata since we have exact proportional allocation.
= 6/60 = 4/40 = 1/10 = .10 for both I and II.
ii. Find j5.
= 1//20 = .05 for both I and II.
----------------------------------------------------------------------------------------------------------------2. (6 points) Do text problem #13.
Have PPS sample (self weighting). tˆ  = (K/n) (sum of the observed psu means)
Unit 14
12
09
14
16
01
14
10
21
yi
1.75
1.25
.25
.75
1.0
2.25
1.0
1.25
2.0
Sum = 17.0
answer = (807/10) (17) = 1371.9
The estimated variance = K2/n {sample variance of the yi ’s} = (807)2/10 {1.462}2, which give SE = 373.
2
11
5.5
--------------------------------------------------------------------------------------------------------------------------3. (9 points) Consider a sample of 4 units as given below.
i
46
15
07
yi
17.9
18.4
14.1
59
15.8
a. Suppose the data were obtained from a simple random sample of size 4 from a population of size 80.
i
____.05__
___.05___
_.05_____
_.05_____
wi
___20___
___20___
__20____
___20___
i. Fill in the blanks in the i ( = probability that the ith unit is in the sample) row above.
ii. Fill in the blanks in the wi (= sampling weights) row above.
iii. Compute the Horvitz-Thompson estimator (show your work) of the population mean.
tˆ  = sum of the weights multiplied by yi = 20(17.9 + 18.4 + 14.1 + 15.8) = 1324,
Estimated K = sum of the weights = 80,
The H-T estimator of the mean = 1324/80 = 16.55
iv. Compute the sample mean and compare with (iii).
Sample mean = (17.9 + 18.4 + 14.1 + 15.8)/4 = 16.55.
b. Now suppose that the sample above is a stratified random sample where labels i=1-30 are from stratum #1
(size 30), and labels i=31-80 are from stratum #2 (size 50) (note the label numbers of the sample units to identify
their stratum).
i
__.04____
__.066____
_.066_____
__.04____
wi
__25____
___15___
__15____
__25____
i. Fill in the blanks in the i ( = probability that the ith unit is in the sample) row above.
ii. Fill in the blanks in the wi (= sampling weights) row above.
iii. Compute the Horvitz-Thompson estimator (show your work) of the population mean.
Find estimate of N = sum of the weights = 80,
Then the H-T estimator of the mean = {25(17.9) + 25(15.8) + 15(18.4) + 15(14.1)}/80 = 16.625
iv. Compute the unbiased estimator of the population mean and compare with (iii).
= y str = (N1/n1) y 1 + (N2/n2) y 2 = (30/80) ( 16.25) + (50/80) (16.85) = 16.625
--------------------------------------------------------------------------------------------------------------------------4. (7 points) Consider the population of brokers and accounts in Problem #1. Suppose we took a without
replacement PPS sample n=2 brokers. The i’s are given below.
i
i
1
.1732
2
.1732
3
.1732
4
.1732
5
.3276
6
.3276
a. Keep the 2nd stage sampling the same as in #1.
i. Find the sampling weights for broker #3 accounts.
w3 wj3 = (1/.1732) (10) = 57.737
ii. Find the sampling weights for broker #5 accounts.
=(1/.3276) (20) = 61.05
3
7
.3276
8
.3276
b.
i. Compute the Horvitz-Thompson estimator of the total fees using weights formula.
= 57.737 (sum of broker 3 fees) + (61.05) (sum of broker 5 fees) = 24,158.89
ii. Use the with replacement formula to get the approximate standard error of your estimator in (i).
tˆ i
tˆ i/(i/n)
i
i/n
3
1877.8
.1732/2=.0866
21683.6
5
4362
.3276/2=.1638
26630
Find the sample variance of 2 values: 21683.6 and 26630, = 12,233,450
Estimated variance of tˆ HT = ½(1-2/8) (12,233,450) = 4587543.7, SE = 2142
c. Compute the Horvitz-Thompson estimator of the mean fee per account.
Must find KHT = sum of the weights = 57.737(10) + 61.05 (10) = 1187.87.
HT estimator of the mean = 24,158.89/1187.87 = $20.34
------------------------------------------------------------------------------------------------------------------------------5. (9 points) The target population is adults in a designated City. The City is first divided into 3 geographic
areas consisting of 100, 300, and 600 blocks each. A simple random sample of 40, 40, and 60 blocks is taken
from each part, respectively. Then for each selected block, a simple random sample of 10 households is taken.
For each household, one adult is selected at random for an interview.
a. A subset of the observations is given below. Give the sampling weight for each of the listed observational
units.
Area Block Number of
Household
Number of
Adult
Label Label Households
Label
adults in household
Label Sampling Weight
1
35
120
95
3
2
__(2.5)(12)(3) = 90____
2
297
56
06
4
3
__(7.5)(5.6)(4) = 168___
3
488
52
20
1
1
__(10)(5.2)(1) = 52____
b. Which geographic area has been OVERSAMPLED?
Proportional allocation: 10%, 30%, 60% of 140, or 14, 42, 84. Area 1 (n1=40) is oversampled.
c. Suppose we decided to post-stratify by type of housing, and this further subdivided the geographic areas
into single family/ multiple family housing. In Geographic area #1, there are 40 blocks of single family and 60
blocks of multiple family housing, and the sample of blocks from Geographic area #1 had 15 single family
blocks and 25 multiple. Block #35 (in part (a)) is multiple. Give a new sampling weight for this adult (Area #1,
Block #35, Household #95, Adult #2) with post stratification.
The post-stratification occurs at the psu (blocks) level. When you combine the geographic strata and the type
of housing strata, you get 6 strata. Then “Area 1 multiple” has 60 blocks in the population, and 25 in the
sample. The new weight for the stage 1 psu is then 60/25 = 2.4 (which would then replace the 2.5 in the first
line product.
ans. (2.4) (12) (3) = 86.4
4
Download