STT 825 F01 HOMEWORK #4 SOLUTIONS Due Monday, Oct. 29

advertisement
STT 825 F01
HOMEWORK #4
SOLUTIONS
Due Monday, Oct. 29, 2001
1. (7.5 points) We have a population of 500 accounts, stratified by balance of the account:
Stratum 1: balance $0 up to $500
Stratum 2: balance $500 up to $2000
Stratum 3: balance $2000 and up.
Yesterday's fees were measured on all accounts, and population characteristics are
Stratum number of accounts
mean fee
standard deviation
1
150
$2.05
$2.07
2
300
$3.93
$2.75
3
50
$8.22
$4.03
population
500
$ (a) below
$3.21
a. Give optimal allocation (assuming sampling costs are equal across strata) of a sample of n=100 accounts.
h
1
2
3
NhSh
310.5 = 150 x 2.07
825
201.5
sum = 1337
allocation nh
{310.5/1337) x 100 = 23.2 (use n1 = 23)
(825/1337) x 100 = 61.7 (use n2 = 62)
(201.5/1337)x100 = 15.1 (use n3 = 15)
b. Give the sampling weight for an observation from Stratum #1.
N1/n1 = 150/23 = 6.52
c. Suppose we had another population of accounts with strata sizes 225, 225, 50 and stratum standard deviations
$2.00, $2.00, $21.00.
h
1
2
3
i. Find the optimal allocation of 100 observations.
NhSh
nh
225x 2.00 = 450
(450/1950)x 100 rounds to 23
225 x 2.00 = 450
23
50 x 21.00 = 1050
54
Sum = 1950
ii. Compare n3 with N3. What allocation would you recommend as "optimal?"
N3 < n3, as optimal take n3 = 50, then split the left over equally among strata 1 and 2:
n1 = n2 = 25.
-------------------------------------------------------------------------------------------------------------------------------------2. (6.5 points) A company has 800 employees who travel on company business. The employees are classified into
200 Level I employees and 600 Level II employees. To audit the amount of mileage claimed last month, a stratified
random sample of 200 (70 from Level I and 130 from Level II) is taken.
a. What is the sampling weight for a Level I employee?
N1/n1 = 200/70 = 2.86
1
b. Note the sample statistics reported below; y-measurements are in 100 miles units.
Level sample size
sample mean sample standard deviation
I
70
11.20
3.467
II
130
8.44
3.00
i. Give an unbiased estimate of the total mileage claimed by only the 200 Level I employees.
200 x 11.20 = 2240
ii. What is the standard error of your estimate in (i)?
Just use SRS theory on your stratum I sample.
Estimated variance = N2(1-n/N) s2/n = 2002(1 - 70/200) (3.467)2/70 = 4464.6,
SE of the estimated total is 66.82
iii. Is there a problem with random n1 in question (i)?
No, Level I employees are a stratum, and n1 is not random.
c. Management later realized that the employees came from two different locations, A and B, and thought travel
amounts might differ by location. They decided to POST-Stratify their stratified random sample by location. The
numbers in the population and sample means are given below.
Level
I
I
II
Area
A
B
A
Number in Population
160
40
200
Number in Sample
56
14
20
Sample Mean
11.32
11.08
8.15
II
B
400
110
8.49
i. Post-stratifying by location, estimate the total mileage claimed by the 800 employees (Do NOT compute
the standard error).
Define 4 strata using the level*area combinations:
yh
Stratum
Nh
Nh y h
1 =I, A
160
11.32
1811.2
2 = I, B
40
11.08
443.2
3 = II, A
200
8.15
1630
4 = II, B
400
8.49
3396
sum =7280.4
y str = 7280.4/800 = $9.10
ii. Is the number of Level I employees in Area A a random variable?
YES because you’re post-stratifying on Area.
-------------------------------------------------------------------------------------------------------------------------------------3. (9 points) Do text problem #3, Chapter 5.
Hint): yij = 1 if error in jth field, ith claim, = 0 if no error and ti = total number of field errors for the ith claim. The
error rate is a rate PER FIELD.
2
a.
ti = total number of field errors, ith claim. Note that yij = 1 or 0 (1 if jth field, ith claim is in error).
The sum of the ti over i in the sample = 37.
Thus, tˆ = 828 (37/85) = 360.42.
You also have to compute (via describe), st = .558263 (sample standard deviation of the 85 ti values
observed.
The ERROR RATE = population mean at the ssu level (field level) because that’s were the 0,1
measurements are taken. The estimated error rate is ŷ = (360.42)/K = 360.42/178020 = .002025.
SE( ŷ ) = (equation 5.6) = .000357
b.
tˆ = 828 (37/85) = 360.42., SE( tˆ ) = K (SE( ŷ )) = 63.55
c. Using an SRS of 85x215 = 18,275 fields from a population of 178,020 fields, the estimate would be the same:
37 field errors/18,275 = .00202
The estimated variance for the SRS = (equation 2.16) = 9.92 x 10-8.
The estimated variance in part (a) for cluster sampling is (.000357)2 = 1.27 x 10-7.
Est. var. cluster/ est. var. SRS = 1.29, so that clustering increased the variance by 29%.
----------------------------------------------------------------------------------------------------------------------------------------4. (19 points) Refer to your handout with the 4 cluster designs. In all designs, we will select enough clusters to get
16 ssu's. (We have already worked with Design #1 in class.)
a. Find V[ tˆ ], wij, and ICC for
Formulas used: equation (5.2) for V[ tˆ ], wij = N/n, ICC = 1- NM(pop.MSW)/(NM-1)S2.
S2 = population MSTot = 4308/39 = 110.46 same for all cluster choices since it’s an ssu
characteristic.
i. Design 2 M=4, N=10 n = 4
St2 = M x MSBet = 4 (83) = 332,
V[ tˆ ] = 4980
wij = 10/4 = 2.5
ICC = 1 - 40(119)/(39)(110.46) = -.104
ii. Design 3
M=8, N=5 n = 2
St2 = M x MSBet = 8 (173) =1384.
V[ tˆ ]= 10,380
wij = 5/2 = 2.5
ICC = 1 - 40(103)/(39)(110.46) =1 - .956 = .044
iii. Design 4
M = 2, N = 20 n = 8
St2 = M x MSBet = 2 x 203.1 = 406.2
V[ tˆ ]= 12,186
wij = 20/8 = 2.5
3
ICC = 1- 40(22.5)/(39)(110.46) = 1 - .209 = .791
iv. SRS of size 16 ssu's. n=16, N=40
Using formula 2.13, and noting that S2 = MStot which doesn’t depend on the design,
V[ tˆ ]= 6,6277.7
wij = 40/16 = 2.5
ICC = (make a good guess here) = 1 - 0 = 1, since M=1, MSW=0.
b. Refer to your answers in (a), and recall that for Design 1, V[ tˆ ]= 7,680. Rank the designs 1-4 and SRS
according to the V[ tˆ ]. Which design gives the highest variance? the lowest variance?
Highest variance to lowest variance:
Designs 4, SRS, 3, 1, 2
c. Is S2 affected by the choice of clusters? Is St2 affected by the choice of clusters?
S2 is NOT affected since it’s a characteristic of the ssu’s; St2 is affected since it depends on the
cluster totals which depend on the cluster choices.
d. For Design #2, we took an SRS of n=4 clusters and selected clusters #3,4,7,10.
i. Use the computer to get the ANOVA on this sample.
One-way ANOVA: yij versus cluster3
Analysis of Variance for yij
Source
DF
SS
cluster3
3
15
Error
12
1260
Total
15
1275
Level
3
4
7
10
N
4
4
4
4
Mean
57.08
58.78
56.22
58.10
Pooled StDev =
10.25
MS
5
105
StDev
6.77
12.15
12.98
7.61
F
0.05
P
0.985
Individual 95% CIs For Mean
Based on Pooled StDev
------+---------+---------+---------+
(---------------*--------------)
(---------------*---------------)
(---------------*---------------)
(---------------*---------------)
------+---------+---------+---------+
49.0
56.0
63.0
70.0
ii. What is the total yield for Cluster #3?
t3 = 4 x 57.08 =228.3
iii. What is the variance in acre yields for the 4 acres in Cluster #3?
s32 = (6.77)2 = 44.9
iv. Use the data to estimate the total yield and its standard error.
Using formula (5.1), tˆ =2286.8,
SE( tˆ ) = (formula 5.3) = 11.97, the st2 = M(sampleMSBet)
v. Give an unbiased estimate of the ssu variance (variance in acre yields).
Use the formula {N(M-1)(sample MSW) + (N-1)(sample MSBet)}/(NM-1) = 83.4
4
Note that sample MSTot = 85.4 (not the same)
-------------------------------------------------------------------------------------------------------------------------------------5. (8 points) Refer to the data sheet attached. We will take a systematic sample of size 25 from this population.
There are 200 claims for expenses by 8 sales representatives.
a. How many possible systematic samples are there (size 25)?
N/n = 200/25 = 8
b. What is the probability that unit # 17 is selected? What is the sampling weight for unit #17?
Probability = 1/8
sampling weight = 8
c. Using List #1, Circle the expenses in a systematic sample of size 25 (period of 8) which begins with unit #5.
Be sure to read across rows. (Used * to indicate measurement in the sample)
EXPENSES (List #1) (listed by date)
30.0
23.8
*20.6
24.1
33.0
34.1
*45.5
19.7
20.8
18.5
*17.6
12.8
8.9
24.6
*17.3
29.1
21.2
40.9
*37.7
35.2
19.0
20.4
25.1
23.3
30.1
47.2
35.5
34.6
25.0
13.9
9.3
22.3
18.4
14.0
7.5
35.7
46.6
28.7
30.4
24.4
32.2
*21.7
11.3
39.7
31.1
*35.1
28.7
32.8
29.5
*8.8
5.9
19.6
25.1
*6.3
14.7
32.9
31.2
*39.7
15.0
25.8
29.8
13.7
19.4
34.2
19.2
22.8
24.0
27.1
11.7
12.8
8.4
16.1
12.9
10.0
15.8
35.1
40.4
18.5
43.2
33.5
*43.7
19.2
21.8
24.2
33.2
18.4
30.6
37.4
29.8
31.6
15.2
*40.6
*28.5
13.0
25.9
23.1
38.5
31.4
59.9
35.0
49.4
34.2
22.2
*13.7
*17.8
16.8
22.7
14.0
12.5
20.9
15.0
13.2
9.2
14.8
11.6
*17.1
*17.5
15.8
22.4
27.9
23.5
17.9
13.7
7.4
11.4
39.5
28.3
*28.5
*33.6
19.2
29.1
25.3
39.6
34.1
31.4
37.9
32.9
36.8
32.5
*30.0
28.8
15.8
28.9
24.5
34.6
40.1
30.1
20.2
32.8
13.1
23.5
16.9
11.6
19.3
5.8
37.5
25.5
26.7
34.3
26.2
15.9
17.1
*16.2
22.7
21.2
14.8
*32.7
12.5
13.0
19.5
*11.8
20.5
21.1
16.2
*15.7
31.5
29.5
39.9
*33.1
45.8
28.1
25.6
25.5
33.6
26.8
31.5
22.5
13.6
9.6
17.5
20.0
22.0
21.8
24.2
21.3
25.6
40.5
30.9
45.6
22.4
d. Using list #2, circle the expenses in a systematic sample of size 25 (period of 8) which begins with unit #5. Be
sure to read across rows.
The sample is all measurements in column 5 (under rep 5)
Expenses listed by rep
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
rep1
30.0
19.0
32.2
29.8
43.7
19.2
21.8
28.8
15.9
28.1
23.8
20.4
21.7
13.7
24.2
33.2
rep2
37.4
29.8
28.9
16.2
25.5
24.1
23.3
39.7
34.2
31.6
15.2
40.6
24.5
22.7
33.6
33.0
rep3
34.1
47.2
35.1
22.8
23.1
38.5
31.4
40.1
14.8
31.5
45.5
35.5
28.7
24.0
59.9
35.0
rep4
22.2
13.7
20.2
12.5
13.6
20.8
25.0
29.5
11.7
17.8
16.8
22.7
32.8
13.0
9.6
18.5
rep5
17.6
9.3
5.9
8.4
15.0
13.2
9.2
23.5
11.8
20.0
12.8
22.3
19.6
16.1
14.8
11.6
rep6
15.8
22.4
11.6
21.1
21.8
24.6
14.0
6.3
10.0
27.9
23.5
17.9
19.3
16.2
24.2
17.3
rep7
29.1
35.7
32.9
35.1
39.5
28.3
28.5
37.5
31.5
25.6
21.2
46.6
31.2
40.4
33.6
19.2
5
rep8
39.6
34.1
26.7
39.9
30.9
37.7
30.4
15.0
43.2
31.4
37.9
32.9
34.3
33.1
45.6
35.2
17
18
19
20
21
22
23
24
25
18.4
15.8
17.1
25.6
20.6
25.1
11.3
19.4
30.6
30.1
31.1
19.2
28.5
13.0
25.9
34.6
21.2
26.8
49.4
30.1
32.7
22.5
19.7
34.6
32.8
27.1
34.2
13.9
8.8
12.8
14.0
12.5
20.9
13.1
19.5
17.5
17.1
16.9
20.5
22.0
8.9
18.4
25.1
12.9
17.5
7.5
14.7
15.8
13.7
7.4
11.4
5.8
15.7
21.3
29.1
25.5
29.5
40.5
40.9
28.7
39.7
18.5
25.3
24.4
25.8
33.5
36.8
32.5
30.0
26.2
45.8
22.4
e. Comparing the population ANOVA's for List #1 and List #2, which list would give systematic samples more
representative of the population?
List (1) gives clusters which are more homogeneous, and their means are all between 21.19 and 27.52. While, list
(2) means are as low as 15.616 and as high as 33.212.
f. In designing a survey of the 200 claim amounts, and noting the population ANOVA for List #2 (by reps), would
you use sales representative as a CLUSTER or a STRATUM?
Sales representative would be useful as a critierion for stratification.
28.0
35.0
6
Download