303_HW5solns

advertisement
Homework 5 Solutions
303 Spring 2003
Steve Fienberg
Tuesday, March 11
Part III
a) Find the sampling distribution of y for:

A simple random sample of size 3 with replacement.
To do this, all of the possible simple random samples need to be enumerated.
Thus, with replacement, there will be 512 equally likely samples in a simple
random sample (8 choices for the first, 8 choices for the second and 8 choices for
the third is 8*8*8=512). To do create these 512 combinations in Minitab, go to
Calc_Patterned Data_Arbitrary set of numbers.
Store patterned data in C1, arbitrary set of numbers 1,2,4,4,7,7,7,8, list each value
1 time, list whole sequence 64 times.
Store patterned data in C2, arbitrary set of numbers 1,2,4,4,7,7,7,8, list each value
8 times, list whole sequence 8 times.
Store patterned data in C3, arbitrary set of numbers 1,2,4,4,7,7,7,8, list each value
64 times, list whole sequence 1 times.
Next, get the average for each of the possible samples by Calc_Calculator Store
results in C4, expression (C1+C2+C3)/3. To determine the frequency of each
average, choose stat_tables_tally and choose column C4, check off counts and
percents. I got the following distribution.
C4
1.00000
1.33333
1.66667
2.00000
2.33333
2.66667
3.00000
3.33333
3.66667
4.00000
4.33333
4.66667
5.00000
5.33333
5.66667
6.00000
6.33333
6.66667
7.00000
7.33333
7.66667
8.00000
N=
Count
1
3
3
7
12
6
21
33
15
47
48
12
63
57
21
57
36
6
27
27
9
1
512
Percent
0.20
0.59
0.59
1.37
2.34
1.17
4.10
6.45
2.93
9.18
9.38
2.34
12.30
11.13
4.10
11.13
7.03
1.17
5.27
5.27
1.76
0.20

A simple random sample of size 3 without replacement.
For a simple random sample without replacement, there are a possible 336 equally
likely samples (8*7*6=336) . The way I created them is a bit contorted, but I
think it works. To create these samples, go to Calc_Make Patterned
Data_Arbitrary Set of Numbers
Store data pattern in C6, Arbitrary set of numbers 1 2 4 4 7 7 7 8, list each value 1
time, list whole sequence 42 times
Store data pattern in C11, Arbitrary set of numbers 2 4 4 7 7 7 8, list each value 6
time, list whole sequence 1 times
Store data pattern in C12, Arbitrary set of numbers 1 4 4 7 7 7 8, list each value 6
time, list whole sequence 1 times
Store data pattern in C13, Arbitrary set of numbers 1 2 4 7 7 7 8, list each value 6
time, list whole sequence 2 times
Store data pattern in C14, Arbitrary set of numbers 1 2 4 4 7 7 8, list each value 6
time, list whole sequence 3 times
Store data pattern in C15, Arbitrary set of numbers 1 2 4 4 7 7 7, list each value 6
time, list whole sequence 1 times
Manip_Stack_Stack Columns, Stack the following columns C11 C12 C13 C14
C15 in a column of the current worksheet C7.
Store data pattern in C16, Arbitrary set of numbers 4 4 7 7 7 8, list each value 1
time, list whole sequence 1 times
Store data pattern in C17, Arbitrary set of numbers 2 4 7 7 7 8, list each value 1
time, list whole sequence 2 times
Store data pattern in C18, Arbitrary set of numbers 2 4 4 7 7 8, list each value 1
time, list whole sequence 3 times
Store data pattern in C19, Arbitrary set of numbers 2 4 4 7 7 7, list each value 1
time, list whole sequence 1 times
Store data pattern in C20, Arbitrary set of numbers 4 4 7 7 7 8, list each value 1
time, list whole sequence 1 times
Store data pattern in C21, Arbitrary set of numbers 1 4 7 7 7 8, list each value 1
time, list whole sequence 2 times
Store data pattern in C22, Arbitrary set of numbers 1 4 4 7 7 8, list each value 1
time, list whole sequence 3 times
Store data pattern in C23, Arbitrary set of numbers 1 4 4 7 7 7, list each value 1
time, list whole sequence 1 times
Store data pattern in C24, Arbitrary set of numbers 2 4 7 7 7 8, list each value 1
time, list whole sequence 1 times
Store data pattern in C25, Arbitrary set of numbers 1 4 7 7 7 8, list each value 1
time, list whole sequence 1 times
… and so on …
Then stack the appopriate columns for the appropriate number of times. Let
Calc_Calculator store result n variable C9, expression (C6+C7+C8)/3. Get the
counts and percentages as computed above.
C9
2.33333
3.00000
3.33333
3.66667
4.00000
4.33333
4.66667
5.00000
5.33333
5.66667
6.00000
6.33333
7.00000
7.33333
N=
Count
12
6
24
6
36
48
12
36
42
18
36
36
6
18
336
Percent
3.57
1.79
7.14
1.79
10.71
14.29
3.57
10.71
12.50
5.36
10.71
10.71
1.79
5.36
b) Is y unbiased for each of the two sampling schemes? Explain by using the information
from your sampling distribution.
The quantity of interest is (1+2+4+4+7+7+7+8)/8 = 5.
Note that the way the samples were created, they are all equally likely. Thus, the
expected values of each of the sampling schemes is just the average of the columns (C4
or C9). To get the average of the columns, calc_column statistics. Statistic mean, input
variable (the column number). Doing this, I see that both of the columns have an average
of 5. Thus, both of the sampling schemes are unbiased.
c) Draw histograms.
Graph_histogram variables C4 and C9.
For plan I)
140
120
Frequency
100
80
60
40
20
0
0
1
2
3
4
5
6
7
8
9
C4
For plan ii)
90
80
Frequency
70
60
50
40
30
20
10
0
0
1
2
3
4
5
6
7
8
9
C9
We can see that the distribution for plan ii) has a smaller spread. This is because the
“extreme” values can not be repeated (i.e. 1,1,1 and 8,8,8 are not possible). In addition, one
of these is considered to be sampled from an infinite population (plan 1) and one is from a
finite population (plan 2). Thus, the improvement factor would be used from plan 2, giving it
a smaller variance.
d) Suppose that now you wish to draw samples of size 5. Explain why you do not need to do
any further calculations in order to demonstrate that the unbiasedness property holds for
this sample. (I’m assuming this is for the without replacement plan)
This has to do with the symmetry of the problem. We know that the population has 8
possible values, and we have shown that if we pick 3, then the remaining 5 values have
the same probabilities as those chosen in the sample of 3. To determine the sum of the
remaining 5 values, realize that the total of all the values is 40. Thus, the average of the
remaining 5 values is (40-3*Avg for the 3 sample)/5. This is because 3*avg for the three
sample is just the sum of the three sample, subtracted from 40 is the sum of the 5 sample,
divided by 5 is the average of the 5 sample. To get the expected value of this, we see that
E((40-3*Avg for the 3 sample)/5) = 1/5*E(40-3*Avg for the 3 sample) = 1/5*(403*E(Avg for the three sample)) = 1/5*(40-3*5) (since we showed the average is
unbiased) = 1/5*25=5. Thus we know that the samples of size 5 would also be unbiased.
Download