Homework 5 Solutions 303 Spring 2003 Steve Fienberg Tuesday, March 11 Part III a) Find the sampling distribution of y for: A simple random sample of size 3 with replacement. To do this, all of the possible simple random samples need to be enumerated. Thus, with replacement, there will be 512 equally likely samples in a simple random sample (8 choices for the first, 8 choices for the second and 8 choices for the third is 8*8*8=512). To do create these 512 combinations in Minitab, go to Calc_Patterned Data_Arbitrary set of numbers. Store patterned data in C1, arbitrary set of numbers 1,2,4,4,7,7,7,8, list each value 1 time, list whole sequence 64 times. Store patterned data in C2, arbitrary set of numbers 1,2,4,4,7,7,7,8, list each value 8 times, list whole sequence 8 times. Store patterned data in C3, arbitrary set of numbers 1,2,4,4,7,7,7,8, list each value 64 times, list whole sequence 1 times. Next, get the average for each of the possible samples by Calc_Calculator Store results in C4, expression (C1+C2+C3)/3. To determine the frequency of each average, choose stat_tables_tally and choose column C4, check off counts and percents. I got the following distribution. C4 1.00000 1.33333 1.66667 2.00000 2.33333 2.66667 3.00000 3.33333 3.66667 4.00000 4.33333 4.66667 5.00000 5.33333 5.66667 6.00000 6.33333 6.66667 7.00000 7.33333 7.66667 8.00000 N= Count 1 3 3 7 12 6 21 33 15 47 48 12 63 57 21 57 36 6 27 27 9 1 512 Percent 0.20 0.59 0.59 1.37 2.34 1.17 4.10 6.45 2.93 9.18 9.38 2.34 12.30 11.13 4.10 11.13 7.03 1.17 5.27 5.27 1.76 0.20 A simple random sample of size 3 without replacement. For a simple random sample without replacement, there are a possible 336 equally likely samples (8*7*6=336) . The way I created them is a bit contorted, but I think it works. To create these samples, go to Calc_Make Patterned Data_Arbitrary Set of Numbers Store data pattern in C6, Arbitrary set of numbers 1 2 4 4 7 7 7 8, list each value 1 time, list whole sequence 42 times Store data pattern in C11, Arbitrary set of numbers 2 4 4 7 7 7 8, list each value 6 time, list whole sequence 1 times Store data pattern in C12, Arbitrary set of numbers 1 4 4 7 7 7 8, list each value 6 time, list whole sequence 1 times Store data pattern in C13, Arbitrary set of numbers 1 2 4 7 7 7 8, list each value 6 time, list whole sequence 2 times Store data pattern in C14, Arbitrary set of numbers 1 2 4 4 7 7 8, list each value 6 time, list whole sequence 3 times Store data pattern in C15, Arbitrary set of numbers 1 2 4 4 7 7 7, list each value 6 time, list whole sequence 1 times Manip_Stack_Stack Columns, Stack the following columns C11 C12 C13 C14 C15 in a column of the current worksheet C7. Store data pattern in C16, Arbitrary set of numbers 4 4 7 7 7 8, list each value 1 time, list whole sequence 1 times Store data pattern in C17, Arbitrary set of numbers 2 4 7 7 7 8, list each value 1 time, list whole sequence 2 times Store data pattern in C18, Arbitrary set of numbers 2 4 4 7 7 8, list each value 1 time, list whole sequence 3 times Store data pattern in C19, Arbitrary set of numbers 2 4 4 7 7 7, list each value 1 time, list whole sequence 1 times Store data pattern in C20, Arbitrary set of numbers 4 4 7 7 7 8, list each value 1 time, list whole sequence 1 times Store data pattern in C21, Arbitrary set of numbers 1 4 7 7 7 8, list each value 1 time, list whole sequence 2 times Store data pattern in C22, Arbitrary set of numbers 1 4 4 7 7 8, list each value 1 time, list whole sequence 3 times Store data pattern in C23, Arbitrary set of numbers 1 4 4 7 7 7, list each value 1 time, list whole sequence 1 times Store data pattern in C24, Arbitrary set of numbers 2 4 7 7 7 8, list each value 1 time, list whole sequence 1 times Store data pattern in C25, Arbitrary set of numbers 1 4 7 7 7 8, list each value 1 time, list whole sequence 1 times … and so on … Then stack the appopriate columns for the appropriate number of times. Let Calc_Calculator store result n variable C9, expression (C6+C7+C8)/3. Get the counts and percentages as computed above. C9 2.33333 3.00000 3.33333 3.66667 4.00000 4.33333 4.66667 5.00000 5.33333 5.66667 6.00000 6.33333 7.00000 7.33333 N= Count 12 6 24 6 36 48 12 36 42 18 36 36 6 18 336 Percent 3.57 1.79 7.14 1.79 10.71 14.29 3.57 10.71 12.50 5.36 10.71 10.71 1.79 5.36 b) Is y unbiased for each of the two sampling schemes? Explain by using the information from your sampling distribution. The quantity of interest is (1+2+4+4+7+7+7+8)/8 = 5. Note that the way the samples were created, they are all equally likely. Thus, the expected values of each of the sampling schemes is just the average of the columns (C4 or C9). To get the average of the columns, calc_column statistics. Statistic mean, input variable (the column number). Doing this, I see that both of the columns have an average of 5. Thus, both of the sampling schemes are unbiased. c) Draw histograms. Graph_histogram variables C4 and C9. For plan I) 140 120 Frequency 100 80 60 40 20 0 0 1 2 3 4 5 6 7 8 9 C4 For plan ii) 90 80 Frequency 70 60 50 40 30 20 10 0 0 1 2 3 4 5 6 7 8 9 C9 We can see that the distribution for plan ii) has a smaller spread. This is because the “extreme” values can not be repeated (i.e. 1,1,1 and 8,8,8 are not possible). In addition, one of these is considered to be sampled from an infinite population (plan 1) and one is from a finite population (plan 2). Thus, the improvement factor would be used from plan 2, giving it a smaller variance. d) Suppose that now you wish to draw samples of size 5. Explain why you do not need to do any further calculations in order to demonstrate that the unbiasedness property holds for this sample. (I’m assuming this is for the without replacement plan) This has to do with the symmetry of the problem. We know that the population has 8 possible values, and we have shown that if we pick 3, then the remaining 5 values have the same probabilities as those chosen in the sample of 3. To determine the sum of the remaining 5 values, realize that the total of all the values is 40. Thus, the average of the remaining 5 values is (40-3*Avg for the 3 sample)/5. This is because 3*avg for the three sample is just the sum of the three sample, subtracted from 40 is the sum of the 5 sample, divided by 5 is the average of the 5 sample. To get the expected value of this, we see that E((40-3*Avg for the 3 sample)/5) = 1/5*E(40-3*Avg for the 3 sample) = 1/5*(403*E(Avg for the three sample)) = 1/5*(40-3*5) (since we showed the average is unbiased) = 1/5*25=5. Thus we know that the samples of size 5 would also be unbiased.