Suppose that , and are estimators of the parameter

advertisement
STAT 211 – 200
EXAM 5 – FORM A
SUMMER03
Although tea is the world’s most widely consumed beverage after water, little is known about its nutritional
value. Folacin is the only B vitamin present in any significant amount in tea, and recent advances in assay
methods have made accurate determination of folacin content feasible. Consider the data on folacin
content for randomly selected specimens of the four leading brands of green tea.
Brand Observations
1
7.9 6.2 6.6 8.6 8.9 10.1 9.6
2
5.7 7.5 9.8 6.1 8.4
3
6.8 7.5 5.0 7.4 5.3 6.1
4
6.4 7.1 7.9 4.5 5.0 4.0
Let
1 ,  2 ,  3 ,  4 be
the true average folacin content for Brand 1, Brand 2, Brand 3, Brand 4
respectively. The following are the MINITAB results for different analysis.
Analysis of Variance
Source
Factor
Error
Total
Level
Brand
Brand
Brand
Brand
1
2
3
4
DF
3
20
SS
23.50
41.78
65.27
MS
7.83
n
7
5
6
6
Mean
8.271
7.500
6.350
5.817
StDev
1.463
1.681
1.060
1.551
Pooled StDev =
F
3.75
P
0.028
1.445
Tukey's pairwise comparisons
Family error rate = 0.0500
Individual error rate = 0.0111
Critical value = 3.96
Intervals for (column level mean) - (row level mean)
Brand2
Brand1
-1.598
3.141
Brand2
Brand3
-0.330
4.173
-1.301
3.601
Brand4
0.203
4.706
-0.767
4.134
Brand3
-1.803
2.870
Test for Equal Variances
Bartlett's Test (normal distribution)
Test Statistic: 0.965
P-Value
: 0.810
Levene's Test (any continuous distribution)
Test Statistic: 0.433
P-Value
: 0.732
Comparison of two means:
Difference = mu Brand 1 - mu Brand 4
Estimate for difference: 2.455
95% CI for difference: (0.582, 4.328)
T-Test of difference = 0 (vs not =): T-Value = 2.92
P-Value = 0.015
DF = 10
STAT 211 – 200
EXAM 5 – FORM A
Difference = mu Brand 1 - mu Brand 4
Estimate for difference: 2.455
95% CI for difference: (0.614, 4.296)
T-Test of difference = 0 (vs not =): T-Value = 2.93
Both use Pooled StDev = 1.50
Difference = mu Brand 2 - mu Brand 4
Estimate for difference: 1.683
95% CI for difference: (-0.583, 3.950)
T-Test of difference = 0 (vs not =): T-Value = 1.71
Difference = mu Brand 2 - mu Brand 4
Estimate for difference: 1.683
95% CI for difference: (-0.522, 3.889)
T-Test of difference = 0 (vs not =): T-Value = 1.73
Both use Pooled StDev = 1.61
Difference = mu Brand 3 - mu Brand 4
Estimate for difference: 0.533
95% CI for difference: (-1.235, 2.302)
T-Test of difference = 0 (vs not =): T-Value = 0.70
Difference = mu Brand 3 - mu Brand 4
Estimate for difference: 0.533
95% CI for difference: (-1.175, 2.242)
T-Test of difference = 0 (vs not =): T-Value = 0.70
Both use Pooled StDev = 1.33
SUMMER03
P-Value = 0.014
DF = 11
P-Value = 0.125
DF = 8
P-Value = 0.118
DF = 9
P-Value = 0.506
DF = 8
P-Value = 0.503
DF = 10
Folacin data found to be normally distributed. Answer the following 10 questions using all this information.
1.
2.
3.
Which of the following is correct based on the information and analysis for the assumptions required
for the analysis?
(a) Only normality assumption is satisfied for the data
(b) Only constant variance assumption is satisfied for this data
(c) Both the normality assumption and constant variance assumption are satisfied for this data
(d) Neither the normality assumption nor constant variance assumption are satisfied for this data
Which of the following is the point estimate for  4   2 ?
(a) -5.817
(b) -1.683
(c) 1.683
(d) 5.817
(e) I need to know  2 and  4 to answer the question
I would like to test
H 0 :  2   4  0 versus H a :  2   4  0 , which of the following would be
the corresponding P-value?
(a) 0.0590
(b) 0.0625
(c) 0.1180
(d) 0.1250
(e) I do not have enough information to compute it
4.
Which of the following is the MSE (Mean Squared Error) for the Analysis of Variance?
(a) 2.089
(b) 7.830
(c) 23.50
(d) 41.780
STAT 211 – 200
EXAM 5 – FORM A
SUMMER03
5.
Are there significant differences among the true mean folacin content of four brands using =0.05?
(a) Since the P-value on the corresponding test is less than 0.05, there are significant differences
among four brands.
(b) Since the P-value on the corresponding test is more than 0.05, there are significant differences
among four brands.
(c) Since the P-value on the corresponding test is less than 0.05, there are no significant differences
among four brands.
(d) Since the P-value on the corresponding test is more than 0.05, there are no significant differences
among four brands.
6.
Which of the following is the total degrees of freedom for the Analysis of Variance?
(a) 20
(b) 21
(c) 23
(d) 24
(e) 25
7.
Which of the following is the point estimate for the constant standard deviation in the analysis of
variance?
(a) 1.330
(b) 1.445
(c) 1.500
(d) 1.610
8.
Look at the Tukey’s pairwise comparisons and tell me which of the following is the right conclusion
using =0.05?
(a) There are no significant differences between those four brands.
(b) Only Brands 1 and 4 are significantly different
(c) Only Brands 1,2 and 4 are significantly different
(d) Only Brands 1,3 and 4 are significantly different
(e) Those four brands are significantly different than one another
9.
I believe that true averages of Brand 1 and Brand 2 are significantly different than Brand 3 and Brand
4. To test this belief, which of the following null hypothesis that you should use?
(a) 1   3 , 1   4 ,  2   3 ,  2   4
(b)
(c)
1  0.5(  3   4 ) ,  2  0.5( 3   4 )
1   2   3   4
10. If you were comparing the true variances if Brand 1 and Brand 2, which of the following would be the
corresponding test statistics?
(a) I do not have enough information to answer this question
(b) 0.7575
(c) 0.8703
11. Which of the following cannot be dependent samples collected for the analysis?
(a) The new treatment will be compared to a current treatment by recording the change in cholesterol
readings over a 10 week period. The study will involve at most 30 participants.
(b) A study was designed to measure the effect of home environment on academic achievement of 12year old students. Since they wanted to control the genetic differences of choosing different
people, thirty sets of identical twins were identified. One is assigned to academic and the other
one to nonacademic group.
(c) Two random samples of 6 rats each exposed to different environments. One sample of rats
held in normal environment at 26C, the other sample was held in cold 5C. Blood pressures
for rats are recorded for comparison purposes.
STAT 211 – 200
EXAM 5 – FORM A
SUMMER03
Infestation of crops by insects has long been of great concern to farmers and agricultural scientists. Certain
article reports normally distributed data on x=age of cotton plant (days) and y=% damaged squares. The
following are the data and corresponding analysis done in MINITAB.
X: 9
12
12
15
18
18
21
21
27
30
30
33
Y: 11 12
23
30
29
52
41
65
60
72
84
93
Regressing Y on X
Predictor
Constant
X
Coef
-19.670
3.2847
S = 9.094
SE Coef
7.524
0.3440
R-Sq = 90.1%
Analysis of Variance
Source
DF
Regression
1
Residual Error
10
Total
11
T
-2.61
9.55
P
0.026
0.000
R-Sq(adj) = 89.1%
SS
7541.7
827.0
8368.7
MS
7541.7
82.7
F
91.19
P
0.000
Scatter plot of x versus y
Y
90+
60+
30+
0+
x
x
x
x
x
x
x
x
x
x
x
x
------+---------+---------+---------+---------+---------+X
10.0
15.0
20.0
25.0
30.0
35.0
Answer the following 9 questions using this information.
12. Which of the following is a possible relationship between x and y?
(a) They are positively related
(b) They are negatively related
(c) They are unrelated
13. Which of the following is the regression equation for regressing y on x?
(a) X=-19.670+3.2847Y
(b) X=3.2847-19.670Y
(c) Y=-19.670+3.2847X
(d) Y=3.2847-19.670X
STAT 211 – 200
EXAM 5 – FORM A
SUMMER03
14. What is the predicted % damaged squares when the age of a cotton plant is 20 days?
(a) There is no way to predict this with the given information
(b) -390.1153
(c) 42.7393
(d) 46.024
(e) 49.3087
15. What is the corresponding residual in the regression analysis when the age of a cotton plant is 15 days?
(a) There is no way to compute this with the given information
(b) -29.6005
(c) -0.3995
(d) 0.3995
(e) 29.6005
16. Suppose previously believed that the expected change in % damaged squares is 3.5 with 1 day increase
in the age of the cotton plant. Which of the following is the test statistics to see if data support this
belief?
(a) -0.6259
(b) -0.3225
(c) 0.3225
(d) 0.6259
(e) There is no way to compute this with the given information
17. If the P-value is computed larger than 0.20 for the belief in the previous question, which of the
following conclusions can be achieved using =0.05?
(a) The expected change in % damaged squares is 3.5 for 1 day increase in the age of the cotton
plant.
(b) The expected change in % damaged squares is not 3.5 for 1 day increase in the age of the cotton
plant.
18. Would you feel comfortable using the results on the output to predict % damaged squares when the age
of a cotton plant is 5 days?
(a) Yes
(b) No
19. What proportion of the observed variation in % damaged squares can be attributed to the simple linear
regression relationship between the age of a cotton plant and the % damaged squares?
(a) 0.6590
(b) 0.8118
(c) 0.9010
(d) 0.9492
(e) 0.9743
20. Which of the following is the correlation between x and y?
(a) -0.9010
(b) -0.8118
(c) 0.8118
(d) 0.9010
(e) 0.9492
21. Which of the following cannot be right?
(a) If you like to explore the linear relationship between two alphanumeric variables, you may
use simple linear regression
(b) If you like to compare more than two population means, you may use Analysis of Variance
(c) If you like to compare two variances, you may use F-test
STAT 211 – 200
EXAM 5 – FORM A
SUMMER03
(d) If you like to compare more than two population variances, you may use Bartlett’s test if the data
is normally distributed
A random sample of 5726 telephone numbers from a certain region taken in March 1992 yielded 1105 that
were unlisted, and 1 year later a sample of 5384 yielded 980 unlisted numbers. Let p 1 be the true proportion
of unlisted numbers in March 1992, p2 be the true proportion of unlisted numbers in March 1993, X be the
number of unlisted numbers in the corresponding sample, n is the sample size. The following is the results
from MINITAB.
Sample
X
n Sample p
1
1105
5726 0.192979
2
980
5384 0.182021
Estimate for p(1) - p(2): 0.0109586
95% CI for p(1) - p(2): (-0.00355738, 0.0254746)
Test for p(1) - p(2) = 0 (vs not = 0): Z = 1.48 P-Value = 0.139
Answer the following 4 questions using this information.
22. What is the point estimate for p1-p2?
(a) 0.005
(b) 0.011
(c) 0.139
(d) 0.182
(e) 0.193
23. Is there a difference in true proportions of unlisted numbers between the two years using =0.05?
(a) Since the P-value 0.05 on the corresponding test, there is a difference in true proportions of
unlisted numbers between the two years
(b) Since the P-value 0.05 on the corresponding test, there is no difference in true proportions of
unlisted numbers between the two years
(c) Since the P-value >0.05 on the corresponding test, there is no difference in true proportions
of unlisted numbers between the two years
(d) Since 0 does not fall in the corresponding interval, there is no difference in true proportions of
unlisted numbers between the two years
(e) Since 0 falls in the corresponding interval, there is difference in true proportions of unlisted
numbers between the two years
24. Which of the following would be the P-value if you were testing there are more unlisted numbers in
March 1992 comparing to March 1993?
(a) 0.0348
(b) 0.0695
(c) 0.139
(d) 0.278
25. If I claim that the true proportion of the unlisted numbers in March 1992 is higher than the true
proportion of the unlisted numbers in March 1993, which of the following is the valid alternative
hypothesis?
(a) 1   2
p1  p2
(c) p1  p2
(d) 1   2
(b)
Download