Solutions to MAS Applied Exam May 2014

advertisement
MAS Applied Exam Sample Solutions, May 2014
1. (a) H0: 1 = 2
Ha: 1 ≠ 2
(b) Based on the box plots, there is no evidence that the population means differ, so it is probably safe to use the
pooled sample variance and the exact t-test.
t
(5.176  5.124)
0.1276 0.1276

23
34
 0.539
s 2p 
(23  1)(0.124)  (34  1)(0.130)
 0.1276
23  34  2
Since |0.539| is not greater than t0.025, 55 = 2.005. (Two-tailed) P-value = 2(0.296) = 0.592. So we cannot reject
H0. We cannot conclude the mean length of the lake frogs differs for the mean length of the pond frogs.
(c) The significance level is set by the researcher to be the maximum allowable probability of a Type I error
(i.e., rejecting H0 when it is in fact true). In this case, we don’t want to make the error of concluding the mean
lengths differ, if in fact they are the same, so we set  to be relatively small (0.05), so we know ahead of time
that there is only a 5% chance of making this error.
(d) Omitting the outliers is not a good idea at all, unless there was some reason the measurements were wrong
(like faulty equipment). In this case, omitting the outliers could have caused a big discrepancy in the results,
since most of the omitted values for the pond sample were on the small side. A better approach, if the outliers
were a concern, would be to base the inference on robust statistics, such as medians or trimmed means.
(e) A nonparametric alternative would be the Wilcoxon Rank-Sum test (aka Mann-Whitney test). The
necessary conditions for such a test are that the two independent samples come from populations that are
continuous and identical except for a possible difference in location (center). [Note: The samples could come
from different distributions but the null hypothesis would then be H0: P(X>Y) = P(X<Y).]
2. a) Using the central limit theorem would give the following formula for the confidence interval.
(ˆ1  ˆ 2 )  z / 2
 1 (1   1 )  2 (1   2 )
n1

n2
and traditionally ˆ1 and ˆ 2 would be substituted for  1 and  2 (called a Wald interval). The Agresti-Caffo
solution is generally accepted as being better. It would use the following values
~
y1  y1  0.25z2 / 2 , ~
y2  y2  0.25z2 / 2 , ~
n1  n1  0.5z2 / 2 , ~
n 2  n2  0.5z2 / 2
in place of the ˆ i and ni. Adding one to each yi and 2 to each ni is a common substitution as well and is
approximately the same for a 95% interval. (There is no penalty on this exam for using the traditional method,
but it will give an extra set of assumptions to check in (b).
z / 2  1.645, ~
y1  116.6765, ~
y 2  79.6765, ~
n1  151.353, ~
n 2  151.353
~  116.6765 / 151.353  0.7709,~  79.6765 / 151.353  0.5264
1
1
0.7709(.2191) 0.5264(0.4736)

151.353
151.353
 0.2445  0.0865  (0.1580,0.3310)
(0.7709  0.5264)  1.645
The traditional interval was very similar, (0.1591, 0.3342).
(b) Any form of this interval assumes we have two independent binomial experiments. Given that we are not
sampling with replacement, the binomial part can at best be approximate (would really be hypergeometric). To
make the approximation work we need the population to be much larger (say 100x) bigger than the population.
It seems doubtful that there are 15,000 garage users and 15,000 parking lot users among the faculty! That the
two samples are independent seems like it should approximately hold (especially if the populations were large
enough for the first part).
The Agresti-Caffo interval has no other assumptions. The Wald interval would require that both of the
and
ni (1  ˆi ) are at least 5.
niˆi
(c) We are 90% confident that the difference in the proportion of those favoring the new offering between the
current garage users and current lot users is between 15.8% and 33.1% (so the garage users like it a lot more).
The 90% confident means that approximately 90% of intervals constructed in this way will contain the true
difference in proportions. This particular interval either does, or it doesn’t.
(d) Hypothesis tests are always constructed under the assumption that the null hypothesis is true. Here, the null
hypothesis is that the two population proportions are equal, so we should plug the same estimated value in for
each. (So we pretend it is one large sample from a single population and estimate p for that).
(e) This would be a test for homogeneity. The table would be three parking groups by two opinions. A 3x2
table has (3-1)x(2-1)=2 degrees of freedom.
3. a) Give the formal model being fit in this simple linear regression – including both the model equation,
identification of any symbols used, and all necessary assumptions.
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝜀𝑖
Where 𝑦𝑖 is the average FCAT mathematics score in school i, 𝛽0 is the y-intercept, 𝛽1 is the slope, 𝑥𝑖 is the
percentage of students below poverty level in school i, and 𝜀𝑖 is the error for school i. One way of writing the
assumptions is that 𝜀𝑖 𝑖𝑖𝑑
𝑁(0, var=𝜎𝜀2 ) where 𝜎𝜀2 is a common error variance.
~
b) That the mean of the errors is zero can be examined using the residual vs. predicted (or fitted) plot. Moving
from left to right the errors seem to be fairly vertically symmetric around zero, so this assumption seems
reasonable.
The assumption of equal variances is checked from the same graph. Moving from left to right they the vertical
spread seems to decrease (a reversed fan structure) and so this assumption seems violated.
We cheat in checking the normality of the errors by using a single q-q plot of the residuals (instead of one for
each x value or even range of x values). It is slightly heavy tailed in this case, but it is hard to tell with a small
sample. I can see an analyst going either way on this, but it might go away if the heteroscedasticity is dealt
with.
For checking independence, there is no indication that this is a random sample and there are many ways in
which the errors could be related (similar regions of the state, similar ethnic make-ups, etc…). Plots of
residuals against a number of variables that could relate the schools could be reassuring if they showed no
patterns.
(c) There is a statistically significant (p-value < 0.000003) relationship between the percent scoring below the
poverty level and the average math score in schools at the 3rd grade level. Each additional percent below
poverty decreases the estimated expected average math score for the school by 0.30544 (the slope estimate).
This relationship is fairly strong, with the poverty rate explaining an estimated 67.31% (R-squared) of the
variability in average math scores between schools.
(d) The confidence interval for 𝐸(𝑦𝑛+1 ) give us a range that we could be 95% confident that the true underlying
regression line at the corresponding 𝑥𝑛+1 value passes through. Given a value 𝑥𝑛+1 , we are 95% confident
that a yet unobserved corresponding value 𝑦𝑛+1 will fall in that range.
4) An experiment is conducted to compare the effectiveness of five different computer screen overlays for the
reduction of eye strain. The five covers were “clear”, “clear-anti-glare”, “tinted”, “tinted anti-glare”, and
“softening”. Ninety volunteers were gathered with approximately the same uncorrected vision, amount of time
spent on-line, and ambient light in their offices. They were randomly assigned an overlay (18 to each) to use
for an entire work week, and a measure of cumulative total eye strain (scale from 0=low to 50=high) was
collected for each subject.
(a) H0: clear = clear-anti-glare=tinted=tinted-anti-glare=softening
vs. HA: at least one mean is different from the others
where the are the mean total eye-strain for all users like the ones in our experiment.
(b) The omnibus test only tells us that the mean total eye-strain for at least one of the covers differs from the
mean total eye-strain at least one other. It doesn’t tell us which ones are different or how they differ.
(c) L½clear – ½ clear-anti-glare + ½ tinted - ½ tinted-anti-glare + 0 softening
H0: L=0 or (clear +tinted)/2 = (clear-anti-glare +tinted-anti-glare)/2
HA: L≠0 or (clear +tinted)/2 ≠ (clear-anti-glare +tinted-anti-glare)/2
(d) We would determine which pairs of means were statistically significantly different from each other (so 10
different comparisons in this experiment) while maintaining a family-wise error rate at the specified level.
Typically this is displayed by showing the means from largest to smallest and then identifying which are not
significantly different (so it also shows us which are not statistically different from the largest mean and which
are not significantly different from the smallest mean as a by product).
(e) We could select one of the screen cover types as a control, and then see which of the other means were
statistically significantly different from it (so 4 different comparisons) while maintaining a family-wise error
rate at the specified level. As part of the process it would indicate which had a statistically significantly larger
mean and which were statistically significantly smaller. (As it does fewer comparisons, it would have more
power than Tukey’s HSD for any false null hypotheses that were common between the two.)
5. (a) The Two-Factor ANOVA model assumes the responses at each factor level combination have a normal
distribution. In this case, the assumption is not strictly met since the data are integer counts, but (especially
since the counts are relatively large) they still may be approximately normal. A natural distribution for the
counts would be a Poisson, but note that as the mean of the Poisson becomes large, the Poisson distribution
resembles the normal distribution. So the normality assumption in this analysis may in fact be fine to use.
85
(b) Something like:
machine
70
65
50
55
60
mean of gaskets
75
80
I
II
C
P
R
material
Or like:
85
material
70
65
50
55
60
mean of gaskets
75
80
P
R
C
I
II
machine
(c) The analyst has ignored the fact that there is significant interaction here. He should not compare the
materials alone, nor should he compare the machines alone. A full conclusion would demand some inferential
comparisons among machine-material combinations, whether post hoc multiple comparisons or pre-planned
comparisons (contrasts). At first glance, it appears to be the case that machine I is better if producing Plastic or
Rubber gaskets, but machine II is better if producing Cork gaskets.
6. (a) yˆ  39.920  0.716 x1  1.295x2  0.152 x3
The predicted stack loss for a plant having air flow 60, water temperature 25, and acid concentration 90 is:
yˆ  39.920  0.716(60)  1.295(25)  0.152(90)  21.735.
(b) H0: 1 = 23 = 
Ha: Not all i = 0, for i = 1, 2, 3
For this data set, we would reject H0, because of the very large ANOVA F-value of 59.9. (P-value < 0.0001).
(c) Multicollinearity refers to correlation between the values of the predictor variables in the data set. In this
data set, there is not a problem with multicollinearity since all of the Variance Inflation Factors (VIFs) are
relatively small (less than 3). VIFs greater than 5 or 10 or so would indicate problems with multicollinearity.
[Note that it is still possible to suffer the effects of multicollinearity with smaller VIFs.]
(d) Using the rule of thumb that an observation is influential if its |DFFITS| > [2(3 + 1)/21]1/2, we see that the
observations with |DFFITS| > 0.617 are 1, 3, 4, and 21. Using the alternate rule of thumb that calls an
observation influential is |DFFITS| > 1, only observation 21 (with |DFFITS| = 2.1) is influential.
(e) H0: 23 = given thatis in the model)
Ha: Not both i = 0, for i = 2, 3
(f) Based on the SAS output for the F-test, since F = 6.67 > 3.59 = F0.05, 2, 17, we reject H0 and conclude that the
model with only air flow may NOT be sufficient. (P-value = 0.0073)
(g) H0: 3 = given thatand 2 are in the model)
Ha: 3 ≠ 0
This can be done with a t-test. From the SAS output, t = -0.97, and since |t| is not greater than t0.025, 17 = 2.11,
we fail to reject H0. The model with only air flow and water temperature is sufficient. (P-value = 0.344)
Download