Suppose that , and are estimators of the parameter

advertisement
EXAM 5 – FORM A
STAT 211

FALL03
Possible critical values that may be needed are t 0.025; 4 =2.776, t 0.05; 4 =2.132, F0.05;1, 4 =7.71,
F0.05; 4, 25 =2.76, z 0.05 =1.645, z 0.025 =1.96, z 0.3015 =0.52, z 0.0934 =1.32. Do not forget, the 0.05, 0.025
or others for the subscripts of the critical values are the areas on the right.
Consider the following three data sets, in which the variables of interest are x: commuting distance and y=commuting time.
All three datasets satisfies the normality assumption on the commuting time and on the errors.
x1
y1
x2
y2
x3
y3
15
42
5
16
5
8
16
35
10
32
10
16
17
45
15
44
15
22
18
42
20
45
20
23
19
49
25
63
25
31
20
46
50
115
50
60
We will use simple linear regression and regress y on x for each set of data to estimate Y   0   1 x  e where n=6
observations included (for each set), the parameters
0
(true intercept) and
1
(true slope) are constants whose "true"
values are unknown and must be estimated from the data. The uncontrolled random error,
normally independently distributed with mean 0 and the constant variance,
 .
2
Regression Analysis: y1 versus x1
Predictor
Constant
x1
Coef
13.67
1.6857
S = 4.034
SE Coef
16.96
0.9644
R-Sq = 43.3%
T
0.81
1.75
Analysis of Variance
Source
DF
Regression
1
Residual Error
4
Total
5
P
0.465
0.155
R-Sq(adj) = 29.1%
SS
49.73
65.10
114.83
MS
49.73
16.28
F
3.06
P
0.155
Regression Analysis: y2 versus x2
Predictor
Constant
x2
Coef
7.869
2.1423
S = 4.034
SE Coef
2.876
0.1132
R-Sq = 98.9%
Analysis of Variance
Source
DF
Regression
1
Residual Error
4
Total
5
T
2.74
18.93
P
0.052
0.000
R-Sq(adj) = 98.6%
SS
5832.4
65.1
5897.5
MS
5832.4
16.3
F
358.36
P
0.000
Regression Analysis: y3 versus x3
Y3=3.197+1.12656(x3)
Predictor
Constant
x3
Coef
3.197
1.12656
S = 1.903
R-Sq = 99.1%
Analysis of Variance
Source
DF
Regression
1
Residual Error
4
Total
5
Obs
1
2
3
SE Coef
1.356
0.05337
x3
5.0
10.0
15.0
y3
8.000
16.000
22.000
SS
1612.9
14.5
1627.3
T
2.36
21.11
P
0.078
0.000
R-Sq(adj) = 98.9%
MS
1612.9
3.6
Predicted y3
8.830
14.462
20.095
F
445.58
Residual
-0.830
1.538
1.905
P
0.000
e associated with the Y is
EXAM 5 – FORM A
STAT 211
4
5
6
20.0
25.0
50.0
23.000
31.000
60.000
25.728
31.361
59.525
FALL03
-2.728
-0.361
0.475
Answer the following 11 questions using this information.
1) In which of those three datasets simple linear regression would be least effective? (Use =0.05)
(a) 1
because of very low rsquare and failing to reject H0: slope is zero (no relationship)
(b) 2
(c) 3
(d) both 1 and 2
(e) both 2 and 3
2) What change in y3 can be expected when x3 decreases by 100 units?
(a) -319.7
(b) -112.656
=-100(slope)=-100(1.12656) because of decrease, it becomes negative.
(c) -1.12656
(d) 1.12656
(e) 112.656
3) I have not attached the output for the Pearson’s correlation but you have enough information to compute it using the
outputs. Which of the following is the Pearson’s correlation coefficient between x3 and y3?
(a) 0.982
(b) 0.991
(c) 0.996
=r= R  0.991 only in simple linear regression. It is positive because of positive slope
(d) I do not have enough information on any of those outputs to compute it
2
4) Which of the following is the point estimate for the constant standard deviation in regressing y3 on x3?
(a) 1.897
(b) 1.903
(c) 3.6
(d) 14.5
=s=
MSE  3.6
5) When you look at the output of regressing y3 on x3, which of the following can you say about the intercept of the
regression equation?
(a) The equation definitely needs an intercept using 0.05 significance
(b) Testing the null hypothesis of no intercept, we fail to reject the null hypothesis using 0.05 significance. We
may want to fit without the intercept and see if it is a better fit
since the P-value=0.078 > =0.05, we fail to reject H0
(c) Testing the null hypothesis of no intercept, we reject the null hypothesis using 0.05 significance. We do not need to
fit without the intercept and see if it is a better fit
6) Which of the following is a possible relationship between x1 and y1? (Use α=0.05 and base your decision on testing
the slope)
(a) They are positively related
(b) They are negatively related
(c) There is no apparent relationship
the P-value=0.155 > α=0.05, then fail to reject H 0 : 1  0
7) What is the corresponding residual in the regression analysis of regressing y3 on x3 when x3=10?
(a) -6
(b) 1.54
when x3=10, y3=16 and predicted y3=14.462 then residual=y3-predicted y3
(c) There is no way to compute it with the given information
8) Suppose =1.93 and consider x3=10, what is P(Y3>17)?
(a) 0.0934
P(Y3>17)=P(Z>(17-14.462)/1.93)=P(Z>1.32)
(b) 0.3015
(c) 0.5200
(d) 0.6985
EXAM 5 – FORM A
STAT 211
FALL03
(e) 0.9066
9) Which of the following is the 95% confidence interval for the expected change in y3 associated with a 1 unit increase
in x3?
(a) (0.9784 , 1.2747)
C.I for the slope: 1.12656 t0.025;4 (0.05337) where t0.025;4 =2.776
(b) (1.0118 , 1.2394)
(c) (1.0219 , 1.2312)
(d) I do not have enough information to compute it.
10) Would you feel comfortable predicting y3 when x3=92 using the fitted regression equation?
(a) Of course, the predicted y3 is 119.6322
(b) Not at all
93 does not fall into the range of x3’s
11) How much of the total variation for y2 is explained by the model relationship with x2?
(a) 2.1423%
(b) 16.3%
(c) 18.93%
(d) 98.9%
is the rsquare by definition
(e) Cannot be determined with the given information
Numerous factors contribute to the smooth running of an electric motor. In particular, it is desirable to keep motor noise
and vibration to a minimum. To study the effect that the brand of bearing has on motor vibration, five different motor
bearing brands (each with true mean,  i     i where  is the overall mean and  i is the ith treatment effect ,
i=1,2,3,4,5) were examined by installing each type of bearing on different random samples of six motors. The amount of
motor vibration ( X ij in microns) was recorded when each of the 30 motors was running. We will model this data by
single factor ANOVA where X ij 
   i   ij .  ij ’s are errors which are normally distributed with mean, 0 and the
constant variance,  .
2
Analysis of Variance for vibration
Source
DF
SS
MS
Brand
4
30.855
7.714
Error
25
22.838
0.914
Total
29
53.694
Level
1
2
3
4
5
N
6
6
6
6
6
Mean
13.683
15.950
13.667
14.733
13.083
Pooled StDev =
Tukey's pairwise
Family error
Individual error
Critical value =
StDev
1.194
1.167
0.816
0.940
0.479
0.956
F
8.44
P
0.000
Individual 95% CIs For Mean
Based on Pooled StDev
---------+---------+---------+------(----*-----)
(----*-----)
(----*----)
(----*-----)
(----*-----)
---------+---------+---------+------13.5
15.0
16.5
comparisons
rate = 0.0500
rate = 0.00706
4.15
Intervals for (column level mean) - (row level mean)
1
2
3
4
2
-3.8860
-0.6473
3
-1.6027
1.6360
0.6640
3.9027
4
-2.6693
0.5693
-0.4027
2.8360
-2.6860
0.5527
EXAM 5 – FORM A
STAT 211
5
-1.0193
2.2193
1.2473
4.4860
-1.0360
2.2027
FALL03
0.0307
3.2693
Bartlett's Test (normal distribution)
Test Statistic: 4.097
P-Value
: 0.393
Answer the following 6 questions using this information.
12) Are there any significant differences between the true means of those five brands using 0.05 significance?
(a) Yes
Since the P-value=0 < =0.05 then reject H 0 : 1   2   3   4   5
(b) No
13) Look at the Tukey’s pairwise comparisons and tell me which of the following is the right conclusion using =0.05?
(a) I do not need to look at the Tukey’s pairwise comparisons because there are no differences between the true means
of those 5 brands.
(b) There are no significant differences between brand 1 and each of the others.
(c) There are no significant differences between brand 2 and each of the others.
(d) There are significant differences between brand 3 and brand 4
(e) There are significant differences between brand 4 and brand 5.
14) Which of the following is the point estimate of the 3rd brands effect?
(a) -0.5402
_
^
(b)
(c)
(d)
(e)
-0.5562
13.667
14.223
14.733

 3  x 3  x =13.667-14.2232 where 14.2232 is the overall sample mean
15) Which of the following is the point estimate for the constant standard deviation in analysis of variance?
(a) 0.914
^
(b) 0.956
=   MSE  0.914
(c) 7.714
(d) Unfortunately, constant variance assumption is not satisfied to compute the estimate using 0.05 significance.
16) I have found out brand 3 is not different than brand 4 and 5 when I look at the Tukey’s pairwise comparisons using
0.05 significance? I decided to compare the true average of brand 3 with the combined average of brands 4 and 5.
Which of the following should be the null hypothesis to test this?
(a)  3   4 and  3   5
3   4 and  4   5
(c)  3   4   5
(d) 2 3   4   5
(b)
17) For the test in question 16, I have computed the P-value as 0.6186 where =0.05. Are there significant differences
between the true average of brand 3 and the combined average of brands 4 and 5?
(a) Yes
(b) No
Since the P-value > , fail to reject H0: no differences. Conclude that there are no differences
18) You will have taken 5 exams this semester in STAT211. In one of the sections that I am teaching, each of the first two
exams are taken by 100 students but the remaining three exams are by 90 students each. We would examine the
significant differences between these 5 exams modeling with the single factor ANOVA. Which of the following would
be the error degrees of freedom?
(a) 4
(b) 85
(c) 95
STAT 211
(d) 185
(e) 465
EXAM 5 – FORM A
FALL03
There are total 2(100)+3(90)=470 students have taken all exams. df=n-I=470-5
19) Let’s say I have given you the boxplot and the normal probability plot of residuals to check the assumption of
normality for the analysis of variance. Boxplot indicates nonnormal residuals where the normal probability indicates
normally distributed residuals. Which of those you should use to see if the normality assumption is satisfied?
(a) Only boxplot
(b) Only normal probability plot
20) Which of the following is not correct?
(a) If the data is normally distributed we can use the Barttlett’s test to check the constant variance in different groups.
(b) If the data is continuous, we can use the Levene’s test to check the constant variance in different groups.
(c) If the data is not normally distributed, we can still use the Levene’s test to check the constant variance in
different groups for any other distribution.
Only for continuous distributions not the discrete distributions
(d) If the data is normally distributed and the constant variance assumption is satisfied, we can use the F-test in
analysis of variance to check the same true mean in different groups.
(e) If the data is normally distributed we can use the t-test for the slope or F-test in regression to check the relationship
between two numeric random variables
Download