STA 302/1001
Last Name (Print):
Fall 2015
Midterm A
First Name:
10/20/2015
Time Limit: 1h 40 min
Student Number:
This exam contains 8 pages (including this cover page) and 3 problems. Check to see if any pages
are missing. Enter all requested information on the top of this page.
• You may not use your books or notes on this exam.
Problem Points Score
• SLR stands for Simple Linear Regression.
EZ A
EZRegression.
EZ A
• You are required to show your work on each problem on this exam. Please carry all possible preci3
EZ A ence, unless otherwise indicated.
• Do not write in the table to the right.
Some formulae:
emy
EZ A
rah
heNo
)2
emy
EZ A
rah
heNo
SSR = Σ(Ŷi −Ȳ )2 = b21 Σ(Xi −X̄)2
91819
1. (10 points) Multiple Choice
the following questions by circling theEbest
I. Which statement is not true about Maximum Likelihood Estimates (MLEs) in general?
A. They are unbiased
B. They are consistent
C. They are efficient
EZ A II. Race (0 = White, 1 = Asian, 2 = Other)
A. Categorical
B. Ordinal
C. Interval
D. Ratio
III.
C. Known random variable
D. Unknown random variable
IV. Suppose you have numerical variables {Y, X1, X2} in your global environment in R. Which
119 of R code will correctly fit the SLR model99EY
119= β0 + β1 X1 ?
of the following99lines
h
h
a
a
r
r
No
No
N
y heA. fit(Y ∼ X1)
y he
y he
m
m
m
e
e
e
d
d
d
ca
ca
ca
B. lm(Y ∼ X1)
EZ A
EZ A
EZ A
C. predict(Y ∼ X1)
D. fit(df\$Y ∼ df\$X1)
V. In SLR, when the coefficient of determination (R2 ) is high ...
A. The relationship is probably linear
D. The relationship between X and Y is a strong positive relationship
Answer the following True or False questions by writing ’T’ or ’F’ in the blank
Do not write something ambiguous like ∓ or =!
F A Type II error
9 is when we incorrectly reject the null hypothesis.
Fy In
T The Spearman correlation is simply Pearson's correlation of the rank-ordered values.
T Least squares and Maximum Likelihood Estimation give the same estimators for the slope and intercept.
slope and intercept.
T Prediction intervals are always wider than confidence intervals.
emy
EZ A
rah
heNo
9
9911
91819
EZtake
2. Consider the SLR model Yi =
0 + β1 Xi + εi . For these questions, you may
anything we proved in lecture about the ki , if you wish.
(a) (2 points) Show that ΣXi ei = 0
Solution: ΣXi ei = Σ(Xi − X̄)ei = Σ(Xi − X̄)(Yi − b0 − b1 Xi )
SSXY
= Σ(Xi − X̄)(Y
19x = 0
1i 1−9 Ȳ ) − b0 Σ(Xi − X̄) − b1 SSx = SSXY − SS9x91SS
N
y he
EZ
em
Solution: E[b1 ] = E[Σki Yi ] = Σki E[Yi ] = Σki (β0 + β1 Xi ) = β0 Σki + β1 Σki Xi = β1
Consider a regression model Yi = β1 Xi + εi with all fixed constants Xi &gt; 0 and the usual
G-M assumptions on the errors.
Solution: SSE = Σ(Yi − b1 Xi )2
dSSE
db1
= −2ΣXi (Yi − b1 Xi )
ΣXi Yi − b1 ΣXi2 = 0
i Yi
b1 = ΣX
ΣX 2
above is a consistent estimator of βE
Recall that an estimator is consistent if it converges to its target parameter in the limit
as n → ∞. You may use the following hints without proof:
ΣXik
ΣXi Yi
k
→ E[XY ] by the Law of Large Numbers.
n → E[X ] and
n
If two estimators each converge to a parameter, the ratio of those estimators converges to
the ratio of the parameters.
Solution: b1 = ΣXi 2 /n
2
XE[(β1 X+ε)]
E[X 2 ]
=
9
9911
N
y he
y h
EZ
9
9911
emy
EZ A
em
rah
heNo
EZ
9
9911
emy
EZ A
em
rah
heNo
9
9911
91819
3. (30 points) In a recent (1992)Eexperiment,
Hungarian food scientists measuredEthe
given by regular consumers as well as one given by experts, for a variety of fruit juices. We
will try to predict the expert score from the consumer score; some R output from a fitted SLR
model follows. You may assume all G-M assumptions are met.
&gt; anova(fit)
&gt; summary(fit)
Coefficients:
Estimate Std. Error t value Pr(&gt;|t|)
(Intercept)
[D]
1.5308
[E] 1.38e-06
consumerScore
0.4685
0.1147
4.086 [F]
&gt; apply(juice, 2, mean) # Means of Y and X
expertScore consumerScore
15.89231
13.10000
&gt; apply(juice, 2, sd)
# SDs of Y and X
em
rah
heNo
9
9911
N
y he
EZ
emy
EZ A
em
rah
heNo
9
9911
(a) (9 points) Some values have been replaced with letters. Fill in those values. You do not
need to show any work for this part.
(A)
Solution:
emy
EZ A
rah
heNo
9
9911
(D)
(G)
(E)
(I)
(D) 9.7550
(E) 6.3725
(F) 0.0004
EZ A
A the true regression slope β .
Solution: 90% CI f or β1 : b1 &plusmn; t24,0.95 s{b1 } 1
0.4685 &plusmn; (1.711)(0.1147)
0.4685 &plusmn; 0.1963 1
(c) (3 ypoints)
Solution: H0 : β1 = 0.6 vs. Ha : β1 6= 0.6
1 −0.6
t∗ = bs{b
= 0.4685−0.6
= −1.1465 on 24 df 1
0.1147
1}
One-sided p-value (0.1, 0.15)
(d) (5 points) A consumer gives a score of 12 for a new juice that has just hit the market. Give
a point estimate for the corresponding expert score, and provide an appropriate interval
around this estimate.
ˆ 9= 9.7543 + 0.4685(12) = 15.3763 1
Solution: expert
1
= 2.262 1 + 26
+ (12−13.1)
= 2.3649 1
171.9356
95% P I f or Yh √
: Yˆh &plusmn; t24,0.975 s{pred}
15.3763 &plusmn; 2.064 2.3649
15.3763 &plusmn; 3.1741 1
(e) (1 point) Give an interpretation
of the estimated slope in plain English, E
inZthe
this question.
Solution: For each additional rating score given by a consumer, we expect the expert score to increase by 0.4685.
score to increase by 0.4685.
(f) (1 point)
Solution: We expect experts to give a score of 9.75 when consumers would rate the juice zero.
juice zero.
(g) (2 points) A colleague of yours suggests that there is no correlation between these two
variables.
Can you test this hypothesis with the information
given? If so, give the null
Solution: Sure, just use the t-test for slope as they are equivalent.
H0 : ρ = 0
p = 0.0004
Looks like very strong evidence that the correlation between these two variables is not
zero. The colleague is wrong.
ca A separate SLR model was fit, with X and
&gt; anova(fitInv)
Analysis of Variance Table
Response: consumerScore
EZ A
rah
heNo
9
9911
&gt; summary(fitInv)
Call:
lm(formula = consumerScore ~ expertScore, data = juice)
(h) (2 points) Derive an expression
this
in terms of the original slope b1 from the previous model. You can, of course, check your
derivation using the numbers posted.
√
√
r SSx SSy
SSxy
=
SSx
SSx
(j) (2 points) In class I said that you cannot simply invert your original regression line when
making
Solution: When r = 1 then b01 = b11
(k) (2 points) Give a 95% CI for the true intercept β0 for this model.
Solution: 95% CI
f or β0 : b1 &plusmn; t24,0.975 s{b0 } 1
