# 2015F vAsol ```emy
EZ A
rah
heNo
9
9911
emy
EZ A
9
9911
rah
heNo
emy
EZ A
rah
heNo
9
9911
STA 302/1001
Last Name (Print):
Fall 2015
Midterm A
First Name:
10/20/2015
Time Limit: 1h 40 min
Student Number:
9
9
1
1
99
9911
h
h
a
a
r
r
No
No
N
Check mone:
y he STA302 STA1001 y he
y he
m
m
e
e
e
d
d
d
ca
ca
ca
EZ A
EZ A
EZ A
This exam contains 8 pages (including this cover page) and 3 problems. Check to see if any pages
are missing. Enter all requested information on the top of this page.
• You may not use your books or notes on this exam.
Problem Points Score
You 9may
119 use a scientific calculator, the formulae 99119
119
9
h
h
h 99
a
a
a
r
r
r
o
o
o
below,
and
the
t-table
on
the
last
page.
heN
heN
heN
1
10
emy
emy
emy
d
d
d
a
a
a
c
c
c
A
• SLR stands for Simple Linear
EZ A
EZRegression.
EZ A
2
10
• You are required to show your work on each problem on this exam. Please carry all possible preci3
30
sion through a numerical question, and give your
final answer to four (4) decimals, unless they are
Total:
50
trailing zeroes. 99119
119
9
9
rah
rah
N
heNo
heNo
y
y
y he
m
m
m
• cYou
may
use
a
benchmark
of
α
=
5%
for
all
infere
e
e
d
d
d
a
ca
ca
EZ A ence, unless otherwise indicated.
EZ A
EZ A
• Do not write in the table to the right.
Some formulae:
emy
EZ A
rah
heNo
9
9911
σ2
V ar(b1 ) =
Σ(Xi − X̄)2
19
99h1 − X̄)2
1 ah (X
V ar(Ŷh ) =yσ heNor+
m
n Σ(Xi − X̄)2
2
V ar(b0 ) = σ 2
SST O = Σ(Yi −Ȳ )2
r
my
E
Z Ac
rah
heNo
Σ(Xi − X̄)(Yi − Ȳ )
Σ(Xi −
− Ȳ
9
9911
EZ A
X̄ 2
1
+
n Σ(Xi − X̄)2
2
SSE = Σ(Yi −Ŷi )2
X̄)2 Σ(Yi
y
b0 = Ȳ − bc1aX̄
dem
rah
heNo
9
2
99121
1
(X
−
X̄)
h
h
a
r
NoŶh ) = σ 1 + +
N
σ {pred} = V ar(Y
h−
2
y he
y he
m
m
n
Σ(X
−
X̄)
e
e
i
d
d
ca
ca
EZ A
EZ A
EZ A
19
=
991p
9
9911
rah
Σ(Xi − X̄)(Yi − Ȳ ) ΣXhi Y
nX̄ Ȳ
i o−
eN
y
m
b1 =
=
e
Σ(Xi − X̄)2EZ Acad ΣXi2 − nX̄ 2
)2
emy
EZ A
rah
heNo
SSR = Σ(Ŷi −Ȳ )2 = b21 Σ(Xi −X̄)2
9
Cov(b
9911 0 , b1 )
σ 2 X̄
=−
Σ(Xi − X̄)2
emy
EZ A
rah
heNo
9
9911
19
119
91819
STAra
302/1001
Midterm A - Page
10/20/2015 rah 991
h 99
h2 9of
a
r
o
o
o
heN
heN
heN
emy
emy
emy
d
d
d
a
a
a
c
c
c
A
EZ A
1. (10 points) Multiple Choice
the following questions by circling theEbest
I. Which statement is not true about Maximum Likelihood Estimates (MLEs) in general?
A. They are unbiased
B. They are consistent
C. They are efficient
9
9119 to a Normal distribution
9tend
9911
D. They
h
h
a
a
r
r
No
No
N
y he
y he
y he
m
m
m
e
e
e
d
d
d
ca
ca
ca
EZ A II. Race (0 = White, 1 = Asian, 2 = Other)
EZisA best modeled as what type of variable?
EZ A
A. Categorical
B. Ordinal
C. Interval
D. Ratio
119
119
9119
h 9
h 99
h 99
a
a
a
r
r
r
III.
The
errors
ε
in
SLR
are
best
modeled
as
a(n):
o
o
o
i
heN
heN
heN
emy
emy
emy
d
d
d
A.
Unknown
constant
a
a
a
c
c
c
EZ A
EZ A
EZ A
B. Known constant
C. Known random variable
D. Unknown random variable
IV. Suppose you have numerical variables {Y, X1, X2} in your global environment in R. Which
119 of R code will correctly fit the SLR model99EY
119= β0 + β1 X1 ?
of the following99lines
h
h
a
a
r
r
No
No
N
y heA. fit(Y ∼ X1)
y he
y he
m
m
m
e
e
e
d
d
d
ca
ca
ca
B. lm(Y ∼ X1)
EZ A
EZ A
EZ A
C. predict(Y ∼ X1)
D. fit(df\$Y ∼ df\$X1)
V. In SLR, when the coefficient of determination (R2 ) is high ...
A. The relationship is probably linear
119
119
9119
9
h
h 99
h 99
a
a
a
r
r
r
B.
You
can
make
accurate
predictions
o
o
o
eN
heN
heN
myishexplained by X
emy
eY
emy
d
d
d
C.
A
lot
of
variation
in
a
a
a
c
c
c
EZ A
EZ A
EZ A
D. The relationship between X and Y is a strong positive relationship
Answer the following True or False questions by writing ’T’ or ’F’ in the blank
Do not write something ambiguous like ∓ or =!
F A Type II error
9 is when we incorrectly reject the null hypothesis.
9
9911
9911
h
h
a
a
r
r
o
N
heN
heNo type of object.
Fy In
an R data frame, all of the rows must beem
the
y same
y he
m
m
e
e
d
d
d
ca
ca
ca
EZ A
EZ A
EZ A
T The Spearman correlation is simply Pearson’s correlation of the rank-ordered values.
T Least squares and Maximum Likelihood Estimation give the same estimators for the
slope and intercept.
T Prediction intervals are always wider than confidence intervals.
119
119
9
h 9
h 99
a
a
r
r
o
o
heN
heN
emy
emy
d
d
a
a
c
c
EZ A
EZ A
emy
EZ A
rah
heNo
9
9911
19
119
91819
STAra
302/1001
Midterm A - Page
10/20/2015 rah 991
h 99
h3 9of
a
r
o
o
o
heN
heN
heN
emy
emy
emy
d
d
d
a
a
a
c
c
c
A as known
EZ A
EZβA
EZtake
2. Consider the SLR model Yi =
0 + β1 Xi + εi . For these questions, you may
anything we proved in lecture about the ki , if you wish.
(a) (2 points) Show that ΣXi ei = 0
Solution: ΣXi ei = Σ(Xi − X̄)ei = Σ(Xi − X̄)(Yi − b0 − b1 Xi )
SSXY
= Σ(Xi − X̄)(Y
19x = 0
1i 1−9 Ȳ ) − b0 Σ(Xi − X̄) − b1 SSx = SSXY − SS9x91SS
9
9
h
h
a
a
r
r
No
No
y he
y he
m
m
e
e
d
d
a of β1 .
ca
Z Ac
EZ A(b) (2 points) Show that b1 is an unbiased Eestimator
N
y he
EZ
em
Solution: E[b1 ] = E[Σki Yi ] = Σki E[Yi ] = Σki (β0 + β1 Xi ) = β0 Σki + β1 Σki Xi = β1
Consider a regression model Yi = β1 Xi + εi with all fixed constants Xi &gt; 0 and the usual
G-M assumptions on the errors.
19
19
91β
9119
(c)
(391points) Derive b1 , the least squares estimate
h 9
h 9for
1 by minimizing the sum of squared orah 9
a
a
r
r
o
o
heN
heN
heN
residuals.
emy
emy
emy
d
d
d
a
a
a
c
c
c
EZ A
EZ A
EZ A
Solution: SSE = Σ(Yi − b1 Xi )2
dSSE
db1
= −2ΣXi (Yi − b1 Xi )
ΣXi Yi − b1 ΣXi2 = 0
i Yi
b1 = ΣX
ΣX 2
119
i
ah
eNor
99
ah
eNor
9
9911
N
y h
y h
y he
m
m
m
e
e
e
d
d
d
a
ca
above is a consistent estimator of βE
Aca
1 .Z Ac
EZ A(d) (3 points) Show that the estimator you
EZderived
Recall that an estimator is consistent if it converges to its target parameter in the limit
as n → ∞. You may use the following hints without proof:
ΣXik
ΣXi Yi
k
→ E[XY ] by the Law of Large Numbers.
n → E[X ] and
n
If two estimators each converge to a parameter, the ratio of those estimators converges to
the ratio of the parameters.
119
119
119
9
h 9
h 99
h 99
a
a
a
r
r
r
o
o
o
eN
ΣX Yi /n
E[XY
heN
heN
my 2h]
Solution: b1 = ΣXi 2 /n
→
emy
eE[X
emy
d
d
d
a
a
a
c
c
c
]
iEZ A
EZ A
EZ A
2
XE[(β1 X+ε)]
E[X 2 ]
=
ah
eNor
=
Xβ1 E[X]
E[X 2 ]
=
β1 E[X ]
E[X 2 ]
9
9911
y h
EZ
emy
EZ A
em
rah
heNo
= β1
ah
eNor
9
9911
N
y he
y h
EZ
9
9911
emy
EZ A
em
rah
heNo
EZ
9
9911
emy
EZ A
em
rah
heNo
9
9911
19
119
91819
STAra
302/1001
Midterm A - Page
10/20/2015 rah 991
h 99
h4 9of
a
r
o
o
o
heN
heN
heN
emy
emy
emy
d
d
d
a
a
a
c
c
c
ZA
Z Arating score
EZ A
3. (30 points) In a recent (1992)Eexperiment,
Hungarian food scientists measuredEthe
given by regular consumers as well as one given by experts, for a variety of fruit juices. We
will try to predict the expert score from the consumer score; some R output from a fitted SLR
model follows. You may assume all G-M assumptions are met.
&gt; anova(fit)
9
119
9911
Analysis of oVariance
h 99 Table
h
a
a
r
r
N
No
y he
y he
m
m
e
e
d
d
ca
ca
EZ AResponse: expertScore
EZ A
Df Sum Sq Mean Sq F value
Pr(&gt;F)
consumerScore 1
[A]
[B] 16.696 0.0004238
Residuals
[C] 54.278
2.262
&gt; summary(fit)
119
119
Call:
h 99
h 99
a
a
r
r
o
o
heN lm(formula = expertScore ~ consumerScore,
heN data = juice)
emy
emy
d
d
a
a
c
c
EZ A
EZ A
Coefficients:
Estimate Std. Error t value Pr(&gt;|t|)
(Intercept)
[D]
1.5308
[E] 1.38e-06
consumerScore
0.4685
0.1147
4.086 [F]
N
y he
EZ
emy
EZ A
9
119
Residual standard
[G] on 24 degrees of freedom ah 9911
99error:
h
a
r
r
eNo
heNo
Multiple
y hR-squared:
m
e
e
d
d
a
ca
0.0004238
Z Ac
EZ AF-statistic: 16.7 on 1 and [I] DF, Ep-value:
&gt; apply(juice, 2, mean) # Means of Y and X
expertScore consumerScore
15.89231
13.10000
&gt; apply(juice, 2, sd)
# SDs of Y and X
9
1
1
119
expertScore
consumerScore
h 99
h 99
a
a
r
r
o
o
2.622975
heN 1.918734
heN
emy
emy
d
d
a
a
c
c
EZ A
EZ A
em
rah
heNo
9
9911
N
y he
EZ
emy
EZ A
em
rah
heNo
9
9911
(a) (9 points) Some values have been replaced with letters. Fill in those values. You do not
need to show any work for this part.
(A)
(B) eNorah
y h
m
c
A
(C)
EZ
Solution:
emy
EZ A
rah
heNo
9
9911
(D)
(G)
(E)
11
h 99
a(H)
r
o
N
e
9
EZ
(F)
(A) 37.7664
(B) 37.7664
(C) 24
9
9911
emy
rah
heNo
EZ
(I)
(D) 9.7550
(E) 6.3725
(F) 0.0004
EZ A
N
y he
y h
em
em
(G) 1.504
(H) 0.4103
(I) 24
9
9911
emy
EZ A
rah
heNo
9
9911
119
91819
STAra
302/1001
Midterm A - Page
h 99
h5 9of
a
r
o
o
heN
heN
emy
emy
d
d
a
a
c
c
A the true regression slope β .
EZ A
EZfor
(b) (2 points) Give a 90% CI
1
9
EZ
1
10/20/2015 rah 991
o
heN
emy
d
a
c
A
Solution: 90% CI f or β1 : b1 &plusmn; t24,0.95 s{b1 } 1
0.4685 &plusmn; (1.711)(0.1147)
0.4685 &plusmn; 0.1963 1
9
9
9911
9911
h
h
a
a
r
r
No
N
(c) (3 ypoints)
heNo Test the null hypothesis that the slope
yofhethe regression line is equal to 0.6.
y he
m
m
m
e
e
e
d
d
d
ca State the null hypothesis in terms of parameters,
ca give the test statistic, and give the mostZ Aca
EZ A
EZ A
E
accurate p-value you can (a range is OK here).
Solution: H0 : β1 = 0.6 vs. Ha : β1 6= 0.6
1 −0.6
t∗ = bs{b
= 0.4685−0.6
= −1.1465 on 24 df 1
0.1147
1}
One-sided p-value (0.1, 0.15)
119
119
119
Two-sided
p-value (0.2, 0.3) 1
h 99
h 99
h 99
a
a
a
r
r
r
o
o
o
N
heN
hereject
heN
∴ We do not have enough evidence
to
the claim of β1 = 0.6 1
emy
emy
emy
d
d
d
a
a
a
c
c
c
EZ A
EZ A
EZ A
(d) (5 points) A consumer gives a score of 12 for a new juice that has just hit the market. Give
a point estimate for the corresponding expert score, and provide an appropriate interval
around this estimate.
ˆ 9= 9.7543 + 0.4685(12) = 15.3763 1
Solution: expert
9
11
9
9
9911
h
h
ˆ
a
a
r
r
SSx efNrom
o V ar(b1 ) = 171.9356 1 No
y2 h
y he
2
m
m
(X
−
X̄)
e
e
1
h
d
d
ca
s {pred} = M SE 1 + n + SSx
1 ca
EZ A
EZ A
2
1
= 2.262 1 + 26
+ (12−13.1)
= 2.3649 1
171.9356
95% P I f or Yh √
: Yˆh &plusmn; t24,0.975 s{pred}
15.3763 &plusmn; 2.064 2.3649
15.3763 &plusmn; 3.1741 1
119
119
h 99
h 99
a
a
r
r
o
o
heN
heN
emy
emy
d
d
a
a
c
c
EZ A
EZ A
ah
eNor
9
9911
y h
EZ
emy
EZ A
em
rah
heNo
ah
eNor
N
y he
EZ
emy
EZ A
rah
heNo
EZ
9
9911
emy
EZ A
rah
heNo
9
9911
9
9911
N
y he
y h
em
em
EZ
9
9911
emy
EZ A
em
rah
heNo
9
9911
19
119
91819
STAra
302/1001
Midterm A - Page
10/20/2015 rah 991
h 99
h6 9of
a
r
o
o
o
heN
heN
heN
emy
emy
emy
d
d
d
a
a
a
c
c
c
A context of
EZ A
EZ A
(e) (1 point) Give an interpretation
of the estimated slope in plain English, E
inZthe
this question.
Solution: For each additional rating score given by a consumer, we expect the expert
score to increase by 0.4685.
9
119
9911 there are some consumer scores near zero,
99give
h
h
(f) (1 point)
Assuming
an interpretation of
a
a
r
r
o
N
heNo
heN
y
y
y he
the
estimated
intercept
in
plain
English,
in
the
context
of
this
question.
m
m
m
e
e
e
d
d
d
ca
ca
ca
EZ A
EZ A
EZ A
Solution: We expect experts to give a score of 9.75 when consumers would rate the
juice zero.
(g) (2 points) A colleague of yours suggests that there is no correlation between these two
variables.
Can you test this hypothesis with the information
given? If so, give the null
19
119
9119
91and
9
9
h
h
hypothesis
in
symbols
and
the
p-value
of
the
test,
a
conclusion
in plain language. If orah 99
a
a
r
r
o
o
heN
heNtest.
heN
y the
not, explain what you are missing
for
emy
em
emy
d
d
d
a
a
a
c
c
c
EZ A
EZ A
EZ A
Solution: Sure, just use the t-test for slope as they are equivalent.
H0 : ρ = 0
p = 0.0004
Looks like very strong evidence that the correlation between these two variables is not
zero. The colleague is wrong.
9
9
9911
9911
h
h
a
a
r
r
No
No
N
y he
y he
y he
m
m
m
e
e
e
d
d
d
a
ca A separate SLR model was fit, with X and
ca
Ycreversed. Some of the output is given below.
EZ A
EZ A
EZ A
&gt; anova(fitInv)
Analysis of Variance Table
Response: consumerScore
119
119
9
Df Sum Sq Mean Sq F value
h 9
h 99Pr(&gt;F)
a
a
r
r
o
o
heN
heN
expertScore 1 70.566 70.566
emy
emy 16.696 0.0004238
d
d
a
a
c
c
Residuals
24 101.434
EZ A
EZ A 4.226
emy
EZ A
rah
heNo
9
9911
&gt; summary(fitInv)
Call:
lm(formula = consumerScore ~ expertScore, data = juice)
9
9
Coefficients:
9911
9911
h
h
a
a
r
r
No
No
Estimate Std. Error t value Pr(&gt;|t|)
y he
y he
m
m
e
e
d
d
a
ca (Intercept) -0.8155
3.4293 E-0.238
Z Ac 0.814055
EZ A
expertScore
0.8756
0.2143
4.086 0.000424
N
y he
EZ
em
Residual standard error: 2.056 on 24 degrees of freedom
emy
EZ A
rah
heNo
9
9911
emy
EZ A
rah
heNo
9
9911
emy
EZ A
rah
heNo
9
9911
19
119
91819
STAra
302/1001
Midterm A - Page
10/20/2015 rah 991
h 99
h7 9of
a
r
o
o
o
heN
heN
heN
emy
emy
emy
d
d
d
a
a
a
c
c
c
Z Acase 0.8756)
EZ A
EZ A for b01 , the slope of the inverted model (in E
(h) (2 points) Derive an expression
this
in terms of the original slope b1 from the previous model. You can, of course, check your
derivation using the numbers posted.
√
√
r SSx SSy
SSxy
=
SSx
SSx
9
s2x
s
s
sx
1
x
x
1
r sy =h(b919sy ) sy = b1 s2
y
ra
Solution: b1 =
√
s
SSy
= r sxy
= r √SSx
9
b01 =
9911
h
a
r
No
No
y he
y he
m
m
e
e
d
d
a
ca
Z Actwo models have the same slope?
EZ A (i) (1 point) Under what conditions wouldEthe
N
y he
EZ
em
Solution: If s2x = s2y
(j) (2 points) In class I said that you cannot simply invert your original regression line when
making
119 inverse predictions, except in a special 9case.
119 Under what (minimal) conditions
119
h 99
h 9
h 99
a
a
a
r
r
r
o
o
o
would it be acceptable to do this?
heN
heN
heN
emy
emy
emy
d
d
d
a
a
a
c
c
c
EZ A
EZ A
EZ A
Solution: When r = 1 then b01 = b11
(k) (2 points) Give a 95% CI for the true intercept β0 for this model.
Solution: 95% CI
f or β0 : b1 &plusmn; t24,0.975 s{b0 } 1
9
119
9
9
9911
−0.8155or&plusmn;
h(2.064)(3.4293)
h
a
a
r
N
No
y he &plusmn; 7.0781 1
y he
−0.8155
m
m
e
e
d
d
ca
ca
EZ A
EZ A
emy
EZ A
rah
heNo
9
9911
emy
EZ A
ah
eNor
rah
heNo
EZ
emy
EZ A
rah
heNo
EZ
9
9911
emy
EZ A
9
9911
y h
em
N
y he
ah
eNor
rah
heNo
EZ
9
9911
emy
EZ A
rah
heNo
9
9911
9
9911
N
y he
y h
em
em
EZ
9
9911
emy
EZ A
em
rah
heNo
9
9911
119
STAra
302/1001
h 99
o
heN
emy
d
a
c
EZ A
ah
eNor
91819
Midterm A - Page
h8 9of
a
r
o
heN
emy
d
a
c
EZ A
9
9911
y h
EZ
emy
EZ A
em
rah
heNo
9
9911
emy
EZ A
EZ
emy
EZ A
rah
heNo
emy
EZ A
rah
heNo
9
emy
EZ A
ah
eNor
9
emy
EZ A
em
rah
heNo
9
N
y he
9
9911
emy
EZ A
9
ah
eNor
EZ
9
9911
emy
EZ A
rah
heNo
em
rah
heNo
9
9911
9
9911
N
y he
y h
em
9
9911
9911
EZ
9911
em
rah
heNo
y h
y h
EZ
N
y he
9911
9
9911
em
9
9911
EZ
9911
EZ
ah
eNor
em
rah
heNo
y h
em
EZ
y h
EZ
ah
eNor
ah
eNor
9
1
10/20/2015 rah 991
o
heN
emy
d
a
c
A
EZ
9
9911
emy