emy cad EZ A rah heNo 9 9911 emy cad EZ A 9 9911 rah heNo emy cad EZ A rah heNo 9 9911 STA 302/1001 Last Name (Print): Fall 2015 Midterm A First Name: 10/20/2015 Time Limit: 1h 40 min Student Number: 9 9 1 1 99 9911 h h a a r r No No N Check mone: y he STA302 STA1001 y he y he m m e e e d d d ca ca ca EZ A EZ A EZ A This exam contains 8 pages (including this cover page) and 3 problems. Check to see if any pages are missing. Enter all requested information on the top of this page. • You may not use your books or notes on this exam. Problem Points Score You 9may 119 use a scientific calculator, the formulae 99119 119 9 h h h 99 a a a r r r o o o below, and the t-table on the last page. heN heN heN 1 10 emy emy emy d d d a a a c c c A • SLR stands for Simple Linear EZ A EZRegression. EZ A 2 10 • You are required to show your work on each problem on this exam. Please carry all possible preci3 30 sion through a numerical question, and give your final answer to four (4) decimals, unless they are Total: 50 trailing zeroes. 99119 119 9 9 rah rah N heNo heNo y y y he m m m • cYou may use a benchmark of α = 5% for all infere e e d d d a ca ca EZ A ence, unless otherwise indicated. EZ A EZ A • Do not write in the table to the right. Some formulae: emy cad EZ A rah heNo 9 9911 σ2 V ar(b1 ) = Σ(Xi − X̄)2 19 99h1 − X̄)2 1 ah (X V ar(Ŷh ) =yσ heNor+ m n Σ(Xi − X̄)2 cade 2 V ar(b0 ) = σ 2 SST O = Σ(Yi −Ȳ )2 r my E ade Z Ac rah heNo Σ(Xi − X̄)(Yi − Ȳ ) Σ(Xi − − Ȳ 9 9911 EZ A X̄ 2 1 + n Σ(Xi − X̄)2 2 SSE = Σ(Yi −Ŷi )2 X̄)2 Σ(Yi y b0 = Ȳ − bc1aX̄ dem rah heNo 9 2 99121 1 (X − X̄) h h a r NoŶh ) = σ 1 + + N σ {pred} = V ar(Y h− 2 y he y he m m n Σ(X − X̄) e e i d d ca ca EZ A EZ A EZ A 19 = 991p 9 9911 rah Σ(Xi − X̄)(Yi − Ȳ ) ΣXhi Y nX̄ Ȳ i o− eN y m b1 = = e Σ(Xi − X̄)2EZ Acad ΣXi2 − nX̄ 2 )2 emy cad EZ A rah heNo SSR = Σ(Ŷi −Ȳ )2 = b21 Σ(Xi −X̄)2 9 Cov(b 9911 0 , b1 ) σ 2 X̄ =− Σ(Xi − X̄)2 emy cad EZ A rah heNo 9 9911 19 119 91819 STAra 302/1001 Midterm A - Page 10/20/2015 rah 991 h 99 h2 9of a r o o o heN heN heN emy emy emy d d d a a a c c c A Z A answer. EZ A EZAnswer 1. (10 points) Multiple Choice the following questions by circling theEbest I. Which statement is not true about Maximum Likelihood Estimates (MLEs) in general? A. They are unbiased B. They are consistent C. They are efficient 9 9119 to a Normal distribution 9tend 9911 D. They h h a a r r No No N y he y he y he m m m e e e d d d ca ca ca EZ A II. Race (0 = White, 1 = Asian, 2 = Other) EZisA best modeled as what type of variable? EZ A A. Categorical B. Ordinal C. Interval D. Ratio 119 119 9119 h 9 h 99 h 99 a a a r r r III. The errors ε in SLR are best modeled as a(n): o o o i heN heN heN emy emy emy d d d A. Unknown constant a a a c c c EZ A EZ A EZ A B. Known constant C. Known random variable D. Unknown random variable IV. Suppose you have numerical variables {Y, X1, X2} in your global environment in R. Which 119 of R code will correctly fit the SLR model99EY 119= β0 + β1 X1 ? of the following99lines h h a a r r No No N y heA. fit(Y ∼ X1) y he y he m m m e e e d d d ca ca ca B. lm(Y ∼ X1) EZ A EZ A EZ A C. predict(Y ∼ X1) D. fit(df$Y ∼ df$X1) V. In SLR, when the coefficient of determination (R2 ) is high ... A. The relationship is probably linear 119 119 9119 9 h h 99 h 99 a a a r r r B. You can make accurate predictions o o o eN heN heN myishexplained by X emy eY emy d d d C. A lot of variation in a a a c c c EZ A EZ A EZ A D. The relationship between X and Y is a strong positive relationship Answer the following True or False questions by writing ’T’ or ’F’ in the blank Do not write something ambiguous like ∓ or =! F A Type II error 9 is when we incorrectly reject the null hypothesis. 9 9911 9911 h h a a r r o N heN heNo type of object. Fy In an R data frame, all of the rows must beem the y same y he m m e e d d d ca ca ca EZ A EZ A EZ A T The Spearman correlation is simply Pearson’s correlation of the rank-ordered values. T Least squares and Maximum Likelihood Estimation give the same estimators for the slope and intercept. T Prediction intervals are always wider than confidence intervals. 119 119 9 h 9 h 99 a a r r o o heN heN emy emy d d a a c c EZ A EZ A emy cad EZ A rah heNo 9 9911 19 119 91819 STAra 302/1001 Midterm A - Page 10/20/2015 rah 991 h 99 h3 9of a r o o o heN heN heN emy emy emy d d d a a a c c c A as known EZ A EZβA EZtake 2. Consider the SLR model Yi = 0 + β1 Xi + εi . For these questions, you may anything we proved in lecture about the ki , if you wish. (a) (2 points) Show that ΣXi ei = 0 Solution: ΣXi ei = Σ(Xi − X̄)ei = Σ(Xi − X̄)(Yi − b0 − b1 Xi ) SSXY = Σ(Xi − X̄)(Y 19x = 0 1i 1−9 Ȳ ) − b0 Σ(Xi − X̄) − b1 SSx = SSXY − SS9x91SS 9 9 h h a a r r No No y he y he m m e e d d a of β1 . ca Z Ac EZ A(b) (2 points) Show that b1 is an unbiased Eestimator N y he EZ em Acad Solution: E[b1 ] = E[Σki Yi ] = Σki E[Yi ] = Σki (β0 + β1 Xi ) = β0 Σki + β1 Σki Xi = β1 Consider a regression model Yi = β1 Xi + εi with all fixed constants Xi > 0 and the usual G-M assumptions on the errors. 19 19 91β 9119 (c) (391points) Derive b1 , the least squares estimate h 9 h 9for 1 by minimizing the sum of squared orah 9 a a r r o o heN heN heN residuals. emy emy emy d d d a a a c c c EZ A EZ A EZ A Solution: SSE = Σ(Yi − b1 Xi )2 dSSE db1 = −2ΣXi (Yi − b1 Xi ) ΣXi Yi − b1 ΣXi2 = 0 i Yi b1 = ΣX ΣX 2 119 i ah eNor 99 ah eNor 9 9911 N y h y h y he m m m e e e d d d a ca above is a consistent estimator of βE Aca 1 .Z Ac EZ A(d) (3 points) Show that the estimator you EZderived Recall that an estimator is consistent if it converges to its target parameter in the limit as n → ∞. You may use the following hints without proof: ΣXik ΣXi Yi k → E[XY ] by the Law of Large Numbers. n → E[X ] and n If two estimators each converge to a parameter, the ratio of those estimators converges to the ratio of the parameters. 119 119 119 9 h 9 h 99 h 99 a a a r r r o o o eN ΣX Yi /n E[XY heN heN my 2h] Solution: b1 = ΣXi 2 /n → emy eE[X emy d d d a a a c c c ] iEZ A EZ A EZ A 2 XE[(β1 X+ε)] E[X 2 ] = ah eNor = Xβ1 E[X] E[X 2 ] = β1 E[X ] E[X 2 ] 9 9911 y h EZ emy cad EZ A em Acad rah heNo = β1 ah eNor 9 9911 N y he y h EZ 9 9911 emy cad EZ A em Acad rah heNo EZ 9 9911 emy cad EZ A em Acad rah heNo 9 9911 19 119 91819 STAra 302/1001 Midterm A - Page 10/20/2015 rah 991 h 99 h4 9of a r o o o heN heN heN emy emy emy d d d a a a c c c ZA Z Arating score EZ A 3. (30 points) In a recent (1992)Eexperiment, Hungarian food scientists measuredEthe given by regular consumers as well as one given by experts, for a variety of fruit juices. We will try to predict the expert score from the consumer score; some R output from a fitted SLR model follows. You may assume all G-M assumptions are met. > anova(fit) 9 119 9911 Analysis of oVariance h 99 Table h a a r r N No y he y he m m e e d d ca ca EZ AResponse: expertScore EZ A Df Sum Sq Mean Sq F value Pr(>F) consumerScore 1 [A] [B] 16.696 0.0004238 Residuals [C] 54.278 2.262 > summary(fit) 119 119 Call: h 99 h 99 a a r r o o heN lm(formula = expertScore ~ consumerScore, heN data = juice) emy emy d d a a c c EZ A EZ A Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) [D] 1.5308 [E] 1.38e-06 consumerScore 0.4685 0.1147 4.086 [F] N y he EZ emy cad EZ A 9 119 Residual standard [G] on 24 degrees of freedom ah 9911 99error: h a r r eNo heNo Multiple [H], Adjusted R-squared:my0.3857 y hR-squared: m e e d d a ca 0.0004238 Z Ac EZ AF-statistic: 16.7 on 1 and [I] DF, Ep-value: > apply(juice, 2, mean) # Means of Y and X expertScore consumerScore 15.89231 13.10000 > apply(juice, 2, sd) # SDs of Y and X 9 1 1 119 expertScore consumerScore h 99 h 99 a a r r o o 2.622975 heN 1.918734 heN emy emy d d a a c c EZ A EZ A em Acad rah heNo 9 9911 N y he EZ emy cad EZ A em Acad rah heNo 9 9911 (a) (9 points) Some values have been replaced with letters. Fill in those values. You do not need to show any work for this part. (A) (B) eNorah y h m ade c A (C) EZ Solution: emy cad EZ A rah heNo 9 9911 (D) (G) (E) 11 h 99 a(H) r o N e 9 EZ (F) (A) 37.7664 (B) 37.7664 (C) 24 9 9911 emy rah heNo EZ (I) (D) 9.7550 (E) 6.3725 (F) 0.0004 cad EZ A N y he y h em Acad em Acad (G) 1.504 (H) 0.4103 (I) 24 9 9911 emy cad EZ A rah heNo 9 9911 119 91819 STAra 302/1001 Midterm A - Page h 99 h5 9of a r o o heN heN emy emy d d a a c c A the true regression slope β . EZ A EZfor (b) (2 points) Give a 90% CI 1 9 EZ 1 10/20/2015 rah 991 o heN emy d a c A Solution: 90% CI f or β1 : b1 ± t24,0.95 s{b1 } 1 0.4685 ± (1.711)(0.1147) 0.4685 ± 0.1963 1 9 9 9911 9911 h h a a r r No N (c) (3 ypoints) heNo Test the null hypothesis that the slope yofhethe regression line is equal to 0.6. y he m m m e e e d d d ca State the null hypothesis in terms of parameters, ca give the test statistic, and give the mostZ Aca EZ A EZ A E accurate p-value you can (a range is OK here). Solution: H0 : β1 = 0.6 vs. Ha : β1 6= 0.6 1 −0.6 t∗ = bs{b = 0.4685−0.6 = −1.1465 on 24 df 1 0.1147 1} One-sided p-value (0.1, 0.15) 119 119 119 Two-sided p-value (0.2, 0.3) 1 h 99 h 99 h 99 a a a r r r o o o N heN hereject heN ∴ We do not have enough evidence to the claim of β1 = 0.6 1 emy emy emy d d d a a a c c c EZ A EZ A EZ A (d) (5 points) A consumer gives a score of 12 for a new juice that has just hit the market. Give a point estimate for the corresponding expert score, and provide an appropriate interval around this estimate. ˆ 9= 9.7543 + 0.4685(12) = 15.3763 1 Solution: expert 9 11 9 9 9911 h h ˆ a a r r SSx efNrom o V ar(b1 ) = 171.9356 1 No y2 h y he 2 m m (X − X̄) e e 1 h d d ca s {pred} = M SE 1 + n + SSx 1 ca EZ A EZ A 2 1 = 2.262 1 + 26 + (12−13.1) = 2.3649 1 171.9356 95% P I f or Yh √ : Yˆh ± t24,0.975 s{pred} 15.3763 ± 2.064 2.3649 15.3763 ± 3.1741 1 119 119 h 99 h 99 a a r r o o heN heN emy emy d d a a c c EZ A EZ A ah eNor 9 9911 y h EZ emy cad EZ A em Acad rah heNo ah eNor N y he EZ emy cad EZ A rah heNo EZ 9 9911 emy cad EZ A rah heNo 9 9911 9 9911 N y he y h em Acad em Acad EZ 9 9911 emy cad EZ A em Acad rah heNo 9 9911 19 119 91819 STAra 302/1001 Midterm A - Page 10/20/2015 rah 991 h 99 h6 9of a r o o o heN heN heN emy emy emy d d d a a a c c c A context of EZ A EZ A (e) (1 point) Give an interpretation of the estimated slope in plain English, E inZthe this question. Solution: For each additional rating score given by a consumer, we expect the expert score to increase by 0.4685. 9 119 9911 there are some consumer scores near zero, 99give h h (f) (1 point) Assuming an interpretation of a a r r o N heNo heN y y y he the estimated intercept in plain English, in the context of this question. m m m e e e d d d ca ca ca EZ A EZ A EZ A Solution: We expect experts to give a score of 9.75 when consumers would rate the juice zero. (g) (2 points) A colleague of yours suggests that there is no correlation between these two variables. Can you test this hypothesis with the information given? If so, give the null 19 119 9119 91and 9 9 h h hypothesis in symbols and the p-value of the test, a conclusion in plain language. If orah 99 a a r r o o heN heNtest. heN y the not, explain what you are missing for emy em emy d d d a a a c c c EZ A EZ A EZ A Solution: Sure, just use the t-test for slope as they are equivalent. H0 : ρ = 0 p = 0.0004 Looks like very strong evidence that the correlation between these two variables is not zero. The colleague is wrong. 9 9 9911 9911 h h a a r r No No N y he y he y he m m m e e e d d d a ca A separate SLR model was fit, with X and ca Ycreversed. Some of the output is given below. EZ A EZ A EZ A > anova(fitInv) Analysis of Variance Table Response: consumerScore 119 119 9 Df Sum Sq Mean Sq F value h 9 h 99Pr(>F) a a r r o o heN heN expertScore 1 70.566 70.566 emy emy 16.696 0.0004238 d d a a c c Residuals 24 101.434 EZ A EZ A 4.226 emy cad EZ A rah heNo 9 9911 > summary(fitInv) Call: lm(formula = consumerScore ~ expertScore, data = juice) 9 9 Coefficients: 9911 9911 h h a a r r No No Estimate Std. Error t value Pr(>|t|) y he y he m m e e d d a ca (Intercept) -0.8155 3.4293 E-0.238 Z Ac 0.814055 EZ A expertScore 0.8756 0.2143 4.086 0.000424 N y he EZ em Acad Residual standard error: 2.056 on 24 degrees of freedom emy cad EZ A rah heNo 9 9911 emy cad EZ A rah heNo 9 9911 emy cad EZ A rah heNo 9 9911 19 119 91819 STAra 302/1001 Midterm A - Page 10/20/2015 rah 991 h 99 h7 9of a r o o o heN heN heN emy emy emy d d d a a a c c c Z Acase 0.8756) EZ A EZ A for b01 , the slope of the inverted model (in E (h) (2 points) Derive an expression this in terms of the original slope b1 from the previous model. You can, of course, check your derivation using the numbers posted. √ √ r SSx SSy SSxy = SSx SSx 9 s2x s s sx 1 x x 1 r sy =h(b919sy ) sy = b1 s2 y ra Solution: b1 = √ s SSy = r sxy = r √SSx 9 b01 = 9911 h a r No No y he y he m m e e d d a ca Z Actwo models have the same slope? EZ A (i) (1 point) Under what conditions wouldEthe N y he EZ em Acad Solution: If s2x = s2y (j) (2 points) In class I said that you cannot simply invert your original regression line when making 119 inverse predictions, except in a special 9case. 119 Under what (minimal) conditions 119 h 99 h 9 h 99 a a a r r r o o o would it be acceptable to do this? heN heN heN emy emy emy d d d a a a c c c EZ A EZ A EZ A Solution: When r = 1 then b01 = b11 (k) (2 points) Give a 95% CI for the true intercept β0 for this model. Solution: 95% CI f or β0 : b1 ± t24,0.975 s{b0 } 1 9 119 9 9 9911 −0.8155or± h(2.064)(3.4293) h a a r N No y he ± 7.0781 1 y he −0.8155 m m e e d d ca ca EZ A EZ A emy cad EZ A rah heNo 9 9911 emy cad EZ A ah eNor rah heNo EZ emy cad EZ A rah heNo EZ 9 9911 emy cad EZ A 9 9911 y h em Acad N y he ah eNor rah heNo EZ 9 9911 emy cad EZ A rah heNo 9 9911 9 9911 N y he y h em Acad em Acad EZ 9 9911 emy cad EZ A em Acad rah heNo 9 9911 119 STAra 302/1001 h 99 o heN emy d a c EZ A ah eNor 91819 Midterm A - Page h8 9of a r o heN emy d a c EZ A 9 9911 y h EZ emy cad EZ A em Acad rah heNo 9 9911 emy cad EZ A EZ emy cad EZ A rah heNo emy cad EZ A rah heNo 9 emy cad EZ A ah eNor 9 emy cad EZ A em Acad rah heNo 9 N y he 9 9911 emy cad EZ A 9 ah eNor EZ 9 9911 emy cad EZ A rah heNo em Acad rah heNo 9 9911 9 9911 N y he y h em Acad 9 9911 9911 EZ 9911 em Acad rah heNo y h y h EZ N y he 9911 9 9911 em Acad 9 9911 EZ 9911 EZ ah eNor em Acad rah heNo y h em Acad EZ y h EZ ah eNor ah eNor 9 1 10/20/2015 rah 991 o heN emy d a c A EZ 9 9911 emy cad EZ A em Acad rah heNo 9 9911