Stat 330 (Spring 2015): Homework 12 Due: May 1, 2015 Show all of your work, and please staple your assignment if you use more than one sheet. Write your name, the course number and the section on every sheet. Show all work to earn partial credit. Problems marked with * will be graded and one additional randomly chosen problem will be graded. 1. * A student interested in purchasing a multifunction inkjet printer wanted to study the relationship between the retail price (in dollars) and the printing speed (pages per minute) of a typical personal printer. He obtained data for 20 inkjet printers from a current PC magazine and calculated the following statistics: n = 20 Σ x2i = 167.51 Σ xi = 56.3 Σ yi2 = 658528 Σ yi = 3232.0 Σ xi yi = 9918.3 (a) Estimate the linear regression equation predicting the price of an inkjet printer, y, based on the printing speed, x. (b) Predict the price in dollars for a printer with a speed of 3.5 pages a minute. (c) Calculate the sample correlation between x and y and the coefficient of determination, R2 . What does the R2 value tell you about the fit of your line? (d) Provide an estimate of the error variance σ 2 Answer: We first compute Sxx , Sxy and Syy : Sxx = n X i=1 Sxy n X xi )2 /n x2i − ( i=1 = 167.51 − 56.32 /20 = 9.0255 n n n X X X = xi yi − ( xi yi )/n i=1 Syy i=1 i=1 = 9918.3 − (56.3)(3232)/20 = 820.22 n n X X yi2 − ( yi )2 /n = i=1 i=1 = 658525 − 32322 /20 = 136233.8 (a) The regression equation is E(Yi ) = β0 + β1 xi where Yi is the price for printer i and xi is the speed of the printer. We estimate β0 and β1 using the following formulas: Pn (x − x)(yi − y) Pn i β̂1 = i=1 2 i=1 (xi − x) Sxy = Sxx 820.22 = = 90.8780677 9.0255 β̂0 = y − β̂1 x = 3232/20 − 90.8780677 × (56.3/20) = −94.22176057 So the prediction equation is ŷi = −94.2218 + 90.8781xi . (b) To predict the price in dollars for a printer with a speed of 3.5 pages a minute plug in the speed xi = 3.5 in the prediction equation So ŷ = −94.2218 + 90.8781 × 3.5 = 223.85 dollars. (c) We can estimate the correlation r and R2 using the following formulas: Sxy 820.22 √ r= p =√ = .74 Sxx Syy 9.0255 136233.8 R2 =r2 = 0.55 The linear model describes only 55% of the variability in the price of inkjet printers; so this is only a weak to moderate fit to the data. 1 Stat 330 (Spring 2015): Homework 12 Due: May 1, 2015 (d) An estimate of the error variance σ 2 is given by σ̂ 2 = n 1 X SSE (yi − ŷi )2 = . n −Chapter 2 i=1 n−2 9 77 So we first need to find SSE=SST-SSR. The regression sum of squares SSR is given by (c) The 90% SSR confidence interval σ is = b1 × Sxy =for 90.8780677 × 820.22 = 7454.0 "s # # "r s r and SST = Syy = 136233.8. Thus = 61693.8 giving (n −SSE 1)s2 = 136233.8 (n − 1)s2− 7454.0 (2)(400 (2)(400 , = , 2 2 χ χ21−α/2− 2) = 3427.43 5.99 0.10 σ̂α/2 = 61693.8/(20 2. * (Baron’s book): 9.10 = [11.6, 89.4] (thousand dollars) Answer: 9.10 (a) Find p̂ = 24/200 = 0.12. Then for α = 1 − 0.96 = 0.04, find zα/2 = z0.02 = 2.054 (the easiest way is to use Table A5 with ∞ degrees of freedom) r r p̂(1 − p̂) 0.12(1 − 0.12) = 0.12 ± (2.054) p̂ ± z0.02 n 200 = 0.12 ± 0.047 or [0.073, 0.167] (b) Test H0 : p ≤ 0.1 (or H0 : p = 0.1) vs HA : p > 0.1. Disproving the manufacturer’s claim means rejecting H0 in favor of this HA . This is a one-sided test, therefore our two-sided confidence interval in (a) cannot be used to conduct this test. The observed test statistic is 0.12 − 0.1 p̂ − p0 = q Z= q p̂(1−p̂) n 0.12(1−0.12) 200 Chapter 9 = 0.8704. 79 In order to consider different significance levels, let us compute the P-value, Then the test statistic is P = P {Z > 0.8704} = 1 − Φ(0.8704) = 1 − 0.8078 = 0.1922, 0.6 − 0.59 from Table A4.Z = q = 0.1307 1 1 (0.5941)(1 − 0.5941) 70 + 100 The P-value exceeds both 0.04 and 0.15. Therefore, we do not have a significance evidence, at the mentioned levels, to disprove the manufacturer’s claim. The P-value equals 9.11 Test H0 : p1 = p2 Pvs=H2P p1 > > |0.1307|} p2 . Higher quality means=lower proportion of defective A : {Z = 2(1 − 0.5517) 0.8966 items. 3. * (Baron’s book): 9.16 (Table Thisfrom is a very high P-value, no significant difference between Given p̂1A4). = 0.12 a sample of sizethus n =there 200 isand p̂2 = 13/150 = 0.0867 from a the support candidate in the two Answer: sample of sizeofmthe = 150, we compute thetowns. pooled proportion np̂1 + mp̂ 24 p̂+2 13 9.16 Here n1 = 250, n2 = 300, p̂1 = 10/250 = 0.04, and = 18/300 = 0.06. 2 p̂(pooled) = n+m = 200 + 150 = 0.1057. (a) A 98% confidence interval for p1 − p2 is Then, the test statistic is s p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 ) p̂2 ± z0.02/20.12 − 0.0867+ Zp̂1=− q n1 n2 = 1.0027 1 1 r 200 + 150 (0.1057)(1 − 0.1057) (0.04)(0.96) (0.06)(0.94) = 0.04 − 0.06 ± 2.326 + 250 300 Finally, we compute the P-value = −0.02 ± 0.043 or [−0.063, 0.023] P = P {Z > 1.0027} = 1 − 0.8413 = 0.1587 (b) The null hypothesis H0 : p1 = p2 is not rejected against the two-sided alternative (Table H A4),: pit 6= is prather large, and we conclude that there is no significance evidence A 1 2 (p1 − p2 = 0) at the 2% level because the 98% confidence interval that the quality of items byisthe supplier is higher than the the quality quality of for p1 − p2 contains produced 0. No, there no new significant difference between items in Exercise 9.10. of the two lots. 9.17 For p̂1 = 45% support of candidate A, the margin of error is r r 2 p̂1 (1 − p̂1 ) (0.45)(0.55) z0.025 = 1.96 = 0.0325 or 3.25% n 900 For p̂2 = 35% support of candidate B, the margin of error is