Stat 231 Exam III December 5, 1997 Prof. Vardeman Attached to this exam are 5 pages of Minitab regression analysis printout. Use them in answering the following questions. You need NOT compute by hand anything that you can get from the printout. (Indeed, you will be wise to avoid wasting time doing hand calculations for things that are obtainable from the printout.) The data used in creating the printout come from a study of a nitride etch process on a single wafer plasma etcher. The process variables studied were B" œ power applied to the cathode (W) B# œ pressure in the reaction chamber (mTorr) B$ œ gap between the anode and the cathode (cm) B% œ flow of the reactant gas C# F' and the response variable was C œ selectivity of the process (SiN/polysilicon) . The first 2 "# pages of the printout concern a simple linear regression analysis based on the model C3 œ "! "$ B$3 %3 . Until further notice, give answers based on (or related to) this SLR model. (a) What fraction of the raw variability in C is explained using B$ as a predictor variable? (b) What is the sample correlation between C and B$ ? (Give a number.) (c) There are two plots on page 2 of the printout. These are plots of standardized residuals versus sC and standardized residuals versus B$ . What difficulty with the simple linear regression model do they reveal? Explain. For purposes of answering the questions (d)-(h) ignore the difficulty discovered in part (c). (d) What is indicated by the plot on the top of page 3 on the printout? -1- (e) If one uses ! œ Þ!&, will one reject H! :.ClB$ œ "! "$ B$ based on a formal lack of fit test? Explain. Reject? yes Explanation: no (circle one) (f) Give a 90% two-sided confidence interval for the increase in mean value of selectivity (C) that accompanies a 1 cm increase in the gap between the anode and the cathode. (No need to simplify after plugging in.) (g) Give a 95% two-sided prediction interval for the next selectivity (C) that accompanies a .8 cm gap (B$ ). (h) As it turns out, the data have B$ œ Þ*'$' and =#B$ œ Þ!$!&%&. Use these facts and find a 95% confidence interval for the mean selectivity that accompanies a .9 cm gap. (No need to simplify.) Beginning in the middle of page 3 of the printout, there is a multiple linear regression analysis of the data. Note that there are both an "all possible subsets" regression and a regression of C on B" ß B# ß B$ and B% (and some plots and some sample correlations). Use these to answer questions (i) through (r). (i) Based on the "all possible subsets regression" output, what reduced model seems like one that should be investigated as a possible "simple" explanation of C? Explain in terms of V # ß = and G: . -2- Model: __________________ Explanation: (j) What degrees of freedom would be used in a formal test for lack of fit to the multiple linear regression model including all 4 predictors B" ß B# ß B$ and B% ? numerator .0 œ _____ denominator .0 œ _____ (k) Give the value of an F statistic, its degrees of freedom and the corresponding :-value for testing whether the variables B" ß B# ß B$ and B% together provide any ability to predict or explain C. 0 œ __________ .0 œ _____ , _____ :-value œ __________ (l) Find the value of an F statistic and its degrees of freedom for testing whether after taking B" and B$ into account, the variables B# and B% provide significant additional ability to explain or predict C. (This is not an easy question, but there is enough information on the printout to allow you to find this 0 .) 0 œ __________ .0 œ _____ , _____ (m) Find the value of a > statistic, its degrees of freedom and the corresponding :-value for testing whether after taking B" ß B# ß and B$ into account, B% adds important explanatory power to the multiple regression model. > œ __________ .0 œ __________ :-value œ __________ (n) Give a 97.5% lower prediction bound for the next value of C when B" œ $#&ß B# œ &!!ß B$ œ Þ) and B% œ #!!. -3- (o) Comment on the appearance of the plot of /‡ versus sC given at the bottom of page 4 of the printout. (p) The first plot on page 5 of the printout is a plot of C versus sC . It appears to be fairly linear. What sample correlation should be associated with this plot? (Give a number.) (q) Based on the MLR model involving all 4 B's, give a 90% two-sided confidence interval for increase in mean C that accompanies a 1 cm increase in gap (B$ ) if B" ß B# and B% are held fixed. (There is no need to simplify after plugging in.) (r) Notice that from the first (SLR) regression run ,$ œ "Þ!%&) while from the last (MLR) regression run ,$ œ "Þ!%"#. This difference is not simply numerical error in Minitab. Explain why you are not surprised that there is a difference in these two figures. (What do the correlations given on page 5 of the printout have to say about this?) -4- MTB > print c1-c5 Data Display Row y x1 x2 x3 x4 1 2 3 4 5 6 7 8 9 10 11 1.63 1.37 1.10 1.58 1.26 1.72 1.65 1.42 1.69 1.54 1.72 275 275 275 300 300 275 300 325 325 325 275 450 500 550 450 500 450 550 450 500 550 450 0.8 1.0 1.2 1.0 1.2 0.8 0.8 1.2 0.8 1.0 0.8 125 160 200 200 125 125 160 160 200 125 125 MTB> Name c6 = 'FITS1' c7 = 'SRES1' c8 MTB > Regress 'y' 1 'x3'; SUBC> Fits' FITS1'; SUBC> SResiduals 'SRES1'; SUBC> Constant; SUBC> Pure; SUBC> Predict 'x3'; SUBC> PFits 'PFIT1'. = 'PF1T1' Regression Analysis The regression equation is y = 2.52 - 1.05 x3 Predictor Constant x3 S = Coef 2.5242 -1.0458 0.09670 R-Sq StDev 0.1711 0.1750 = 79.9% T P 14.75 -5.98 0.000 0.000 R-Sq(adj) = 77.6% Analysis of Variance Source Regression Error Total OF 1 9 10 SS 0.33410 0.08416 0.41825 Unusual Observations Obs x3 y 3 1.20 1.1000 MS 0.33410 0.00935 F P 35.73 0.000 Fit StDev Fit 1.2692 0.0506 Residual -0.1692 St Resid -2.05R R denotes an observation with a large standardized residual Fit StDev Fit 1.6875 0.0409 1.4783 0.0298 1.2692 0.0506 1.4783 0.0298 1.2692 0.0506 1.6875 0.0409 1.6875 0.0409 1.2692 0.0506 1.6875 0.0409 1.4783 0.0298 1.6875 0.0409 Pure error test - F 95.0% CI 1.5950, 1.7800) 1.4108, 1.5459) 1.1547, 1.3837) 1.4108, 1.5459) 1.1547, 1.3837) 1.5950, 1.7800) 1.5950, 1.7800) 1.1547, 1.3837) 1.5950, 1.7800) 1.4108, 1.5459) 1.5950, 1.7800) = 0.14 P = 95.0% PI 1.4500, 1.9250) 1.2493, 1.7073) 1.0222, 1.5161) 1.2493, 1.7073) 1.0222, 1.5161) 1.4500, 1.9250) 1.4500, 1.9250) 1.0222, 1.5161) 1.4500, 1.9250) 1.2493, 1.7073) 1.4500, 1.9250) 0.7214 DF(pure error) -1- = 8 MTB > Plot 'SRES1' 'FITS1'; SUBC> Symbol '*'. Character Plot SRES1 1.5+ * * * 0.0+ 2 * * * * * -1.5+ * ----+---------+---------+---------+---------+---------+--FITS1 1.2BO 1.360 1.440 1.520 1.600 1.6BO MTB > Plot 'SRES1' 'x3'; SUBC> Symbo I '*, . Character Plot SRES1 * 1.5+ * 2 0.0+ * * * * * -1.5+ * * ----+---------+---------+---------+---------+---------+--x3 O.BOO O.BBO 0.960 1.040 1.120 1.200 MTB> SUBC> MTB > DATA> DATA> MTB > MTB > MTB > SUBC> MTB > SUBC> SUBC> SUBC> Sort 'SRES1' cB; By 'SRES1'. Set c9 1( 1 : 11 / 1 )1 End. sub .5 c9 c9 div c9 11.0 c9 InvCDF C9 C9; Normal 0.0 1.0. Plot CB C9; Symbo l '* ,; XLabel 'standardized residual quantile'; YLabel 'std normal quantile'. -2- Character Plot s t d * 1.5+ n * o r m a l 0.0+ * * * * * * * q * u -1.5+ a n t i * --------+---------+---------+---------+---------+--------1.40 -0.70 0.00 0.70 1.40 standardized residual quantile MTB > BReg 'y' 'x1' 'x2' 'x3' 'x4' ; SUBC> NVars 1 4; SUBC> Best 4. Best Subsets Regression Response is y Vars R-Sq R-Sq (adj) 1 1 1 1 2 2 2 2 3 3 3 3 4 79.9 17.5 6.7 0.6 87.6 85.3 80.4 20.5 95.7 89.1 85.4 24.8 96.4 77.6 26.6 0.096700 8.3 130.7 0.19583 0.0 148.7 0.20825 0.0 158.8 0.21489 84.5 15.7 0.080524 81.6 19.6 0.087782 75.5 27.7 0.10123 0.7 127.6 0.20384 93.9 4.1 0.050394 84.4 15.2 0.080755 79.1 21.4 0.093509 0.0 122.5 0.21197 94.0 5.0 0.050069 Cop S x x x x 1 234 X X X X X X X X X X X X X X X X X X X X X X X X X X X X MTB > Name c10 = 'FITS2' c11 = 'SRES2' MTB > Regress 'y' 4 'x1' 'x2' 'x3' 'x4'; SUBC> Fits 'FITS2'; SUBC> SResiduals 'SRES2'; SUBC> Constant; SUBC> Predict 'x1' 'x2' 'x3' 'x4'. Regression Analysis The regression equation is = 2.29 + 0.00327 x1 - 0.00133 x2 - 1.04 x3 -0.000532 x4 y -3- StOev 0.2544 0.0007621 0.0003811 0.09526 0.0005094 Predictor Coef 2.2896 Constant 0.0032704 x1 -0.0013315 x2 -1.04120 x3 -0.0005321 x4 S = 0.05007 = R-Sq 96.4% T 9.00 4.29 -3.49 -10.93 -1.04 R-Sq(adj) = P 0.000 0.005 0.013 0.000 0.337 94.0% Analysis of Variance Source Regression Error Total OF 4 6 10 SS 0.40321 0.01504 0.41825 Source x1 x2 x3 x4 OF 1 1 1 1 Seq SS 0.00267 0.08305 0.31476 0.00273 Fit StOev Fit 1.6903 0.0276 1.3968 0.0232 1.1007 0.0433 1.5239 0.0333 1.2890 0.0328 1.6903 0.0276 1.6203 0.0330 1.4187 0.0391 1.7473 0.0393 1.5124 0.0388 1.6903 0.0276 MS 0.10080 0.00251 95.0% CI 1.6228, 1.7578) 1.3401, 1.4536) 0.9949, 1.2066) 1.4424, 1.6054) 1.2088, 1.3692) 1.6228, 1.7578) 1.5396, 1.7010) 1.3230, 1.5144) 1.6511, 1.8435) 1.4175, 1.6074) 1.6228, 1.7578) F 40.21 P 0.000 95.0% PI 1.5504, 1.8302) 1.2618, 1.5319) 0.9388, 1.2627) 1.3767, 1.6710) 1.1425, 1.4355) 1.5504, 1.8302) 1.4735, 1.7670) 1.2632, 1.5742) 1.5915, 1.9031) 1.3574, 1.6674) 1.5504, 1.8302) MTB > Plot 'SRES2' 'FITS2'; SUBC> Symbol '*'; SUBC> XLabel 'yhat'; SUBC> YLabel 'standardized residual'. Character Plot s t a * 1.2+ n d * * a r d 0.0+ * i z e * * d 2 * -1.2+ r * e s i * +---------+---------+---------+---------+---------+-----1.08 1.20 1.32 1.44 1.68 1.56 yhat MTB > Plot 'y' 'FITS2'; SUBC> Symbol '*'; -4- SUBC> SUBC> Xlabel 'yhat'; Ylabel 'y'. Character Plot 2 1.60+ * y * * * * * 1.40+ * * 1.20+ * +---------+---------+---------+---------+---------+-----1.56 1.44 1.08 1.20 1.32 1.68 yhat MTB > Correlation 'x1' 'x2' 'x3' 'x4'. Correlations (Pearson) x2 x3 x4 x1 0.214 0.214 0.210 x2 x3 0.214 0.210 0.210 -5- Key I December 1 I i Prof. Vardeman The data used in creating the printout come from a study of a nitride etch process etcher. The process variables studied were :rl = power applied to the cathode (W) %2 = pressure in the reaction chamber (mTorr) %5 = gap between the anode and the cathode (em) x .• = flow of the reactant gas c,Fs and the response variable was i i The first I (e) If one uses 2l pages of the printout concern a simple linear regression Until further notice, give (a) What fraction 8'lSWerS Q = .05. will one reject Ro:!'vl" = f30 + J3:J:r,based on a formal lack or fit test? Explain. Reject? . yes ~ (circle one) Explanation: --\IA.t. f-I/~I,t..(. ~~~ l 'F<\e. ( analysis based on the model is in y is explained Xl (Ast- pluggingin.) as a predictor ,72. ,"I.;,,~ is "~r\ (Mle (f) Give a 90% two-sided confidence interval for the increase in mean value of selectivity (y) that accompanies a 1 em increase in the gap between the anode and the cathode. (No need to simplify after • using A 41- +"'S+ 1(., Iu~ !A~..t1- .6 3 based on (or related to) this SLR model. of the raw variability • -th 1-4" . OS) '11 is Yi=f30+f3J:r"+'i ~ on a single wafer plasma = selectivity of the process (SiN/polysilicon) y J Stat 231 Exam III - Attached to this exam are S pages of Mini tab regression analysis printout. Use them in answering the following questions. You need NOT compute by hand anything that you can get from the printout. (Indeed, you will be wise to avoid wasting time doing hand calculations for things that are obtainable from the printout.) I j :=.. 5. 1997 -t :t (~<J:It>f §fJ..).rl bs ') variable? j ',j (g) Give (:r,). I (b) What is the sample correlation h?, M1-~ -J{..t 'j I 1:>3 .~ .....- So between ,= - J y and X3? (Give a number.) '1~'hV~ - is l'v.~ ~ '! -;-.~ :ct I'I~ tN-M\ j(fl'Uro:. (h) As it turns out, the data have X3 = .9636 and s~.= .030545. Use these facts and find a 95% confidence interval for~he mean s~lectivity that accompanies a.9 em gap. (No need to simplify.) ~(:c.;i-.x:~) = ~-\)S~ =- 10(.030545')'" .305+5 For purposes of answering ~t lit Y..M'Ab·'\'I~ 11k .-x- ~~ ~ t>~{\f 1'h 'lID\ot~. It. ~~4- .79'3'> (c) There are two plots on page 2 oCthe printout These are plots of standardized standardized residuals versus X3. What difficulty with the simple linear regression Explain. vs- a 95% two-sided prediction interval for the next selectivity (y) that accompanies a .8 em gap SL-R. 1I\CN\ the questions 'S, I" Y-"Sr"'''s:e:. -nJ. ""'C;;V."'?l'l~s d~"'s. I\~<;;~t- cr ~ ~p~U'r~ 1D .be:. (d)-(h) ignore the difficulty discovered 1'= V:;2. residuals versusy and model do they reveal? - I.OS(.~) .t (-;'.?2-I.O;(.~)) Beginning in the middle oCpage 3 of the printout, there is a multiple linear regression analysis of the data. Note that there are both an "all possible subsets" regression and a regression ofy on X1.%2,%S and x" (and some plots and some sample correlations). Use these to answer questions (i) through (r). in part (c). ·1· ~ (i) Based on the "all possible subsets regression" be investigated as a possible "simple" explanation ·2· (n) Give a 97.5% lower prediction =200. output. what reduced model seems like one that should of1/? Explain in terms of R',6 and 011' bound for the next value or y when = 325, Xl %2 = 500,X3 = .8 and %~ t. 5>,,/5 (0) Comment on the appearance of the plot of e" versusy ~~ i6t (fv.p~~) ;.Jl. '\'k.•.f~ie.."t.e~ SlAf«.,' (.f,..O numerator ~ df = -±.... denominator df = I~ 2 (k) Give the value of an F statistic, its degrees of freedom and the corresponding p-value for testing whether the variables %11%'1 %s and together provide any ability to predict or explain y. %" f= "3. 40.2./ p-value= of page. 4 of the printout. ~ ~ 1'''''~~S: VI~ filL.~ 1M£..(' ~ . (P) The first plot on page 5 of the printout is a plot ofy versus y. It appears sample correlation should be associated with this plot? (Give a number.) to be fairly linear. What ..0~J:t1M~a~)= +~ . dl=~.~ given at the bottom I".~~I;(u t'("rt«,ttss: .. . .. .000 (1) Find the value of an F statistic and its degrees of freedom for testing whether after taking %1 and Xl into account, the variables X2 and provide significant additional ability to explain or predict y. (This is not an easy question, but there is enough information on the printout to allo, you to :find this f.) x" (q) Based on the MLR. model involving all 4 %'S, give a 90% two-sided confidence interval for increase in mean y that accompanies a 1 cm increase in gap (X3) if XII X, and Xf are held fixed. (There is no need to snnPIi1Yafter plugging in.) = r;",(X,,);<;3 '). SST"or = . 676>\·tI82.<;;)= .3""3;;' _ ~SR(&\\)- ssg(,..~.:R.)2/(~-e) (.4032\ - 3"6>39)/2FS'S&(.(l,.I\)/(r\-~-I) .01'30+/'=SSR (-:CD -;,:~) lA~ = b:!,:t i: (d:'£. sle('eRe{"f b,3 ) - (.04/'2. ± {,!:>43(."9:;;26 ) (r) Notice that from the first (SLR) regression run b3 = - 1.0458 while from the last (MLR) regression run b3 f= /.?h =- 1.0412. This difference is not simply numerical error in Minitab. printout .5\s (m) Find the value of a t statistic, its degrees of freedom and the corresponding ;;.-;..- whether after taking Xl,X2, and Xs into account, adds important explanatory regression model. %" p-value for testing power to the multiple have to say about this?) ~ t~((k~'" i". $6'Ml. ~(-= G...iNul\. ~ ""\A.rti?-ol\'·~ri~ i~i6b(o; (),.'rc: l f> 's d..r-c ,£.y.~.,.,e Th ?~..e. ~'~ -t.v..l'\~ MI. ''v-w~ I'" ,;. w..."e;. ( • t= -{.ot Explain why you arc not surprised that there is a difference in these two figures. (What do the correlations given on page 5 of the df = __ b__ ·3· p-value =~ -4. I\.4'X i~ <f'W:. :cero I $0 ?Ybb~. "sn~s. "f<S'V\. "k-t