Stat 231 Exam III December 5, 1997 Prof. Vardeman

advertisement
Stat 231 Exam III
December 5, 1997
Prof. Vardeman
Attached to this exam are 5 pages of Minitab regression analysis printout. Use them in answering the
following questions. You need NOT compute by hand anything that you can get from the printout.
(Indeed, you will be wise to avoid wasting time doing hand calculations for things that are obtainable
from the printout.)
The data used in creating the printout come from a study of a nitride etch process on a single wafer
plasma etcher. The process variables studied were
B" œ power applied to the cathode (W)
B# œ pressure in the reaction chamber (mTorr)
B$ œ gap between the anode and the cathode (cm)
B% œ flow of the reactant gas C# F'
and the response variable was
C œ selectivity of the process (SiN/polysilicon) .
The first 2 "# pages of the printout concern a simple linear regression analysis based on the model
C3 œ "!  "$ B$3  %3 .
Until further notice, give answers based on (or related to) this SLR model.
(a) What fraction of the raw variability in C is explained using B$ as a predictor variable?
(b) What is the sample correlation between C and B$ ? (Give a number.)
(c) There are two plots on page 2 of the printout. These are plots of standardized residuals versus sC and
standardized residuals versus B$ . What difficulty with the simple linear regression model do they reveal?
Explain.
For purposes of answering the questions (d)-(h) ignore the difficulty discovered in part (c).
(d) What is indicated by the plot on the top of page 3 on the printout?
-1-
(e) If one uses ! œ Þ!&, will one reject H! :.ClB$ œ "!  "$ B$ based on a formal lack of fit test? Explain.
Reject? yes
Explanation:
no
(circle one)
(f) Give a 90% two-sided confidence interval for the increase in mean value of selectivity (C) that
accompanies a 1 cm increase in the gap between the anode and the cathode. (No need to simplify after
plugging in.)
(g) Give a 95% two-sided prediction interval for the next selectivity (C) that accompanies a .8 cm gap
(B$ ).
(h) As it turns out, the data have B$ œ Þ*'$' and =#B$ œ Þ!$!&%&. Use these facts and find a 95%
confidence interval for the mean selectivity that accompanies a .9 cm gap. (No need to simplify.)
Beginning in the middle of page 3 of the printout, there is a multiple linear regression analysis of the data.
Note that there are both an "all possible subsets" regression and a regression of C on B" ß B# ß B$ and B%
(and some plots and some sample correlations). Use these to answer questions (i) through (r).
(i) Based on the "all possible subsets regression" output, what reduced model seems like one that should
be investigated as a possible "simple" explanation of C? Explain in terms of V # ß = and G: .
-2-
Model: __________________
Explanation:
(j) What degrees of freedom would be used in a formal test for lack of fit to the multiple linear regression
model including all 4 predictors B" ß B# ß B$ and B% ?
numerator .0 œ _____
denominator .0 œ _____
(k) Give the value of an F statistic, its degrees of freedom and the corresponding :-value for testing
whether the variables B" ß B# ß B$ and B% together provide any ability to predict or explain C.
0 œ __________
.0 œ _____ , _____ :-value œ __________
(l) Find the value of an F statistic and its degrees of freedom for testing whether after taking B" and B$
into account, the variables B# and B% provide significant additional ability to explain or predict C. (This is
not an easy question, but there is enough information on the printout to allow you to find this 0 .)
0 œ __________
.0 œ _____ , _____
(m) Find the value of a > statistic, its degrees of freedom and the corresponding :-value for testing
whether after taking B" ß B# ß and B$ into account, B% adds important explanatory power to the multiple
regression model.
> œ __________
.0 œ __________
:-value œ __________
(n) Give a 97.5% lower prediction bound for the next value of C when B" œ $#&ß B# œ &!!ß B$ œ Þ) and
B% œ #!!.
-3-
(o) Comment on the appearance of the plot of /‡ versus sC given at the bottom of page 4 of the printout.
(p) The first plot on page 5 of the printout is a plot of C versus sC . It appears to be fairly linear. What
sample correlation should be associated with this plot? (Give a number.)
(q) Based on the MLR model involving all 4 B's, give a 90% two-sided confidence interval for increase
in mean C that accompanies a 1 cm increase in gap (B$ ) if B" ß B# and B% are held fixed. (There is no need
to simplify after plugging in.)
(r) Notice that from the first (SLR) regression run ,$ œ  "Þ!%&) while from the last (MLR) regression
run ,$ œ  "Þ!%"#. This difference is not simply numerical error in Minitab. Explain why you are not
surprised that there is a difference in these two figures. (What do the correlations given on page 5 of the
printout have to say about this?)
-4-
MTB > print c1-c5
Data Display
Row
y
x1
x2
x3
x4
1
2
3
4
5
6
7
8
9
10
11
1.63
1.37
1.10
1.58
1.26
1.72
1.65
1.42
1.69
1.54
1.72
275
275
275
300
300
275
300
325
325
325
275
450
500
550
450
500
450
550
450
500
550
450
0.8
1.0
1.2
1.0
1.2
0.8
0.8
1.2
0.8
1.0
0.8
125
160
200
200
125
125
160
160
200
125
125
MTB> Name c6 = 'FITS1' c7 = 'SRES1' c8
MTB > Regress 'y' 1 'x3';
SUBC> Fits' FITS1';
SUBC> SResiduals 'SRES1';
SUBC> Constant;
SUBC> Pure;
SUBC> Predict 'x3';
SUBC>
PFits 'PFIT1'.
=
'PF1T1'
Regression Analysis
The regression equation is
y = 2.52 - 1.05 x3
Predictor
Constant
x3
S
=
Coef
2.5242
-1.0458
0.09670
R-Sq
StDev
0.1711
0.1750
=
79.9%
T
P
14.75
-5.98
0.000
0.000
R-Sq(adj)
=
77.6%
Analysis of Variance
Source
Regression
Error
Total
OF
1
9
10
SS
0.33410
0.08416
0.41825
Unusual Observations
Obs
x3
y
3
1.20
1.1000
MS
0.33410
0.00935
F
P
35.73
0.000
Fit StDev Fit
1.2692
0.0506
Residual
-0.1692
St Resid
-2.05R
R denotes an observation with a large standardized residual
Fit StDev Fit
1.6875
0.0409
1.4783
0.0298
1.2692
0.0506
1.4783
0.0298
1.2692
0.0506
1.6875
0.0409
1.6875
0.0409
1.2692
0.0506
1.6875
0.0409
1.4783
0.0298
1.6875
0.0409
Pure error test - F
95.0% CI
1.5950, 1.7800)
1.4108, 1.5459)
1.1547, 1.3837)
1.4108, 1.5459)
1.1547, 1.3837)
1.5950, 1.7800)
1.5950, 1.7800)
1.1547, 1.3837)
1.5950, 1.7800)
1.4108, 1.5459)
1.5950, 1.7800)
=
0.14 P
=
95.0% PI
1.4500, 1.9250)
1.2493, 1.7073)
1.0222, 1.5161)
1.2493, 1.7073)
1.0222, 1.5161)
1.4500, 1.9250)
1.4500, 1.9250)
1.0222, 1.5161)
1.4500, 1.9250)
1.2493, 1.7073)
1.4500, 1.9250)
0.7214 DF(pure error)
-1-
=
8
MTB > Plot 'SRES1' 'FITS1';
SUBC>
Symbol '*'.
Character Plot
SRES1
1.5+
*
*
*
0.0+
2
*
*
*
*
*
-1.5+
*
----+---------+---------+---------+---------+---------+--FITS1
1.2BO
1.360
1.440
1.520
1.600
1.6BO
MTB > Plot 'SRES1' 'x3';
SUBC>
Symbo I '*, .
Character Plot
SRES1
*
1.5+
*
2
0.0+
*
*
*
*
*
-1.5+
*
*
----+---------+---------+---------+---------+---------+--x3
O.BOO
O.BBO
0.960
1.040
1.120
1.200
MTB>
SUBC>
MTB >
DATA>
DATA>
MTB >
MTB >
MTB >
SUBC>
MTB >
SUBC>
SUBC>
SUBC>
Sort 'SRES1' cB;
By 'SRES1'.
Set c9
1( 1 : 11 / 1 )1
End.
sub .5 c9 c9
div c9 11.0 c9
InvCDF C9 C9;
Normal 0.0 1.0.
Plot CB C9;
Symbo l '* ,;
XLabel 'standardized residual quantile';
YLabel 'std normal quantile'.
-2-
Character Plot
s
t
d
*
1.5+
n
*
o
r
m
a
l
0.0+
* *
*
*
*
*
*
q
*
u
-1.5+
a
n
t
i
*
--------+---------+---------+---------+---------+--------1.40
-0.70
0.00
0.70
1.40
standardized residual quantile
MTB > BReg 'y' 'x1' 'x2' 'x3' 'x4' ;
SUBC> NVars 1 4;
SUBC> Best 4.
Best Subsets Regression
Response is y
Vars
R-Sq
R-Sq
(adj)
1
1
1
1
2
2
2
2
3
3
3
3
4
79.9
17.5
6.7
0.6
87.6
85.3
80.4
20.5
95.7
89.1
85.4
24.8
96.4
77.6 26.6 0.096700
8.3 130.7 0.19583
0.0 148.7 0.20825
0.0 158.8 0.21489
84.5
15.7 0.080524
81.6
19.6 0.087782
75.5 27.7 0.10123
0.7 127.6 0.20384
93.9
4.1 0.050394
84.4
15.2 0.080755
79.1 21.4 0.093509
0.0 122.5 0.21197
94.0
5.0 0.050069
Cop
S
x x x x
1 234
X
X
X
X
X
X
X
X
X
X
X X
X
X
X X
X
X X
X
X X
X
X
X
X
X
X
MTB > Name c10 = 'FITS2' c11 = 'SRES2'
MTB > Regress 'y' 4 'x1' 'x2' 'x3' 'x4';
SUBC>
Fits 'FITS2';
SUBC> SResiduals 'SRES2';
SUBC> Constant;
SUBC> Predict 'x1' 'x2' 'x3' 'x4'.
Regression Analysis
The regression equation is
= 2.29 + 0.00327 x1 - 0.00133 x2 - 1.04 x3 -0.000532 x4
y
-3-
StOev
0.2544
0.0007621
0.0003811
0.09526
0.0005094
Predictor
Coef
2.2896
Constant
0.0032704
x1
-0.0013315
x2
-1.04120
x3
-0.0005321
x4
S
=
0.05007
=
R-Sq
96.4%
T
9.00
4.29
-3.49
-10.93
-1.04
R-Sq(adj)
=
P
0.000
0.005
0.013
0.000
0.337
94.0%
Analysis of Variance
Source
Regression
Error
Total
OF
4
6
10
SS
0.40321
0.01504
0.41825
Source
x1
x2
x3
x4
OF
1
1
1
1
Seq SS
0.00267
0.08305
0.31476
0.00273
Fit StOev Fit
1.6903
0.0276
1.3968
0.0232
1.1007
0.0433
1.5239
0.0333
1.2890
0.0328
1.6903
0.0276
1.6203
0.0330
1.4187
0.0391
1.7473
0.0393
1.5124
0.0388
1.6903
0.0276
MS
0.10080
0.00251
95.0% CI
1.6228, 1.7578)
1.3401, 1.4536)
0.9949, 1.2066)
1.4424, 1.6054)
1.2088, 1.3692)
1.6228, 1.7578)
1.5396, 1.7010)
1.3230, 1.5144)
1.6511, 1.8435)
1.4175, 1.6074)
1.6228, 1.7578)
F
40.21
P
0.000
95.0% PI
1.5504, 1.8302)
1.2618, 1.5319)
0.9388, 1.2627)
1.3767, 1.6710)
1.1425, 1.4355)
1.5504, 1.8302)
1.4735, 1.7670)
1.2632, 1.5742)
1.5915, 1.9031)
1.3574, 1.6674)
1.5504, 1.8302)
MTB > Plot 'SRES2' 'FITS2';
SUBC> Symbol '*';
SUBC> XLabel 'yhat';
SUBC> YLabel 'standardized residual'.
Character Plot
s
t
a
*
1.2+
n
d
*
*
a
r
d
0.0+
*
i
z
e
*
*
d
2
*
-1.2+
r
*
e
s
i
*
+---------+---------+---------+---------+---------+-----1.08
1.20
1.32
1.44
1.68
1.56
yhat
MTB > Plot 'y' 'FITS2';
SUBC> Symbol '*';
-4-
SUBC>
SUBC>
Xlabel 'yhat';
Ylabel 'y'.
Character Plot
2
1.60+
*
y
*
*
*
*
*
1.40+
*
*
1.20+
*
+---------+---------+---------+---------+---------+-----1.56
1.44
1.08
1.20
1.32
1.68
yhat
MTB >
Correlation 'x1' 'x2' 'x3' 'x4'.
Correlations (Pearson)
x2
x3
x4
x1
0.214
0.214
0.210
x2
x3
0.214
0.210
0.210
-5-
Key
I
December
1
I
i
Prof. Vardeman
The data used in creating the printout come from a study of a nitride etch process
etcher. The process variables studied were
:rl = power applied to the cathode (W)
%2 = pressure in the reaction chamber (mTorr)
%5 = gap between the anode and the cathode (em)
x .• = flow of the reactant gas c,Fs
and the response variable was
i
i
The first
I
(e) If one uses
2l pages
of the printout
concern
a simple linear regression
Until further
notice, give
(a) What fraction
8'lSWerS
Q
= .05. will one reject Ro:!'vl" = f30 + J3:J:r,based on a formal lack or fit test? Explain.
Reject? . yes ~
(circle one)
Explanation:
--\IA.t. f-I/~I,t..(.
~~~
l 'F<\e. (
analysis based on the model
is
in
y
is explained
Xl
(Ast-
pluggingin.)
as a predictor
,72.
,"I.;,,~
is "~r\ (Mle
(f) Give a 90% two-sided confidence interval for the increase in mean value of selectivity (y) that
accompanies a 1 em increase in the gap between the anode and the cathode. (No need to simplify after
•
using
A 41- +"'S+
1(., Iu~
!A~..t1-
.6
3
based on (or related to) this SLR model.
of the raw variability
•
-th
1-4" . OS)
'11 is
Yi=f30+f3J:r"+'i
~
on a single wafer plasma
= selectivity of the process (SiN/polysilicon)
y
J
Stat 231 Exam III
-
Attached to this exam are S pages of Mini tab regression analysis printout. Use them in answering the
following questions.
You need NOT compute by hand anything that you can get from the printout.
(Indeed, you will be wise to avoid wasting time doing hand calculations for things that are obtainable from
the printout.)
I
j
:=..
5. 1997
-t
:t
(~<J:It>f §fJ..).rl bs ')
variable?
j
',j
(g) Give
(:r,).
I
(b) What is the sample correlation
h?,
M1-~ -J{..t
'j
I
1:>3
.~
.....-
So
between
,= - J
y
and
X3?
(Give a number.)
'1~'hV~ -
is
l'v.~ ~
'!
-;-.~
:ct
I'I~
tN-M\
j(fl'Uro:.
(h) As it turns out, the data have X3 = .9636 and s~.= .030545. Use these facts and find a 95%
confidence interval for~he mean s~lectivity that accompanies a.9 em gap. (No need to simplify.)
~(:c.;i-.x:~) = ~-\)S~
=- 10(.030545')'" .305+5
For purposes
of answering
~t
lit Y..M'Ab·'\'I~
11k
.-x- ~~
~
t>~{\f 1'h
'lID\ot~.
It.
~~4-
.79'3'>
(c) There are two plots on page 2 oCthe printout
These are plots of standardized
standardized
residuals versus X3. What difficulty with the simple linear regression
Explain.
vs-
a 95% two-sided prediction interval for the next selectivity (y) that accompanies a .8 em gap
SL-R.
1I\CN\
the questions
'S,
I" Y-"Sr"'''s:e:.
-nJ.
""'C;;V."'?l'l~s
d~"'s.
I\~<;;~t-
cr ~
~p~U'r~
1D .be:.
(d)-(h) ignore the difficulty discovered
1'= V:;2.
residuals versusy and
model do they reveal?
- I.OS(.~)
.t
(-;'.?2-I.O;(.~))
Beginning in the middle oCpage 3 of the printout, there is a multiple linear regression analysis of the data.
Note that there are both an "all possible subsets" regression and a regression ofy on X1.%2,%S and x" (and
some plots and some sample correlations). Use these to answer questions (i) through (r).
in part (c).
·1·
~
(i) Based on the "all possible subsets regression"
be investigated as a possible "simple" explanation
·2·
(n) Give a 97.5% lower prediction
=200.
output. what reduced model seems like one that should
of1/? Explain in terms of R',6 and 011'
bound for the next value or
y
when
= 325,
Xl
%2
= 500,X3 = .8 and
%~
t. 5>,,/5
(0) Comment
on the appearance
of the plot of e" versusy
~~
i6t (fv.p~~)
;.Jl. '\'k.•.f~ie.."t.e~
SlAf«.,'
(.f,..O
numerator
~
df
=
-±....
denominator df =
I~
2
(k) Give the value of an F statistic, its degrees of freedom and the corresponding
p-value for testing
whether the variables %11%'1 %s and
together provide any ability to predict or explain y.
%"
f=
"3.
40.2./
p-value=
of page. 4 of the printout.
~
~
1'''''~~S: VI~
filL.~ 1M£..('
~
.
(P) The first plot on page 5 of the printout is a plot ofy versus y. It appears
sample correlation should be associated with this plot? (Give a number.)
to be fairly linear.
What
..0~J:t1M~a~)=
+~
.
dl=~.~
given at the bottom
I".~~I;(u t'("rt«,ttss:
..
.
..
.000
(1) Find the value of an F statistic and its degrees of freedom for testing whether after taking %1 and Xl
into account, the variables X2 and
provide significant additional ability to explain or predict y. (This is
not an easy question, but there is enough information on the printout to allo, you to :find this f.)
x"
(q) Based on the MLR. model involving all 4 %'S, give a 90% two-sided confidence interval for increase in
mean y that accompanies a 1 cm increase in gap (X3) if XII X, and Xf are held fixed. (There is no need to
snnPIi1Yafter plugging in.)
= r;",(X,,);<;3 '). SST"or = . 676>\·tI82.<;;)= .3""3;;'
_ ~SR(&\\)- ssg(,..~.:R.)2/(~-e)
(.4032\ - 3"6>39)/2FS'S&(.(l,.I\)/(r\-~-I)
.01'30+/'=SSR (-:CD -;,:~)
lA~
=
b:!,:t i: (d:'£. sle('eRe{"f b,3 )
- (.04/'2.
±
{,!:>43(."9:;;26
)
(r) Notice that from the first (SLR) regression run b3 = - 1.0458 while from the last (MLR) regression
run b3
f=
/.?h
=-
1.0412.
This difference
is not simply numerical
error in Minitab.
printout
.5\s
(m) Find the value of a t statistic, its degrees of freedom and the corresponding
;;.-;..- whether after taking Xl,X2,
and Xs into account,
adds important explanatory
regression model.
%"
p-value for testing
power to the multiple
have to say about this?)
~
t~((k~'"
i".
$6'Ml.
~(-=
G...iNul\.
~
""\A.rti?-ol\'·~ri~
i~i6b(o;
(),.'rc:
l f> 's d..r-c ,£.y.~.,.,e Th ?~..e. ~'~
-t.v..l'\~ MI. ''v-w~
I'"
,;. w..."e;. ( •
t=
-{.ot
Explain why you arc not
surprised that there is a difference in these two figures. (What do the correlations given on page 5 of the
df
= __ b__
·3·
p-value
=~
-4.
I\.4'X
i~ <f'W:.
:cero I
$0
?Ybb~.
"sn~s.
"f<S'V\.
"k-t
Download