252regex2 9/21/99

advertisement
252regex2 9/21/99 (Open in Print Layout View)
MINITAB EXAMPLE
Multiple Regression
Explanation: The data set has already been prepared and stored as famdat.mtw.
Column 1 (C1) is labeled ‘Y’, C2 is labeled ‘X’, C3 is labeled ‘RESID’ and C4 is
labeled ‘PRED’. Two additional independent variables were added, ‘W’ in C5 and
‘H’ in C6. C1, C2, C5 and C6 contain the data and C3 and C4 are blank. The data
set is retrieved and printed out. Then the command for simple regression is
repeated from the previous use of this data set. In the command, the words ‘on
1’ indicate that there is only one independent variable.(In later regressions we
use ‘on 2’ and ‘on 3’ to indicate more independent variables.) The equation of
the regression line ( Y  b0  b1 X , where b0  0.833 and b1  0.667 ) is printed out.
This is followed by a short table that repeats, in the ‘Coef’ column, the
coefficients b0 and b1 . The quantities in the ‘Stdev’ column are the two
standard deviations sb
0
and sb . In the ‘t-ratio’ column, are the two ratios
1
H10 :  0  0
H :   0
b 0
b0  0
and t  1
, which are used to test 
and  20 1
. Finally,
sb0
sb1
H 21 : 1  0
H11 :  0  0
in the ‘p’ column are the p-values for the two null hypotheses. If we assume
that   .05 , since the first p-value is 0.117, which is above the significance
level, we do not reject the null hypothesis for b0 , and thus say that the
intercept is not significant at the 5% significance level. Similarly, since the
second p-value is below the significance level, we reject the null hypothesis
for b1 and say that the slope is significant.
t
’s’ is the standard error se , which is used in computing t-tests and
confidence intervals. ‘R-sq’ is, of course, R 2 .
A graph of ‘PRED’(which is the same as Ŷ ) against Y is now shown. If we
had a perfect prediction, this graph would be a 45 degree line. This graph is
repeated after each regression to give us a visual idea of our progress,
although better measures are R 2 and R 2 adjusted for degrees of freedom. (The
symbol for R 2 adjusted for degrees of freedom is Rk2 
n  1R 2  k ,
where k is the
n  k 1
number of independent variables.) In each regression, the t-ratio provides a
test of the null hypothesis that the corresponding coefficient is not
significantly different from zero. Note that in the second regression ( Y
against X and W )all the p-values are less than 5%, indicating that the
coefficients are all significant at the 5% significance level. However, in the
final regression ( Y against X , W and H ), the p-values for W and H are above
5%, indicating that the coefficients are not significant at the 5% level. All of
the analyses of variance show p-values below 5% indicating that the independent
variables are generally related to the dependent variable. The last regression
is not satisfactory because of the two insignificant coefficients and most
researchers would prefer the second regression.
252regex2 9/21/99 (Open in Print Layout View)
Minitab Output:
Worksheet size: 100000 cells
MTB > RETR 'C:\MINITAB\FAMDAT.MTW'.
Retrieving worksheet from file: C:\MINITAB\FAMDAT.MTW
Worksheet was saved on 4/ 1/1998
MTB > print 'Y''X''W''H'
Data Display
Row
Y
X
W
H
1
2
3
4
5
6
7
8
9
10
0
2
1
3
1
3
4
2
1
2
0
1
2
1
0
3
4
2
2
1
1
0
1
0
0
0
0
1
1
0
1
0
1
0
0
0
0
0
1
0
MTB > regress 'Y' on 1 'X''resid''pred'
Regression Analysis
The regression equation is
Y = 0.833 + 0.667 X
Predictor
Constant
X
Coef
0.8333
0.6667
s = 0.9014
Stdev
0.4751
0.2375
R-sq = 49.6%
t-ratio
1.75
2.81
p
0.117
0.023
R-sq(adj) = 43.3%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
1
8
9
SS
6.4000
6.5000
12.9000
MS
6.4000
0.8125
F
7.88
p
0.023
MTB > plot 'pred'*'Y'
PRED
3
2
1
0
1
2
Y
3
4
252regex2 9/21/99 (Open in Print Layout View)
MTB > regress 'Y' on 2 'X''W''resid''pred'
Regression Analysis
The regression equation is
Y = 1.45 + 0.628 X - 1.40 W
Predictor
Constant
X
W
Coef
1.4535
0.6279
-1.3953
s = 0.5139
Stdev
0.3086
0.1357
0.3325
R-sq = 85.7%
t-ratio
4.71
4.63
-4.20
p
0.000
0.000
0.004
R-sq(adj) = 81.6%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
2
7
9
SS
11.0512
1.8488
12.9000
SOURCE
X
W
DF
1
1
SEQ SS
6.4000
4.6512
MS
5.5256
0.2641
F
20.92
p
0.001
MTB > plot 'pred'*'Y'
4
PRED
3
2
1
0
0
1
2
Y
MTB > regress 'Y' on 3 'X''W''H''resid''pred'
Regression Analysis
3
4
252regex2 9/21/99 (Open in Print Layout View)
The regression equation is
Y = 1.51 + 0.595 X - 0.698 W - 0.937 H
Predictor
Constant
X
W
H
Coef
1.5079
0.5952
-0.6984
-0.9365
s = 0.4484
Stdev
0.2709
0.1198
0.4860
0.5239
R-sq = 90.6%
t-ratio
5.57
4.97
-1.44
-1.79
p
0.001
0.003
0.201
0.124
R-sq(adj) = 86.0%
Analysis of Variance
SOURCE
Regression
Error
Total
DF
3
6
9
SS
11.6937
1.2063
12.9000
SOURCE
X
W
H
DF
1
1
1
SEQ SS
6.4000
4.6512
0.6425
MS
3.8979
0.2011
Unusual Observations
Obs.
X
Y
4
1.00
3.000
8
2.00
2.000
Fit
2.103
2.000
F
19.39
Stdev.Fit
0.200
0.448
p
0.002
Residual
0.897
0.000
R denotes an obs. with a large st. resid.
X denotes an obs. whose X value gives it large influence.
MTB > plot 'pred'*'Y'
4
PRED
3
2
1
0
0
1
2
Y
MTB > Stop.
3
4
St.Resid
2.23R
* X
Download