March 3

advertisement
Stat 462 March 3
Example: A study is done to compare three metal alloys used to make welds to join pipes together.
Y = a measure of the strength of the weld
X = diameter of weld
Alloy = type of alloy used to make the weld (Alloy 1, 2, or 3)
Graph of the Strength versus Diameter in which alloys are indicated by different symbols.
Note that Strength and Diameter are related and that there are differences among the alloys.
There appears to be interaction as the slopes differ for the three alloys. The slope is steeper for
alloy 2.
Alloy is categorical - the numerical codes 1, 2, 3 are arbitrary.
To put Alloy into a regression model, create two indicator variables
A1 = 1 if observation is alloy 1 and 0 otherwise.
A2 = 1 if observation is alloy 2 and 0 otherwise
General Rule: If a categorical variable has k categories, then k1 indictor variables will fully
describe the variable.
In our example, we could create A3 = 1 if observation is alloy 3 and 0 otherwise. But, notice that
A1+A2+A3 = 1 for any observation. Thus, A3=1A1A2 meaning that A3 is perfectly
predictable from values of A1 and A2 and it would be redundant as a predictor in a regression
equation.
In the alloy problem, a “no interaction” model is
E(Y) = 0 + 1X + 2A1 + 3A2
Given the plot above, this model almost surely is wrong.
An interaction model is
E(Y) = 0 + 1X + 2A1 + 3A2 + 4X*A1 + 5 X*A2.
Notice that the interaction terms involve multiplications of the Alloy indicators and X=diameter.
Page 2
Understanding the meaning of the  coefficients
When indicator variables are present in the model, the data analyst must give consideration to the correct
interpretation of the coefficients multiplying the predictors. To do this
Consider each category of a categorical variable separately.
For a specific category, determine the values of all indicator variables
Substitute these values into the equation (model for E(Y)) and reduce as far as possible.
When this is done for each category, compare the resulting equations to determine what the individual 
coefficients measure.
Alloy Example – No Interaction model
Model for average Y is E(Y) = 0 + 1X + 2A1 + 3A2
Alloy 1. For this alloy, A1=1 and A2 = 0. So,
E(Y) = 0 + 1X + 2(1) + 3(0)
= 0 +2 + 1X
Alloy 2. For this alloy, A1=0 and A2 = 1. So,
E(Y) = 0 + 1X + 2(0) + 3(1)
= 0 +3 + 1X
Alloy 3. For this alloy, A1=0 and A2 = 0. So,
E(Y) = 0 + 1X + 2(0) + 3(0)
= 0 + 1X



1 = the slope between Y and X, regardless of alloy. This is what “no interaction is about – the
slope between Y and X is the same for each alloy. The model actually consists of three parallel
lines.
2 = difference between intercepts for alloys 1 and 3. More generally, it would be the difference
between E(Y) for alloys 1 and 3 at any specified value of X.
3 = difference between intercepts for alloys 2 and 3. More generally, it would be the difference
between E(Y) for alloys 2 and 3 at any specified value of X.
MINITAB RESULTS INCLUDING GRAPH OF ESTIMATED MODEL
The regression equation is
Y = - 57.3 + 6.04 X + 12.0 A1 + 29.8 A2
Predictor
Coef
SE Coef
Constant
-57.27
16.43
X
6.0425
0.8956
A1
12.009
4.866
A2
29.798
4.597
T
-3.49
6.75
2.47
6.48
P
0.004
0.000
0.027
0.000
Page 3
But, the graph of the data showed an interaction. A plot of residuals versus fits for the no interaction
model shows a V-shape so the model may not be right.
Alloy Example – Interaction model
Model for average Y I is E(Y) = 0 + 1X + 2A1 + 3A2 + 4X*A1 + 5 X*A2.
Alloy 1. For this alloy, A1=1 and A2 = 0. So,
E(Y) = 0 + 1X + 2(1) + 3(0) + 4X*(1) + 5 X*(0)
= 0 +2 + (1+ 4) X
Alloy 2. For this alloy, A1=0 and A2 = 1. So,
E(Y) = 0 + 1X + 2(0) + 3(1) + 4X*(0) + 5 X*(1)
= 0 +3 + (1+5 )X
Alloy 3. For this alloy, A1=0 and A2 = 0. So,
E(Y) = 0 + 1X + 2(0) + 3(0) + 4X*(0) + 5 X*(0)
= 0 + 1X
 1 = the slope between Y and X, only for alloy 3.
 2 = difference between intercepts for alloys 1 and 3.
 3 = difference between intercepts for alloys 2 and 3.
 4 = difference between slopes for alloys 1 and 3.
 3 = difference between slopes for alloys 2 and 3.
MINITAB RESULTS INTERACTION MODEL
The regression equation is
Y = - 23.9 + 4.19 X + 12.3 A1 - 92.8 A2 - 0.269 X_A1 + 6.49 X_A2
Predictor
Constant
X
A1
A2
X_A1
X_A2
Coef
-23.91
4.1892
12.26
-92.82
-0.2692
6.494
SE Coef
14.10
0.7797
16.89
19.61
0.9723
1.051
T
-1.70
5.37
0.73
-4.73
-0.28
6.18
P
0.116
0.000
0.482
0.000
0.787
0.000
GRAPH OF PREDICTED VALUES
Dropping the X_A1 term from the model gives
The regression equation is
Y = - 20.8 + 4.02 X + 7.62 A1 - 95.9 A2 + 6.67 X_A2
Predictor
Constant
X
A1
A2
X_A2
S = 3.232
Coef
-20.789
4.0161
7.618
-95.94
6.6673
SE Coef
8.189
0.4490
2.105
15.48
0.8144
R-Sq = 98.5%
T
-2.54
8.94
3.62
-6.20
8.19
P
0.025
0.000
0.003
0.000
0.000
R-Sq(adj) = 98.1%
What is the interpretation of this model?
Download