Interaction between quantitative predictors

advertisement
Interaction between quantitative predictors
• In a first-order model like the ones we have discussed, the association
between E(y) and a predictor xj does not depend on the value of the
other predictors in the model.
• See Fig. 4.1: relation between E(y) and x1 is the same regardless of
the value of x2: all the prediction lines are parallel.
• If, however, the association between response and one of the predictors
depends on the value of other predictors, then a first-order model is no
longer appropriate.
• We say that there is an interaction among predictors.
Stat 328 - Fall 2004
1
Interaction (cont’d)
• Example: a company wishes to estimate the association between sales
of a beauty product (y) and two potential predictors of sales in each of
n markets:
– $ spent on daytime TV ads in ith market (x1) and
– average number of years of education of females in ith market.
• Intuitively, this is what we would expect:
– Advertisement expenses will tend to increase sales (up to a point).
– In cities where women are highly educated (on the average), less of
them will be watching TV during the day.
– The effect of $ in ads on sales may then also depend on education
of potential consumers.
Stat 328 - Fall 2004
2
Interaction (cont’d)
• A figure to represent the association between ads and sales for different
levels of education will be drawn in class.
• How do we include an interaction term in the model?
• With k = 2 predictors:
yi = β0 + β1x1i + β2x2i + β3x1ix2i + i,
where the assumptions about the model are the same as before.
• An interaction between two predictors is a second-order term in the
model.
Stat 328 - Fall 2004
3
Interaction (cont’d)
• In sales example, we would expect that β3 < 0: as education increases
(and more women are out working), the strength of the association
between daytime TV ads on sales decreases. In other words, daytime
ads are expected to be more effective in markets where more women
are at home watching TV during the day than in markets where most
women are not watching TV.
• In general, with k predictors, we can include pairwise interactions
between any two, as appropriate.
• Higher order interactions (e.g.
xj xlxt denoting the three-way
interaction between the jth, lth and tth predictors) can also be included
in the model, but are much harder to interpret from a subject matter
point of view.
Stat 328 - Fall 2004
4
Interaction (cont’d)
• When predictors interact, the interpretation of all the β’s changes.
• If the model is
yi = β0 + β1x1i + β2x2i + β3x1ix2i + i,
– β0 is still interpreted as before.
– (β1 + β3x2) is change in E(y) when x1 increases by one unit and x2
is held fixed.
– (β2 + β3x1) is change in E(y) when x2 increases by one unit and x1
is held fixed.
• Association between E(y) and x1 depends on level of x2, unless β3 = 0,
in which case interaction does not exist.
Stat 328 - Fall 2004
5
Interaction (cont’d)
• In sales example, suppose we find that: b0 = 5, b1 = 3, b2 = 0.5, b3 =
−0.2. Interpretation?
– Number of units sold can be expected to change by 3 − 0.2x2 when
ad expenses increase by $1 given education.
– Number of units sold can be expected to change by 0.5 − 0.2x1
when education of potential customers increases by one year, given
ad expenditures.
• In a market with 12 years of average education, we expect that sales
will increase by 3-0.2(12) = 0.6 units if ad expenditures increase by $1.
• In a market with average education equal to 8 years, an additional $1
spent on daytime ads would be associated to an increase of about 1.4
units in expected sales.
Stat 328 - Fall 2004
6
Interaction (cont’d)
• How do we draw inferences in models with interaction terms?
• Steps would be the same as in any multiple regression model:
1. Do a global F test of the utility of the model. The null hypothesis
in this case is
H0 : β1 = β2 = ... = βk = 0,
tested against the alternative that says that at least one of the β’s
is different from 0.
2. If F test leads to rejection of H0, then do a t test on each of the
β’s associated to interaction terms.
3. If interaction between xj and xk is significant, do not test hypothesis
for βj and βk ; if the interaction is important, the individual x’s must
be important too (some statisticians would argue different here).
Stat 328 - Fall 2004
7
Second order model with quadratic predictors
• Sometimes, the association between E(y) and xj is not linear but
quadratic.
• A second order model with one predictor is:
yi = β0 + β1x1i + β2x21i + i.
• If β2 > 0: association is concave upwards (bowl shape).
• If β2 < 0: concave downwards (mound shape).
• β2 is known as a rate of curvature parameter.
Stat 328 - Fall 2004
8
Quadratic predictors - Example
• Example 4.6, page 198.
• Data: y is immunoglobin in blood (indicator of immunity, in mgrs) and
x is maximum oxygen uptake (indicator of fitness, in ml/kg) measured
on 30 individuals.
• Range: x ∈ (32, 70).
• See scatter plot of data.
• Model:
yi = β0 + β1xi + β2x2i + i,
with usual assumptions.
Stat 328 - Fall 2004
9
Quadratic predictors - Example
• Results: b0 = −1, 464, b1 = 88.3 and b2 = −0.54, so that the
prediction equation is
ŷ = −1, 464 + 88.3x − 0.54x2.
• Ra2 = 0.93 so about 93% of the variability observed in immunoglobin
can be associated to fitness.
• Interpretation of coefficients:
– The intercept is meaningless. Cannot have negative immunoglobin.
– b1 no longer has a simple interpretation. It is NOT the expected
change in y when x increases by one.
– The quadratic term b2 is negative: response curves downwards as x
increases.
Stat 328 - Fall 2004
10
Quadratic predictors - Example
• Be cautious with extrapolations! See Fig. 4.16. Concavity of response
implies that for large enough x the E(y) will begin to decrease.
• This makes no sense from a physiology point of view.
• Nonsensical predictions may occur if the model is used outside of the
range of the data!
Stat 328 - Fall 2004
11
Quadratic predictors - Example
• First test of hypotheses is F -test for entire model. We test:
H0 : β1 = β2 = 0, against Ha : at least one of the two 6= 0.
• In this example, F = 203.16 which we know will be larger than the
critical value even without looking at the table.
• We reject H0: maximal oxygen uptake contributes information about
immunoglobin levels in the blood.
• Next step is to decide whether curvature is important or not.
Stat 328 - Fall 2004
12
Quadratic predictors - Example
• We now test for significance of the quadratic effect: H0 : β2 = 0
against Ha : β2 6= 0 (or we can do a one-tailed test too).
• t−statistic is t = b2/σ̂b2 = −0.536/0.158 = −3.39 which we compare
to a table value with α/2 = 0.025 and n − 3 degrees of freedom. We
reject H0.
• Interpretation: There is strong evidence that immunoglobin levels
increase more slowly per unit increase in maximal oxygen uptake in
individuals with high aerobic fitness than in those with low aerobic
fitness.
• If we had failed to reject H0 : β2 = 0, we would conclude that the
association between y and x is linear.
Stat 328 - Fall 2004
13
Estimation and prediction
• Same concepts as before. With the model we might wish to:
1. Estimate the expected mean value of the response at a certain value
of the predictor(s).
2. Predict a single response for some value of the predictor.
• In both cases, the point estimator (predictor) is ŷ = b0 + b1xp + b2x2p
for x = xp.
• The standard error of ŷ depends on whether we predict a mean or a
single value.
• As before, σ̂(y−ŷ) > σ̂ŷ .
• Calculations are complex, so we use the computer to get these standard
errors and CIs.
Stat 328 - Fall 2004
14
Estimation and prediction
• In example, suppose we wish to obtain
1. The expected mean immunoglobin levels for people with oxygen
uptake of xp = 40 ml/kg.
2. The expected immunoglobin level for a person with xp = 40 ml/kg.
• In both cases, point estimator is
ŷ = −1, 464.4 + 88.3(40) − 0.536(40)2 = 1, 209.9.
• JMP and SAS will give the (1 − α)% CI for the mean or for a single
prediction.
Stat 328 - Fall 2004
15
Estimation and prediction
• From CI we can derive σ̂ŷ or σ̂(y−ŷ) recalling that
(1 − α)% Lower bound of CI = ŷ − tα/2,n−k−1 std error.
Then
Std error =
ŷ − Lower bound
.
tα/2,n−k−1
• We can also derive the std errors using the upper bound of the CI as
follows:
Upper bound − ŷ
Std error =
.
tα/2,n−k−1
Stat 328 - Fall 2004
16
Estimation and prediction
• In example, the 95% CI for the mean immunoglobin at x = 40 ml/kg
is (1, 156.2, 1, 263.6). Then:
σ̂ŷ =
1, 209.9 − 1, 156.2
= 26.17.
2.052
• Also, since the 95% CI for a single response is (985, 1, 434.8):
σ̂(y−ŷ)
Stat 328 - Fall 2004
1, 209.9 − 985.0
=
= 112.45.
2.052
17
More complex models: interaction + curvature
• Consider the following complete second-order model with two
predictors:
y = β0 + β1x1 + β2x2 + β3x1x2 + β4x21 + β5x22 + .
See Fig. 4.19.
• A complete second order model with three predictors includes 3 firstorder terms, 3 squared terms, 3 two-way interactions, and 1 three-way
interaction.
• The number of terms in complete models gets out of hand fast.
Samples often not large enough to fit all possible terms.
• Use subject-matter knowledge to decide which terms to include.
Stat 328 - Fall 2004
18
More complex models: Example
• Example 4.7, page 213: Study to determine whether weight of package
(x1) and distance delivered (x2) are associated to shipping costs (y) in
a small regional express delivery service.
• See scatter plots.
• Complete second-order model fitted with JMP. Data ”‘Express”’ on
class web site.
• Results: See output.
Stat 328 - Fall 2004
19
More complex models: Example
• Interpretation of results:
– Since RMSE = 0.44, about 95% of shipping costs will fall within
$0.89 of their predicted values.
– Ra2 = 0.99: almost all of the variability in shipping costs can be
explained by the model.
– F −statistic = 458.39 on 5 and 14 df. Highly significant, model is
useful.
– Weight is associated to cost both linearly and quadratically. Distance
only linearly.
– Interaction between weight and cost is positive: effect of weight on
cost is not independent of distance.
Stat 328 - Fall 2004
20
Download