Stat 328 Regression Summary

advertisement
Stat 328 Regression Summary
Simple Linear Regression Model
The basic (normal) "simple linear regression" model says that a response/output variable
C depends on an explanatory/input/system variable B in a "noisy but linear" way. That is,
one supposes that there is a linear relationship between B and mean C,
.ClB œ "!  "" B
and that (for fixed B) there is around that mean a distribution of C that is normal. Further,
the model assumption is that the standard deviation of the response distribution is
constant in B. In symbols it is standard to write
C œ "!  "" B  %
where % is normal with mean ! and standard deviation 5 . This describes one C. Where
several observations C3 with corresponding values B3 are under consideration, the
assumption is that the C3 (the %3 ) are independent. (The %3 are conceptually equivalent to
unrelated random draws from the same fixed normal continuous distribution.) The model
statement in its full glory is then
C3 œ "!  "" B3  %3 for 3 œ "ß #ß ÞÞÞß 8
%3 for 3 œ "ß #ß ÞÞÞß 8 are independent normal Ð!ß 5 # Ñ random variables
The model statement above is a perfectly theoretical matter. One can begin with it, and
for specific choices of "! ß "" and 5 find probabilities for C at given values of B. In
applications, the real mode of operation is instead to take 8 data pairs
ÐB" ß C" Ñß ÐB# ß C# Ñß ÞÞÞß ÐB8 ß C8 Ñ and use them to make inferences about the parameters
"! ß "" and 5 and to make predictions based on the estimates (based on the empirically
fitted model).
Descriptive Analysis of Approximately Linear ÐBß CÑ Data
After plotting ÐBß CÑ data to determine that the "linear in B mean of C " model makes some
sense, it is reasonable to try to quantify "how linear" the data look and to find a line of
"best fit" to the scatterplot of the data. The sample correlation between C and B
! ÐB3  BÑÐC3  CÑ
8
<œ
3œ"
Ë
! ÐB3  BÑ# † ! ÐC3  CÑ#
8
8
3œ"
3œ"
is a measure of strength of linear relationship between B and C.
1
Calculus can be invoked to find a slope "" and intercept "! minimizing the sum of
8
squared vertical distances from data points to a fitted line, ! ÐC3  Ð"!  "" B3 ÑÑ# . These
3œ"
"least squares" values are
! ÐB3  BÑÐC3  CÑ
8
," œ
3œ"
! ÐB3  BÑ#
8
3œ"
and
,! œ C  ," B
It is further common to refer to the value of C on the "least squares line" corresponding to
B3 as a fitted or predicted value
sC3 œ ,!  ," B3
One might take the difference between what is observed (C3 ) and what is "predicted" or
"explained" (sC3 ) as a kind of leftover part or "residual" corresponding to a data value
/3 œ C3  sC 3
The sum ! ÐC3  CÑ# is most of the sample variance of the 8 values C3 . It is a measure of
8
3œ"
raw variation in the response variable. People often call it the "total sum of squares"
and write
WW>9> œ "ÐC3  CÑ#
8
3œ"
The sum of squared residuals ! /3# is a measure of variation in response remaining
8
3œ"
unaccounted for after fitting a line to the data. People often call it the "error sum of
squares" and write
WWI œ "/3# œ "ÐC3  sC 3 Ñ#
8
8
3œ"
3œ"
One is guaranteed that WWX 9> WWI . So the difference WWX 9>  WWI is a nonnegative measure of variation accounted for in fitting a line to ÐBß CÑ data. People often
call it the "regression sum of squares" and write
WWV œ WW>9>  WWI
2
The coefficient of determination expresses WWV as a fraction of WWX 9> and is
V# œ
WWV
WWX 9>
which is interpreted as "the fraction of raw variation in C accounted for in the model
fitting process."
Parameter Estimates for SLR
The descriptive statistics for ÐBß CÑ data can be used to provide "single number estimates"
of the (typically unknown) parameters of the simple linear regression model. That is, the
slope of the least squares line can serve as an estimate of "" ,
s
" " œ ,"
and the intercept of the least squares line can serve as an estimate of " ! ,
s
" ! œ ,!
The variance of C for a given B can be estimated by a kind of average of squared
residuals
=# œ
8
"
WWI
"/3# œ
8  # 3œ"
8#
Of course, the square root of this "regression sample variance" is = œ È=# and serves as
a single number estimate of 5 .
Interval-Based Inference Methods for SLR
The normal simple linear regression model provides inference formulas for model
parameters. The normal simple linear regression model provides inference formulas for
model parameters. Confidence limits for 5 are
=Ë
8#
8#
and =Ë #
#
;upper
;lower
where ;#upper and ;#lower are upper and lower percentage points of the ;# distribution with
/ œ 8  # degrees of freedom. And confidence limits for "" (the slope of the line
relating mean C to B ... the rate of change of average C with respect to B) are
=
," „ >
8
! ÐB3  BÑ#
Ë
3œ"
3
where > is a quantile of the > distribution with / œ 8  # degrees of freedom. The text
calls the ratio =ÎË ! ÐB3  BÑ# the standard error of ," and uses the symbols WI," for it.
8
3œ"
In these terms, the confidence limits for "" are
," „ >WI,"
Confidence limits for .ClB œ "!  "" B (the mean value of C at a given value B) are
Í
Í"
ÐB  BÑ#
Ð,!  ," BÑ „ >= Í

8
Í8
! ÐB3  BÑ#
Ì
3œ"
One can abbreviate Ð,!  ," BÑ as sC , and the text calls =
Ë
"
8

ÐBBÑ#
! ÐB3 BÑ#
8
the standard error
3œ"
of the mean and uses the symbol WI.s for it. (JMP unfortunately call= this the Std Err
Pred y.) In these terms, the confidence limits for .ClB are
sC „ >WI.s
(Note that by choosing B œ !, this formula provides confidence limits for "! , though this
parameter is rarely of independent practical interest.)
Prediction limits for an additional observation C at a given value B are
Í
Í
"
ÐB  BÑ#
Ð,!  ," BÑ „ >= Í
"


8
Í
8
! ÐB3  BÑ#
Ì
3œ"
The text (unfortunately) calls =
Ë
"
"
8

ÐBBÑ#
! ÐB3 BÑ#
8
the standard error for "estimating" an
3œ"
individual C and uses the symbol WIsC for it. (JMP equally unfortunately calls this "Std
Err Individ y.") It is on occasion useful to notice that
WIsC œ É=#  WI.s#
Using the =sC notation, prediction limits for an additional C at B are
sC „ >WIsC
4
Hypothesis Tests and SLR
The normal simple linear regression model supports hypothesis testing. H0 :"" œ # can
be tested using the test statistic
,"  #
X œ
=
Ë
! ÐB3 BÑ#
œ
8
,"  #
WI,"
3œ"
and a >8# reference distribution. H0 :.ClB œ # can be tested using the test statistic
X œ
Ð,!  ," BÑ  #
=
Ë
"
8

ÐBBÑ#
! ÐB3 BÑ#
8
œ
sC  #
WI.s
3œ"
and a >8# reference distribution.
ANOVA and SLR
The breaking down of WWX 9> into WWV and WWI can be thought of as a kind of
"analysis of variance" in C. That enterprise is often summarized in a special kind of
table. The general form is as below.
Source
Regression
Error
Total
SS
WWV
WWI
WW>9>
ANOVA Table (for SLR)
df
MS
"
Q WV œ WWVÎ"
8  # Q WI œ WWIÎÐ8  #Ñ
8"
F
J œ Q WVÎQ WI
In this table the ratios of sums of squares to degrees of freedom are called "mean
squares." The mean square for error is, in fact, the estimate of 5 # (i.e. Q WI œ =# ).
As it turns out, the ratio in the "F" column can be used as a test statistic for the hypothesis
H0 :"" œ 0. The reference distribution appropriate is the J"ß8# distribution. As it turns
out, the value J œ Q WVÎQ WI is the square of the > statistic for testing this hypothesis,
and the J test produces exactly the same :-values as a two-sided > test.
Standardized Residuals and SLR
The theoretical variances of the residuals turn out to depend upon their corresponding B
values. As a means of putting these residuals all on the same footing, it is common to
"standardize" them by dividing by an estimated standard deviation for each. This
5
produces standardized residuals
/3
/3‡ œ
=
Ë
"
"
8

ÐB3 BÑ#
! ÐB3 BÑ#
8
3œ"
These (if the normal simple linear regression model is a good one) "ought" to look as if
they are approximately normal with mean ! and standard deviation ". Various kinds of
plotting with these standardized residuals (or with the raw residuals) are used as means of
"model checking" or "model diagnostics."
Multiple Linear Regression Model
The basic (normal) "multiple linear regression" model says that a response/output
variable C depends on explanatory/input/system variables B" ß B# ß á ß B5 in a "noisy but
linear" way. That is, one supposes that there is a linear relationship between
B" ß B# ß á ß B5 and mean C ,
.ClB" ßB# ßá,B5 œ "!  "" B"  "# B#  â  "5 B5
and that (for fixed B" ß B# ß á ß B5 ) there is around that mean a distribution of C that is
normal. Further, the standard assumption is that the standard deviation of the response
distribution is constant in B" ß B# ß á B5 . In symbols it is standard to write
C œ "!  "" B"  "# B#  â  "5 B5  %
where % is normal with mean ! and standard deviation 5 . This describes one C. Where
several observations C3 with corresponding values B"3 ß B#3 ß á ß B53 are under
consideration, the assumption is that the C3 (the %3 ) are independent. (The %3 are
conceptually equivalent to unrelated random draws from the same fixed normal
continuous distribution.) The model statement in its full glory is then
C3 œ "!  "" B"3  "# B#3  â  "5 B53  %3 for 3 œ "ß #ß ÞÞÞß 8
%3 for 3 œ "ß #ß ÞÞÞß 8 are independent normal Ð!ß 5 # Ñ random variables
The model statement above is a perfectly theoretical matter. One can begin with it, and
for specific choices of "! ß "" ß "# ß á ß "5 and 5 find probabilities for C at given values of
B" ß B# ß á ß B5 . In applications, the real mode of operation is instead to take 8 data
vectors ÐB"" ß B#" ß á ß B5" ß C" Ñß ÐB"# ß B## ß á ß B5# ß C# Ñß ÞÞÞß ÐB"8 ß B#8ß á ß B58ß C8Ñ and use
them to make inferences about the parameters "! ß "" ,"# ß á ß "5 and 5 and to make
predictions based on the estimates (based on the empirically fitted model).
6
Descriptive Analysis of Approximately Linear ÐB" ß B# ß á ß B5 ß CÑ Data
Calculus can be invoked to find coefficients "! ß "" ß "# ß á ß "5 minimizing the sum of
squared vertical distances from data points in Ð5  "Ñ dimensional space to a fitted
surface, ! ÐC3  Ð"!  "" B"3  "# B#3  â  "5 B53 ÑÑ# . These "least squares" values
8
3œ"
DO NOT have simple formulas (unless one is willing to use matrix notation). And in
particular, one can NOT simply somehow use the formulas from simple linear regression
in this more complicated context. We will call these minimizing coefficients
,! ß ," ß ,# ß á ß ,5 and need to rely upon JMP to produce them for us.
It is further common to refer to the value of C on the "least squares surface"
corresponding to B"3 ß B#3 ß á ß B53 as a fitted or predicted value
sC3 œ ,!  ," B"3  ,# B#3  â  ,5 B53
Exactly as in SLR, one takes the difference between what is observed (C3 ) and what is
"predicted" or "explained" (sC3 ) as a kind of leftover part or "residual" corresponding to a
data value
/3 œ C3  sC 3
The total sum of squares, WWX 9> œ ! ÐC3  CÑ# , is (still) most of the sample variance
8
3œ"
of the 8 values C3 and measures raw variation in the response variable. Just as in SLR,
8
the sum of squared residuals ! /3# is a measure of variation in response remaining
3œ"
unaccounted for after fitting the equation to the data. As in SLR, people call it the error
sum of squares and write
WWI œ "/3# œ "ÐC3  sC 3 Ñ#
8
8
3œ"
3œ"
(The formula looks exactly like the one for SLR. It is simply the case that now sC3 is
computed using all 5 inputs, not just a single B.) One is still guaranteed that
WWX 9> WWI . So the difference WWX 9>  WWI is a non-negative measure of
variation accounted for in fitting the linear equation to the data. As in SLR, people call it
the regression sum of squares and write
WWV œ WW>9>  WWI
The coefficient of (multiple) determination expresses WWV as a fraction of WWX 9> and
is
V# œ
WWV
WWX 9>
7
which is interpreted as "the fraction of raw variation in C accounted for in the model
fitting process." This quantity can also be interpreted in terms of a correlation, as it turns
out to be the square of the sample linear correlation between the observations C3 and the
fitted or predicted values sC3 .
Parameter Estimates for MLR
The descriptive statistics for ÐB" ß B# ß á ß B5 ß CÑ data can be used to provide "single
number estimates" of the (typically unknown) parameters of the multiple linear
regression model. That is, the least squares coefficients ,! ß ," ß ,# ß á ß ,5 serve as
estimates of the parameters "! ß "" ß "# ß á ß "5 . The first of these is a kind of highdimensional "intercept" and in the case where the predictors are not functionally related,
the others serve as rates of change of average C with respect to a single B, provided the
other B's are held fixed.
The variance of C for a fixed values B" ß B# ß á ß B5 can be estimated by a kind of average
of squared residuals
8
"
WWI
"
= œ
/3# œ
8  5  " 3œ"
85"
#
The square root of this "regression sample variance" is = œ È=# and serves as a single
number estimate of 5 .
Interval-Based Inference Methods for MLR
The normal multiple linear regression model provides inference formulas for model
parameters. Confidence limits for 5 are
=Ë
85"
85"
and = Ë
#
;upper
;#lower
where ;#upper and ;#lower are upper and lower percentage points of the ;# distribution with
/ œ 8  5  " degrees of freedom. Confidence limits for "4 (the rate of change of
average C with respect to B4 ) are
,4 „ >WI,4
where > is a quantile of the > distribution with / œ 8  5  " degrees of freedom and
WI,4 is a standard error of ,4 . There is no simple formula for WI,4 , and in particular, one
can NOT simply somehow use the formula from simple linear regression in this more
complicated context. It IS the case that this standard error is a multiple of = , but we will
have to rely upon JMP to provide it for us.
8
Confidence limits for .ClB" ßB# ßá,B5 œ "!  "" B"  "# B#  â  "5 B5 (the mean
value of C at a particular choice of the B" ß B# ß á ß B5 ) are
sC „ >WI.s
where WI.s depends upon which values of B" ß B# ß á ß B5 are under consideration. There
is no simple formula for =.s and in particular one can NOT simply somehow use the
formula from simple linear regression in this more complicated context. It IS the case
that this standard error is a multiple of =, but we will have to rely upon JMP to provide it
for us.
Prediction limits for an additional observation C at a given vector ÐB" ß B# ß á ß B5 Ñ
are
sC „ >WI sC
where it remains true (as in SLR) that WIsC œ É=#  WI.s# but otherwise there are no
simple formulas for it, and in particular one can NOT simply somehow use the formula
from simple linear regression in this more complicated context.
Hypothesis Tests and MLR
The normal multiple linear regression model supports hypothesis testing. H0 :"4 œ # can
be tested using the test statistic
X œ
,4  #
WI,4
and a >85" reference distribution. H0 :.ClB" ßB# ßá,B5 œ # can be tested using the test
statistic
X œ
sC  #
WI.s
and a >85" reference distribution.
ANOVA and MLR
As in SLR, the breaking down of WWX 9> into WWV and WWI can be thought of as a kind
of "analysis of variance" in C , and summarized in a special kind of table. The general
form for MLR is as below.
9
Source
Regression
Error
Total
ANOVA Table (for MLR Overall F Test)
SS
df
MS
F
WWV
5
Q WV œ WWVÎ5
J œ Q WVÎQ WI
WWI
8  5  " Q WI œ WWIÎÐ8  5  "Ñ
WW>9> 8  "
(Note that as in SLR, the mean square for error is, in fact, the estimate of 5 # (i.e.
Q WI œ =# ).)
As it turns out, the ratio in the "F" column can be used as a test statistic for the hypothesis
H0 :"" œ "# œ â œ "5 œ 0. The reference distribution appropriate is the J5ß85"
distribution.
"Partial F Tests" in MLR
It is possible to use ANOVA ideas to invent F tests for investigating whether some whole
group of " 's (short of the entire set) are all !. For example one might want to test the
hypothesis
H0 :"6" œ "6# œ â œ "5 œ 0
(This is the hypothesis that only the first 6 of the 5 input variables B3 have any impact on
the mean system response ... the hypothesis that the first 6 of the B's are adequate to
predict C ... the hypothesis that after accounting for the first 6 of the B's, the others do not
contribute "significantly" to one's ability to explain or predict C.)
If we call the model for C in terms of all 5 of the predictors the "full model" and the
model for C involving only B" through B6 the "reduced model" then an J test of the
above hypothesis can be made using the statistic
J œ
#
#
VFull
 VReduced
ÐWWVFull  WWVReduced ÑÎÐ5  6Ñ
85"
œŒ
Œ

#
Q WIFull
56
"  VFull
and an J56ß85" reference distribution. WWVFull WWVReduced so the numerator here is
non-negative.
Finding a :-value for this kind of test is a means of judging whether V # for the full model
is "significantly"/detectably larger than V # for the reduced model. (Caution here,
statistical significant is not the same as practical importance. With a big enough data set,
essentially any increase in V # will produce a small :-value.) It is reasonably common to
expand the basic MLR ANOVA table to organize calculations for this test statistic. This
is
10
(Expanded) ANOVA Table (for MLR)
Source
Regression
B" ß ÞÞÞß B6
SS
WWVFull
WWVRed
df
5
6
MS
Q WVFull œ WWVÎ5
F
J œ Q WVFull ÎQ WIFull
B6" ß ÞÞÞß B5 lB6 ß ÞÞÞß B5
Error
Total
WWVFull  WWVRed
WWIFull
WW>9>
56
85"
8"
ÐWWVFull  WWVRed ÑÎÐ5  6Ñ
Q WIFull œ WWIFull ÎÐ8  5  "Ñ
ÐWWVFull WWVRed ÑÎÐ56Ñ
Q WIFull
Standardized Residuals in MLR
As in SLR, people sometimes wish to standardize residuals before using them to do
model checking/diagnostics. While it is not possible to give a simple formula for the
"standard error of /3 " without using matrix notation, most MLR programs will compute
these values. The standardized residual for data point 3 is then (as in SLR)
/3
/3‡ œ
standard error of /3
If the normal multiple linear regression model is a good one these "ought" to look as if
they are approximately normal with mean ! and standard deviation ".
11
Intervals and Tests for Linear Combinations of " 's in MLR
It is sometimes important to do inference for a linear combination of MLR model
coefficients
P œ -! "!  -" ""  -# "#  â  -5 "5
(where -! ß -" ß á ß -5 are known constants). Note, for example, that .ClB" ßB# ßá,B5 is of this
form for -! œ "ß -" œ B" ß -# œ B# ß á ß and -5 œ B5 . Note too that a difference in mean
responses at two sets of predictors, say ÐB" ß B# ß á ß B5 Ñ and ÐBw" ß Bw# ß á ß Bw5 Ñ is of this
form for -! œ !ß -" œ B"  Bw" ß -# œ B#  Bw# ß á ß and -5 œ B5  Bw5 .
An obvious estimate of P is
s œ - ! ,!  - " , "  - # , #  â  - 5 , 5
P
Confidence limits for P are
s „ >Ðstandard error of PÑ
s
P
s"Þ This standard error is a multiple of
There is no simple formula for "standard error of P
s and its
=, but we will have to rely upon JMP to provide it for us. (Computation of P
standard error is under the "Custom Test" option in JMP.)
H0 :P œ # can be tested using the test statistic
X œ
s#
P
s
standard error of P
and a >85" reference distribution. Or, if one thinks about it for a while, it is possible to
find a reduced model that corresponds to the restriction that the null hypothesis places on
the MLR model and to use a "6 œ 5  "" partial J test (with " and 8  5  " degrees of
freedom) equivalent to the > test for this purpose.
12
Download