Uploaded by kdw4478

Chapter 8

advertisement
Chapter 8. Multiple Regression
A. Multiple Regression Model
Many applications of regression analysis involve situations in which there are more than one
independent variable. A regression model that contains more than one independent variable is called a
multiple regression model.

Multiple Linear Regression Model
Y   0  1 X 1     k X k  
Y is the dependent variable
X 1 , X 2 ,..., X k is the independent variables (known constant)
 0 , 1,  ,  k is the unknown regression coefficients (parameters)
 is the random error

Assumptions on the random error 
iid
iid
Y ~ Normal[ 0  1 X1    k X k , 2 ] (  2 unknown)
 ~ Normal[0, 2 ] 
1) the mean of the error variable  is 0 ( E [ ]  0)
2) the variance of  is  (V [ ]   2 )
3) the errors are independent
4) the errors are normally distributed
B. Estimation
Suppose that we have n sets of observations ( xi1 , xi 2 ,..., xik , yi ) , i  1,..., n to be used to estimate
 0 , 1,  ,  k in a multiple linear regression model.
yi   0  1 xi1   2 xi 2  ...   k xik   i (i  1, 2,..., n )

Estimating the Coefficients by the Method of Least Square
Choose the values of ˆ0 , ˆ1,..., ˆk that minimize SS as estimators of  0 , 1 ,...,  k
SS 

n
i 1
( yi  0  1xi1   2 xi 2  ...   k xik )2
1
▪ Least Squares Estimates of ˆ0 , ˆ1,..., ˆk
 SS
 0 , j  0,1,2,..., k
Find ˆ0 , ˆ1,..., ˆk that satisfy
j
▶ ˆ0 , ˆ1,..., ˆk : let the computer produce these values

The fitted or estimated regression model is
yˆ i  ˆ0  ˆ1 xi1  ...  ˆk xik (i  1,2,..., n )
▪ Residual ei
ei  yi  yˆi  yi  ˆ0  ˆ1xi1  ...  ˆk xik

Estimating  2
The residuals ei  yi  yˆ i are used to obtain an estimate of  2
▪ SSE (error sum of square)

n
i 1
ei2 

n
i 1
( yi  yˆi )2 
▪ MSE (mean square error,

n
i 1
( yi  ˆ0  ˆ1xi1  ...  ˆk xik )2
SSE
) : ̂ 2 (estimator of  2 )
n  k 1
 
SSE  2 

 E MSE  
   
 n  k 1 
 

▶ SSE , MSE (ˆ 2 ) : let the computer produce these values
C. Hypothesis Tests in Multiple Regression
In simple linear regression, the t-test and the F test provide the same conclusion; that is, if the null
hypothesis is rejected, we conclude that
  0 . In multiple regression, the t-test and the F-test have
different purposes.
1) The F-test is used to determine whether a significant relationship exists between the dependent
variable and the set of all the independent variables. (a test for overall significance)
2) The t-test is used to determine whether each of the individual independent variables is significant. A
separate t-test is conducted for each of the independent variables in the model. (a test for individual
significance)

Use of t-Tests
For each independent variable X 1 , X 2 ,..., X k , we can test to determine whether there is
enough evidence of a linear relationship between it and the dependent variable Y for
2
the entire population.

Hypothesis(Testing the significance of coefficients)
H0 :  j  0
Failure to reject
H1 :  j  0 ( j  1,..., k )
vs
H 0 :  j  0 is equivalent to concluding that there is no linear
relationship between and X j and Y . Alternatively, rejecting H 0 :  j  0 implies that
there is enough evidence of linear relationship between X j and Y .

Sampling Distribution of the Test Statistic (under H 0 :  j  0 )
T
ˆ j  0

S .E .[ ˆ j ]
 t [ n  k  1]
▶ S . Eˆ .[ ˆ j ] : let the computer produce these values

Rejection Rule
Critical Value Approach
Reject H 0 if T 
ˆ j

S . E .[ ˆ j ]
 t /2,n  k 1
p-value Approach
Reject H 0
if p  value  
※ 100(1   )% Confidence Interval on the slope  j


 ˆ  t
 j  /2,n k 1  S.E.[ ˆ j ], ˆ j  t /2,n k 1  S .E.[ ˆ j ] 



The F-test

Hypothesis(Analysis of Variance Approach to Test Significance of Regression)
H 0 : 1   2  ...   k  0
H 1 : At least one  j  0 ( j  1,2,..., k )
If H 0 : 1   2  ...   k  0 is rejected, the test gives us sufficient statistical evidence
to conclude that one or more of the parameters is not equal to zero and that the overall
relationship between
Y
and the set of independent variables
X 1 , X 2 ,..., X k
is
significant. However, if H 0 cannot be rejected, we do not have sufficient evidence to
conclude that a significant relationship is present

Sampling Distribution of the Sum of Squares
The procedure partitions the total variability in the response variable into meaningful components
as the basis for the test.
3
- Decomposition of variation

n
i 1
( yi  y )2 

n
i 1
( yˆ i  y )2 

n
i 1
( yi  yˆ i )2
⇒ SST (total sum of squares) is partitioned into SSR (regression sum of squares)
and SSE (error sum of squares)
▶ SST , SSR : let the computer produce these values
- Decomposition of Degrees of Freedom
( n  1)  ( k )  ( n  k  1)
⇒ SST ’s degrees of freedom is partitioned into SSR ’s degrees of freedom and SSE ’s
degrees of freedom
-
SSR
2
,
SSE
2
are independently distributed chi-square random variables
with k , ( n  k  1) degrees of freedom (by Cochran’s Theorem).
- we can show that
 SSE 
2
E
  ,


1
n
k


 SSR 
2
E
   , under H 0 : 1   2  ...   k  0
k


 SSR 
2
If H 0 : 1   2  ...   k  0 is not true, E 
 
k



Test Statistic
F
MSR
SSR / k
~ F [k  1, n  k  1]

MSE SSE / (n  k  1)
⇒ Under H 0 : 1   2  ...   k  0 , MSR  MSE .
⇒ If MSR  MSE significantly, then reject H 0 (Always Upper Tail)

Rejection Rule
Critical Value Approach
Reject H 0 if F 

p-value Approach
MSR
 F ,k ,n  k 1
MSE
Reject H 0
if p  value  
Table Result
Source of
Sum of
Degrees of
Variation
Squares
Freedom
Regression
SSR
k
4
Mean Square
F
 SSR 
MSR  
k 

MSR
MSE
Error
SSE
n  k 1
Total
SST
n 1
SSE 

MSE  

 n  k 1 
D. Estimating Values of the Dependent Variable
As was the case with simple linear regression, we can use the multiple regression equation in two ways:
we can produce the prediction interval for a particular value of y , and we can produce the confidence
interval estimate of the expected value of y .
▶ 100(1   ) % Confidence Interval about the mean response E [ y0 ] , 100(1   ) Prediction Interval
of new observation y0 : let the computer produce these values
E. Adequacy of the Regression Model

Using R 2 in Multiple Regression Model
Adding independent variables causes the prediction error to become smaller, thus reducing SSE .
When SSE becomes smaller, SSR become larger, causing R 2 
SSR
to increase
SST
⇒ The R 2 value for a regression can be made arbitrarily high simply by including more and more
predictors in the model.

Adjusted R 2
The Adjusted R 2 statistic essentially penalizes the analyst for adding terms to the model. It is an
easy way to guard against overfitting , that is, including independent variables that are not really
useful.
The Adjusted R 2 is given by
Adj R 2  1 
MSE
SSE / (n  k  1)
 1
MST
SST / (n  1)
⇒ Because SSE / ( n  k  1) is the mean square error and SST / ( n  1) is a constant, Adj R 2
will only increase when a variable is added to the model if the new variable reduces the mean square
error.
F. Regression Diagnostics (Residual Analysis)

the Required Conditions for the Validity of Regression Analysis
5
Estimation of the model parameters requires that the errors are uncorrelated normal random variables
with mean zero and constant variance.

Regression Diagnostics (examining the adequacy of the regression model)
Most departures from required conditions can be diagnosed by examining the residuals
ei  yi  yˆ i (i  1, 2,..., n ) .
1) Non-normality
⇒ Histogram of the residuals ei
2) Heteroscedasticity
⇒ Residual Plots (Plot the residuals ei against yˆ i , i  1, 2,..., n )
3) Non-independence(Autocorrelation) of the error variable
⇒ Residual Plots (Plot the residuals ei against yˆ i  or i  , i  1,2,..., n )

Patterns for Residual Plots
G. Regression Diagnostics (Multicollinearity)
In most regression problems, we find that there are dependencies among the independent variables
6
X 1 ,..., X k . In situations where these dependencies are strong, we say that multicollinearity exists.

The Effect of Multicollinearity
The variance of the estimate of coefficient ˆ j ( j  1,2,..., k ) can be expressed as
2
1
, where
Var[ ˆ j ] 

S x j x j (1  R 2j )
Sx j x j 

n
i 1
( xij  x j ) 2 and
R 2j
is the coefficient of
determination resulting from regressing x j on the other k  1 regressor variables.

the stronger the linear dependency of x j on the remaining regressor variables, and
hence the stronger the multicollinearity, the larger the value of R 2j will be. ⇒ the
variance of ˆ j inflated by the quantity

1
(1  R 2j )
As a result, multicollinearity can have some negative effects on the estimates of the
regression coefficients.
-
the individual coefficients are not statistically significant, even though the overall regression
equation is strong and the predictive ability good
-
the relative magnitudes and even the signs of the coefficients may defy interpretation
-
the values of the individual regression coefficients may change radically with the removal or
addition of a predictor variable in the equation

Detection of the presence of multicollinearity
1) The variance inflation factor VIF (  j ) are very useful measures of multicollinearity. Some
authors have suggested that if any variance inflation factor VIF (  j ) , j  1,2,..., k exceeds 10,
multicollinearity is a problem.
▪ Variance Inflation Factor for  j
VIF (  j ) 
1
, j  1,2,..., k
(1  R 2j )
2) If the F-test for significance of regression is significant, but tests on the individual regression
coefficients are not significant, multicollinearity may be present.

Multiple Regression Examples

Example 1
In order to determine whether or not the sales volume of a company ( Y in millions of dollars) is
related to advertising expenditures ( X1 in millions of dollars) and the number of salespeople ( X 2 ),
7
data were gathered for 10 years. Part of the regression results is shown below.
Predictor
Coefficient
Standard Error
Constant
7.0174
1.8972
X1
8.6233
2.3968
X2
0.0858
0.1845
Analysis of Variance
Source
Degrees of Freedom
Sum of Squares
Mean Square
F
Regression
?
321.11
?
?
Error
?
63.39
?
a) Use the above results and write the regression equation that can be used to predict sales.
b) Estimate the sales volume for an advertising expenditure of 3.5 million dollars and 45 salespeople.
Give your answer in dollars.
c) At   0.05 , test to determine if the fitted equation developed in Part a) represents a significant
relationship between the independent variables and the dependent variable.
d) At   0.05 , test to see if 1 is significantly different from zero.
e) Determine the multiple coefficient of determination.
f) Compute the adjusted coefficient of determination.

Example 2
The following is part of the results of a regression analysis involving sales ( Y in millions of
dollars), advertising expenditures ( X1 in thousands of dollars), and number of sales people ( X 2 )
for a corporation:
Analysis of Variance
Source
Degrees of Freedom
Sum of Squares
Mean Square
F
Regression
2
822.088
?
?
Error
7
736.012
?
a) At   0.05 level of significance, test to determine if the model is significant.
That is,
determine if there exists a significant relationship between the independent variables and the
dependent variable.
8
b) Determine the multiple coefficient of determination.
c) Determine the adjusted multiple coefficient of determination.
d) What has been the sample size for this regression analysis?

Example 3
Below you are given a partial computer output based on a sample of 12 observations relating the
number of personal computers sold by a computer shop per month ( Y ), unit price ( X1 in $1,000)
and the number of advertising spots ( X 2 ) used on a local television station.
Predictor
Coefficient
Standard Error
Constant
17.145
7.865
X1
-0.104
3.282
X2
1.376
0.250
a) Use the output shown above and write an equation that can be used to predict the monthly sales of
computers.
b) Interpret the coefficients of the estimated regression equation found in Part a).
c) If the company charges $2,000 for each computer and uses 10 advertising spots, how many
computers would you expect them to sell?
d) At   0.05 , test to determine if the price is a significant variable.
e) At   0.05 , test to determine if the number of advertising spots is a significant variable.

Example 4
Below you are given a partial computer output relating the price of a company's stock ( Y in dollars),
the Dow Jones industrial average ( X1 ), and the stock price of the company's major competitor ( X 2
in dollars).
Analysis of Variance
Source
Degrees of Freedom
Sum of Squares
9
Mean Square
F
Regression
?
?
?
Error
20
40
?
Total
?
800
?
a) What has been the sample size for this regression analysis?
b) At   0.05 level of significance, test to determine if the model is significant.
That is,
determine if there exists a significant relationship between the independent variables and the
dependent variable.
c) Determine the multiple coefficient of determination.

Example 5
A microcomputer manufacturer has developed a regression model relating his sales ( Y in $10,000s)
with three independent variables. The three independent variables are price per unit (Price in
$100s), advertising (ADV in $1,000s) and the number of product lines (Lines).
Part of the
regression results is shown below.
Predictor
Coefficient
Constant
Standard Error
1.0211
22.8752
Price
-0.1524
0.1411
ADV
0.8849
0.2886
Lines
-0.1463
1.5340
Analysis of Variance
Source
Degrees of Freedom
Sum of Squares
Mean Square
F
Regression
?
2708.61
?
?
Error
14
2840.51
?
a) Use the above results and write the regression equation that can be used to predict sales.
b) If the manufacturer has 10 product lines, advertising of $40,000, and the price per unit is $3,000,
what is your estimate of their sales? Give your answer in dollars.
c) Compute the coefficient of determination and fully interpret its meaning.
10
d) At   0.05 , test to see if there is a significant relationship between sales and unit price.
e) At   0.05 , test to see if there is a significant relationship between sales and the number of
product lines.
f) Is the regression model significant?
(Perform an F test.)
g) Fully interpret the meaning of the regression (coefficient of price) per unit that is, the slope for the
price per unit.
h) What has been the sample size for this analysis?
11
Download