Chapter 15 Multiple Regression Model Building

advertisement

Statistics for Managers

Using Microsoft® Excel

5th Edition

Chapter 15

Multiple Regression Model Building

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-1

Learning Objectives

In this chapter, you learn:

 To use quadratic terms in a regression model

 To measure the correlation among independent variables

 To build a regression model, using either the stepwise or best-subsets approach

 To avoid the pitfalls involved in developing a multiple regression model

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-2

Nonlinear Relationships

The relationship between the dependent variable and an independent variable may not be linear

Review the scatter plot to check for non-linear relationships

Example: Quadratic model

Y

i

β

0

β

1

X

1i

β

2

2

X

1i

ε

i

 The second independent variable is the square of the first variable

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-3

Y

Quadratic Regression Model

Y

i

β

0

β

1

X

1i

β

2

X

2

1i

ε

i

Quadratic models may be considered when the scatter diagram takes on one of the following shapes:

Y Y Y

β

β

1

2

< 0

> 0

X

1 β

1

β

2

> 0

> 0

X

1 β

1

β

2

< 0

< 0

β

1

β

2

= the coefficient of the linear term

= the coefficient of the squared term

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

X

1 β

1

β

2

> 0

< 0

X

1

Chap 15-4

Residual Analysis

Multiple Regression with X

1 and X

2

A non-linear relationship

X2 Residual Plot

 b

0

 b

1

X

1

 b

2

X

2

X

1

Residual plot was randomly distributed

0 2 4 6 8 10 12

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-5

Quadratic Model Worksheet

Item, i

1

2

3

4

4

1

Y i

1

3

X

1i

1

8

3

5

X

2i

3

5

2

6

Multiply X

2 1 i by X

2 to get X

Run regression with Y i

, X

1

* X

2

, , X

.

2

2

, X

2i

1

X

2

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

X 2

2i

9

25

4

36

Chap 15-6

Quadratic Regression

Equation

Collect data and use Excel to calculate the regression equation:

ˆ i

 b

0

 b

1

X

1i

 b

2

X 2

1i

 Test for Overall Relationship

H

H

0

1

: β

: β

1

1

= β

2

= 0 (no overall relationship between X and Y) and/or β

2

≠ 0 (there is a relationship between X and Y)

 F-test statistic =

MSR

MSE

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-7

Testing for Significance

Quadratic Effect

Testing the Quadratic Effect

Consider the quadratic regression equation

^

Y i

 b

0

 b

1

X

1i

 b

2

2

X

1i

Hypotheses

H

0

: β

2

= 0

H

1

: β

2

0

(The quadratic term does not improve the model)

(The quadratic term improves the model)

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-8

Testing for Significance

Quadratic Effect

Testing the Quadratic Effect

Hypotheses

H

0

: β

2

= 0

H

1

: β

2

0

(The quadratic term does not improve the model)

(The quadratic term improves the model)

The test statistic is t

 b

2

 β

2

S b

2 where: p -value=TDIST(t n-3

) b

2

= estimated slope

β

2

= hypothesized slope (zero) d.f.

 n

3

S b2

= standard error of the slope

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-9

Testing the Quadratic Term

(continued)

 A second test of the quadratic effect

Compare the adjusted r 2 from the simple regression to the adjusted r 2 from the quadratic model

 If the adjusted r 2 of the quadratic model is greater than the adjusted r 2 of the simple linear regression then the quadratic model has explained a larger percentage of the variability in Y

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-10

70

78

85

87

99

40

54

67

Purity

3

7

8

15

22

33

14

15

15

16

17

2

3

5

Filter

Time

1

Quadratic Regression

Example

Purity increases as filter time increases:

Purity vs. Time

100

80

7

8

60

10

12

13

40

20

0

0

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

5 10

Time

15 20

Chap 15-11

Intercept

Time

Quadratic Regression

Example

Simple (linear) regression results: t statistic, F statistic, and adjusted r 2 are all high, but the residuals are not random:

Coefficients

-11.28267

5.98520

Standard

Error

3.46805

0.30966

t Stat P-value

-3.25332

0.00691

19.32819

2.078E-10

Time Residual Plot

R Square

Regression Statistics

0.96888

Adjusted R Square 0.96628

Standard Error 6.15997

F

373.57904

Significance F

2.0778E-10

10

5

0

-5 0

-10

5 10 15

Time

20

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-12

Quadratic Regression

Example

Quadratic regression results:

Y = 1.539 + 1.565 Time + 0.245 (Time) 2

Intercept

Time

Time-squared

Coefficients

1.53870

1.56496

0.24516

Standard

Error

2.24465

0.60179

0.03258

t Stat P-value

0.68550

2.60052

0.50722

0.02467

7.52406

1.165E-05

Regression Statistics

F Significance F

R Square 0.99494

1080.7330

2.368E-13

Adjusted R Square

Standard Error

0.99402

2.59513

The quadratic term is significant and improves the model: adjusted r 2 is higher and S

YX lower, residuals are now random.

is

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

10

5

0

-5

0

Time Residual Plot

5 10

Time

15

10

5

0

-5

0

Time-squared Residual Plot

100 200

Time-squared

300

Chap 15-13

400

20

Collinearity

Collinearity: High correlation exists among two or more independent variables

The correlated variables contribute redundant information to the multiple regression model

 Including two highly correlated independent variables can adversely affect the regression results

 No new information provided

 Can lead to unstable coefficients (large standard error and low t-values)

 Coefficient signs may not match prior expectations

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-14

Some Indications of Strong

Collinearity

Incorrect signs on the coefficients

Large change in the value of a previous coefficient when a new variable is added to the model

A previously significant variable becomes insignificant when a new independent variable is added

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-15

Detecting Collinearity

Variance Inflationary Factor

VIF j is used to measure collinearity:

VIF j

1

1

R

2 j where R 2 j is the coefficient of determination from a regression model that uses X j as the dependent variable and all other X variables as the independent variables

If VIF j

> 10 , X j is highly correlated with the other independent variables

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-16

12

13

14

10

11

8

9

6

7

4

5

Week

1

2

3

340

300

440

450

430

470

450

490

Pie

Sales

350

460

350

430

350

380

Example: Pie Sales Model

7.20

7.90

5.90

5.00

4.50

6.40

7.00

5.00

Price

($)

5.50

7.50

8.00

8.00

6.80

7.50

3.5

3.2

4.0

3.5

3.0

3.7

3.5

4.0

Advertising

($100s)

3.3

3.3

3.0

4.5

3.0

4.0

Recall the multiple regression model of chapter 13:

Sales = b

0

+ b

1

(Price)

+ b

2

Advertising)

Chap 15-17

Detect Collinearity in PHStat

PHStat / regression / multiple regression …

Check the “variance inflationary factor (VIF)” box

Regression Analysis

Price and all other X

Regression Statistics

Output for the pie sales example:

 Since there are only two explanatory variables, only one

VIF is reported Multiple R

R Square

Adjusted R

Square

Standard Error

Observations

VIF

0.030438

0.000926

-0.075925

1.21527

15

1.000927

 VIF is < 10

 There is no evidence of collinearity between Price and

Advertising

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-18

Model Building

Goal is to develop a model with the best set of independent variables

Easier to interpret if unimportant variables are removed

Lower probability of collinearity

Stepwise regression procedure

 Provide evaluation of alternative models as variables are added

Best-subset approach

 Try all combinations and select the model with the highest adjusted r 2

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-19

Stepwise Regression

 Idea: develop the least squares regression equation in steps, adding one independent variable at a time and evaluating whether existing variables should remain or be removed

 The coefficient of partial determination is the measure of the marginal contribution of each independent variable, given that other independent variables are in the model

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-20

Best Subsets Regression

Idea: estimate all possible regression equations using all possible combinations of independent variables

Choose the best model by looking for the highest adjusted r 2

The model with the largest adjusted r 2 will also have the smallest S

YX

Stepwise regression and best subsets regression can be performed using Excel with PHStat add in

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-21

Alternative Best Subsets

Criterion

?

The C p

Statistic

C p

( 1

R k

2

1

)( n

R 2

T

T )

( n

2 ( k

1 ))

Where k = number of independent variables included in a particular regression model

T = total number of parameters to be estimated in the full regression model

R 2 k

= coefficient of multiple determination for model with k independent variables

R

2

T

= coefficient of multiple determination for full model with all T estimated parameters

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-22

8 Steps in Model Building

1. Choose independent variables to include in the model

2. Estimate full model and check VIFs and check if any VIFs > 10

 If no VIF > 10 , go to step 3

 If one VIF > 10 , remove this variable

 If more than one, eliminate the variable with the highest VIF and repeat step 2

3. Perform best subsets regression with remaining variables.

4. Sort all models by r 2

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-23

8 Steps in Model Building

5. Choose the best model.

 Consider parsimony.

 Do extra variables make a significant contribution?

6. Perform complete analysis with chosen model, including residual analysis.

7. Transform the model if necessary to deal with violations of linearity or other model assumptions.

8. Use the model for prediction and inference.

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-24

Model Validation

The final step in the model-building process is to validate the selected regression model.

Collect new data and compare the results.

Compare the results of the regression model to previous results.

If the data set is large, split the data into two parts and cross-validate the results.

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-25

Model Building Flowchart

Choose X

1

,X

2

,…,X k

Run regression to find VIFs

Any

VIF>5?

No

Yes

Remove variable with highest

VIF

Yes

More than one?

No

Remove this X

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Run best subsets regression to obtain

“best” models in terms of C p

Do complete analysis

Add quadratic term and/or transform variables as indicated

Perform predictions

Chap 15-26

Pitfalls and Ethical

Considerations

To avoid pitfalls and address ethical considerations:

Understand that interpretation of the estimated regression coefficients are performed holding all other independent variables constant

Evaluate residual plots for each independent variable

Evaluate interaction terms

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-27

Pitfalls and Ethical

Considerations

To avoid pitfalls and address ethical considerations:

Obtain VIFs for each independent variable before determining which variables should be included in the model.

Examine several alternative models using best-subsets regression or step-wise regression.

Use other methods when the assumptions necessary for least-squares regression have been seriously violated.

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-28

Chapter Summary

In this chapter, we have

Developed the quadratic regression model

Described collinearity

Discussed model building

 Stepwise regression

 Best subsets

Addressed pitfalls in multiple regression and ethical considerations

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.

Chap 15-29

Download