X 1

advertisement
統計學
Statistics
多元回歸及模型
Multiple Regression
Model Building
複迴歸3-1
講題綱要






二次多項式多元迴歸--The quadratic
regression model
虛擬變數的引用--Dummy variables
資料轉換的應用--Using transformation in
regression models
自變數間共線性問題--Collinearity
迴歸模型的建立與探討--Model building
多元迴歸模型的綜合考量-- Pitfalls in multiple
regression and ethical considerations
複迴歸3-2
線性複迴歸模式

1. 某個變數和其它變數之間的線性關係
Population
Y-intercept
Population
slopes
隨機誤差
(Random error)
Yi   0   1X 1i   2 X 2i   k X ki   i
相依或反應
(response)
變數
獨立或探討
(explanatory)
變數
複迴歸3-3
母體複迴歸模式
觀測值
Bivariate model
Y
Response
Plane
X1
Yi =  0 +  1X1i +  2X2i +  i
(Observed Y)
0
i
X2
(X1i,X2i)
E(Y) =  0 +  1X1i +  2X2i
期望值
複迴歸3-4
樣本複迴歸模式
Bivariate model
Y
Response
Plane
Yi = ^0 + ^1X1i + ^2X2i + ^i
(Observed Y)
^
0
^
i
X2
X1
(X1i,X2i)
^
^
^
^
Yi =  0 +  1X1i +  2X2i
複迴歸3-5
估計係數之詮釋

^
1. 第k個斜率係數(slope, k)

在所有其它X變數固定下, Xk改變一個單位時, 平均
Y改變k的量
^
^
 Example: If 1 = 2, then Sales (Y) Is Expected to
Increase by 2 for Each 1 Unit Increase in
Advertising (X1), Given the Number of Sales (X2)
fixed

2. Y-截距(^0)

在所有Xk = 0時, 平均之Y值
複迴歸3-6
二次多項式多元迴歸
The Quadratic Regression Model



The relationship between one response
variable and one or more explanatory
variables is a quadratic polynomial function
It is useful when scatter diagram indicates a
non-linear relationship
Quadratic model:

Yi   0   1 X 1 i   2 X   i
2
1i
The second explanatory variable is the square
of the first variable

複迴歸3-7
二次多項式多元迴歸模型
Quadratic Regression Model
(continued)
Quadratic models may be considered when
scatter diagram takes on the following shapes:
Y
Y
2 > 0
X1
Y
2 > 0
X1
Y
2 < 0
X1
2 < 0 X1
Yi   0   1 X 1 i   2 X 1 i   i
2
2 = the coefficient of the quadratic term
複迴歸3-8
二次項模型的檢定Testing for
Significance: Quadratic Model

Testing for overall relationship


Similar to test for linear model
F test statistic = M S R
M SE

Testing the quadratic effect

Compare quadratic model
Yi   0   1 X 1 i   2 X 1 i   i
2
with the linear model
Yi   0   1 X 1 i   i

Hypotheses


H0 : 2  0
H1 : 2  0
(No 2nd order polynomial term)
(2nd order polynomial term is needed)
複迴歸3-9
廣告大小與回應範例1

你在銘傳時報的廣告部
門工作. 你想找出廣告
大小(公分平方) 對讀者
回應次數的效應(單位百
次).
你所收集資料如下:
回應 廣告大小 流通
1
1
2
4
8
8
1
3
1
3
5
7
2
6
4
4
10
6
複迴歸3-10
廣告大小與回應範例1:
殘差分析Residual Analysis
廣告大小 殘差圖
觀察值與期望值
的比較
殘差
1
0
0
2
4
6
8
10
廣告大小 樣本迴歸線圖
12
廣告大小
No Discernable
Pattern
廣告回應
-1
6
4
2
0
廣告回應
預測為 廣告回應
0
5
10
15
廣告大小
複迴歸3-11
廣告大小與回應範例1:
t Test for Quadratic Model

Testing the quadratic effect

Compare quadratic model in size
Yi   0   1 X 1, i   2 X 1, i   i
2
with the linear model
Yi   0   1 X 1, i   i

Hypotheses
 H
:  2  0 (No quadratic term in size)
0

H 1 :  2  0 (Quadratic term is needed in
size)
複迴歸3-12
廣告大小與回應範例1結論:
Is a quadratic model in size needed on replies of News
Paper? Test at  = 0.05.
H0: 2 = 0
Test Statistic:
H1: 2  0
t Test Statistic = 6.2*10-15
Decision:
Do not reject H0 at  = 0.05
df = 3
Critical Value(s):
Reject H0
Reject H0
.025
.025
-3.182
0 3.182
Z
Conclusion:
There is not sufficient
evidence for the need to
include quadratic effect of
size on reply.
複迴歸3-13
使用 PHStat做詳盡的解說

PHStat | regression | multiple regression …

EXCEL spreadsheet for the 廣告與回應1.
複迴歸3-14
廣告大小與回應範例2

你在銘傳時報的廣告部
門工作. 你想找出廣告
大小(公分平方) 對讀者
回應次數的效應(單位百
次).
你所收集資料如下:
回應 廣告大小 流通
1
1
2
4
8
8
1
3
1
3
5
7
2
6
4
4
10
6
5
28 9
複迴歸3-15
廣告大小與回應範例2:
殘差分析Residual Analysis
觀察值與期望值
的比較
廣告大小 殘差圖
1
0
-1 0
5
10
15
20
25
廣告大小 樣本迴歸線圖
30
-2
廣告大小
廣告回應
殘差
2
6
4
廣告回應
2
0
預測為 廣告回應
0
Discernable Pattern
10
20
30
廣告大小
複迴歸3-16
廣告大小與回應範例2:
t Test for Quadratic Model

Testing the quadratic effect

Compare quadratic model in size
Yi   0   1 X 1, i   2 X 1, i   i
2
with the linear model
Yi   0   1 X 1, i   i

Hypotheses
 H
:  2  0 (No quadratic term in size)
0

H 1 :  2  0 (Quadratic term is needed in
size)
複迴歸3-17
廣告大小與回應範例2解答:
Is a quadratic model in size needed on replies of News
Paper? Test at  = 0.05.
H0: 2 = 0
Test Statistic:
H1: 2  0
t Test Statistic = -2.848
Decision:
Reject H0 at  = 0.05
df = 4
Critical Value(s):
Reject H0
Reject H0
.025
.025
-2.776
0 2.776
Z
Conclusion:
There is a sufficient evidence
for the need to include
quadratic effect of size on
replies.
複迴歸3-18
使用 PHStat做詳盡的解說

PHStat | regression | multiple regression …

EXCEL spreadsheet for the 廣告與回應2.
複迴歸3-19
暖屋用油與溫度及隔離範例:
Heating Oil Example
Determine whether a quadratic
model is needed for estimating
heating oil used for a single
family home in the month of
January based on average
temperature and amount of
insulation in inches.
0
Oil (Gal) Temp ( F) Insulation
275.30
40
3
363.80
27
3
164.30
40
10
40.80
73
6
94.30
64
6
230.90
34
6
366.70
9
6
300.60
8
10
237.80
23
10
121.40
63
3
31.40
65
10
203.50
41
6
441.10
21
3
323.00
38
3
52.50
58
10
複迴歸3-20
暖屋用油與溫度及隔離範例:
Residual Analysis
(continued)
T e m p e r a t u r e R e s id u a l P lo t
May be some nonlinear relationship
60
40
R e si d u a l s
20
In s u la tio n R e s id u a l P lo t
0
0
20
40
60
80
-20
-40
-60
0
2
4
6
8
10
12
No Discernable Pattern
複迴歸3-21
暖屋用油與溫度及隔離範例:
t Test for Quadratic Model
(continued)

Testing the quadratic effect

Compare quadratic model in insulation
Yi   0   1 X 1 i   2 X 2 i   3 X 2 i   i
2
with the linear model

Yi   0   1 X 1i   2 X 2 i   i
Hypotheses


H 0 : 3  0
H1 : 3  0
(No quadratic term in insulation)
(Quadratic term is needed in
insulation)
複迴歸3-22
暖屋用油與溫度及隔離範例:
Example Solution
Is a quadratic model in insulation needed on monthly
consumption of heating oil? Test at  = 0.05.
H0: 3 = 0
Test Statistic:
H1: 3  0
t Test Statistic = 1.6611
Decision:
Do not reject H0 at  = 0.05
df = 11
Critical Value(s):
Reject H0
Reject H0
.025
.025
-2.2010
0 2.2010
Z
Conclusion:
There is not sufficient
evidence for the need to
include quadratic effect of
insulation on oil consumption.
複迴歸3-23
使用 PHStat做詳盡的解說


PHStat | regression | multiple regression …
EXCEL spreadsheet for the heatingoil
example.
複迴歸3-24
虛擬變數模型的使用:
Dummy Variable Models
Categorical explanatory variable (dummy variable)
with two or more levels:






Yes or no, on or off, male or female,
Coded as 0 or 1
Only intercepts are different
Assumes equal slopes across categories
The number of dummy variables needed is
(number of levels - 1)
Regression model has same form:
Yi   0   1 X 1i   2 X 2 i       k X ki   i
複迴歸3-25
純使用虛擬變數模型範例:
Dummy-Variable Models
銘統連鎖超級市場想要了解貨品陳列的位置
是否會影響寵物玩偶銷售的結果。在店中依
照位置所在可將商品陳列區分為:前段
Front, 中段Middle, 以及後段 Rear。 現
從旗下18家連鎖店中隨機抽出6家店來。
並將相同的寵物玩偶置放於所選出店的不同
的位置,經過一個月後再變換位置,每店實
施三個月,並記錄其當月銷售總金額(萬
元)。
請參考檔案:複迴歸位置影響
複迴歸3-26
純使用虛擬變數模型範例:
Dummy-Variable Models
Given: Yˆi  b0  b1 X 1 i  b 2 X 2 i
X1 = Front Aisle =
X2 = Middle Aisle =
Y = Sales
F=1 if Front Aisle F=0 if else
M=1 if MiddleM=0 if else
Front Aisle (X1 = 1, X2 = 1)
Yˆi  b0  b1 X 1, i  b 2 X 2 ,i  b0  b1 (1)  b 2 ( 0 )  b0  b1
Middle Aisle (X1=0, X2 = 1)
Yˆi  b0  b1 X 1,i  b 2 X 2 , i  b0  b1 ( 0 )  b 2 (1)  b0  b 2
Rear Aisle(X1=0, X2 = 0) Yˆi  b 0 Mean of Rear Aisle
複迴歸3-27
純使用虛擬變數模型範例:
使用PHStat做詳盡的解說


PHStat | regression | multiple regression …
EXCEL spreadsheet for the 複迴歸位置影響
example.
複迴歸3-28
純使用虛擬變數模型範例圖解1:
Dummy-Variable Models
(continued)
Sales VS Location
10
Front
5
Middle
Rear
0
0
2
4
6
8
(Location)
複迴歸3-29
純使用虛擬變數模型範例圖解2:
Dummy-Variable Models
(continued)
Y (Sales)
b0 + b1
b0
Intercepts
different
b0 + b2
Front
Rear
Middle
(Location)
複迴歸3-30
純使用虛擬變數模型解說:
Dummy-Variable Models
•參數估計: Yˆi  b0  b1 X 1 i  b 2 X 2 i (單位:萬元)
b0=3.733 ; b1=2.333 ; b2= -1.667
•後段(比較的依據)的平均銷售額為:
b0=3.733
•前段的平均銷售額為: b0 +b1=6.066
•中段的平均銷售額為: b0 +b2=2.066
•此結果與變異數分析結果一致。
•且前段與後段平均差異顯著;中段與後段平均差異
複迴歸3-31
含虛擬與數量變數模型
Given: Yˆi  b0  b1 X 1 i  b 2 X 2 i
Y = Assessed value of house
X1 = Square footage of house
X2 = Desirability of neighborhood = 0 if undesirable
Desirable (X2 = 1)
Yˆi  b0  b1 X 1i  b 2 (1)  ( b 0  b 2 )  b1 X 1 i
Undesirable (X2 = 0)
Yˆi  b 0  b1 X 1i  b 2 (0)  b 0  b1 X 1 i
1 if desirable
Same
slopes
複迴歸3-32
含虛擬與數量變數模型圖解
Y (Assessed Value)
Same
slopes b1
b0 + b2
Intercepts
different
b0
X1 (Square footage)
複迴歸3-33
含虛擬與數量變數模型:
使用 PHStat做詳盡的解說


PHStat | regression | multiple regression …
EXCEL spreadsheet for the 房價與大小鄰居
example.
複迴歸3-34
含虛擬與數量變數模型係數解說1
據報導男性大學生在進入職場時起薪較相同女性起薪
為高,大約2000元。
Yˆi  b0  b1 X 1,i  b 2 X 2 ,i  23  1 . 5 X 1,i  2 X 2 ,i
:
Y: 大學畢業生的工作薪資(千元)
X 1 : 年資年增1.5
X
0 女性
2
1 男性
複迴歸3-35
含虛擬與數量變數模型係數解說2
G iven:
Y  A ssessed V alue of the H ouse (1000 $)
X 1  S quare F ootage of the H ouse
S tyle of the H ouse = S plit-level, R anch, C ondo
(3 L evels; N eed 2 D um m y V ariables)
1 if S plit-level
X2  
0 if not

1 if R anch
X3  
 0 if not
Yˆi  b 0  b1 X 1  b 2 X 2  b3 X 3
複迴歸3-36
含虛擬與數量變數模型係數解說2
(continued)
G iven th e E stim ated M o d el:
Yˆi  2 0 .4 3  0 .0 4 5 X 1 i  1 8 .8 4 X 2 i  2 3 .5 3 X 3 i
F o r S p lit-level
X2
 1:
Yˆi  2 0 .4 3  0 .0 4 5 X 1 i  1 8 .8 4
F o r R an ch
X3
 1 :
Yˆi  2 0 .4 3  0 .0 4 5 X 1 i  2 3 .5 3
For C ondo:
Yˆi  2 0 .4 3  0 .0 4 5 X 1 i
With the same footage, a splitlevel home will have an
estimated average assessed
value of 18.84 thousand dollars
more than a Condo.
With the same footage, a ranch
home will have an estimated
average assessed value of 23.53
thousand dollars more than a
Condo.
複迴歸3-37
含交互作用多元迴歸模型
Interaction Regression Model

Hypothesizes interaction between pairs of X
variables


Contains two-way cross product terms


Response to one X variable varies at different
levels of another X variable
Yi   0   1 X 1i   2 X 2 i   3 X 1i X 2 i   i
Can be combined with other models

e.g.: Dummy variable model
複迴歸3-38
交互作用所產生的影響
Effect of Interaction

Given:




Yi   0   1 X 1i   2 X 2 i   3 X 1i X 2 i   i
Without interaction term, effect of X1 on Y is
measured by 1
With interaction term, effect of X1 on Y is
measured by 1 + 3 X2
Effect changes as X2 increases
複迴歸3-39
交互作用模型及係數範例
Interaction Example
Y
Y = 1 + 2X1 + 3X2 + 4X1X2
Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1
12
8
Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
4
0
X1
0
0.5
1
1.5
Effect (slope) of X1 on Y does depend on X2 value
複迴歸3-40
交互作用交乘項的產生:
Interaction Regression Model
Case, i
Yi
X1i
X2i
X1i X2i
1
1
1
3
3
2
4
8
5
40
3
1
3
2
6
4
3
5
6
30
:
:
:
:
:
Multiply X1 by X2 to get X1X2.
Run regression with Y, X1, X2 , X1X2
複迴歸3-41
虛擬變數含交乘模型範例
Y   0   1 M ALE   2 M AR R IED   3 D IV O R C ED
  4 M ALE  M AR R IED   5 M ALE  D IV O R C ED
MALE = 0 if female and 1 if male
MARRIED = 1 if married; 0 if not
DIVORCED = 1 if divorced; 0 if not
MALE•MARRIED = 1 if male married; 0 otherwise
= (MALE times MARRIED)
MALE•DIVORCED = 1 if male divorced; 0 otherwise
= (MALE times DIVORCED)
複迴歸3-42
虛擬變數含交乘模型範例
(continued)
Y   0   1 M ALE   2 M AR R IED   3 D IV O R C ED
  4 M ALE  M AR R IED   5 M ALE  D IV O R C ED
SINGLE
MARRIED
DIVORCED
FEMALE

  2
  3
MALE
   1     1
   1
  2   4  3  5
複迴歸3-43
虛擬變數含交乘模型範例解說
Female
Single:  0
Married:  0   2
Divorced:  0   3
MALE
Difference
1
Single:  0   1
1   4
Married:  0   1   2   4
Divorced:  0   1   3   5  1   5
Main Effects : MALE, MARRIED and DIVORCED
Interaction Effects : MALE•MARRIED and
MALE•DIVORCED
複迴歸3-44
交互作用項的檢測


Hypothesize interaction between pairs of
independent variables
Contains 2-way product terms
Yi   0   1 X 1i   2 X 2 i   3 X 1i X 2 i   i

Hypotheses:


H0: 3 = 0 (no interaction between X1 and X2)
H1: 3  0 (X1 interacts with X2)
複迴歸3-45
綜合應用範例
薪資 工作年資 性別 性別年資


銘傳就業輔導中心
欲了解學生畢業後
薪資待遇情形,進
行調查得到以下12
位畢業校友的薪資
以及其相關的年資、
性別狀況:
請檢測並建立適當
的預估模型.
20710
23160
23210
24140
25760
25590
19510
20440
21340
21760
22750
23200
1
3
3
4
5
5
1
2
3
3
4
5
1
1
1
1
1
1
0
0
0
0
0
0
1
3
3
4
5
5
0
0
0
0
0
0
複迴歸3-46
綜合應用範例圖解
30000
25000
男性年資與
薪資
20000
女性年資與
薪資
15000
10000
線性 (男性
年資與薪資)
5000
線性 (女性
年資與薪資)
0
0
2
4
6
複迴歸3-47
綜合應用模型建立及係數解說
Example:
Y   0   1  年資   2  性別   3  年資  性別
Y: 薪資,單位為元
年資:為數量變數
性別:虛擬變數;男性為1、女性為0
年資性別:交互作用;女性為0、
男性為其年資
複迴歸3-48
綜合應用範例:
使用 PHStat做詳盡的解說


PHStat | regression | multiple regression …
EXCEL spreadsheet for the 薪資與年資性別
example.
複迴歸3-49
綜合應用範例總結
Y   0   1  年資   2  性別   3  年資  性別
b0=18593 ;
b1=969 ;
b2 =
b4=260
867;
•女性平均起薪約為18593元
•女性每年調薪約為969元
•男性平均起薪約為18593+867=19460元
•男性每年調薪約為969+260=1229元
複迴歸3-50
交互作用項的檢測


Hypothesize interaction between pairs of
independent variables
Contains 2-way product terms
Yi   0   1 X 1i   2 X 2 i   3 X 1i X 2 i   i

Hypotheses:


H0: 3 = 0 (no interaction between X1 and X2)
H1: 3  0 (X1 interacts with X2)
複迴歸3-51
綜合應用範例之交互作用檢測:
使用 = 0.05 ,檢測性別及年資是否有交互作用;男
性女性每年調薪金額(斜率)是否相同.
H0: 3 = 0
Test Statistic:
H1: 3  0
t Test Statistic = 2.988
Decision:
Reject H0 at  = 0.05
df = 8
Critical Value(s):
Reject H0
Reject H0
.025
.025
-2.306
0 2.306
Conclusion:
Z
有充分證據顯示:男性
女性每年調薪金額(斜
率)的確不同
複迴歸3-52
綜合應用模型圖解
Y (薪資)
斜率為b1+b3
斜率也不同
,差異為b3
b0 + b2
截距不同,
差異為b2
b0
斜率為b1
X1 (年資)
Y   0   1  年資   2  性別   3  年資  性別
複迴歸3-53
資料的轉換—以合乎線性迴歸
Using Transformations




Requires data transformation
Either or both independent and dependent
variables may be transformed
Can be based on theory, logic or scatter
diagrams
Non-linear models that can be expressed in
linear form


Can be estimated by least squares in linear form
Require data transformation
複迴歸3-54
自變數相乘方性的Log-Log轉換
Transformed Multiplicative Model (Log-Log)
1
2
O riginal: Yi   0  X 1i  X 2 i   i
T ransform ed: ln  Yi   ln   0    1 ln  X 1 i    2 ln  X 2 i   ln   i 
1  1
Y
Y
0  1  1
1  1  0
1  1
1  1
X1
Similarly for X2
X1
複迴歸3-55
平方根轉換:
Square Root Transformation
Yi   0   1 X 1i   2 X 2 i   i
Y
1 > 0
Similarly for X2
1 < 0
X1
Transforms one of the above models to one that
appears linear. Often used to overcome
heteroscedasticity.
複迴歸3-56
線性—Log轉換:
Linear-Logarithmic Transformation
Yi   0   1 ln( X 1i )   2 ln( X 2 i )   i
Y
1 >
0
Similarly for X2
1 <
0
X1
Transformed from an original multiplicative model
複迴歸3-57
指數資料的Log—線性轉換:
Exponential Transformation(Log-Linear)
Original Model
Y
Yi  e
 0  1 X 1i   2 X 2 i
i
1 > 0
1 < 0
Transformed Into:
X1
ln Yi   0   1 X 1i   2 X 2 i  ln  1
複迴歸3-58
使用轉換法後係數的解釋1:
Interpretation of Coefficients

The dependent variable is logged


The coefficient of the independent variable Xk can
be approximately interpreted as: a 1 unit change
in Xk that leads to an estimated exp(bk) times Yk
change in the average of Y
The independent variable is logged

The coefficient of the independent variable can be
approximately interpreted as: a 100 percent
change in Xk that leads to an estimated bk*log(2)
unit change in the average of Y
複迴歸3-59
使用轉換法後係數的解釋2:
Interpretation of Coefficients
(continued)

Both dependent and independent variables
are logged

The coefficient of the independent variable X k can
be approximately interpreted as : a 1 percent
change in X k leads to an estimated b k percentage
change in the average of Y. Therefore b k is the
elasticity of Y with respect to a change in X k
複迴歸3-60
使用轉換法後係數的解釋3:
Interpretation of Coefficients
(continued)

If both Y and X k are measured in
standardized form:


yi 
Yi   Y
Y
And
x ki 
X ki   k
k
The bk ' s are called standardized coefficients

They indicate the estimated number of average
standard deviations Y will change when X k
changes by one standard deviation
複迴歸3-61
共線性相關
Collinearity (Multicollinearity)



1. X變數之間有高度相關High correlation between
explanatory variables
2. 係數測量綜合效應Coefficient of multiple
determination measures combined effect of the
correlated explanatory variables
3. 導致模式中係數不穩定(+/-, 誤差大)Leads to
unstable coefficients (large standard error)

4. 通常存在 -- 只是程度大小

5. 例: 同一模式中, 同時使用年齡和身高
複迴歸3-62
偵測Detecting Multicollinearity

1. 檢測相關距陣(correlation matrix)


2. 變異數膨脹因素(variance inflation factor,
簡稱VIF)


配對X的相關比(X和Y)相關更甚時
若 VIFj > 5, Multicollinearity 存在
3. 一些補救方法

再取新的樣本資料, 刪除一個相關的X變數
複迴歸3-63
相關矩陣 (SAS報表)
Correlation Analysis

Pearson Corr Coeff /Prob>|R| under HO:Rho=0/
N=6


RESPONSE


ADSIZE


CIRC


rY1
RESPONSE
1.00000
0.0
ADSIZE
0.90932
0.0120
CIRC
0.93117
0.0069
0.90932
0.0120
1.00000
0.0
0.74118
0.0918
0.93117
0.0069
0.74118
0.0918
rY2
1.00000
0.0
對角線之值
r12
複迴歸3-64
Variance Inflation Factors
Computer Output
Parameter Standard T for H0:
Variable DF Estimate
Error Param=0 Prob>|T|
INTERCEP
1
0.0640
0.2599 0.246
0.8214
ADSIZE
1
0.2049
0.0588 3.656
0.0399
CIRC
1
0.2805
0.0686 4.089
0.0264


Variable DF
INTERCEP
1
ADSIZE
1
CIRC
1

Variance
Inflation
0.0000
2.2190
2.2190
VIF1  5
複迴歸3-65
共線性相關的文氏圖解說
Venn Diagrams and Collinearity
Large
Overlap
reflects
collinearity
between
Temp and
Insulation
Temp
Oil
Large Overlap in
variation of Temp
and Insulation is
used in explaining
the variation in
Oil but NOT in
estimating  1 and
2
Insulation
複迴歸3-66
共線性相關的檢測
(Variance Inflationary Factor)
 V IF j Used to measure collinearity
R j  coefficient of m ultiple
2
V IF j 
1
1  R 
2
j
determ ination of regression
X
j
on all the other
explantory variables
If V IF j  5, X
j
is highly correlated
with the other explanatory variables.
複迴歸3-67
使用 PHStat檢測共線性相關

PHStat | regression | multiple regression …


Check the “variance inflationary factor (VIF)” box
EXCEL spreadsheet for the heatingoil example

Since there are only two explanatory variables, only
one VIF is reported in the excel spreadsheet
 No VIF is > 5

There is no evidence of collinearity
複迴歸3-68
多元迴歸模型的建立:
Model Building

Goal is to develop a good model with the
fewest explanatory variables



Stepwise regression procedure


Easier to interpret
Lower probability of collinearity
Provides limited evaluation of alternative models
Best-subset approach


Uses the cp statistic
Selects model with small cp near p+1
複迴歸3-69
如何建立多元迴歸模型流程:
Model Building Flowchart
Choose
X1,X2,…Xp
Run Regression
to find VIFs
Any
VIF>5?
Yes
Remove
Variable with
Highest
VIF
Yes
More than
One?
No
Remove
this X
No
Run Subsets
Regression to Obtain
“best” models in
terms of Cp
Do Complete Analysis
Add Curvilinear Term and/or
Transform Variables as Indicated
Perform
Predictions
複迴歸3-70
多元迴歸模型的綜合考量1
To avoid pitfalls and address ethical issues:



Understand that interpretation of the
estimated regression coefficients are
performed holding all other independent
variables constant
Evaluate residual plots for each independent
variable
Evaluate interaction terms
複迴歸3-71
多元迴歸模型的綜合考量2
To avoid pitfalls and address ethical issues:



Obtain VIF for each independent variable and
remove variables that exhibit a high
collinearity with other independent variables
before performing significance test on each
independent variable
Examine several alternative models using
best-subsets regression
Use other methods when the assumptions
necessary for least-squares regression have
been seriously violated
複迴歸3-72
本演講總結






Described the quadratic regression model
Addressed dummy variables
Discussed using transformation in regression
models
Described collinearity
Discussed model building
Addressed pitfalls in multiple regression and
ethical considerations
複迴歸3-73
Download