變異數不齊一性三

拾壹違反迴歸假設以及補救方法主講人陳陸輝研究員政治大學選舉研究中心講授主題一、解釋變數之間的共線性問題二、變異數不齊一性三、誤差項自我迴歸（相關）一、解釋變數之間的共線性問題 1.定義多重共線性 2.各種偵測共線性的方法 3.共線性對於估計的影響 4.補救方法 1.定義多重共線性 1.定義多重共線性 Perfect multicollinearity （完全多重共線性） Yi =  0 + 1 X1,i +  2 X 2,i +3 X 3,i +i (11.2) 3 X 1,i =  + 4X 2,i - X 3,i (11.3) 5 Yi =  0* +  2* X 2,i +  3*X 3,i +i (11.4) 多重共線性的後果當解釋變數間出現完全多重共線性的情況時：解釋變數的係數無法估計  其標準誤出現無限大的情況  多重共線性的程度實際上，共線性是「程度問題」偶見高度多重共線性以一個第147頁的（11.2）方程式為例 ˆ )  var(b 2 ˆ )  var(b 3  2 2 2 x ( 1  r  2i 23 )  2 2 2 x ( 1  r  3i 23 ) 變異膨脹因子(Variance-Inflating Factor, VIF) 1 VIF = 2 1  r23  ˆ var(b2 )  2  ˆ var(b3 )  2 x x 2 2i 2 3i VIF VIF 相關程度與變異膨脹因子的效果 Table 11.1 The Effect of Increasing r23 on Value of r23 0.00 0.50 0.70 0.80 0.90 0.95 0.97 0.99 0.995 0.999 Note: A =  2 x 2 2i ;B= var( bˆ2 ) and cov( bˆ2 , bˆ3 ) VIF var( bˆ2 ) cov( bˆ2 , bˆ3 ) 1.00 1.33 1.96 2.78 5.76 10.26 16.92 50.25 100.00 500.00 A 1.33  A 1.96  A 2.78  A 5.76  A 10.26  A 16.92  A 50.25  A 100.00  A 500.00  A 0 0.67  B 1.37  B 2.22  B 4.73  B 9.74  B 16.41  B 49.75  B 99.50  B 499.50  B  2 x 22i x32i . Source: Gujarati 1995: 329, Table 10.1. 2.各種偵測共線性的方法(p.149) (1) High R2 but few significant t ratios. (2) High pair-wise correlations among regressors. (3) Examination of partial correlations. (4) Auxiliary regressions. (5) Eigenvalue and condition index. (6) Tolerance and variance inflation factor. (1) High R2 but few significant t ratios 當你發現模型的R2高，但是變數的t檢定卻少有顯著時可以拿掉一兩個變數，看看估計結果是否出現重要變化 (2)自變數間高度相關放入模型中的解釋變數，本身即具有高度的相關，相關程度超過0.8會是大問題（Gujarati 1995, 335）不過，有時自變數間低相關也會出現共線性解釋變數超過兩個，也難用此原則檢視社會科學常見現象，選舉研究尤為常見：政黨認同、統獨立場、候選人評價 (3)檢視偏相關當我們有超過兩個以上的解釋變數需檢視變數之間的偏相關 (4) 輔助迴歸估計檢視變數間是否高度相關以一自變數為依變數，將其他變數放入模型得到新的R2 用此新的R2與原有模型的R2做F檢定或是看新的R2是否超過原統計模型的R2 (5) 特徵值與條件指標條件指標小於10則沒有問題介於10到30則為中度到高度的共線性超過30則為嚴重的共線性 MaximumEigenvalue  k( CI= MinimumEigenvalue （6）容忍度與變異膨脹因子 VIF超過10表示相關達到0.949 TOL=1/VIF 接近1表示獨立變數間無關聯性接近0表示獨立變數間有高度相關 3.共線性對於估計的影響 (1) Large Variances of OLS Estimators (2) Wider Confidence Intervals (3) Insignificant t Ratio (4) A High R2 but Few Significant t Ratio (5) Sensitivity of OLS estimators and their standard errors to small changes in data 4.補救方法 (1). A priori information (2). Combining cross-sectional and time series data (3). Dropping a variable(s) and specification bias (4). Transformation of variables (5). Additional or new data. (6). Reducing collinearity in polynomial regressions (7). Other methods—Factor analysis or principal components （1）事前資訊當你知道自變數之間的關係時先將部分自變數納入再用估計結果推估未納入計算自變數的估計值（Gujarati 1995, 340）（2）納入跨時與剖面資料此一目的在增加觀察值不過，也增加解釋的困難（中國不同地區經濟成長或是政府支出問題）（3）拿掉一個變數拿掉一個你認為是「搗蛋」的變數不過，也會出現模型設定不足（Model Under-specification）（4）轉換變數在時間序列資料中將自變數與依變數與前一時間點相減自變數之間的相關將會消除，不過，誤差項也許會出現問題（5）納入更多資料與方法2類似，不過，通常不太「實際」（6）polynomial regressions 將自變數與不同次方取自變數的離差（7）其他方法幾個變數如果高度相關，則可以用因素分析或是指標建構的方式，將多個變數建立成一指標二、變異數不齊一性 1.定義 2.成因 3.對估計的影響 4.檢驗方法 5.補救措施 1.定義誤差項的變異數大小會隨著依變數大小而變化 Yi =  1 +  2 X 2i +  3 X 3i + …+  kX ki +  i (11.1) 2 2 E (  i ) =  , where i=1, 2, 3, …, n (11.16) 2 var(  i | X 2i ,X 3i ,…,X ki ) =  i (11.17) 1.定義：變異數齊一性 1.定義：變異數不齊一性 2.變異數不齊一性的成因      As people learn The variances of error terms are positively correlated with the independent variables. When our data collecting techniques improve Heteroskedasticity can also arise as a result of outliers in our data. Another resource of heteroscedasticity arises from mis-specifying the regression model. (1)因為學習，減低錯誤 (2) 誤差項與自變數正相關個人收入愈多，存款的選擇愈多公司收入愈好，股利發放愈多樣（3）資料蒐集的技術改進（4）因為出現極端值（5）模型設定錯誤忽略重要變數 3.對估計的影響 1). Heteroskedasticity, Unbiasedness, and Consistency 2). Heteroskedasticity and Standard Errors of OLS Coefficients The consequences of heteroskedasticity are that b is still unbiased and consistent, but its variance will be incorrect. In other words, it is inefficient. Additionally, our conventional test statistics are invalid. 4.檢驗方法非正式：圖形檢驗 1) The White Test 2) The Breusch-Pagan-Godfrey Test 圖形檢驗：誤差平方與預測值圖形檢驗：超過兩個自變數正式檢定：The White Test 先估計原模型取誤差項的平方將自變數改為原變數原變數的平方原變數之間的交互作用（Gujarati 1995,379-80）統計檢定：n*R2~X2df 5.補救措施對於變異數不齊一性的各種補救方法都不適當，主要補救方法著重在對於特定解釋變數予以加權處理，以降低其變異數的影響。最大概似法有其他的方法，可以處理這個問題，此處並不加以介紹。三、誤差項自我迴歸（相關） 1.定義 2.成因 3.對估計的影響 4.檢驗方法 5.補救措施誤差項自我迴歸（相關） 1.定義 The term autocorrelation (or serial correlation) may be defined as correlation between members of series of observations ordered in time [as in time series data] or space [as in cross-sectional data] (Gujarati 1995: 400-1). 2.成因  However, serial correlation could indeed be error autocorrelation, but it could also be the result of dynamic misspecification, parameter nonconstancy, incorrect functional form, and so on (Granato 1991: 124). 2.成因 According to Kennedy (1992: 119), there are several reasons why serial correlation arises:      Spatial autocorrelation: Prolonged influence of shocks: Inertia: Data manipulation: Misspecification: 3.對估計的影響 (1).No Lagged Endogenous Variable (2).Lagged Endogenous Variable (1).No Lagged Endogenous Variable 係數仍然是無偏估計  係數的標準誤估計被低估  假設檢定犯下第一型錯誤的機率增高  (2).Lagged Endogenous Variable 讓OLS的估計變得Inconsistent 4.檢驗方法參考課本內容以及所列書目 5.補救措施好好學統計表7的說明好好學統計

變異數不齊一性三

Related documents

Products

Support

變異數不齊一性三

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib