The Simple Linear Regression Model Simple Linear Regression Model y = 0 + 1 x + Simple Linear Regression Equation E(y) = 0 + 1x Estimated Simple Linear Regression Equation y^ = b0 + b1x Slide 1 最小平方直線(最佳預測直線) 通過平面分佈圖資料點的直線中,使預測誤差平方和 爲最小者即稱爲最小平方直線,而此方法即稱爲最小 平方法(Least Square Method) 何謂誤差平方和? 設 ( x1 , y1 ), ( x2 , y2 ),...,( xn , yn )爲n個資料點,若以 y b0 b1 x 做 爲以X預測Y的直線,則當X=x1,預測值 y1 b0 b1 x 與實際觀 察的y1之差異 y1 y1 即稱爲預測誤差,誤差平方和即定義爲 n n f (b0 , b1 ) ( yi y i ) ( yi b0 b1 xi ) 2 i 1 2 i 1 求 b0 , b1使函數 f 爲最小時,由微積分解“極大或極小”方法。 Slide 2 最小平方直線 解此聯立方程組 可得 f (b0 , b1 ) 0 b 0 f (b , b ) 0 1 0 b1 xi yi ( xi yi ) / n b1 : 2 2 x ( x ) i i /n b y b x 1 0 故最小平方直線為 yˆ b0 b1x y b1 x b1x y b1( x x) Slide 3 Example: Reed Auto Sales Simple Linear Regression Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 6 previous sales are shown below. Number of TV Ads 1 3 2 1 3 2 Number of Cars Sold 14 24 18 17 27 22 Slide 4 Example: Reed Auto Sales Slope for the Estimated Regression Equation b1 = 264 - (12)(122)/5 = 5 28 - (12)2/5 y-Intercept for the Estimated Regression Equation b0 = 20.333 - 5(2) = 10.333 Estimated Regression Equation y^ = 10.333 + 5x Slide 5 Example: Reed Auto Sales Scatter Diagram 30 Cars Sold 20 10 0 0 1 2 3 4 TV ad Slide 6 The Coefficient of Determination Relationship Among SST, SSR, SSE SST = SSR + SSE 2 2 ^ )2 ( y i y ) ( y^i y ) ( y i y i Coefficient of Determination r2 = SSR/SST where: SST = total sum of squares SSR = sum of squares due to regression SSE = sum of squares due to error Slide 7 判定係數 定義: r2 = SSR/SST 用以表示Y的變異數中已被X解釋的部分(比率) • 當r2 愈大時,表示最小平方直線愈精確 • 1- r2為總變異數(SST)中無法由X解釋的餘量(剩餘的比率) Example: Reed Auto Sales • r2 = SSR/SST = 100/117.333 = .852273 • 表示汽車銷售量的差異與變化有85.2%可由“廣告次數”這個 因素來解釋(而有14.8%無法由“廣告次數”所解釋) Slide 8 The Correlation Coefficient Sample Correlation Coefficient rxy (sign of b1 ) Coefficien t of Determinat ion rxy (sign of b1 ) r 2 where: b1 = the slope of the estimated regression equation yˆ b0 b1 x Slide 9 Example: Reed Auto Sales Sample Correlation Coefficient rxy (sign of b1 ) r 2 The sign of b1 in the equation yˆ 10.333 5 x is “+”. rxy 0.852273 rxy = +.923186 Slide 10 Model Assumptions Assumptions About the Error Term • The error is a random variable with mean of zero. • The variance of , denoted by 2, is the same for all values of the independent variable. • The values of are independent. • The error is a normally distributed random variable. Slide 11 Testing for Significance To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of 1 is zero. Two tests are commonly used • t Test • F Test Both tests require an estimate of 2, the variance of in the regression model. Slide 12 Testing for Significance An Estimate of 2 The mean square error (MSE) provides the estimate of 2, and the notation s2 is also used. s2 = MSE = SSE/(n-2) where: SSE (yi yˆi ) 2 ( yi b0 b1 xi ) 2 Slide 13 Testing for Significance An Estimate of • To estimate we take the square root of 2. • The resulting s is called the standard error of the estimate. SSE s MSE n2 Slide 14 Testing for Significance: t Test Hypotheses H0 : 1 = 0 Ha : 1 = 0 Test Statistic Rejection Rule b1 t sb1 where sb1 s 2 ( x x ) i Reject H0 if t < -t or t > t where t is based on a t distribution with n - 2 degrees of freedom. Slide 15 Example: Reed Auto Sales t Test • Hypotheses • Rejection Rule H 0 : 1 = 0 H a : 1 = 0 For = .05 and d.f. = 4, t.025 = 2.776 Reject H0 if t > 2.776 • Test Statistics t = 5/1.0408 = 4.804 • Conclusions Reject H0 • P-value 2P{T>4.804}=0.0086 <0.05 Reject H0 Slide 16 Confidence Interval for 1 We can use a 95% confidence interval for 1 to test the hypotheses just used in the t test. H0 is rejected if the hypothesized value of 1 is not included in the confidence interval for 1. Slide 17 Confidence Interval for 1 The form of a confidence interval for 1 is: b1 t / 2 sb1 where b1 t / 2 sb1 t / 2 is the point estimate is the margin of error is the t value providing an area of /2 in the upper tail of a t distribution with n - 2 degrees of freedom Slide 18 Example: Reed Auto Sales Rejection Rule Reject H0 if 0 is not included in the confidence interval for 1. 95% Confidence Interval for 1 b1 t / 2 sb1 = 5 2.776(1.0408) = 5 2.89 or 2.11 to 7.89 Conclusion Reject H0 Slide 19 Testing for Significance: F Test Hypotheses H 0 : 1 = 0 H a : 1 = 0 Test Statistic F = MSR/MSE Rejection Rule Reject H0 if F > F where F is based on an F distribution with 1 d.f. in the numerator and n - 2 d.f. in the denominator. Slide 20 Example: Reed Auto Sales F Test • Hypotheses • Rejection Rule H 0 : 1 = 0 H a : 1 = 0 For = .05 and d.f. = 1, 4: F.05 = 7.709 Reject H0 if F > 7.709. • Test Statistic F = MSR/MSE = 100/4.333 = 23.077 • Conclusion We can reject H0. Slide 21 Some Cautions about the Interpretation of Significance Tests Rejecting H0: 1 = 0 and concluding that the relationship between x and y is significant does not enable us to conclude that a cause-and-effect relationship is present between x and y. Just because we are able to reject H0: 1 = 0 and demonstrate statistical significance does not enable us to conclude that there is a linear relationship between x and y. Slide 22 Using the Estimated Regression Equation for Estimation and Prediction Confidence Interval Estimate of E(yp) y p t /2 s y p Prediction Interval Estimate of yp yp + t/2 sind where the confidence coefficient is 1 - and t/2 is based on a t distribution with n - 2 d.f. s yˆ p is the standard error of the estimate of E(yp) sind is the standard error of individual ˆp estimate of y Slide 23 Standard Errors of Estimate of E(yp) and yp s yˆ p ( x0 x ) 2 1 S n ( xi x ) 2 sind ( x0 x ) 2 1 S 1 n ( xi x ) 2 Slide 24 E(yp) 與yp估計式的變異數 2 y 的變異數: b1 ( x0 x) 的變異數: S 的變異數: 2 E( yp) 0 1 x0 估計式的變異數: n 2 b1 ( x0 x) 2 2 ( x0 x) 2 2 ( x x ) i Var( yˆ ) Var[ y b1 ( x0 x)] 2 where 2s (x0 ( xsx )x )2 n b1 2 2 ( x x ) i i yp 0 1 x0 估計式的變異數: Var ( yˆ ) 2 Slide 25 Example: Reed Auto Sales Point Estimation If 3 TV ads are run prior to a sale, we expect the mean number of cars sold to be: y^ = 10.333 + 5(3) = 25.333 cars Confidence Interval for E(yp) 95% confidence interval estimate of the mean number of cars sold when 3 TV ads are run is: 25.333 + 3.730 = 21.603 to 29.063 cars Prediction Interval for yp 95% prediction interval estimate of the number of cars sold in one particular week when 3 TV ads are run is: 25.333 + 6.878 = 18.455 to 32.211 cars Slide 26