Matakuliah Tahun Versi : A0064 / Statistik Ekonomi : 2005 : 1/1 Pertemuan 22 Regresi dan Korelasi Linier Sederhana-2 1 Learning Outcomes Pada akhir pertemuan ini, diharapkan mahasiswa akan mampu : • Menyimpulkan hasil perhitungan model regresi linier sederhana dengan peramalan/pengambilan keputusan 2 Outline Materi • Uji Hipotesis tentang Hubungan Regresi • Koefisien Determinasi • Menggunakan Model Regresi untuk Peramalan 3 COMPLETE BUSINESS STATISTICS 10-4 5th edi tion Covariance and Correlation The covariance of two random variables X and Y: Cov ( X , Y ) E [( X )(Y )] X Y where and Y are the population means of X and Y respectively. X The population correlation coefficient: Cov ( X , Y ) = X Y The sample correlation coefficient * : SS XY r= SS SS X Y *Note: Example 10 - 1: SS XY r= SS SS X Y 51402852.4 ( 40947557.84)( 66855898) 51402852.4 .9824 52321943.29 If < 0, b1 < 0 If = 0, b1 = 0 If > 0, b1 >0 McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 10-5 BUSINESS STATISTICS 5th edi tion Hypothesis Tests for the Correlation Coefficient H0: = 0 H1: 0 (No linear relationship) (Some linear relationship) Test Statistic: t( n 2 ) r 1 r2 n2 Example 10 -1: r t( n 2 ) 1 r2 n2 0.9824 = 1 - 0.9651 25 - 2 0.9824 = 25.25 0.0389 t0. 005 2.807 25.25 H 0 rejected at 1% level McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 10-6 BUSINESS STATISTICS 5th edi tion 10-6 Hypothesis Tests about the Regression Relationship Constant Y Unsystematic Variation Y Y X Nonlinear Relationship Y X X A hypothesis test for the existence of a linear relationship between X and Y: H0: b1 0 H1: b 1 0 Test statistic for the existence of a linear relationship between X and Y: b 1 t (n - 2) s(b ) 1 where b is the least - squares estimate of the regression slope and s ( b ) is the standard error of b . 1 1 1 When the null hypothesis is true, the statistic has a t distribution with n - 2 degrees of freedom. McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 10-7 BUSINESS STATISTICS 5th edi tion Hypothesis Tests for the Regression Slope Example10 - 4 : H : b 1 0 1 H : b 1 1 1 b 1 t 1 ( n - 2) s (b ) 1 1.24 - 1 = 1.14 0.21 Example 10 - 1: H0: b1 0 H1: b 1 0 t b 1 s(b ) 1 1.25533 (n - 2) = 25.25 0.04972 2.807 25.25 t ( 0 . 005 , 23 ) H 0 is rejected at the 1% level and we may conclude that there is a relationship between charges and miles traveled. McGraw-Hill/Irwin 1.671 1.14 (0.05,58) H is not rejected at the10% level. 0 We may not conclude that the beta coefficien t is different from 1. t Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 10-8 BUSINESS STATISTICS 5th edi tion 10-7 How Good is the Regression? The coefficient of determination, r2, is a descriptive measure of the strength of the regression relationship, a measure of how well the regression line fits the data. ( y y ) ( y y) ( y y ) Total = Unexplained Explained Deviation Deviation Deviation (Error) (Regression) Y . Y Unexplained Deviation Y Explained Deviation Y } { 2 ( y y ) ( y y) ( y y ) SST = SSE + SSR Total Deviation { 2 r X SSR SST X McGraw-Hill/Irwin 2 Aczel/Sounderpandian 1 SSE SST 2 Percentage of total variation explained by the regression. © The McGraw-Hill Companies, Inc., 2002 COMPLETE 10-9 BUSINESS STATISTICS 5th edi tion The Coefficient of Determination Y Y Y X X SST r2=0 SSE r2=0.50 SST SSE SSR X r2=0.90 S S E SST SSR 7000 Example 10 -1: SSR 64527736.8 r 0.96518 SST 66855898 2 Dollars 6000 5000 4000 3000 2000 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 Miles McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 10-10 BUSINESS STATISTICS 5th edi tion 10-8 Analysis of Variance and an F Test of the Regression Model Source of Variation Sum of Squares Regression SSR Degrees of Freedom Mean Square F Ratio (1) MSR Error SSE (n-2) MSE Total SST (n-1) MST MSR MSE Example 10-1 Source of Variation Sum of Squares Regression 64527736.8 Degrees of Freedom 1 Mean Square 64527736.8 637.47 101224.4 Error 2328161.2 23 Total 66855898.0 24 McGraw-Hill/Irwin F Ratio p Value Aczel/Sounderpandian 0.000 © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 10-11 5th edi tion Template (partial output) that displays Analysis of Variance and an F Test of the Regression Model McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 10-12 BUSINESS STATISTICS 5th edi tion 10-9 Residual Analysis and Checking for Model Inadequacies Residuals Residuals 0 0 x or y x or y Homoscedasticity: Residuals appear completely random. No indication of model inadequacy. Residuals Heteroscedasticity: Variance of residuals changes when x changes. Residuals 0 0 x or y Time Residuals exhibit a linear trend with time. McGraw-Hill/Irwin Curved pattern in residuals resulting from underlying nonlinear relationship. Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 10-13 5th edi tion Normal Probability Plot of the Residuals Flatter than Normal McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 10-14 5th edi tion Normal Probability Plot of the Residuals More Peaked than Normal McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 10-15 5th edi tion Normal Probability Plot of the Residuals More Positively Skewed than Normal McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 10-16 5th edi tion Normal Probability Plot of the Residuals More Negatively Skewed than Normal McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 10-17 BUSINESS STATISTICS 5th edi tion 10-10 Use of the Regression Model for Prediction • Point Prediction A single-valued estimate of Y for a given value of X obtained by inserting the value of X in the estimated regression equation. • Prediction Interval For a value of Y given a value of X • • Variation in regression line estimate Variation of points around regression line For an average value of Y given a value of X • McGraw-Hill/Irwin Variation in regression line estimate Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 10-18 BUSINESS STATISTICS 5th edi tion Errors in Predicting E[Y|X] Y Y Upper limit on slope Upper limit on intercept Regression line Lower limit on slope Y X Y Lower limit on intercept X X 1) Uncertainty about the slope of the regression line McGraw-Hill/Irwin Regression line X 2) Uncertainty about the intercept of the regression line Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 10-19 BUSINESS STATISTICS 5th edi tion Prediction Interval for E[Y|X] Y • Prediction band for E[Y|X] Regression line • Y X • X Prediction Interval for E[Y|X] McGraw-Hill/Irwin The prediction band for E[Y|X] is narrowest at the mean value of X. The prediction band widens as the distance from the mean of X increases. Predictions become very unreliable when we extrapolate beyond the range of the sample itself. Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 10-20 BUSINESS STATISTICS 5th edi tion Additional Error in Predicting Individual Value of Y Y Regression line Y Prediction band for E[Y|X] Regression line Y Prediction band for Y X X 3) Variation around the regression line McGraw-Hill/Irwin X Prediction Interval for E[Y|X] Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 10-21 BUSINESS STATISTICS 5th edi tion Prediction Interval for a Value of Y A (1 - ) 100% prediction interval for Y : 1 (x x) yˆ t s 1 n SS 2 2 X Example10 - 1 (X = 4,000) : 1 (4,000 3,177.92) {274.85 (1.2553)(4,000)} 2.069 318.16 1 25 40,947,557.84 2 5296.05 676.62 [4619.43, 5972.67] McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 10-22 BUSINESS STATISTICS 5th edi tion Prediction Interval for the Average Value of Y A (1 - ) 100% prediction interval for the E[Y X] : 1 (x x) yˆ t s SS n 2 2 X Example10 - 1 (X = 4,000) : 1 (4,000 3,177.92) {274.85 (1.2553)(4,000)} 2.069 318.16 40,947,557.84 25 2 5,296.05 156.48 [5139.57, 5452.53] McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 10-23 5th edi tion Template Output with Prediction Intervals McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 10-24 5th edi tion 10-11 The Solver Method for Regression The solver macro available in EXCEL can also be used to conduct a simple linear regression. See the text for instructions. McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 Penutup • Regresi dan Korelasi linier Sederhana pada hakekatnya merupakan suatu pendekatan/model untuk mencari hubungan sebab akibat (secara linier) antara dua variabel, yaitu variabel bebas (variabel pengaruh) dan variabel tak bebas (variabel terpengaruh) yang selanjutnya dapat digunakan untuk peramalan atau prakiraan 25