Document 14250119

Matakuliah Tahun : D0722 - Statistika dan Aplikasinya : 2010 Regresi dan Korelasi Pertemuan 10 Learning Outcomes • Pada akhir pertemuan ini, diharapkan mahasiswa akan mampu : 1. menghubungkan dua variabel dalam analisis regresi dan korelasi linier sederhana 2. dapat menunjukkan hubungan antara variabel berdasarkan hasil uji hipotesis 3 COMPLETE 1-4 BUSINESS STATISTICS 5th edi tion The Simple Linear Regression Model The population simple linear regression model: Y= 0 + 1 X Nonrandom or Systematic Component +  Random Component where  Y is the dependent variable, the variable we wish to explain or predict  X is the independent variable, also called the predictor variable   is the error term, the only random component in the model, and thus, the only source of randomness in Y.  0 is the intercept of the systematic component of the regression relationship.  1 is the slope of the systematic component. The conditional mean of Y: E[Y X ]   0   1 X McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-5 BUSINESS STATISTICS Picturing the Simple Linear Regression Model Y Regression Plot E[Y]=0 + 1 X Yi } { Error: i } 1 = Slope The simple linear regression model gives an exact linear relationship between the expected or average value of Y, the dependent variable, and X, the independent or predictor variable: E[Yi]=0 + 1 Xi 1 Actual observed values of Y differ from the expected value by an unexplained or random error: 0 = Intercept Xi McGraw-Hill/Irwin X Aczel/Sounderpandian Yi = E[Yi] + i = 0 + 1 Xi + i © The McGraw-Hill Companies, Inc., 2002 COMPLETE 1-6 BUSINESS STATISTICS 5th edi tion 10-3 Estimation: The Method of Least Squares Estimation of a simple linear regression relationship involves finding estimated or predicted values of the intercept and slope of the linear regression line. The estimated regression equation: Y = b0 + b1X + e where b0 estimates the intercept of the population regression line, 0 ; b1 estimates the slope of the population regression line, 1; and e stands for the observed errors - the residuals from fitting the estimated regression line b0 + b1X to a set of n points. The estimated regression line: Y  b0 + b1 X  (Y - hat) is the value of Y lying on the fitted regression line for a given where Y value of X. McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-7 BUSINESS STATISTICS Fitting a Regression Line Y Y Data X Three errors from the least squares regression line X Y Three errors from a fitted line X McGraw-Hill/Irwin Aczel/Sounderpandian Errors from the least squares regression line are minimized X © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-8 BUSINESS STATISTICS Errors in Regression Y the observeddata point . Yi { Error ei  Yi  Yi Yi Y  b0  b1 X the fitted regression line Yi the predicted value of Y for X i X Xi McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-9 BUSINESS STATISTICS Least Squares Regression The sum of squared errors in regression is: n SSE = n e  2 (y  y  )  i i i=1 i=1 2 i The least squares regression line is that which minimizes the SSE with respect to the estimates b 0 and b 1 . The normal equations: n y SSE b0 n i  nb0  b1  x i i=1 i=1 n x y i i i=1 McGraw-Hill/Irwin n n i=1 i=1 Least squares b0 b0  x i  b1  x 2i Least squares b1 Aczel/Sounderpandian At this point SSE is minimized with respect to b0 and b1 b1 © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-10 BUSINESS STATISTICS Sums of Squares, Cross Products, and Least Squares Estimators Sums of Squares and Cross Products: SSx   (x  x )   x 2 2 x    2 n 2 y   2 2 SS y   ( y  y )   y  n SSxy   (x  x )( y  y )   x  ( y )   xy  n Least  squares regression estimators: SS XY b1  SS X b0  y  b1 x McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-11 BUSINESS STATISTICS Error Variance and the Standard Errors of Regression Estimators Y Degrees of Freedom in Regression: df = (n - 2) (n total observations less one degree of freedom for each parameter estimated (b 0 and b1 ) ) 2 ) SS ( 2 XY SSE =  ( Y - Y )  SSY  SS X = SSY  b1SS XY Square and sum all regression errors to find SSE. Example 10 - 1: 2 2 An unbiased estimator of s , denoted by S : SSE SSE = SS Y  b1 SS XY  66855898  (1.255333776 )( 51402852 .4 )  2328161.2 MSE  SSE n2  101224 .4 MSE = (n - 2) s  McGraw-Hill/Irwin X Aczel/Sounderpandian MSE   2328161.2 23 101224 .4  318.158 © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 1-12 5th edi tion Standard Errors of Estimates in Regression The standard error of b0 (intercept): s(b0 )  where s = s x 2 nSS X MSE The standard error of b1 (slope): s(b1 )  McGraw-Hill/Irwin s SS X Aczel/Sounderpandian Example 10 - 1: 2 s x s(b0 )  nSS X 318.158 293426944  ( 25)( 4097557.84 )  170.338 s s(b1 )  SS X 318.158  40947557.84  0.04972 © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 5th edi tion 1-13 Confidence Intervals for the Regression Parameters A (1 -  ) 100% confidence interval for b : 0 b  t  s (b ) 0  ,(n 2 ) 0 2  A (1 -  ) 100% confidence interval for b : 1 b  t  s (b ) 1  ,(n 2 ) 1 2  Least-squares point estimate: b1=1.25533 McGraw-Hill/Irwin 0 b t s (b ) 0  0.025,( 25 2 ) 0 = 274.85  ( 2.069) (170.338)  274.85  352.43  [ 77.58, 627.28] b1  t  0.025,( 25 2 ) s (b1 ) = 1.25533  ( 2.069) ( 0.04972 )  1.25533  010287 .  [115246 . ,1.35820] Height = Slope Length = 1 Example 10 - 1 95% Confidence Intervals: (not a possible value of the regression slope at 95%) Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 5th edi tion 1-14 Correlation The correlation between two random variables, X and Y, is a measure of the degree of linear association between the two variables. The population correlation, denoted by, can take on any value from -1 to 1.    -1 <  < 0  0<<1    indicates a perfect negative linear relationship indicates a negative linear relationship indicates no linear relationship indicates a positive linear relationship indicates a perfect positive linear relationship The absolute value of  indicates the strength or exactness of the relationship. McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-15 BUSINESS STATISTICS Illustrations of Correlation Y  = -1 Y  = -.8 Y X McGraw-Hill/Irwin Y X X Y =0 =0 X Aczel/Sounderpandian =1 X Y  = .8 X © The McGraw-Hill Companies, Inc., 2002 COMPLETE BUSINESS STATISTICS 5th edi tion 1-16 Covariance and Correlation The covariance of two random variables X and Y: Cov ( X , Y )  E [( X   )(Y   )] X Y where  and  Y are the population means of X and Y respectively. X The population correlation coefficient: Cov ( X , Y ) =   X Y The sample correlation coefficient * : SS XY r= SS SS X Y *Note: Example 10 - 1: SS XY r= SS SS X Y 51402852.4  ( 40947557.84)( 66855898) 51402852.4  .9824 52321943.29 If  < 0, b1 < 0 If  = 0, b1 = 0 If  > 0, b1 >0 McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-17 BUSINESS STATISTICS Hypothesis Tests for the Correlation Coefficient H0:  = 0 H1:   0 (No linear relationship) (Some linear relationship) Test Statistic: t( n 2 )  r 1 r2 n2 Example 10 -1: r t( n 2 )  1 r2 n2 0.9824 = 1 - 0.9651 25 - 2 0.9824 =  25.25 0.0389 t0. 005  2.807  25.25 H 0 rejected at 1% level McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-18 BUSINESS STATISTICS Hypothesis Tests about the Regression Relationship Constant Y Unsystematic Variation Y Y X Nonlinear Relationship Y X X A hypothesis test for the existence of a linear relationship between X and Y: H0: 1  0 H1:  1  0 Test statistic for the existence of a linear relationship between X and Y: b 1  t (n - 2) s(b ) 1 where b is the least - squares estimate of the regression slope and s ( b ) is the standard error of b . 1 1 1 When the null hypothesis is true, the statistic has a t distribution with n - 2 degrees of freedom. McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-19 BUSINESS STATISTICS Hypothesis Tests for the Regression Slope Example 10 - 1: H0: 1  0 H1:  1  0  t b 1 s(b ) 1 1.25533 (n - 2) =  25.25 Example10 - 4 : H :  1 0 1 H :  1 1 1 b 1 t  1 ( n - 2) s (b ) 1 1.24 - 1 =  1.14 0.21 0.04972  2.807  25.25 t ( 0 . 005 , 23 ) H 0 is rejected at the 1% level and we may conclude that there is a relationship between charges and miles traveled. McGraw-Hill/Irwin  1.671  1.14 (0.05,58) H is not rejected at the10% level. 0 We may not conclude that the beta coefficien t is different from 1. Aczel/Sounderpandian t © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-20 BUSINESS STATISTICS How Good is the Regression? The coefficient of determination, r2, is a descriptive measure of the strength of the regression relationship, a measure of how well the regression line fits the data. ( y  y )  ( y  y)  ( y  y ) Total = Unexplained Explained Deviation Deviation Deviation (Error) (Regression) Y . Y Unexplained Deviation Y Explained Deviation Y } { 2  ( y  y )   ( y  y)   ( y  y ) SST = SSE + SSR Total Deviation { 2 r  X X McGraw-Hill/Irwin 2 Aczel/Sounderpandian SSR SST  1 SSE SST 2 Percentage of total variation explained by the regression. © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-21 BUSINESS STATISTICS The Coefficient of Determination Y Y Y X X SST r2=0 SSE r2=0.50 SST SSE SSR X r2=0.90 S S E SST SSR 7000 Example 10 -1: SSR 64527736.8 r    0.96518 SST 66855898 2 Dollars 6000 5000 4000 3000 2000 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 Miles McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 5th edi tion 1-22 BUSINESS STATISTICS Analysis of Variance and an F Test of the Regression Model Source of Variation Sum of Squares Degrees of Freedom Mean Square F Ratio Regression SSR (1) MSR Error SSE (n-2) MSE Total SST (n-1) MST MSR MSE Example 10-1 Source of Variation Sum of Squares Degrees of Freedom Regression 64527736.8 1 Mean Square 64527736.8 637.47 101224.4 Error 2328161.2 23 Total 66855898.0 24 McGraw-Hill/Irwin F Ratio p Value Aczel/Sounderpandian 0.000 © The McGraw-Hill Companies, Inc., 2002 COMPLETE 1-23 BUSINESS STATISTICS 5th edi tion Use of the Regression Model for Prediction • Point Prediction A single-valued estimate of Y for a given value of X obtained by inserting the value of X in the estimated regression equation. • Prediction Interval For a value of Y given a value of X • • Variation in regression line estimate Variation of points around regression line For an average value of Y given a value of X • McGraw-Hill/Irwin Variation in regression line estimate Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 1-24 BUSINESS STATISTICS 5th edi tion Prediction Interval for a Value of Y A (1 -  ) 100% prediction interval for Y : 1 (x  x) yˆ  t  s 1   n SS 2  2 X Example10 - 1 (X = 4,000) : 1 (4,000  3,177.92) {274.85  (1.2553)(4,000)}  2.069  318.16 1   25 40,947,557.84 2  5296.05  676.62  [4619.43, 5972.67] McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 COMPLETE 1-25 BUSINESS STATISTICS 5th edi tion Prediction Interval for the Average Value of Y A (1 -  ) 100% prediction interval for the E[Y X] : 1 (x  x)  yˆ  t  s SS n 2  2 X Example10 - 1 (X = 4,000) : 1 (4,000  3,177.92)  {274.85  (1.2553)(4,000)}  2.069  318.16 40,947,557.84 25 2  5,296.05  156.48  [5139.57, 5452.53] McGraw-Hill/Irwin Aczel/Sounderpandian © The McGraw-Hill Companies, Inc., 2002 RINGKASAN Regresi : - Bentuk hubungan anatara variabel bebas dengan variabel tak bebas Korelasi: -Keeratan dan arah hubungan antara dua variabel -Uji hipotesis parameter regresi -Uji hipotesis korelasi 26

Document 14250119

Related documents

Products

Support

Document 14250119

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib