Systems Engineering Program Department of Engineering Management, Information and Systems EMIS 7370/5370 STAT 5340 : PROBABILITY AND STATISTICS FOR SCIENTISTS AND ENGINEERS Correlation and Regression Analysis – An Application Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering 1 Montgomery, Peck, and Vining (2001) present data concerning the performance of the 28 National Football league teams in 1976. It is suspected that the number of games won(y) is related to the number of yards gained rushing by an opponent(x). The data are shown in the following table: 2 Games Won (y) Yards Rushing by Opponent (x) Games Won (y) Yards Rushing by Opponent (x) Washington 10 2205 Detroit 6 1901 Minnesota 11 2096 Green Bay 5 2288 New England 11 1847 Houston 5 2072 Oakland 13 1903 Kansas City 5 2861 Pittsburgh 10 1457 Miami 6 2411 Baltimore 11 1848 New Orleans 4 2289 Los Angeles 10 1564 New york Giants 3 2203 Dallas 11 1821 New York Jets 3 2592 Atlanta 4 2577 Philadelphia 4 2053 Buffalo 2 2476 St. Louis 10 1979 Chicago 7 1984 San Diego 6 2048 Cincinnati 10 1917 San Francisco 8 1786 Cleveland 9 1761 Seattle 2 2876 Denver 9 1709 Tampa Bay 0 2560 Team Team 3 Correlation Analysis • Statistical analysis used to obtain a quantitative measure of the strength of the relationship between a dependent variable and one or more independent variables 4 Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08 Scatter Plot 5 Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08 Sample correlation coefficient n n n n x i y i x i y i i 1 i 1 i 1 ρˆ r 1 2 2 2 n n n n 2 2 n x i x i n y i y i i 1 i 1 i 1 i 1 Notes: -1 r 1 R=r2 100% = coefficient of determination 6 Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08 r 28 * 386,127 59,084 *195 28 *128,284,292 59,084 * 28 *1,685 195 1 2 2 r 0.738 R=r2 100% =0.5447 7 Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08 2 Correlation To test for no linear association between x & y, calculate t r n2 1 r 2 Where r is the sample correlation coefficient and n is the sample size. t r n2 1 r 2 0.738 * 28 2 1 ( 0.738) 5.5766 2 8 Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08 Correlation Conclude no linear association if - tα 2 ,n 2 t tα ,n 2 2 then treat y1, y2, …, yn as a random sample 9 Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08 Correlation Take α=0.05 and check from the T-table, we get -tα ,n 2 t 0 .025 , 26 2 . 0555 2 Since t=-5.5766 < -2.0555, we conclude that there is linear association between x and y and proceed with regression analysis 10 Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08 Linear Regression Model Simple linear regression model Y 0 1X where Y is the response (or dependent) variable 0 and 1 are the unknown parameters ~ N(0,) and data: (x1, y1), (x2, y2), ..., (xn, yn) 11 Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08 Least squares estimates of 0 and 1 ^ b1 1 n n n i 1 i 1 i 1 n xi yi xi yi n x xi i 1 i 1 n n 2 2 i n 1 n b 0 β 0 y i b1 x i n i 1 i 1 ^ 12 Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08 estimates of 1 ^ b1 β1 n n n i 1 i 1 i 1 n x i yi x i yi n x xi i 1 i 1 n n 2 2 i b1 28 * 386,127 59,084 *195 28 *128,284,292 59,084 2 b1 0.00703 13 Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08 estimates of 0 n 1 n b 0 y i b1 x i n i 1 i 1 b0 1 28 195 (0.00703) * 59,084 b0 21.7883 14 Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08 Least squares regression equation Point estimate of the linear model Y β 0 β1x ε is ˆ 21.78825 0.00703x Y 15 Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08 Regression Fitted Line Plot 16 Stracener_EMIS 7370/STAT 5340_Fall 08_11.18.08 Point estimate of 2 ˆ σ 2 S 2 y Y i i n 2 i 1 1 n y y i n 2 i 1 1 n 2 ^ 2 b1 n n n n X i y i X i y i n i 1 i 1 i 1 2 n y i 1 n b1 n n n 2 i 1 yi n X i y i X i y i n 2 i 1 n n i 1 i 1 i 1 5.726 17 Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08 Interval Estimates for y intercept (0) (1 - )100% confidence interval for 0 is β 0L , β 0U where β 0L b 0 t α ,n 2 Sb 0 2 and β 0U b 0 t α ,n 2 Sb 0 2 2 Xi i 0 S 2 n n n X i2 X i i 0 i 0 1/ 2 n where Sb 0 18 Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08 Interval Estimates for y intercept (0) Take =0.05, then 95% confidence interval for 0 is S b0 S n n i 0 n 2 Xi i0 n 2 Xi i0 2 Xi 1/ 2 128 , 284 , 292 2 . 3929 * 2 28 * 128 , 284 , 292 59 , 084 1/ 2 2 . 696 19 Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08 Interval Estimates for y intercept (0) Apply S b to the equation and we get the lower and upper bound for β 0 : 0 β 0L b 0 t α ,n 2 Sb 0 21.7883 2.056 * 2.696 16.246 2 β 0U b 0 t α ,n 2 Sb 0 21.7883 2.056 * 2.696 27.33 2 20 Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08 Interval Estimates for slope (1) (1 - )100% confidence interval for 1 is β 1L , β1U where β1L b1 t α ,n 2 Sb1 2 and β1U b1 t α ,n 2 Sb1 2 where Sb1 S 1 2 2 n Xi n 2 i 0 Xi i 0 n 21 Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08 Interval Estimates for slope (1) S Sb1 1 n Xi n 2 i 0 X i n i 0 2 2 2.3929 2 59,084 128,284,292 28 1/ 2 0.00126 β1L b1 t α ,n 2 Sb1 0.00703 2.056 * 0.00126 0.00961 2 β1U b1 t α ,n 2 Sb1 0.00703 2.056 * 0.00126 0.00444 2 22 Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08 Confidence interval for conditional mean of Y, given x=2205 Given x equal to 2205, we can calculate the confidence interval of conditional mean of Y 1 2 2 ^ ^ 1 n xx L ( x) Y ( x) t 2 n n n ,n 2 2 2 n xi xi i 1 i 1 1 28 * 3,608,611 1 2 L ( x) 6.298 2.056 * 2.3929 * 2 28 28 * 128 , 284 , 292 59084 L ( x) 1.291 Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08 23 Confidence interval for conditional mean of Y, given x=2205 and 1 2 2 ^ ^ 1 n xx U ( x) Y ( x) t 2 n n n ,n 2 2 2 n xi xi i 1 i 1 1 28 * 3,608,611 1 2 U ( x) 6.298 2.056 * 2.3929 * 2 28 28 * 128284292 59084 U ( x) 11.305 24 Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08 25 Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08 Prediction interval for a single future value of Y, given x 1 2 2 ^ ^ 1 n xx YL ( x ) Y ( x ) t 1 2 n n ,n 2 n 2 2 n x i x i i 1 i 1 and 1 2 2 ^ ^ 1 n xx YU ( x ) Y ( x ) t 1 2 n n ,n 2 n 2 2 n xi xi i 1 i 1 26 Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08 Prediction interval for a single future value of Y, given x=2000 Given x= 2000, ^ Y ( 2000 ) 21 . 7883 0 . 00703 * 2000 7 . 738 1 2 2 ^ ^ 1 n xx YL ( x ) Y ( x ) t 1 2 n n ,n 2 n 2 2 n xi xi i 1 i 1 1 1 28 * 3,608,611 2 YL ( x ) 7.738 2.056 * 2.3929 * 1 2 28 28 *128,284,292 59,084 YL ( x ) 0.7186 27 Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08 Prediction interval for a single future value of Y, given x=2000 and 1 2 2 ^ ^ 1 n xx YU ( x ) Y ( x ) t 1 2 ,n 2 n n 2 n 2 n xi xi i 1 i 1 1 1 28 * 3,608,611 2 YU ( x ) 7.738 2.056 * 2.3929 * 1 2 28 28 *128,284,292 59084 YU ( x ) 14.757 28 Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08 29 Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08 Excel Calculation X Y 2205 10 22050 4862025 100 6.297905 13.70551 8997.878 2096 11 23056 4393216 121 7.063641 15.49492 200.0204 1847 11 20317 3411409 121 8.812891 4.783447 69244.16 1903 13 24739 3621409 169 8.419485 20.98112 42908.16 1457 10 14570 2122849 100 11.55268 2.410815 426595.6 1848 11 20328 3415104 121 8.805866 4.814226 68718.88 1564 10 15640 2446096 100 10.80099 0.641591 298272 1821 11 20031 3316041 121 8.995543 4.017847 83603.59 2577 4 10308 6640929 16 3.684567 0.099498 217955.6 2476 2 4952 6130576 4 4.394103 5.731727 133851.4 1984 7 13888 3936256 49 7.850452 0.723268 15912.02 1917 10 19170 3674889 100 8.321134 2.818592 37304.16 1761 9 15849 3101121 81 9.417049 0.17393 121900.7 1709 9 15381 2920681 81 9.782355 0.612079 160915.6 1901 6 11406 3613801 36 8.433535 5.922094 43740.73 2288 5 11440 5234944 25 5.714821 0.51097 31633.16 2072 5 10360 4293184 25 7.232243 4.982909 1454.878 2861 5 14305 8185321 25 1.689439 10.95981 563786.4 2411 6 14466 5812921 36 4.850734 1.320812 90515.02 2289 4 9156 5239521 16 5.707796 2.916568 31989.88 2203 3 6609 4853209 9 6.311955 10.96905 8622.449 2592 3 7776 6718464 9 3.579191 0.335462 232186.3 2053 4 8212 4214809 16 7.36572 11.32807 3265.306 1979 10 19790 3916441 100 7.885577 4.470783 17198.45 2048 6 12288 4194304 36 7.400846 1.962368 3861.735 1786 8 14288 3189796 64 9.241422 1.541128 105068.6 2876 2 5752 8271376 4 1.584062 0.173004 586537.2 2560 0 0 6553600 0 3.803994 14.47037 202371.4 195 386127 128284292 1685 195 148.872 3608611 9155 961785.6 SUM 59084 x-bar 2110.1429 XY X^2 Y^2 Y^ -709824 101041120 (Y-Y^)^2 (x-xbar)^2 34.54949 -0.738027304 <-r Sb0 14.0723 2.696233 b1 -0.007025 5.725845085 <-S^2 b0l 16.2448 b0 21.788251 2.392873813 <--S b0u 27.33171 Y(2205)-> Y(2000)-> 6.2979048 7.7380503 mu-l 1.291074258 mu-u 11.30473529 y-l 0.718628866 y-u 14.7574718 Sb1 0.00126 0.00126 Sb1l -0.00961 -0.00961 Sb1u -0.00444 -0.00444 30 Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08 Excel Regression Analysis Output SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.738027 0.544684 0.527172 2.392874 28 ANOVA df 1 26 27 178.0923 148.872 326.9643 178.0923 5.725845 31.10324 Significance F 7.381E-06 Coefficients 21.78825 -0.00703 Standard Error 2.696233 0.00126 t Stat 8.080996 -5.57703 P-value 1.46E-08 7.38E-06 Lower 95% 16.246064 -0.009614 Predicted Y 6.297905 7.063641 8.812891 8.419485 11.55268 8.805866 10.80099 8.995543 3.684567 4.394103 7.850452 8.321134 9.417049 9.782355 8.433535 5.714821 7.232243 1.689439 4.850734 5.707796 6.311955 3.579191 7.36572 7.885577 7.400846 9.241422 1.584062 3.803994 Residuals 3.702095 3.936359 2.187109 4.580515 -1.55268 2.194134 -0.80099 2.004457 0.315433 -2.3941 -0.85045 1.678866 -0.41705 -0.78235 -2.43354 -0.71482 -2.23224 3.310561 1.149266 -1.7078 -3.31195 -0.57919 -3.36572 2.114423 -1.40085 -1.24142 0.415938 -3.80399 Regression Residual Total Intercept X Variable 1 SS MS F Upper 95% 27.3304377 -0.0044359 Lower 95.0% 16.2460641 -0.0096143 Upper 95.0% 27.33044 -0.00444 RESIDUAL OUTPUT Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Stracener_EMIS 7370/STAT 5340_Fall 08_11.17.08 31