Simple Linear Regression Continued

advertisement
PROBABILITY AND STATISTICS FOR SCIENTISTS AND
ENGINEERS
Correlation and
Regression Analysis –
An Application
Jerrell T.Stracener - Ph.D.
1
Montgomery, Peck, and Vining (2001) present data concerning
the performance of the 28 National Football league teams in
1976. It is suspected that the number of games won(y) is
related to the number of yards gained rushing by an
opponent(x). The data are shown in the following table:
Jerrell T.Stracener - Ph.D.
2
Games
Won (y)
Yards
Rushing by
Opponent (x)
Games
Won (y)
Yards
Rushing by
Opponent (x)
Washington
10
2205
Detroit
6
1901
Minnesota
11
2096
Green Bay
5
2288
New England
11
1847
Houston
5
2072
Oakland
13
1903
Kansas City
5
2861
Pittsburgh
10
1457
Miami
6
2411
Baltimore
11
1848
New Orleans
4
2289
Los Angeles
10
1564
New york Giants
3
2203
Dallas
11
1821
New York Jets
3
2592
Atlanta
4
2577
Philadelphia
4
2053
Buffalo
2
2476
St. Louis
10
1979
Chicago
7
1984
San Diego
6
2048
Cincinnati
10
1917
San Francisco
8
1786
Cleveland
9
1761
Seattle
2
2876
Denver
9
1709
Tampa Bay
0
2560
Team
Team
3
Jerrell T.Stracener - Ph.D.
3
Correlation Analysis
Statistical analysis used to obtain a quantitative measure of
the strength of the relationship between a dependent variable
and one or more independent variables.
Jerrell T.Stracener - Ph.D.
4
Scatter Plot
5
Jerrell T.Stracener - Ph.D.
5
Sample correlation coefficient
 n
  n  n 
n   x i y i     x i   y i 
 i 1
  i 1  i 1 
ρˆ  r 
2
2
n
n
n
 n








 n x 2   x   n y 2   y   



i
i
i
i
 

i 1
i 1
i 1
i 1










Notes: -1  r  1
Jerrell T.Stracener - Ph.D.
6
1
2
r
28 .386,127  59,084 195
28 128,284,292  59,084  28 1,685  195 
2
r  0.738
Jerrell T.Stracener - Ph.D.
7
2
1
2
Correlation
To test for no linear association between x & y, calculate
t
r n2
1 r2
where r is the sample correlation coefficient and n is the
sample size.
t
r n2
1 r2

 0.738  28  2
1  (0.738) 2
Jerrell T.Stracener - Ph.D.
8
 5.5766
Correlation
Conclude no linear association if
- tα
2
,n  2
 t  tα
2
,n 2
then treat y1, y2, …, yn as a random sample
Jerrell T.Stracener - Ph.D.
9
Correlation
Take α=0.05 from the T-table, we get
- tα
2
,n 2
 t0.025, 26  2.0555
Since t=-5.5766 < -2.0555, we conclude that there is a linear
association between x and y. therefore, proceed with
regression analysis
Jerrell T.Stracener - Ph.D.
10
Linear Regression Model
Simple linear regression model
Y  0  1X  
where
Y is the response (or dependent) variable
0 and 1 are the unknown parameters
 ~ N(0,) and data: (x1, y1), (x2, y2), ..., (xn, yn)
Jerrell T.Stracener - Ph.D.
11
Least squares estimates of 0 and 1
^
b1   1 
n
n
n
i 1
i 1
i 1
n xi yi   xi  yi


n x    xi 
i 1
 i 1 
n
n
2
2
i
n
1 n

b 0  β 0    y i  b1  x i 
n  i 1
i 1

^
Jerrell T.Stracener - Ph.D.
12
estimate of 1
^
b1  β1 
n
n
n
i 1
i 1
i 1
n  x i yi   x i  yi


n x    xi 
i 1
 i 1 
n
n
2
2
i
28  386,127  59,084 195
b1 
2
28 128,284,292  59,084
b1  0.00703
Jerrell T.Stracener - Ph.D.
13
estimate of 0
n
1 n

b 0    y i  b1  x i 
n  i 1
i 1

1
b0  195  (0.00703)  59,084 
28
b0  21.7883
Jerrell T.Stracener - Ph.D.
14
Least squares regression equation
Point estimate of the linear model
Y  β 0  β1x  ε
is
ˆ  21.78825  0.00703x
Y
Jerrell T.Stracener - Ph.D.
15
Regression Fitted Line Plot
Jerrell T.Stracener - Ph.D.
16
Point estimate of 2
1


ˆ S 
σ
y

Y
 i

i 
n  2 i 1 

n
2
^
2
2
2
1  n
b1  n
 n
 n  

y i  y  n  X i y i    X i   y i 

n  2  i 1
n  i 1
 i 1  i 1  


2
n






y



i
1  n 2  i 1  b1  n
 n
 n  

 n  X i y i    X i   y i  
  yi 
n  2  i 1
n
n  i 1
 i 1  i 1  




 5.726
Jerrell T.Stracener - Ph.D.
17
Interval Estimates for y intercept (0)
(1 - ).100% confidence interval for 0 is β 0L , β 0 U  where
and
β 0L  b 0  t α
2
β 0U  b 0  t α
2
,n 2
Sb 0
,n 2
Sb 0



2
  Xi 


 i 0


S b 0  S
2
  n
n
 

 n   X i2     X i  
  i 0
  i 0  
n
where
Jerrell T.Stracener - Ph.D.
18
1/ 2
Interval Estimates for y intercept (0)
Take =0.05, then 95% confidence interval for 0 is


 n
2
  Xi 


 i 0


S b 0  S
2
  n
n
 

 n   X i2     X i  
  i 0
  i 0  
1/ 2
128,284,292


 2.3929 
2
28

128
,
284
,
292

59
,
084


 2.696
Jerrell T.Stracener - Ph.D.
19
1/ 2
Interval Estimates for y intercept (0)
Apply Sb0 to the equation and we get the lower and upper
bound for β0 :
β 0L  b 0  t α
2
,n 2
β 0U  b 0  t α
2
Sb 0  21.7883  2.056  2.696  16.246
,n 2
Sb 0  21.7883  2.056  2.696  27.33
Jerrell T.Stracener - Ph.D.
20
Interval Estimates for slope (1)


(1 - ).100% confidence interval for 1 is β1L , β1U where
β1L  b1  t α
and
2
β1U  b1  t α
2
where
Sb1 
,n 2
,n 2
Sb1
Sb1
S

 n

  Xi 
 n

 X 2   i 0
i

n
i 0


Jerrell T.Stracener - Ph.D.
21
2






1
2
Interval Estimates for slope (1)
S
Sb1 

 n

  Xi 
 n

 X 2   i 0
i

n
i 0


 0.00126
β1L  b1  t α
2
,n 2
β1U  b1  t α
2
2






1
2

2.3929

59,084 2 
128,284,292 

28


1/ 2
Sb1  0.00703  2.056  0.00126  0.00961
,n 2
Sb1  0.00703  2.056  0.00126  0.00444
Jerrell T.Stracener - Ph.D.
22
Confidence interval for conditional mean of Y,
given x=2205
Given x equal to 2205, we can calculate the confidence
interval of conditional mean of Y




2
^ 1
^
n xx


 L ( x)  Y ( x) t   
2
n
n
n
,n 2

 

2

n  xi2     xi  

 i 1   i 1  

1
2

28  8997.878

1
 L ( x)  6.298  2.056  2.3929   
2
59084

292
,
284
,
128

28
28


 L ( x)  5.3254
Jerrell T.Stracener - Ph.D.
23
1
2
Confidence interval for conditional mean of Y,
given x=2205
and




2
^
^ 1
n xx


U ( x)  Y ( x) t   
2
n
n
n
,n  2


2
2 

n  xi     xi  

 i 1   i 1  

1
2

28  8997.878
1

U ( x)  6.298  2.056  2.3929   
2
28
28

128284292

59084


U ( x)  7.248
Jerrell T.Stracener - Ph.D.
24
1
2
Jerrell T.Stracener - Ph.D.
25
Prediction interval for a single future value of Y,
given x




2
^
^
1
n xx


YL ( x )  Y( x ) t 
 1 
2
 n
n
n
,n 2



2
2

n  x i     x i  

 i 1   i 1  
and
1

2


2
^
^
1
n xx


YU ( x)  Y ( x) t 
 1 
2
 n
n
n
,n  2



2
2

n  xi     xi  

 i 1   i 1  


Jerrell T.Stracener - Ph.D.
26


1
2
Prediction interval for a single future value of Y,
given x=2000
Given x= 2000,
^
Y (2000)  21.7883 0.00703* 2000 7.738




2
^
^
n xx
1


 1 
YL ( x)  Y ( x) t 
2
 n
n
n
,n  2

 

2

n  xi2     xi  

 i 1   i 1  

1
2

28  12130.82
1



YL ( x)  7.738  2.056  2.3929  1 
2
 28 28  128,284,292  59,084 
YL ( x)  2.723
Jerrell T.Stracener - Ph.D.
27
1
2
Prediction interval for a single future value of Y,
given x=2000
and




2
^
^
1
n xx

YU ( x)  Y ( x) t 
 1  
2
 n
,n  2
 n 2  n  
2

n  xi     xi  

 i 1   i 1  

1
2

1
28  12130.82


YU ( x)  7.738  2.056  2.3929  1 

2
 28 28  128,284,292  59084 
YU ( x)  12.75
Jerrell T.Stracener - Ph.D.
28
1
2
Prediction Interval
Jerrell T.Stracener - Ph.D.
29
Download