Document 14250119

advertisement
Matakuliah
Tahun
: D0722 - Statistika dan Aplikasinya
: 2010
Regresi dan Korelasi
Pertemuan 10
Learning Outcomes
•
Pada akhir pertemuan ini, diharapkan
mahasiswa akan mampu :
1. menghubungkan dua variabel dalam
analisis regresi dan korelasi linier
sederhana
2. dapat menunjukkan hubungan antara
variabel berdasarkan hasil uji hipotesis
3
COMPLETE
1-4
BUSINESS STATISTICS
5th edi tion
The Simple Linear Regression Model
The population simple linear regression model:
Y= 0 + 1 X
Nonrandom or
Systematic
Component
+ 
Random
Component
where
 Y is the dependent variable, the variable we wish to explain or predict
 X is the independent variable, also called the predictor variable
  is the error term, the only random component in the model, and thus, the
only source of randomness in Y.
 0 is the intercept of the systematic component of the regression relationship.
 1 is the slope of the systematic component.
The conditional mean of Y: E[Y X ]   0   1 X
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-5
BUSINESS STATISTICS
Picturing the Simple Linear
Regression Model
Y
Regression Plot
E[Y]=0 + 1 X
Yi
}
{
Error: i
}
1 = Slope
The simple linear regression
model gives an exact linear
relationship between the
expected or average value of Y,
the dependent variable, and X,
the independent or predictor
variable:
E[Yi]=0 + 1 Xi
1
Actual observed values of Y
differ from the expected value by
an unexplained or random error:
0 = Intercept
Xi
McGraw-Hill/Irwin
X
Aczel/Sounderpandian
Yi = E[Yi] + i
= 0 + 1 Xi + i
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
1-6
BUSINESS STATISTICS
5th edi tion
10-3 Estimation: The Method of Least
Squares
Estimation of a simple linear regression relationship involves finding estimated
or predicted values of the intercept and slope of the linear regression line.
The estimated regression equation:
Y = b0 + b1X + e
where b0 estimates the intercept of the population regression line, 0 ;
b1 estimates the slope of the population regression line, 1;
and e stands for the observed errors - the residuals from fitting the estimated
regression line b0 + b1X to a set of n points.
The estimated regression line:
Y  b0 + b1 X
 (Y - hat) is the value of Y lying on the fitted regression line for a given
where Y
value of X.
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-7
BUSINESS STATISTICS
Fitting a Regression Line
Y
Y
Data
X
Three errors from the
least squares regression
line
X
Y
Three errors
from a fitted line
X
McGraw-Hill/Irwin
Aczel/Sounderpandian
Errors from the least
squares regression
line are minimized
X
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-8
BUSINESS STATISTICS
Errors in Regression
Y
the observeddata point
.
Yi
{
Error ei  Yi  Yi
Yi
Y  b0  b1 X
the fitted regression line
Yi the predicted value of Y for X
i
X
Xi
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-9
BUSINESS STATISTICS
Least Squares Regression
The sum of squared errors in regression is:
n
SSE =
n
e 
2
(y

y

)
 i i
i=1
i=1
2
i
The least squares regression line is that which minimizes the SSE
with respect to the estimates b 0 and b 1 .
The normal equations:
n
y
SSE
b0
n
i
 nb0  b1  x i
i=1
i=1
n
x y
i
i
i=1
McGraw-Hill/Irwin
n
n
i=1
i=1
Least squares b0
b0  x i  b1  x 2i
Least squares b1
Aczel/Sounderpandian
At this point
SSE is
minimized
with respect
to b0 and b1
b1
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-10
BUSINESS STATISTICS
Sums of Squares, Cross Products, and
Least Squares Estimators
Sums of Squares and Cross Products:
SSx   (x  x )   x
2
2
x



2
n 2
y


2
2
SS y   ( y  y )   y 
n
SSxy   (x  x )( y  y )  
x  ( y )


xy 
n
Least  squares regression estimators:
SS XY
b1 
SS X
b0  y  b1 x
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-11
BUSINESS STATISTICS
Error Variance and the Standard
Errors of Regression Estimators
Y
Degrees of Freedom in Regression:
df = (n - 2) (n total observations less one degree of freedom
for each parameter estimated (b 0 and b1 ) )
2
)
SS
(
2
XY
SSE =  ( Y - Y )  SSY 
SS X
= SSY  b1SS XY
Square and sum all
regression errors to find
SSE.
Example 10 - 1:
2
2
An unbiased estimator of s , denoted by S :
SSE
SSE = SS Y  b1 SS XY
 66855898  (1.255333776 )( 51402852 .4 )
 2328161.2
MSE 
SSE
n2
 101224 .4
MSE =
(n - 2)
s 
McGraw-Hill/Irwin
X
Aczel/Sounderpandian
MSE 

2328161.2
23
101224 .4  318.158
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
1-12
5th edi tion
Standard Errors of Estimates
in Regression
The standard error of b0 (intercept):
s(b0 ) 
where s =
s
x
2
nSS X
MSE
The standard error of b1 (slope):
s(b1 ) 
McGraw-Hill/Irwin
s
SS X
Aczel/Sounderpandian
Example 10 - 1:
2
s x
s(b0 ) 
nSS X
318.158 293426944

( 25)( 4097557.84 )
 170.338
s
s(b1 ) 
SS X
318.158

40947557.84
 0.04972
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
5th edi tion
1-13
Confidence Intervals for the Regression
Parameters
A (1 -  ) 100% confidence interval for b :
0
b  t 
s (b )
0  ,(n 2 ) 0
2

A (1 -  ) 100% confidence interval for b :
1
b  t 
s (b )
1  ,(n 2 ) 1
2

Least-squares point estimate:
b1=1.25533
McGraw-Hill/Irwin
0
b t
s (b )
0  0.025,( 25 2 ) 0
= 274.85  ( 2.069) (170.338)
 274.85  352.43
 [ 77.58, 627.28]
b1  t
 0.025,( 25 2 )
s (b1 )
= 1.25533  ( 2.069) ( 0.04972 )
 1.25533  010287
.
 [115246
.
,1.35820]
Height = Slope
Length = 1
Example 10 - 1
95% Confidence Intervals:
(not a possible value of the
regression slope at 95%)
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
5th edi tion
1-14
Correlation
The correlation between two random variables, X and Y, is a measure of the
degree of linear association between the two variables.
The population correlation, denoted by, can take on any value from -1 to 1.
  
-1 <  < 0

0<<1
  
indicates a perfect negative linear relationship
indicates a negative linear relationship
indicates no linear relationship
indicates a positive linear relationship
indicates a perfect positive linear relationship
The absolute value of  indicates the strength or exactness of the relationship.
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-15
BUSINESS STATISTICS
Illustrations of Correlation
Y
 = -1
Y
 = -.8
Y
X
McGraw-Hill/Irwin
Y
X
X
Y
=0
=0
X
Aczel/Sounderpandian
=1
X
Y
 = .8
X
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
5th edi tion
1-16
Covariance and Correlation
The covariance of two random variables X and Y:
Cov ( X , Y )  E [( X   )(Y   )]
X
Y
where  and  Y are the population means of X and Y respectively.
X
The population correlation coefficient:
Cov ( X , Y )
=
 
X Y
The sample correlation coefficient * :
SS
XY
r=
SS SS
X Y
*Note:
Example 10 - 1:
SS
XY
r=
SS SS
X Y
51402852.4

( 40947557.84)( 66855898)
51402852.4

.9824
52321943.29
If  < 0, b1 < 0 If  = 0, b1 = 0 If  > 0, b1 >0
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-17
BUSINESS STATISTICS
Hypothesis Tests for the
Correlation Coefficient
H0:  = 0
H1:   0
(No linear relationship)
(Some linear relationship)
Test Statistic: t( n 2 ) 
r
1 r2
n2
Example 10 -1:
r
t( n 2 ) 
1 r2
n2
0.9824
=
1 - 0.9651
25 - 2
0.9824
=
 25.25
0.0389
t0. 005  2.807  25.25
H 0 rejected at 1% level
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-18
BUSINESS STATISTICS
Hypothesis Tests about the Regression
Relationship
Constant Y
Unsystematic Variation
Y
Y
X
Nonlinear Relationship
Y
X
X
A hypothesis test for the existence of a linear relationship between X and Y:
H0: 1  0
H1:  1  0
Test statistic for the existence of a linear relationship between X and Y:
b
1

t
(n - 2)
s(b )
1
where b is the least - squares estimate of the regression slope and s ( b ) is the standard error of b .
1
1
1
When the null hypothesis is true, the statistic has a t distribution with n - 2 degrees of freedom.
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-19
BUSINESS STATISTICS
Hypothesis Tests for the Regression
Slope
Example 10 - 1:
H0: 1  0
H1:  1  0

t
b
1
s(b )
1
1.25533
(n - 2)
=
 25.25
Example10 - 4 :
H :  1
0 1
H :  1
1 1
b 1
t
 1
( n - 2) s (b )
1
1.24 - 1
=
 1.14
0.21
0.04972
 2.807  25.25
t
( 0 . 005 , 23 )
H 0 is rejected at the 1% level and we may
conclude that there is a relationship between
charges and miles traveled.
McGraw-Hill/Irwin
 1.671  1.14
(0.05,58)
H is not rejected at the10% level.
0
We may not conclude that the beta
coefficien t is different from 1.
Aczel/Sounderpandian
t
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-20
BUSINESS STATISTICS
How Good is the Regression?
The coefficient of determination, r2, is a descriptive measure of the strength of
the regression relationship, a measure of how well the regression line fits the data.
( y  y )  ( y  y)
 ( y  y )
Total = Unexplained
Explained
Deviation
Deviation
Deviation
(Error)
(Regression)
Y
.
Y
Unexplained Deviation
Y
Explained Deviation
Y
}
{
2
 ( y  y )   ( y  y)   ( y  y )
SST
= SSE
+ SSR
Total Deviation
{
2
r 
X
X
McGraw-Hill/Irwin
2
Aczel/Sounderpandian
SSR
SST
 1
SSE
SST
2
Percentage of
total variation
explained by the
regression.
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-21
BUSINESS STATISTICS
The Coefficient of Determination
Y
Y
Y
X
X
SST
r2=0
SSE
r2=0.50
SST
SSE SSR
X
r2=0.90
S
S
E
SST
SSR
7000
Example 10 -1:
SSR 64527736.8
r 

 0.96518
SST
66855898
2
Dollars
6000
5000
4000
3000
2000
1000 1500 2000 2500 3000 3500 4000 4500 5000 5500
Miles
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
5th edi tion
1-22
BUSINESS STATISTICS
Analysis of Variance and an F Test of
the Regression Model
Source of
Variation
Sum of
Squares
Degrees of
Freedom Mean Square F Ratio
Regression SSR
(1)
MSR
Error
SSE
(n-2)
MSE
Total
SST
(n-1)
MST
MSR
MSE
Example 10-1
Source of
Variation
Sum of
Squares
Degrees of
Freedom
Regression 64527736.8
1
Mean Square
64527736.8
637.47
101224.4
Error
2328161.2
23
Total
66855898.0
24
McGraw-Hill/Irwin
F Ratio p Value
Aczel/Sounderpandian
0.000
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
1-23
BUSINESS STATISTICS
5th edi tion
Use of the Regression Model for
Prediction
• Point Prediction
A single-valued estimate of Y for a given value of X
obtained by inserting the value of X in the estimated
regression equation.
• Prediction Interval
For a value of Y given a value of X
•
•
Variation in regression line estimate
Variation of points around regression line
For an average value of Y given a value of X
•
McGraw-Hill/Irwin
Variation in regression line estimate
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
1-24
BUSINESS STATISTICS
5th edi tion
Prediction Interval for a Value of Y
A (1 -  ) 100% prediction interval for Y :
1 (x  x)
yˆ  t  s 1  
n
SS
2

2
X
Example10 - 1 (X = 4,000) :
1 (4,000  3,177.92)
{274.85  (1.2553)(4,000)}  2.069  318.16 1  
25
40,947,557.84
2
 5296.05  676.62  [4619.43, 5972.67]
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
1-25
BUSINESS STATISTICS
5th edi tion
Prediction Interval for the Average
Value of Y
A (1 -  ) 100% prediction interval for the E[Y X] :
1 (x  x)

yˆ  t  s
SS
n
2

2
X
Example10 - 1 (X = 4,000) :
1 (4,000  3,177.92)

{274.85  (1.2553)(4,000)}  2.069  318.16
40,947,557.84
25
2
 5,296.05  156.48  [5139.57, 5452.53]
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
RINGKASAN
Regresi :
- Bentuk hubungan anatara variabel bebas
dengan variabel tak bebas
Korelasi:
-Keeratan dan arah hubungan antara dua
variabel
-Uji hipotesis parameter regresi
-Uji hipotesis korelasi
26
Download