Pertemuan 22 Regresi dan Korelasi Linier Sederhana-2 Matakuliah

advertisement
Matakuliah
Tahun
Versi
: A0064 / Statistik Ekonomi
: 2005
: 1/1
Pertemuan 22
Regresi dan Korelasi Linier
Sederhana-2
1
Learning Outcomes
Pada akhir pertemuan ini, diharapkan mahasiswa
akan mampu :
• Menyimpulkan hasil perhitungan model
regresi linier sederhana dengan
peramalan/pengambilan keputusan
2
Outline Materi
• Uji Hipotesis tentang Hubungan Regresi
• Koefisien Determinasi
• Menggunakan Model Regresi untuk
Peramalan
3
COMPLETE
BUSINESS STATISTICS
10-4
5th edi tion
Covariance and Correlation
The covariance of two random variables X and Y:
Cov ( X , Y )  E [( X   )(Y   )]
X
Y
where  and  Y are the population means of X and Y respectively.
X
The population correlation coefficient:
Cov ( X , Y )
=
 
X Y
The sample correlation coefficient * :
SS
XY
r=
SS SS
X Y
*Note:
Example 10 - 1:
SS
XY
r=
SS SS
X Y
51402852.4

( 40947557.84)( 66855898)
51402852.4

.9824
52321943.29
If  < 0, b1 < 0 If  = 0, b1 = 0 If  > 0, b1 >0
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-5
BUSINESS STATISTICS
5th edi tion
Hypothesis Tests for the
Correlation Coefficient
H0:  = 0
H1:   0
(No linear relationship)
(Some linear relationship)
Test Statistic: t( n 2 ) 
r
1 r2
n2
Example 10 -1:
r
t( n 2 ) 
1 r2
n2
0.9824
=
1 - 0.9651
25 - 2
0.9824
=
 25.25
0.0389
t0. 005  2.807  25.25
H 0 rejected at 1% level
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-6
BUSINESS STATISTICS
5th edi tion
10-6 Hypothesis Tests about the
Regression Relationship
Constant Y
Unsystematic Variation
Y
Y
X
Nonlinear Relationship
Y
X
X
A hypothesis test for the existence of a linear relationship between X and Y:
H0: b1  0
H1: b 1  0
Test statistic for the existence of a linear relationship between X and Y:
b
1

t
(n - 2)
s(b )
1
where b is the least - squares estimate of the regression slope and s ( b ) is the standard error of b .
1
1
1
When the null hypothesis is true, the statistic has a t distribution with n - 2 degrees of freedom.
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-7
BUSINESS STATISTICS
5th edi tion
Hypothesis Tests for the Regression
Slope
Example10 - 4 :
H : b 1
0 1
H : b 1
1 1
b 1
t
 1
( n - 2) s (b )
1
1.24 - 1
=
 1.14
0.21
Example 10 - 1:
H0: b1  0
H1: b 1  0

t
b
1
s(b )
1
1.25533
(n - 2)
=
 25.25
0.04972
 2.807  25.25
t
( 0 . 005 , 23 )
H 0 is rejected at the 1% level and we may
conclude that there is a relationship between
charges and miles traveled.
McGraw-Hill/Irwin
 1.671  1.14
(0.05,58)
H is not rejected at the10% level.
0
We may not conclude that the beta
coefficien t is different from 1.
t
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-8
BUSINESS STATISTICS
5th edi tion
10-7 How Good is the Regression?
The coefficient of determination, r2, is a descriptive measure of the strength of
the regression relationship, a measure of how well the regression line fits the data.
( y  y )  ( y  y)
 ( y  y )
Total = Unexplained
Explained
Deviation
Deviation
Deviation
(Error)
(Regression)
Y
.
Y
Unexplained Deviation
Y
Explained Deviation
Y
}
{
2
 ( y  y )   ( y  y)   ( y  y )
SST
= SSE
+ SSR
Total Deviation
{
2
r 
X
SSR
SST
X
McGraw-Hill/Irwin
2
Aczel/Sounderpandian
 1
SSE
SST
2
Percentage of
total variation
explained by the
regression.
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-9
BUSINESS STATISTICS
5th edi tion
The Coefficient of Determination
Y
Y
Y
X
X
SST
r2=0
SSE
r2=0.50
SST
SSE SSR
X
r2=0.90
S
S
E
SST
SSR
7000
Example 10 -1:
SSR 64527736.8
r 

 0.96518
SST
66855898
2
Dollars
6000
5000
4000
3000
2000
1000 1500 2000 2500 3000 3500 4000 4500 5000 5500
Miles
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-10
BUSINESS STATISTICS
5th edi tion
10-8 Analysis of Variance and an F Test
of the Regression Model
Source of
Variation
Sum of
Squares
Regression SSR
Degrees of
Freedom Mean Square F Ratio
(1)
MSR
Error
SSE
(n-2)
MSE
Total
SST
(n-1)
MST
MSR
MSE
Example 10-1
Source of
Variation
Sum of
Squares
Regression 64527736.8
Degrees of
Freedom
1
Mean Square
64527736.8
637.47
101224.4
Error
2328161.2
23
Total
66855898.0
24
McGraw-Hill/Irwin
F Ratio p Value
Aczel/Sounderpandian
0.000
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
10-11
5th edi tion
Template (partial output) that displays
Analysis of Variance and an F Test of the
Regression Model
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-12
BUSINESS STATISTICS
5th edi tion
10-9 Residual Analysis and Checking
for Model Inadequacies
Residuals
Residuals
0
0
x or y
x or y
Homoscedasticity: Residuals appear completely
random. No indication of model inadequacy.
Residuals
Heteroscedasticity: Variance of residuals
changes when x changes.
Residuals
0
0
x or y
Time
Residuals exhibit a linear trend with time.
McGraw-Hill/Irwin
Curved pattern in residuals resulting from
underlying nonlinear relationship.
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
10-13
5th edi tion
Normal Probability Plot of the Residuals
Flatter than Normal
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
10-14
5th edi tion
Normal Probability Plot of the Residuals
More Peaked than Normal
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
10-15
5th edi tion
Normal Probability Plot of the Residuals
More Positively Skewed than Normal
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
10-16
5th edi tion
Normal Probability Plot of the Residuals
More Negatively Skewed than Normal
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-17
BUSINESS STATISTICS
5th edi tion
10-10 Use of the Regression Model for
Prediction
• Point Prediction
A single-valued estimate of Y for a given value of X
obtained by inserting the value of X in the estimated
regression equation.
• Prediction Interval
For a value of Y given a value of X
•
•
Variation in regression line estimate
Variation of points around regression line
For an average value of Y given a value of X
•
McGraw-Hill/Irwin
Variation in regression line estimate
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-18
BUSINESS STATISTICS
5th edi tion
Errors in Predicting E[Y|X]
Y
Y
Upper limit on slope
Upper limit on intercept
Regression line
Lower limit on slope
Y
X
Y
Lower limit on intercept
X
X
1) Uncertainty about the slope
of the regression line
McGraw-Hill/Irwin
Regression line
X
2) Uncertainty about the intercept
of the regression line
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-19
BUSINESS STATISTICS
5th edi tion
Prediction Interval for E[Y|X]
Y
•
Prediction band for E[Y|X]
Regression
line
•
Y
X
•
X
Prediction Interval for E[Y|X]
McGraw-Hill/Irwin
The prediction band for
E[Y|X] is narrowest at the
mean value of X.
The prediction band widens as
the distance from the mean of
X increases.
Predictions become very
unreliable when we
extrapolate beyond the range
of the sample itself.
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-20
BUSINESS STATISTICS
5th edi tion
Additional Error in Predicting
Individual Value of Y
Y
Regression line
Y
Prediction band for E[Y|X]
Regression
line
Y
Prediction band for Y
X
X
3) Variation around the regression
line
McGraw-Hill/Irwin
X
Prediction Interval for E[Y|X]
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-21
BUSINESS STATISTICS
5th edi tion
Prediction Interval for a Value of Y
A (1 -  ) 100% prediction interval for Y :
1 (x  x)
yˆ  t  s 1  
n
SS
2

2
X
Example10 - 1 (X = 4,000) :
1 (4,000  3,177.92)
{274.85  (1.2553)(4,000)}  2.069  318.16 1  
25
40,947,557.84
2
 5296.05  676.62  [4619.43, 5972.67]
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-22
BUSINESS STATISTICS
5th edi tion
Prediction Interval for the Average Value
of Y
A (1 -  ) 100% prediction interval for the E[Y X] :
1 (x  x)

yˆ  t  s
SS
n
2

2
X
Example10 - 1 (X = 4,000) :
1 (4,000  3,177.92)

{274.85  (1.2553)(4,000)}  2.069  318.16
40,947,557.84
25
2
 5,296.05  156.48  [5139.57, 5452.53]
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
10-23
5th edi tion
Template Output with Prediction
Intervals
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
10-24
5th edi tion
10-11 The Solver Method for Regression
The solver macro available in EXCEL can also be used to conduct a
simple linear regression. See the text for instructions.
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
Penutup
• Regresi dan Korelasi linier Sederhana
pada hakekatnya merupakan suatu
pendekatan/model untuk mencari
hubungan sebab akibat (secara linier)
antara dua variabel, yaitu variabel bebas
(variabel pengaruh) dan variabel tak
bebas (variabel terpengaruh) yang
selanjutnya dapat digunakan untuk
peramalan atau prakiraan
25
Download