Pertemuan 21 Regresi dan Korelasi Linier Sederhana-1 Matakuliah

advertisement
Matakuliah
Tahun
Versi
: A0064 / Statistik Ekonomi
: 2005
: 1/1
Pertemuan 21
Regresi dan Korelasi Linier
Sederhana-1
1
Learning Outcomes
Pada akhir pertemuan ini, diharapkan mahasiswa
akan mampu :
• Menghubungkan dua variabel dengan
analisis regresi dan korelasi linier
sederhana
2
Outline Materi
• Model Regresi Linier Sederhana
• Estimasi: Model Kuadrat Terkecil
• Korelasi
3
COMPLETE
BUSINESS STATISTICS
10-4
5th edi tion
10 Simple Linear Regression and Correlation
•
•
•
•
•
•
•
•
•
•
•
Using Statistics
The Simple Linear Regression Model
Estimation: The Method of Least Squares
Error Variance and the Standard Errors of Regression Estimators
Correlation
Hypothesis Tests about the Regression Relationship
How Good is the Regression?
Analysis of Variance Table and an F Test of the Regression Model
Residual Analysis and Checking for Model Inadequacies
Use of the Regression Model for Prediction
Summary and Review of Terms
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
10-5
5th edi tion
10-1 Using Statistics
This scatterplot locates pairs of observations of
advertising expenditures on the x-axis and sales
on the y-axis. We notice that:
Scatterplot of Advertising Expenditures (X) and Sales (Y)
140
120
 Larger (smaller) values of sales tend to be
associated with larger (smaller) values of
advertising.
Sales
100
80
60
40
20
0
0
10
20
30
40
50
A d ve rtising
 The scatter of points tends to be distributed around a positively sloped straight line.
 The pairs of values of advertising expenditures and sales are not located exactly on a
straight line.
 The scatter plot reveals a more or less strong tendency rather than a precise linear
relationship.
 The line represents the nature of the relationship on average.
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-6
BUSINESS STATISTICS
5th edi tion
Examples of Other Scatterplots
0
Y
Y
Y
0
0
0
0
X
McGraw-Hill/Irwin
X
X
Y
Y
Y
X
X
Aczel/Sounderpandian
X
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-7
BUSINESS STATISTICS
5th edi tion
Model Building
The inexact nature of the
relationship between
advertising and sales
suggests that a statistical
model might be useful in
analyzing the relationship.
A statistical model separates
the systematic component
of a relationship from the
random component.
McGraw-Hill/Irwin
Data
Statistical
model
Systematic
component
+
Random
errors
Aczel/Sounderpandian
In ANOVA, the systematic
component is the variation
of means between samples
or treatments (SSTR) and
the random component is
the unexplained variation
(SSE).
In regression, the
systematic component is
the overall linear
relationship, and the
random component is the
variation around the line.
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-8
BUSINESS STATISTICS
5th edi tion
10-2 The Simple Linear Regression
Model
The population simple linear regression model:
Y= 0 + 1 X
Nonrandom or
Systematic
Component
+ 
Random
Component
where
 Y is the dependent variable, the variable we wish to explain or predict
 X is the independent variable, also called the predictor variable
  is the error term, the only random component in the model, and thus, the
only source of randomness in Y.
 0 is the intercept of the systematic component of the regression relationship.
 1 is the slope of the systematic component.
The conditional mean of Y: E[Y X ]   0   1 X
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-9
BUSINESS STATISTICS
5th edi tion
Picturing the Simple Linear
Regression Model
Y
Regression Plot
E[Y]=0 + 1 X
Yi
}
{
Error: i
}
1 = Slope
The simple linear regression
model gives an exact linear
relationship between the
expected or average value of Y,
the dependent variable, and X,
the independent or predictor
variable:
E[Yi]=0 + 1 Xi
1
Actual observed values of Y
differ from the expected value by
an unexplained or random error:
0 = Intercept
Xi
McGraw-Hill/Irwin
X
Aczel/Sounderpandian
Yi = E[Yi] + i
= 0 + 1 Xi + i
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-10
BUSINESS STATISTICS
5th edi tion
Assumptions of the Simple Linear
Regression Model
•
•
•
The relationship between X and
Y is a straight-line relationship.
The values of the independent
variable X are assumed fixed
(not random); the only
randomness in the values of Y
comes from the error term i.
The errors i are normally
distributed with mean 0 and
variance 2. The errors are
uncorrelated (not related) in
successive observations. That
is: ~ N(0,2)
Y
Assumptions of the Simple
Linear Regression Model
E[Y]=0 + 1 X
Identical normal
distributions of errors,
all centered on the
regression line.
X
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-11
BUSINESS STATISTICS
5th edi tion
10-3 Estimation: The Method of Least
Squares
Estimation of a simple linear regression relationship involves finding estimated
or predicted values of the intercept and slope of the linear regression line.
The estimated regression equation:
Y = b0 + b1X + e
where b0 estimates the intercept of the population regression line, 0 ;
b1 estimates the slope of the population regression line, 1;
and e stands for the observed errors - the residuals from fitting the estimated
regression line b0 + b1X to a set of n points.
The estimated regression line:
Y  b0 + b1 X
 (Y - hat) is the value of Y lying on the fitted regression line for a given
where Y
value of X.
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-12
BUSINESS STATISTICS
5th edi tion
Fitting a Regression Line
Y
Y
Data
X
Three errors from the
least squares regression
line
X
Y
Three errors
from a fitted line
X
McGraw-Hill/Irwin
Aczel/Sounderpandian
Errors from the least
squares regression
line are minimized
X
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-13
BUSINESS STATISTICS
5th edi tion
Errors in Regression
Y
the observeddata point
.
Yi
{
Error ei  Yi  Yi
Yi
Y  b0  b1 X
the fitted regression line
Yi the predicted value of Y for X
i
X
Xi
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-14
BUSINESS STATISTICS
5th edi tion
Least Squares Regression
The sum of squared errors in regression is:
n
SSE =
n
e 
2
(y

y

)
 i i
i=1
i=1
2
i
The least squares regression line is that which minimizes the SSE
with respect to the estimates b 0 and b 1 .
The normal equations:
n
y
SSE
b0
n
i
 nb0  b1  x i
i=1
i=1
n
x y
i
i
i=1
McGraw-Hill/Irwin
Least squares b0
n
n
i=1
i=1
b0  x i  b1  x 2i
Least squares b1
Aczel/Sounderpandian
At this point
SSE is
minimized
with respect
to b0 and b1
b1
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-15
BUSINESS STATISTICS
5th edi tion
Sums of Squares, Cross Products, and
Least Squares Estimators
Sums of Squares and Cross Products:
SSx   (x  x )   x
2
2
x



2
n 2
y


2
2
SS y   ( y  y )   y 
n
SSxy   (x  x )( y  y )  
x  ( y )


xy 
n
Least  squares regression estimators:
SS XY
b1 
SS X
b0  y  b1 x
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-16
BUSINESS STATISTICS
5th edi tion
Example 10-1
Miles
1211
1345
1422
1687
1849
2026
2133
2253
2400
2468
2699
2806
3082
3209
3466
3643
3852
4033
4267
4498
4533
4804
5090
5233
5439
79,448
Dollars
1802
2405
2005
2511
2332
2305
3016
3385
3090
3694
3371
3998
3555
4692
4244
5298
4801
5147
5738
6420
6059
6426
6321
7026
6964
106,605
Miles 2
1466521
1809025
2022084
2845969
3418801
4104676
4549689
5076009
5760000
6091024
7284601
7873636
9498724
10297681
12013156
13271449
14837904
16265089
18207288
20232004
20548088
23078416
25908100
27384288
29582720
293,426,946
McGraw-Hill/Irwin
Miles*Dollars
2182222
3234725
2851110
4236057
4311868
4669930
6433128
7626405
7416000
9116792
9098329
11218388
10956510
15056628
14709704
19300614
18493452
20757852
24484046
28877160
27465448
30870504
32173890
36767056
37877196
390,185,014
2
SS x   x 
 x 2
n
 293, 426 ,946 
SS xy   xy 
79, 448
 x ( y )
2
 40,947 ,557.84
25
n
 390,185,014 
(79, 448)(106,605 )
 51, 402 ,852 .4
25
SS
51, 402,852.4
b  XY 
 1.255333776  1.26
1 SS
40,947 ,557.84
X
b  y b x 
0
1
106,605
25
 79,448 

25


 (1.255333776 )
 274.85
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
10-17
5th edi tion
Template (partial output) that can be
used to carry out a Simple Regression
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
10-18
5th edi tion
Template (continued) that can be used to
carry out a Simple Regression
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
10-19
5th edi tion
Template (continued) that can be used to
carry out a Simple Regression
Residual Analysis. The plot shows the absence of a relationship
between the residuals and the X-values (miles).
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
10-20
5th edi tion
Template (continued) that can be used to
carry out a Simple Regression
Note: The normal probability plot is approximately linear. This
would indicate that the normality assumption for the errors has not
been violated.
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-21
BUSINESS STATISTICS
5th edi tion
Total Variance and Error Variance
Y
Y
X
What you see when looking
at the total variation of Y.
McGraw-Hill/Irwin
Aczel/Sounderpandian
X
What you see when looking
along the regression line at
the error variance of Y.
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-22
BUSINESS STATISTICS
5th edi tion
10-4 Error Variance and the Standard
Errors of Regression Estimators
Y
Degrees of Freedom in Regression:
df = (n - 2) (n total observations less one degree of freedom
for each parameter estimated (b 0 and b1 ) )
2
)
SS
(
2
XY
SSE =  ( Y - Y )  SSY 
SS X
= SSY  b1SS XY
Square and sum all
regression errors to find
SSE.
Example 10 - 1:
2
2
An unbiased estimator of s , denoted by S :
SSE
SSE = SS Y  b1 SS XY
 66855898  (1.255333776 )( 51402852 .4 )
 2328161.2
MSE 
SSE
n2
 101224 .4
MSE =
(n - 2)
s 
McGraw-Hill/Irwin
X
Aczel/Sounderpandian
MSE 

2328161.2
23
101224 .4  318.158
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-23
BUSINESS STATISTICS
5th edi tion
Standard Errors of Estimates
in Regression
The standard error of b0 (intercept):
s(b0 ) 
where s =
s
x
2
nSS X
MSE
The standard error of b1 (slope):
s(b1 ) 
McGraw-Hill/Irwin
s
SS X
Aczel/Sounderpandian
Example 10 - 1:
2
s x
s(b0 ) 
nSS X
318.158 293426944

( 25)( 4097557.84 )
 170.338
s
s(b1 ) 
SS X
318.158

40947557.84
 0.04972
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
10-24
5th edi tion
Confidence Intervals for the Regression
Parameters
A (1 -  ) 100% confidence interval for b :
0
b  t 
s (b )
0  ,(n 2 ) 0
2

A (1 -  ) 100% confidence interval for b :
1
b  t 
s (b )
1  ,(n 2 ) 1
2

Least-squares point estimate:
b1=1.25533
McGraw-Hill/Irwin
0
b t
s (b )
0  0.025,( 25 2 ) 0
= 274.85  ( 2.069) (170.338)
 274.85  352.43
 [ 77.58, 627.28]
b1  t
 0.025,( 25 2 )
s (b1 )
= 1.25533  ( 2.069) ( 0.04972 )
 1.25533  010287
.
 [115246
.
,1.35820]
Height = Slope
Length = 1
Example 10 - 1
95% Confidence Intervals:
(not a possible value of the
regression slope at 95%)
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
10-25
5th edi tion
Template (partial output) that can be
used to obtain Confidence Intervals for
0 and 1
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
BUSINESS STATISTICS
10-26
5th edi tion
10-5 Correlation
The correlation between two random variables, X and Y, is a measure of the
degree of linear association between the two variables.
The population correlation, denoted by, can take on any value from -1 to 1.
  1
-1 <  < 0
0
0<<1
  1
indicates a perfect negative linear relationship
indicates a negative linear relationship
indicates no linear relationship
indicates a positive linear relationship
indicates a perfect positive linear relationship
The absolute value of  indicates the strength or exactness of the relationship.
McGraw-Hill/Irwin
Aczel/Sounderpandian
© The McGraw-Hill Companies, Inc., 2002
COMPLETE
10-27
BUSINESS STATISTICS
5th edi tion
Illustrations of Correlation
Y
 = -1
=0
Y
X
X
Y
 = -.8
Y
=0
X
McGraw-Hill/Irwin
Y
X
Aczel/Sounderpandian
=1
X
Y
 = .8
X
© The McGraw-Hill Companies, Inc., 2002
Penutup
• Pembahasan materi dilanjutkan dengan
Materi Pokok 22 (Regresi dan Korelasi
Linier Sederhana-2)
28
Download