Matakuliah
Tahun
: A0392 – Statistik Ekonomi
: 2006
Pertemuan 12
Korelasi dan Regresi Linier
1
Outline Materi :
Koefisien korelasi dan determinasi
Persamaan regresi
Regresi dan peramalan
2
Simple Correllation and Linear
Regression
• Types of Regression Models
• Determining the Simple Linear Regression
Equation
• Measures of Variation
• Assumptions of Regression and
Correlation
• Residual Analysis
• Measuring Autocorrelation
• Inferences about the Slope
3
Simple Correlation and…
Association
• Estimation of Mean Values and Prediction of Individual Values
• Pitfalls in Regression and Ethical Issues
4
Purpose of Regression
Analysis
• Regression Analysis is Used Primarily to
Model Causality and Provide Prediction
– Predict the values of a dependent (response) variable based on values of at least one independent (explanatory) variable
– Explain the effect of the independent variables on the dependent variable
5
Types of Regression Models
Positive Linear Relationship Relationship NOT Linear
Negative Linear Relationship No Relationship
6
Simple Linear Regression
Model
• Relationship between Variables is
Described by a Linear Function
• The Change of One Variable Causes the
Other Variable to Change
• A Dependency of One Variable on the
Other
7
Simple Linear Regression
Model
(continued)
Population regression line is a straight line that describes the dependence of the average value
(conditional mean) of one variable on the other
Population
Y Intercept
Y i
Population
Slope
X
Coefficient i
i
Random
Error
Dependent
(Response)
Variable
Population
Regression
Line
(Conditional Mean)
Independent
(Explanatory)
Variable
8
Simple Linear Regression
Model
(continued)
Y (Observed Value of Y ) = Y i
X i
i
Observed Value of Y
i
= Random Error
(Conditional Mean)
X i
X
9
Linear Regression Equation
Sample regression line provides an the population regression line as well as a predicted value of Y estimate of
Sample
Y Intercept
Y i
Sample
Slope b
0
b
1
X i
e i
Coefficient
Residual
Y
b
0
b X
1
Simple Regression Equation
(Fitted Regression Line, Predicted Value)
10
Linear Regression Equation
• b b
0 b b
(continued)
0 1 of and that minimize the sum of the
1 squared residuals i n
1
Y i
Y i
2
i n
1 e i
2
• b b
1
0 provides an estimate of
• provides an estimate of
11
Linear Regression Equation
Y
Y i
b
0
b
1
X i
e i e i b
0
Observed Value
i
Y i
X
(continued) i
i b
1
X i
Y
i
0 b X
1 i
X
12
Interpretation of the Slope and Intercept
•
|
0
is the average value of Y when the value of X is zero
•
1 change in
|
change in X measures the change in the average value of Y as a result of a one-unit change in X
13
•
Interpretation of the Slope and Intercept b
(continued)
is the estimated average value of Y when the value of X is zero
• b
1
is the estimated
X change in the average value of Y as a result of a one-unit change in X
14
Simple Linear Regression:
Example
You wish to examine the linear dependency of the annual sales of produce stores on their sizes in square footage.
Sample data for 7 stores were obtained.
Find the equation of the straight line that fits the data best.
Annual
Store Square Sales
Feet ($1000)
5
6
7
3
4
1 1,726
2 1,542
2,816
5,555
1,292
2,208
1,313
3,681
3,395
6,653
9,543
3,318
5,563
3,760
15
Scatter Diagram: Example
1 2 0 0 0
1 0 0 0 0
8 0 0 0
6 0 0 0
4 0 0 0
2 0 0 0
0
0
Excel Output
1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0
S q u a r e F e e t
5 0 0 0 6 0 0 0
16
Simple Linear Regression
Equation: Example
Y
ˆ i
0 b X
1 i
X i
From Excel Printout:
I n t e r c e p t
C o e ffi c i e n ts
1 6 3 6 . 4 1 4 7 2 6
X V a r i a b l e 1 1 . 4 8 6 6 3 3 6 5 7
17
Graph of the Simple Linear
Regression Equation: Example
1 2 0 0 0
1 0 0 0 0
8 0 0 0
6 0 0 0
4 0 0 0
2 0 0 0
0
0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0
S q u a r e F e e t
5 0 0 0 6 0 0 0
18
Interpretation of Results:
Example
Y i
X i
The slope of 1.487 means that for each increase of one unit in X, we predict the average of Y to increase by an estimated 1.487 units.
The equation estimates that for each increase of 1 square foot in the size of the store, the expected annual sales are predicted to increase by $1487 .
19
Simple Linear Regression in PHStat
• In Excel, use PHStat | Regression | Simple
Linear Regression …
• Excel Spreadsheet of Regression Sales on Footage
20
Measures of Variation:
The Sum of Squares
• SST = Total Sum of Squares
– Measures the variation of the Y i values
(continued)
• SSR = Regression Sum of Squares
– Explained variation attributable to the relationship between X and Y
• SSE = Error Sum of Squares
– Variation attributable to factors other than the relationship between X and Y
21
The Coefficient of Determination
• r
2
SSR
SST
• Measures the proportion of variation in Y that is explained by the independent variable X in the regression model
22
Venn Diagrams and
Explanatory Power of
Regression
r
2
SSR
SSR
S SE
23
Coefficients of Determination ( and Correlation ( r ) r 2 )
Y r 2 = 1, r = +1
Y i
^
= b
0
+ b
1
X i
X
Y r 2 = .81, r = +0.9
^ i
= b
0
+ b
1
X i
X
Y r 2 = 1,
^ i
r = -1
= b
0
+ b
1
X i
Y
X r 2 = 0, r = 0
^ i
= b
0
+ b
1
X i
X
24
•
Standard Error of Estimate
S
YX
SSE n
2
i n
1
Y
Y i
ˆ
2 n
2
• Measures the standard deviation
(variation) of the Y values around the regression equation
25
Measures of Variation:
Produce Store Example
Excel Output for Produce Stores
R e g r e ssi o n S ta ti sti c s
M u l t i p l e R 0 . 9 7 0 5 5 7 2
R S q u a r e 0 . 9 4 1 9 8 1 2 9
A d j u s t e d R S q u a r e 0 . 9 3 0 3 7 7 5 4
S t a n d a r d E r r o r 6 1 1 . 7 5 1 5 1 7 r 2 = .94
O b s e r va t i o n s n
7
94% of the variation in annual sales can be explained by the variability in the size of the store as measured by square footage.
S yx
26
Linear Regression
Assumptions
• Normality
– Y values are normally distributed for each X
– Probability distribution of error is normal
• Homoscedasticity (Constant Variance)
• Independence of Errors
27
Consequences of Violation of the Assumptions
• Violation of the Assumptions
– Non-normality (error not normally distributed)
– Heteroscedasticity (variance not constant)
• Usually happens in cross-sectional data
– Autocorrelation (errors are not independent)
• Usually happens in time-series data
• Consequences of Any Violation of the Assumptions
– Predictions and estimations obtained from the sample regression line will not be accurate
– Hypothesis testing results will not be reliable
• It is Important to Verify the Assumptions
28
Variation of Errors Around the Regression Line f(e)
• Y values are normally distributed around the regression line.
• For each X value, the “spread” or variance around the regression line is the same.
Y
X
2
X
X
1
Sample Regression Line
29
Purpose of Correlation
Analysis
(continued)
• Sample Correlation Coefficient r is an
Estimate of
and is Used to Measure the
Strength of the Linear Relationship in the
Sample Observations r
i n
1
X i
i
Y
i n
1
X i
X
i n
1
Y i
Y
2
30
Features of and r
• Unit Free
• Range between -1 and 1
• The Closer to -1, the Stronger the
Negative Linear Relationship
• The Closer to 1, the Stronger the Positive
Linear Relationship
• The Closer to 0, the Weaker the Linear
Relationship
31
Pitfalls of Regression Analysis
• Lacking an Awareness of the Assumptions
Underlining Least-Squares Regression
• Not Knowing How to Evaluate the
Assumptions
• Not Knowing What the Alternatives to
Least-Squares Regression are if a
Particular Assumption is Violated
• Using a Regression Model Without
Knowledge of the Subject Matter
32
Strategy for Avoiding the
Pitfalls of Regression
• Start with a scatter plot of X on Y to observe possible relationship
• Perform residual analysis to check the assumptions
• Use a histogram, stem-and-leaf display, box-and-whisker plot, or normal probability plot of the residuals to uncover possible non-normality
33
Strategy for Avoiding the
Pitfalls of Regression
(continued)
• If there is violation of any assumption, use alternative methods (e.g., least absolute deviation regression or least median of squares regression) to least-squares regression or alternative least-squares models (e.g., curvilinear or multiple regression)
• If there is no evidence of assumption violation, then test for the significance of the regression coefficients and construct confidence intervals and prediction intervals
34
Chapter Summary
• Introduced Types of Regression Models
• Discussed Determining the Simple Linear
Regression Equation
• Described Measures of Variation
• Addressed Assumptions of Regression and Correlation
• Discussed Residual Analysis
• Addressed Measuring Autocorrelation
35
Chapter Summary
(continued)
• Described Inference about the Slope
• Discussed Correlation - Measuring the
Strength of the Association
• Addressed Estimation of Mean Values and
Prediction of Individual Values
• Discussed Pitfalls in Regression and
Ethical Issues
36