advertisement

Matakuliah

Tahun

: A0392

– Statistik Ekonomi

: 2006

**Pertemuan 12**

**Korelasi dan Regresi Linier**

1

Outline Materi :

Koefisien korelasi dan determinasi

Persamaan regresi

Regresi dan peramalan

2

**Simple Correllation and Linear **

**Regression**

• Types of Regression Models

• Determining the Simple Linear Regression

Equation

• Measures of Variation

• Assumptions of Regression and

Correlation

• Residual Analysis

• Measuring Autocorrelation

• Inferences about the Slope

3

**Simple Correlation and… **

Association

• Estimation of Mean Values and Prediction of Individual Values

• Pitfalls in Regression and Ethical Issues

4

**Purpose of Regression **

**Analysis**

• Regression Analysis is Used Primarily to

Model Causality and Provide Prediction

– Predict the values of a dependent (response) variable based on values of at least one independent (explanatory) variable

– Explain the effect of the independent variables on the dependent variable

5

**Types of Regression Models**

**Positive Linear Relationship Relationship NOT Linear**

**Negative Linear Relationship No Relationship**

6

**Simple Linear Regression **

**Model**

• Relationship between Variables is

Described by a Linear Function

• The Change of One Variable Causes the

Other Variable to Change

• A Dependency of One Variable on the

Other

7

**Simple Linear Regression **

**Model**

(continued)

Population regression line is a straight line that describes the dependence of the average value

(conditional mean) of one variable on the other

Y

Population

Intercept

*Y i*

Population

Slope

*X*

Coefficient

*i*

*i*

Random

Error

Dependent

(Response)

Variable

Population

Regression

Line

(Conditional Mean)

Independent

(Explanatory)

Variable

8

**Simple Linear Regression **

**Model**

(continued)

**Y**

(Observed Value of

*Y*

) =

*Y i*

*X i*

*i*

Observed Value of

*Y*

*i*

= Random Error

**(Conditional Mean)**

*X i*

**X**

9

**Linear Regression Equation**

Sample regression line provides an predicted value of Y estimate the population regression line as well as a of

**Sample **

**Y Intercept**

*Y i*

*b*

0

*b*

1

*X i*

**Sample**

**Slope**

*e i*

**Coefficient**

**Residual**

*Y*

*b*

0

*b X*

1

**Simple Regression Equation **

**(Fitted Regression Line, Predicted Value)**

10

**Linear Regression Equation**

•

*b*

0

*b*

0

*b b*

1 squared residuals

(continued) of and that minimize the sum of the

•

•

*i n*

1

*Y i*

*Y i*

2

*i n*

1

*e i*

2

*b*

0

*b*

1 provides an

*estimate*

provides an

*estimate*

of of

11

**Linear Regression Equation**

**Y**

*Y i*

*b*

0

*b*

1

*X i*

*e i e i b*

0

Observed Value

*i*

*Y i*

*X*

(continued)

*i*

*i b*

1

*Y i*

0

*b X*

1

*i*

*X i*

**X**

12

**Interpretation of the Slope and Intercept**

•

0

is the average value of Y when the value of X is zero

•

1

change in

| change in

*X*

measures the change in the average value of Y as a result of a one-unit change in X

13

•

*b*

**Interpretation of the Slope and Intercept**

(continued)

is the

*estimated*

average value of Y when the value of X is zero

•

*b*

1

is the

*estimated*

*X*

change in the average value of Y as a result of a one-unit change in X

14

**Simple Linear Regression: **

**Example**

You wish to examine the linear dependency of the annual sales of produce stores on their sizes in square footage.

Sample data for 7 stores were obtained.

Find the equation of the straight line that fits the data best.

**Store Square **

**Feet**

**5**

**6**

**7**

**3**

**4**

**1 1,726**

**2 1,542**

**2,816**

**5,555**

**1,292**

**2,208**

**1,313**

**Annual **

**Sales**

**($1000)**

**3,681**

**3,395**

**6,653**

**9,543**

**3,318**

**5,563**

**3,760**

15

**Scatter Diagram: Example**

1 2 0 0 0

1 0 0 0 0

8 0 0 0

6 0 0 0

4 0 0 0

2 0 0 0

0

0

**Excel Output**

1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0

**S q u a r e F e e t**

5 0 0 0 6 0 0 0

16

**Simple Linear Regression **

**Equation: Example**

*Y i*

ˆ

0

*b X*

1

*i*

*X i*

*From Excel Printout:*

**I n t e r c e p t**

**X V a r i a b l e 1**

*C o e ffi c i e n ts*

1 6 3 6 . 4 1 4 7 2 6

1 . 4 8 6 6 3 3 6 5 7

17

**Graph of the Simple Linear **

**Regression Equation: Example**

1 2 0 0 0

1 0 0 0 0

8 0 0 0

6 0 0 0

4 0 0 0

2 0 0 0

0

0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0

**S q u a r e F e e t**

5 0 0 0 6 0 0 0

18

**Interpretation of Results: **

**Example**

*Y i*

*X i*

*The slope of 1.487 means that for each increase of one unit in X, we predict the average of Y to increase by an estimated 1.487 units.*

*The equation estimates square foot that for each increase of 1 in the size of the store, the expected annual sales are predicted to increase by $1487 .*

19

**Simple Linear Regression in PHStat**

• In Excel, use PHStat | Regression | Simple

Linear Regression …

• Excel Spreadsheet of Regression Sales on Footage

20

**Measures of Variation: **

**The Sum of Squares**

• SST = Total Sum of Squares

– Measures the variation of the

*Y i*

values

(continued)

• SSR = Regression Sum of Squares

– Explained variation attributable to the relationship between

*X*

and

*Y*

• SSE = Error Sum of Squares

– Variation attributable to factors other than the relationship between

*X*

and

*Y*

21

**The Coefficient of Determination**

•

*r*

2

*SSR*

*SST*

• Measures the proportion of variation in

*Y*

that is explained by the independent variable

*X *

in the regression model

22

**Venn Diagrams and **

**Explanatory Power of **

**Regression**

*r*

2

*SSR*

*SSR*

*S SE*

23

**Coefficients of Determination ( and Correlation ( r**

**) r **

**2**

**) **

**Y**

*r*

**2**

**= 1, r = +1**

*Y i*

**^**

**= b**

**0**

**+ b**

**1**

*X i*

**X**

**Y**

*r*

**2**

**= .81, r = +0.9**

**^**

*i*

**= b**

**0**

**+ b**

**1**

*X i*

**X**

**Y**

*r*

**2**

**= 1,**

**^**

*i*

*r *= -1

**= b**

**0**

**+ b**

**1**

*X i*

**Y**

**X**

*r*

**2**

**= 0, r = 0**

**^**

*i*

**= b**

**0**

**+ b**

**1**

*X i*

**X**

24

•

**Standard Error of Estimate**

*S*

*YX*

*SSE n*

2

*i n*

1

*Y n*

2

*Y i*

ˆ

2

• Measures the standard deviation

(variation) of the

*Y *

values around the regression equation

25

**Measures of Variation: **

**Produce Store Example**

**Excel Output for Produce Stores**

*R e g r e ssi o n S ta ti sti c s*

M u l t i p l e R 0 . 9 7 0 5 5 7 2

R S q u a r e

A d j u s t e d R S q u a r e

S t a n d a r d E r r o r

0 . 9 4 1 9 8 1 2 9

0 . 9 3 0 3 7 7 5 4

6 1 1 . 7 5 1 5 1 7

**r**

**2 **

**= .94 **

O b s e r va t i o n s n

7

94% of the variation in annual sales can be explained by the variability in the size of the store as measured by square footage.

*S yx*

26

**Linear Regression **

**Assumptions**

• Normality

– Y values are normally distributed for each

*X*

– Probability distribution of error is normal

• Homoscedasticity (Constant Variance)

• Independence of Errors

27

**Consequences of Violation of the Assumptions**

• Violation of the Assumptions

– Non-normality (error not normally distributed)

– Heteroscedasticity (variance not constant)

• Usually happens in cross-sectional data

– Autocorrelation (errors are not independent)

• Usually happens in time-series data

• Consequences of Any Violation of the Assumptions

– Predictions and estimations obtained from the sample regression line will not be accurate

– Hypothesis testing results will not be reliable

• It is Important to Verify the Assumptions

28

**Variation of Errors Around the Regression Line**

*f(e)*

•

*Y *values are normally distributed around the regression line.

•

**For each X value, the “spread” or variance around the regression line is the same.**

*Y*

*X*

*2*

*X*

*X*

*1*

**Sample Regression Line**

29

**Purpose of Correlation **

**Analysis**

(continued)

• Sample Correlation Coefficient

*r*

Estimate of

is an and is Used to Measure the

Strength of the Linear Relationship in the

Sample Observations

*r*

*i n*

1

*X i*

*i*

*Y*

*i n*

1

*X i*

*X n*

*i*

1

*Y i*

*Y*

2

30

**Features of **

**and r**

• Unit Free

• Range between -1 and 1

• The Closer to -1, the Stronger the

Negative Linear Relationship

• The Closer to 1, the Stronger the Positive

Linear Relationship

• The Closer to 0, the Weaker the Linear

Relationship

31

**Pitfalls of Regression Analysis**

• Lacking an Awareness of the Assumptions

Underlining Least-Squares Regression

• Not Knowing How to Evaluate the

Assumptions

• Not Knowing What the Alternatives to

Least-Squares Regression are if a

Particular Assumption is Violated

• Using a Regression Model Without

Knowledge of the Subject Matter

32

**Strategy for Avoiding the **

**Pitfalls of Regression**

• Start with a scatter plot of X on Y to observe possible relationship

• Perform residual analysis to check the assumptions

• Use a histogram, stem-and-leaf display, box-and-whisker plot, or normal probability plot of the residuals to uncover possible non-normality

33

**Strategy for Avoiding the **

**Pitfalls of Regression**

(continued)

• If there is violation of any assumption, use alternative methods (e.g., least absolute deviation regression or least median of squares regression) to least-squares regression or alternative least-squares models (e.g., curvilinear or multiple regression)

• If there is no evidence of assumption violation, then test for the significance of the regression coefficients and construct confidence intervals and prediction intervals

34

**Chapter Summary**

• Introduced Types of Regression Models

• Discussed Determining the Simple Linear

Regression Equation

• Described Measures of Variation

• Addressed Assumptions of Regression and Correlation

• Discussed Residual Analysis

• Addressed Measuring Autocorrelation

35

**Chapter Summary**

(continued)

• Described Inference about the Slope

• Discussed Correlation - Measuring the

Strength of the Association

• Addressed Estimation of Mean Values and

Prediction of Individual Values

• Discussed Pitfalls in Regression and

Ethical Issues

36