Pertemuan 12 Korelasi dan Regresi Linier – Statistik Ekonomi Matakuliah

advertisement

Matakuliah

Tahun

: A0392

– Statistik Ekonomi

: 2006

Pertemuan 12

Korelasi dan Regresi Linier

1

Outline Materi :

Koefisien korelasi dan determinasi

Persamaan regresi

Regresi dan peramalan

2

Simple Correllation and Linear

Regression

• Types of Regression Models

• Determining the Simple Linear Regression

Equation

• Measures of Variation

• Assumptions of Regression and

Correlation

• Residual Analysis

• Measuring Autocorrelation

• Inferences about the Slope

3

Simple Correlation and…

Association

• Estimation of Mean Values and Prediction of Individual Values

• Pitfalls in Regression and Ethical Issues

4

Purpose of Regression

Analysis

• Regression Analysis is Used Primarily to

Model Causality and Provide Prediction

– Predict the values of a dependent (response) variable based on values of at least one independent (explanatory) variable

– Explain the effect of the independent variables on the dependent variable

5

Types of Regression Models

Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship

6

Simple Linear Regression

Model

• Relationship between Variables is

Described by a Linear Function

• The Change of One Variable Causes the

Other Variable to Change

• A Dependency of One Variable on the

Other

7

Simple Linear Regression

Model

(continued)

Population regression line is a straight line that describes the dependence of the average value

(conditional mean) of one variable on the other

Y

Population

Intercept

Y i

Population

Slope

X

Coefficient

i

i

Random

Error

Dependent

(Response)

Variable

Population

Regression

Line

(Conditional Mean)

Independent

(Explanatory)

Variable

8

Simple Linear Regression

Model

(continued)

Y

(Observed Value of

Y

) =

Y i

X i

i

Observed Value of

Y

i

= Random Error

(Conditional Mean)

X i

X

9

Linear Regression Equation

Sample regression line provides an predicted value of Y estimate the population regression line as well as a of

Sample

Y Intercept

Y i

b

0

b

1

X i

Sample

Slope

e i

Coefficient

Residual

Y

ˆ

b

0

b X

1

Simple Regression Equation

(Fitted Regression Line, Predicted Value)

10

Linear Regression Equation

b

0

b

0

b b

1 squared residuals

(continued) of and that minimize the sum of the

i n

1

Y i

Y i

ˆ

2

i n

1

e i

2

b

0

b

1 provides an

estimate

provides an

estimate

of of

11

Linear Regression Equation

Y

Y i

b

0

b

1

X i

e i e i b

0

Observed Value

i

Y i

X

(continued)

i

i b

1

Y i

ˆ

 

0

b X

1

i

X i

X

12

Interpretation of the Slope and Intercept

0

 is the average value of Y when the value of X is zero

1

 change in

| change in

X

 measures the change in the average value of Y as a result of a one-unit change in X

13

b

Interpretation of the Slope and Intercept

|

(continued)

0

 is the

estimated

average value of Y when the value of X is zero

b

1

change in

|

 is the

estimated

change in

X

change in the average value of Y as a result of a one-unit change in X

14

Simple Linear Regression:

Example

You wish to examine the linear dependency of the annual sales of produce stores on their sizes in square footage.

Sample data for 7 stores were obtained.

Find the equation of the straight line that fits the data best.

Store Square

Feet

5

6

7

3

4

1 1,726

2 1,542

2,816

5,555

1,292

2,208

1,313

Annual

Sales

($1000)

3,681

3,395

6,653

9,543

3,318

5,563

3,760

15

Scatter Diagram: Example

1 2 0 0 0

1 0 0 0 0

8 0 0 0

6 0 0 0

4 0 0 0

2 0 0 0

0

0

Excel Output

1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0

S q u a r e F e e t

5 0 0 0 6 0 0 0

16

Simple Linear Regression

Equation: Example

Y i

ˆ

 

0

b X

1

i

X i

From Excel Printout:

I n t e r c e p t

X V a r i a b l e 1

C o e ffi c i e n ts

1 6 3 6 . 4 1 4 7 2 6

1 . 4 8 6 6 3 3 6 5 7

17

Graph of the Simple Linear

Regression Equation: Example

1 2 0 0 0

1 0 0 0 0

8 0 0 0

6 0 0 0

4 0 0 0

2 0 0 0

0

0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0

S q u a r e F e e t

5 0 0 0 6 0 0 0

18

Interpretation of Results:

Example

Y i

 

X i

The slope of 1.487 means that for each increase of one unit in X, we predict the average of Y to increase by an estimated 1.487 units.

The equation estimates square foot that for each increase of 1 in the size of the store, the expected annual sales are predicted to increase by $1487 .

19

Simple Linear Regression in PHStat

• In Excel, use PHStat | Regression | Simple

Linear Regression …

• Excel Spreadsheet of Regression Sales on Footage

20

Measures of Variation:

The Sum of Squares

• SST = Total Sum of Squares

– Measures the variation of the

Y i

values

(continued)

• SSR = Regression Sum of Squares

– Explained variation attributable to the relationship between

X

and

Y

• SSE = Error Sum of Squares

– Variation attributable to factors other than the relationship between

X

and

Y

21

The Coefficient of Determination

r

2

SSR

SST

Regression Sum of Squares

Total Sum of Squares

• Measures the proportion of variation in

Y

that is explained by the independent variable

X

in the regression model

22

Venn Diagrams and

Explanatory Power of

Regression

Sales

r

2

Sizes

SSR

SSR

S SE

23

Coefficients of Determination ( and Correlation ( r

) r

2

)

Y

r

2

= 1, r = +1

Y i

^

= b

0

+ b

1

X i

X

Y

r

2

= .81, r = +0.9

^

i

= b

0

+ b

1

X i

X

Y

r

2

= 1,

^

i

r = -1

= b

0

+ b

1

X i

Y

X

r

2

= 0, r = 0

^

i

= b

0

+ b

1

X i

X

24

Standard Error of Estimate

S

YX

SSE n

2

i n

1

Y n

2

Y i

ˆ

2

• Measures the standard deviation

(variation) of the

Y

values around the regression equation

25

Measures of Variation:

Produce Store Example

Excel Output for Produce Stores

R e g r e ssi o n S ta ti sti c s

M u l t i p l e R 0 . 9 7 0 5 5 7 2

R S q u a r e

A d j u s t e d R S q u a r e

S t a n d a r d E r r o r

0 . 9 4 1 9 8 1 2 9

0 . 9 3 0 3 7 7 5 4

6 1 1 . 7 5 1 5 1 7

r

2

= .94

O b s e r va t i o n s n

7

94% of the variation in annual sales can be explained by the variability in the size of the store as measured by square footage.

S yx

26

Linear Regression

Assumptions

• Normality

– Y values are normally distributed for each

X

– Probability distribution of error is normal

• Homoscedasticity (Constant Variance)

• Independence of Errors

27

Consequences of Violation of the Assumptions

• Violation of the Assumptions

– Non-normality (error not normally distributed)

– Heteroscedasticity (variance not constant)

• Usually happens in cross-sectional data

– Autocorrelation (errors are not independent)

• Usually happens in time-series data

• Consequences of Any Violation of the Assumptions

– Predictions and estimations obtained from the sample regression line will not be accurate

– Hypothesis testing results will not be reliable

• It is Important to Verify the Assumptions

28

Variation of Errors Around the Regression Line

f(e)

Y values are normally distributed around the regression line.

For each X value, the “spread” or variance around the regression line is the same.

Y

X

2

X

X

1

Sample Regression Line

29

Purpose of Correlation

Analysis

(continued)

• Sample Correlation Coefficient

r

Estimate of

 is an and is Used to Measure the

Strength of the Linear Relationship in the

Sample Observations

r

i n

1

X i



i

Y

i n

1

X i

X n

 

i

1

Y i

Y

2

30

Features of

and r

• Unit Free

• Range between -1 and 1

• The Closer to -1, the Stronger the

Negative Linear Relationship

• The Closer to 1, the Stronger the Positive

Linear Relationship

• The Closer to 0, the Weaker the Linear

Relationship

31

Pitfalls of Regression Analysis

• Lacking an Awareness of the Assumptions

Underlining Least-Squares Regression

• Not Knowing How to Evaluate the

Assumptions

• Not Knowing What the Alternatives to

Least-Squares Regression are if a

Particular Assumption is Violated

• Using a Regression Model Without

Knowledge of the Subject Matter

32

Strategy for Avoiding the

Pitfalls of Regression

• Start with a scatter plot of X on Y to observe possible relationship

• Perform residual analysis to check the assumptions

• Use a histogram, stem-and-leaf display, box-and-whisker plot, or normal probability plot of the residuals to uncover possible non-normality

33

Strategy for Avoiding the

Pitfalls of Regression

(continued)

• If there is violation of any assumption, use alternative methods (e.g., least absolute deviation regression or least median of squares regression) to least-squares regression or alternative least-squares models (e.g., curvilinear or multiple regression)

• If there is no evidence of assumption violation, then test for the significance of the regression coefficients and construct confidence intervals and prediction intervals

34

Chapter Summary

• Introduced Types of Regression Models

• Discussed Determining the Simple Linear

Regression Equation

• Described Measures of Variation

• Addressed Assumptions of Regression and Correlation

• Discussed Residual Analysis

• Addressed Measuring Autocorrelation

35

Chapter Summary

(continued)

• Described Inference about the Slope

• Discussed Correlation - Measuring the

Strength of the Association

• Addressed Estimation of Mean Values and

Prediction of Individual Values

• Discussed Pitfalls in Regression and

Ethical Issues

36

Download