Pertemuan 12 Korelasi dan Regresi Linier – Statistik Ekonomi Matakuliah

advertisement

Matakuliah

Tahun

: A0392 – Statistik Ekonomi

: 2006

Pertemuan 12

Korelasi dan Regresi Linier

1

Outline Materi :

 Koefisien korelasi dan determinasi

 Persamaan regresi

 Regresi dan peramalan

2

Simple Correllation and Linear

Regression

• Types of Regression Models

• Determining the Simple Linear Regression

Equation

• Measures of Variation

• Assumptions of Regression and

Correlation

• Residual Analysis

• Measuring Autocorrelation

• Inferences about the Slope

3

Simple Correlation and…

Association

• Estimation of Mean Values and Prediction of Individual Values

• Pitfalls in Regression and Ethical Issues

4

Purpose of Regression

Analysis

• Regression Analysis is Used Primarily to

Model Causality and Provide Prediction

– Predict the values of a dependent (response) variable based on values of at least one independent (explanatory) variable

– Explain the effect of the independent variables on the dependent variable

5

Types of Regression Models

Positive Linear Relationship Relationship NOT Linear

Negative Linear Relationship No Relationship

6

Simple Linear Regression

Model

• Relationship between Variables is

Described by a Linear Function

• The Change of One Variable Causes the

Other Variable to Change

• A Dependency of One Variable on the

Other

7

Simple Linear Regression

Model

(continued)

Population regression line is a straight line that describes the dependence of the average value

(conditional mean) of one variable on the other

Population

Y Intercept

Y i

Population

Slope

 

X

Coefficient i

  i

Random

Error

Dependent

(Response)

Variable

Population

Regression

Line

(Conditional Mean)

Independent

(Explanatory)

Variable

8

Simple Linear Regression

Model

(continued)

Y (Observed Value of Y ) = Y i

 

 

X i

  i

Observed Value of Y

 i

= Random Error

  

 

(Conditional Mean)

X i

X

9

Linear Regression Equation

Sample regression line provides an the population regression line as well as a predicted value of Y estimate of

Sample

Y Intercept

Y i

Sample

Slope b

0

 b

1

X i

 e i

Coefficient

Residual

Y

ˆ

 b

0

 b X

1

 Simple Regression Equation

(Fitted Regression Line, Predicted Value)

10

Linear Regression Equation

• b b

0 b b

(continued)

0 1 of and that minimize the sum of the

1 squared residuals i n 

1

Y i

Y i

ˆ

2

 i n 

1 e i

2

• b b

1

0 provides an estimate of

• provides an estimate of

11

Linear Regression Equation

Y

Y i

 b

0

 b

1

X i

 e i e i b

0

Observed Value

 i

Y i

 

 

X

(continued) i

  i b

1

  

 

X i

Y

ˆ

i

 

0 b X

1 i

X

12

Interpretation of the Slope and Intercept

• 

 

|

0

 is the average value of Y when the value of X is zero

 

1 change in

|

 change in X measures the change in the average value of Y as a result of a one-unit change in X

13

Interpretation of the Slope and Intercept b

ˆ

(continued)

|

0

 is the estimated average value of Y when the value of X is zero

• b

1

change in

  is the estimated

change in

X change in the average value of Y as a result of a one-unit change in X

14

Simple Linear Regression:

Example

You wish to examine the linear dependency of the annual sales of produce stores on their sizes in square footage.

Sample data for 7 stores were obtained.

Find the equation of the straight line that fits the data best.

Annual

Store Square Sales

Feet ($1000)

5

6

7

3

4

1 1,726

2 1,542

2,816

5,555

1,292

2,208

1,313

3,681

3,395

6,653

9,543

3,318

5,563

3,760

15

Scatter Diagram: Example

1 2 0 0 0

1 0 0 0 0

8 0 0 0

6 0 0 0

4 0 0 0

2 0 0 0

0

0

Excel Output

1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0

S q u a r e F e e t

5 0 0 0 6 0 0 0

16

Simple Linear Regression

Equation: Example

Y

ˆ i

 

0 b X

1 i

X i

From Excel Printout:

I n t e r c e p t

C o e ffi c i e n ts

1 6 3 6 . 4 1 4 7 2 6

X V a r i a b l e 1 1 . 4 8 6 6 3 3 6 5 7

17

Graph of the Simple Linear

Regression Equation: Example

1 2 0 0 0

1 0 0 0 0

8 0 0 0

6 0 0 0

4 0 0 0

2 0 0 0

0

0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0

S q u a r e F e e t

5 0 0 0 6 0 0 0

18

Interpretation of Results:

Example

Y i

 

X i

The slope of 1.487 means that for each increase of one unit in X, we predict the average of Y to increase by an estimated 1.487 units.

The equation estimates that for each increase of 1 square foot in the size of the store, the expected annual sales are predicted to increase by $1487 .

19

Simple Linear Regression in PHStat

• In Excel, use PHStat | Regression | Simple

Linear Regression …

• Excel Spreadsheet of Regression Sales on Footage

20

Measures of Variation:

The Sum of Squares

• SST = Total Sum of Squares

– Measures the variation of the Y i values

(continued)

• SSR = Regression Sum of Squares

– Explained variation attributable to the relationship between X and Y

• SSE = Error Sum of Squares

– Variation attributable to factors other than the relationship between X and Y

21

The Coefficient of Determination

• r

2 

SSR

SST

Regression Sum of Squares

Total Sum of Squares

• Measures the proportion of variation in Y that is explained by the independent variable X in the regression model

22

Venn Diagrams and

Explanatory Power of

Regression

Sales

r

2 

Sizes

SSR

SSR

S SE

23

Coefficients of Determination ( and Correlation ( r ) r 2 )

Y r 2 = 1, r = +1

Y i

^

= b

0

+ b

1

X i

X

Y r 2 = .81, r = +0.9

^ i

= b

0

+ b

1

X i

X

Y r 2 = 1,

^ i

r = -1

= b

0

+ b

1

X i

Y

X r 2 = 0, r = 0

^ i

= b

0

+ b

1

X i

X

24

Standard Error of Estimate

S

YX

SSE n

2

 i n 

1

Y

Y i

ˆ

2 n

2

• Measures the standard deviation

(variation) of the Y values around the regression equation

25

Measures of Variation:

Produce Store Example

Excel Output for Produce Stores

R e g r e ssi o n S ta ti sti c s

M u l t i p l e R 0 . 9 7 0 5 5 7 2

R S q u a r e 0 . 9 4 1 9 8 1 2 9

A d j u s t e d R S q u a r e 0 . 9 3 0 3 7 7 5 4

S t a n d a r d E r r o r 6 1 1 . 7 5 1 5 1 7 r 2 = .94

O b s e r va t i o n s n

7

94% of the variation in annual sales can be explained by the variability in the size of the store as measured by square footage.

S yx

26

Linear Regression

Assumptions

• Normality

– Y values are normally distributed for each X

– Probability distribution of error is normal

• Homoscedasticity (Constant Variance)

• Independence of Errors

27

Consequences of Violation of the Assumptions

• Violation of the Assumptions

– Non-normality (error not normally distributed)

– Heteroscedasticity (variance not constant)

• Usually happens in cross-sectional data

– Autocorrelation (errors are not independent)

• Usually happens in time-series data

• Consequences of Any Violation of the Assumptions

– Predictions and estimations obtained from the sample regression line will not be accurate

– Hypothesis testing results will not be reliable

• It is Important to Verify the Assumptions

28

Variation of Errors Around the Regression Line f(e)

Y values are normally distributed around the regression line.

• For each X value, the “spread” or variance around the regression line is the same.

Y

X

2

X

X

1

Sample Regression Line

29

Purpose of Correlation

Analysis

(continued)

• Sample Correlation Coefficient r is an

Estimate of

 and is Used to Measure the

Strength of the Linear Relationship in the

Sample Observations r

 i n 

1

X i

 i

Y

 i n 

1

X i

X

  i n 

1

Y i

Y

2

30

Features of  and r

• Unit Free

• Range between -1 and 1

• The Closer to -1, the Stronger the

Negative Linear Relationship

• The Closer to 1, the Stronger the Positive

Linear Relationship

• The Closer to 0, the Weaker the Linear

Relationship

31

Pitfalls of Regression Analysis

• Lacking an Awareness of the Assumptions

Underlining Least-Squares Regression

• Not Knowing How to Evaluate the

Assumptions

• Not Knowing What the Alternatives to

Least-Squares Regression are if a

Particular Assumption is Violated

• Using a Regression Model Without

Knowledge of the Subject Matter

32

Strategy for Avoiding the

Pitfalls of Regression

• Start with a scatter plot of X on Y to observe possible relationship

• Perform residual analysis to check the assumptions

• Use a histogram, stem-and-leaf display, box-and-whisker plot, or normal probability plot of the residuals to uncover possible non-normality

33

Strategy for Avoiding the

Pitfalls of Regression

(continued)

• If there is violation of any assumption, use alternative methods (e.g., least absolute deviation regression or least median of squares regression) to least-squares regression or alternative least-squares models (e.g., curvilinear or multiple regression)

• If there is no evidence of assumption violation, then test for the significance of the regression coefficients and construct confidence intervals and prediction intervals

34

Chapter Summary

• Introduced Types of Regression Models

• Discussed Determining the Simple Linear

Regression Equation

• Described Measures of Variation

• Addressed Assumptions of Regression and Correlation

• Discussed Residual Analysis

• Addressed Measuring Autocorrelation

35

Chapter Summary

(continued)

• Described Inference about the Slope

• Discussed Correlation - Measuring the

Strength of the Association

• Addressed Estimation of Mean Values and

Prediction of Individual Values

• Discussed Pitfalls in Regression and

Ethical Issues

36

Download