What is regression? Fitting models to data sets regression is most common Linear

advertisement

What is regression?

( source: Roger Hadgraft roger.hadgraft@eng.monash.edu.au

 Fitting models to data sets

 Linear regression is most common

“Line of best fit”

Example

Look at the data

Fuel efficiency

25

20

15

10

5

0

800 1000 1200

M ass (kg) e(i)

1400 1600

Chart | Add trendline

Fuel efficiency y = 0.0126x - 0.8763

R

2

= 0.3427

25

20

15

10

5

0

800 1000 1200

M ass (kg)

1400 1600

Some maths

 Assume we have paired data [x(i), y(i)]

 Simplest model is:

 Y(i) = b1.x(i) + bo

 Where Y is model and y is original data

 This notation matches Excel ’s

Residual or Error: e(i) = Y(i) - y(i)

Minimise

 e(i) 2 - Least Squares approach

Chart | Add trendline

Fuel efficiency y = 0.0126x - 0.8763

R

2

= 0.3427

25

20

15

10

5

0

800 1000 e(i)

1400 1600 1200

M ass (kg)

Regression coefficients e i

2 

 y i

 b 1 .

x i

Choose b1 and

 bo

2

SSE = Sum of the

Squares of the Errors b0 to minimise SSE

  n e i

2

(SSE)

 

0

 bo

2

 n

( y i

 b 1 .

x i

 bo )

(SSE)

 

0

 b 1

2

 n

( y i

 b 1 .

x i

 bo ) x i

Rearranging n .

bo

 b 1

 n x i

  n y i bo

 x i

 b 1

 x i

2   n

Thus : b 1

 n n

 n x i y i

 n

 n x i n

 n x i

2 

 n x i y i x i

 n

2 y i bo

 y

 b 1 .

x

Data Analysis | Regression

 We can do the calculations by hand, or we can use Excel ’s Data Analysis Toolpak

 Tools | Add-ins | Data Analysis

 Once only to activate it

 Tools | Data Analysis | Regression

 Demonstration

Example

Chart | Add trendline

Fuel efficiency y = 0.0126x - 0.8763

R

2

= 0.3427

25

20

15

10

5

0

800 1000 1200

M ass (kg)

1400 1600

Tools | Data Analysis |

Regression

This means that 34% of the variance in fuel consumption is explained by vehicle mass. The remaining

66% belongs to other factors (eg driver behaviour, etc

Is the model any good?

 R 2 = proportion of variance of y data explained by regression equation

=SSR/SST

 SSR = unexplained variance

Total

SST

Sum

  n of Squares

 y i

 y

2

Error Sum of Squares

SSE

  n e i

2

Regression Sum of Squares

SSR

SST

SSE

Tools | Data Analysis |

Regression

R = sqrt(R 2 )

Tools | Data Analysis |

Regression

Compensates for different number of model parameters (in multiple linear regression).

Text page 587

Tools | Data Analysis |

Regression standard deviation of the residuals (but divide by (n-

2) rather than (n-1))

Questions?

Tools | Data Analysis |

Regression

ANOVA = Analysis of

Variance

Tools | Data Analysis |

Regression

SSR, SSE and SST

Tools | Data Analysis |

Regression

Regression df = k-1

Total df = n-1

Residual df=(n-1)-(k-1)=(n-k) k=number of parameters n=number of data points

Tools | Data Analysis |

Regression

Regression MS = SSR/df1

Residual MS = SSE/df2

Tools | Data Analysis |

Regression

F = Reg MS / Residual MS

Tools | Data Analysis |

Regression

Probability of F statistic given df1=1 and df2=18.

This is the probability of no relationship.

Analisis

Other regressions

 Multilinear regression

 Non-linear equations

 Transform the variables, eg logs, powers, etc

 use multi-linear regression to determine coefficients

Download