Simple Regression I

advertisement
Simple Regression
1


Correlation tells us how strongly Y and X are
related … but regression estimates the form
of this relationship
We’ll begin with simple regression, which
assumes the form:
Yˆi  b0  b1 X i
Simple Regression
2

Y is the variable we want to predict

We believe X influences how Y behaves

Ŷi is the estimated value of Y at Xi

b0 is the Y-intercept in the equation

b1 is the slope of the regression line
Simple Regression
3

Our goal: Find the straight line that best fits
the data we’ve collected

The best equation will be the one that
minimizes the error in fit

The equation is:

The fit error is thus:
Yˆi  b0  b1 X i
ei  Yi  Yˆi
Simple Regression
4
14
+ Errors
12
10
8
6
- Errors
4
2
0
0
1
2
3
4
5
Simple Regression
6
7
5

The fit error for the ith point on the
scatterplot diagram is:
ei  Yi  Yˆi

We would like the sum of the + errors to be
the same as the sum of the – errors.

However, there are many lines that can make
this happen.
Simple Regression
6
Simple Regression
7


So, which of these solutions is the best one?
Select the line with the minimum sum of
squared error terms. This is called leastsquares regression.
Simple Regression
8

Intercept:

Slope:
b0  Y  b1 X
SS xy
COVAR( x, y ) n
b1 

*
SS x
Var ( x)
n 1
*
note COVAR here is Excel’s functional calculation which is the
population covariance not the sample covariance
Simple Regression
9




Some values can be calculated directly using
the means, variances, and covariances.
For one-variable (simple) regression, can add
a trendline to a chart.
Can use the Data Analysis Tool, Regression
Can use the Excel function LINEST.
Simple Regression
10
25
y = 0.0297x + 0.1912
20
Y
15
10
5
0
0
200
400
600
800
X
Uses Excel’s Trend Line function
Simple Regression
11
Simple Regression
12
The LINEST function must be entered as an array
formula. For the example, highlight the cells E3:F7,
type the formula “=LINEST(Orders,Weight,1,1)”, then
CTRL-SHFT-ENTER.
Simple Regression
13



Remember the variables are X = weight in
pounds and Y = orders in 1000s
The estimated intercept (b0) tells us that if
there was no mail, we still have a minimum
of (.1912)(1000) or 191.2 orders per day.
The estimated slope (b1) tells us that each
pound of mail tends to bring with it
(.0297)(1000) or 29.7 orders.
Simple Regression
14
There are two standard ways to judge:
1.
2.
How much of the variation in the Y values
(orders) can be attributed to the different
values of X (weight of mail)?
In general, how small (or large) are the
errors in fit?
Simple Regression
15

The Coefficient of Determination:

The variation in Y explained by the X - Y relationsh ip
R 
The R2 value is: The variation in Y
2
◦ Always between 0 and 1
◦ Is the percentage of variation explained by the
model.
◦ The square of correlation (for simple regression)
Simple Regression
16

ANOVA table: Total variation in the Y values is
SST = 449.76

The amount of unexplained variation is
SSE = 12.12


The difference is thus the variation explained
by the regression equation or
SSR = 449.76 – 12.12 = 437.64
The ratio of explained to total is how we get
R2 = 437.64/449.76 = .973
Simple Regression
17

For every observation i, its error is given by:
ei  Yi  Yˆi


To find the “typical error,”
use this formula:
n
S
e
2
i
i
n2
This is the “Standard Error”, also the √MSE.
Simple Regression
18



The typical error (called the standard error of
prediction) for our regression model is: S = .7258
This means that we typically misestimate the actual
number of orders per day by (.7258)(1000) = 725.8
That may sound like a lot, but you have to consider
that we have between 5 and 20 thousand orders
each day, average (13.22)*(1000) = 13200, then
the percentage error is only 725.8 / 13200 = 5.5%.
Simple Regression
19
Simple Regression
20
Simple Regression
21
Simple Regression
22
Simple Regression
23
Simple Regression
24
Download