Algebra Review The equation of a straight line y –

advertisement
Algebra Review


The equation of a straight line
y = mx + b
– m is the slope – the change in y
over the change in x – or rise
over run.
– b is the y-intercept – the value
where the line cuts the y axis.
1
y = 3x + 2
15
10
y
5
0
-5
-10
-15
-5
-4
-3
-2
-1
0
x
1
2
3
4
5
2
Review

y = 3x + 2
–x = 0
y = 2 (y-intercept)
–x = 3
y = 11
–Change in y (+9) divided by the
change in x (+3) gives the
slope, 3.
3
Linear Regression

Example: Tar (mg) and CO
(mg) in cigarettes.
–y, Response: CO (mg).
–x, Explanatory: Tar (mg).
–Cases: 25 brands of cigarettes.
4
Correlation Coefficient

Tar and nicotine
z z

r
22.9796

n 1
24
x

y
r = 0.9575
5
Linear Regression


There is a strong positive
linear association between tar
and nicotine.
What is the equation of the
line that models the
relationship between tar and
nicotine?
6
Linear Model


The linear model is the equation
of a straight line through the
data.
A point on the straight line
through the data gives
ŷ a
predicted value of y, denoted .
7
Residual


The difference between the
observed value of y and the
predicted value of y,ŷ , is
called the residual.
Residual = y  yˆ
8
Residual
9
Line of “Best Fit”


There are lots of straight lines
that go through the data.
The line of “best fit” is the
line for which the sum of
squared residuals is the
smallest – the least squares
line.
10
Line of “Best Fit”
yˆ  b0  b1 x
Least squares
slope:
intercept:
sy
b1  r
sx
b0  y  b1 x
11
Summary of the Data
Tar, x
x  12.216 mg
s x  5.6658 mg
CO, y
y  12.528 mg
s y  4.7397 mg
r  0.9575
12
Least Squares Estimates
4.7397
b1  0.9575
 0.801
5.6658
b0  12.528  0.801(12.216)  2.743
yˆ  2.743  0.801x
Predicted CO  2.743  0.801* Tar
13
Interpretations


Slope – for every 1 mg increase in
tar, the CO content increases, on
average, 0.801 mg.
Intercept – there is not a reasonable
interpretation of the intercept in
this context because one wouldn’t
see a cigarette with 0 mg of tar.
14
Predicted CO = 2.743 + 0.801*Tar
15
Prediction

Least squares line
Predicted CO  2.743  0.801* Tar
for Tar  16.0
Predicted CO  2.743  0.801(16)
Predicted CO  15.56 mg
16
Residual




Tar, x = 16.0 mg
CO, y = 16.6 mg
Predicted, ŷ = 15.56 mg
Residual, y  yˆ = 16.6–15.56
= 1.04 mg
17
Residuals


Residuals help us see if the
linear model makes sense.
Plot residuals versus the
explanatory variable.
– If the plot is a random scatter of
points, then the linear model is the
best we can do.
18
19
Interpretation of the Plot

The residuals appear to have a
pattern. For values of Tar
between 0 and 20 the residuals
tend to increase. The brand
with Tar = 30, appears to have
a large residual.
20
2
(r)

or
2
R
The square of the correlation
coefficient gives the amount
of variation in y, that is
accounted for or explained by
the linear relationship with x.
21
Tar and Nicotine



r = 0.9575
(r)2 = (0.9575)2 = 0.917 or 91.7%
91.7% of the variation in CO
content can be explained by the
linear relationship with Tar
content.
22
Regression Conditions



Quantitative variables – both
variables should be quantitative.
Linear model – does the scatter
plot show a reasonably straight
line?
Outliers – watch out for outliers
as they can be very influential.
23
Regression Cautions




Beware of extraordinary points.
Don’t extrapolate beyond the
data.
Don’t infer x causes y just
because there is a good linear
model relating the two variables.
Don’t choose a model based on
R2 alone.
24
Download