Simple Linear Regression
Deterministic Relationship
īŽ
If the value of y (dependent) is completely determined by
the value of x (Independent variable)
(Like an equation in the form y = 2x + 10, or f(x) = 5x-1)
īŽ
However, in most situations, the variables of interest are
not deterministically related!
For example, the value of y = 1st year college GPA is
certainly not determined solely by x = high school
GPA.
Probabilistic Model
īŽ
A description of the relation between 2 variables x and y
that are not deterministic.
īŽ
The general form allows y to be larger or smaller than f(x)
by a random amount, e.
đ = đ
đđđđđđđđđđđđ đđđđđđđđ đđ đ + đđđđ
đđ đ
đđđđđđđđ
đ=đ đ +đ
Let x* denote the value of x….
īŦ y īž f ( x ) if e īž 0
ī¯
*
ī y īŧ f ( x ) if e īŧ 0
ī¯ y īŊ f ( x* ) if e īŊ 0
īŽ
*
Without the random deviation e, all observed (x, y) points
would fall exactly on the population regression line. The
inclusion of e in the model equation recognizes that
points will deviate from the line.
Simple Linear
Regression Model:
y īŊīĄ īĢ īĸxīĢe
īŽ
e1
Assumptions about the distribution of e
=0
đđ = 0
ī¨ St. Dev.
īŗ is the same for any value of x.
ī¨ Distribution of e at any x value is normal
ī¨ Random deviations đ1 , đ2 , đ3 … . , đđ associated
with different observations are independent of
one another
ī¨ Mean
Slope
y īŊīĄ īĢ īĸxīĢe
Population Regression Line
īŽ
Average change in y associated with a 1 unit increase in
x.
īŽ
Point estimate is the slope (b). (Population is đˇ)
īŽ
Y-intercept’s point estimate is a. (Population is đļ)
Summary
đđĨđĻ
īŽ
đ = đđđđđĄ đđ đĄđđđđĄđ đđ đŊ =
īŽ
đ = đđđđđĄ đđ đĄđđđđĄđ đđ đŧ = đĻ − đđĨ
where
đđĨđĻ =
īŽ
đĨđĻ −
đĨ
đĻ
đ
and
đđĨđĨ
đđĨđĨ =
2
đĨ −
đĨ 2
đ
The estimated regression line is then just the leastsquares line
đĻ = đ + đđĨ.
X* denotes a specified value of the predictor
variable x ….
*
a
īĢ
bx
īŽ So
has 2 different interpretations
ī¨ It is a point estimate of the true mean y value
when x = x*.
ī¨ It
is a point predictor of an individual y value
that would be observed when x = x*.
Find the point estimate of the mean y-value for
the following:
īŦ x īŊ mother's age
ī
īŽ y īŊ birth weight
Age (x)
Weight
(y)
15
17
18
15
16
19
17
16
18
20
2289 3393 3271 2648 2897 3327 2970 2535 3138 3573
So what’s the point estimate for an 18
year old mom?
Point estimate and point prediction are identical – only the
interpretation is different.
īŽ
Prediction – weight of single baby who mom is 18
īŽ
Estimate – average weight of all babies born to 18 yearolds
Answer the following:
īŽ
Explain the slope in context of the problem
īŽ
Explain the y-intercept in context of the problem.
Find SSResid.
īŽ
īŽ
īĨī¨ y ī yīŠ
2
On calculator – every time you calculate a linear regression – it
calculates the residuals. Put them in list 3 and square them & add
the list.
Point estimate of is đ is đđ .
SSRe sid
Se īŊ
nī2
īŽ
It represents the typical deviation in the y-variable
from the least squares line.
Find the residual for a mother who is 19.
Find the probability that a 19 year old
mother has a baby that is more than 3000 g.
Coefficient of determination (r2)
īŽ
ī¨
SSTot īŊ īĨ y ī y
īŊīĨy
īŽ
īŽ
2
īŠ
2
yīŠ
ī¨
īĨ
ī
2
n
SS resid
r īŊ 1ī
SSTot
2
It’s the amount of variation in the y-variables that can be
explained by the least squares line.
Homework
īŽ
Worksheet