Chapter_2

advertisement
Chapter 2
Simple Linear
Regression
1
Regression and Model Building
• Regression analysis is a statistical
technique for investigating and modeling
the relationship between variables.
• Equation of a straight line (classical)
y = mx +b
we usually write this as
y = 0 +1x
2
Regression and Model Building
• Not all observations will fall exactly on a
straight line.
y = 0 + 1x + 
where  represents error
- it is a random variable that accounts for the
failure of the model to fit the data exactly.
-  is a random error
3
2.1 Simple Linear Regression
Model
• Single regressor, x; response, y
y  0  1x  
Population
regression model
• 0 – intercept: if x = 0 is in the range, then 0
is the mean of the distribution of the response
y, when x = 0; if x = 0 is not in the range, then
0 has no practical interpretation
• 1 – slope: change in the mean of the
distribution of the response produced by a
unit change in x
• We usually assume  ~ N(0, 2)
4
2.1 Simple Linear Regression
Model
• The response, y, is a random variable
• There is a probability distribution for y at
each value of x
– Mean:
Ey | x   0  1x
– Variance:
Var y | x   Var 0  1x     
2
5
2.2 Least-Squares Estimation of
the Parameters
• 0 and 1 are unknown and must be
estimated
yi  0  1 xi   i , i  1, 2,..., n
Sample
regression
model
• Least squares estimation seeks to minimize
the sum of squares of the differences
2
between
observed
response,
yi),2and the
S ( 0 , the
)



(
y




x


1
i
i
0
1 i
straight line. i
i
6
2.2 Least-Squares Estimation of
the Parameters
• Let ˆ 0 , ˆ 1 represent the least squares
estimators of 0 and 1, respectively.
• These estimators must satisfy:
S
 0
S
1
ˆ0 , ˆ1
ˆ0 , ˆ1
 2 ( yi  ˆ0  ˆ1 xi )  0
i
 2 ( yi  ˆ0  ˆ1 xi )xi  0
i
7
2.2 Least-Squares Estimation of
the Parameters
• Solving the normal equations yields the
ordinary least squares estimators:
ˆ 0  y  ˆ 1x
n
n




y
x




i
i
n
 i 1  i 1 
 yi x i 
n
ˆ 1  i 1
2
n
x 

i
n
2
 i 1 
 xi 
i 1
n
8
2.2 Least-Squares Estimation of
the Parameters
• Simplifying yields the least squares
normal equations:
n
n
i 1
i 1
nˆ0  ˆ1  xi   yi
n
n
n
i 1
i 1
i 1
2
ˆ
0  xi  1  xi   yi xi
ˆ
9
2.2 Least-Squares Estimation of
the Parameters
• The fitted simple linear regression model:
ŷ  ˆ 0  ˆ 1x
• Sum of Squares Notation:
2
 x 

i
n
n
2
i 1


2
Sxx   x i 
  x i  x 
i 1
i 1
n
n
n




y
x




i
i
n
n
i 1
i 1




Sxy   y i x i 
  y i x i  x 
i 1
i 1
n
n
10
2.2 Least-Squares Estimation of
the Parameters
• Sovel the least squres normal equation, then,
we can get the estimate of the two
coefficiants
ˆ  y
ˆ x

0
1
n
n




y
x




i
i 
n
 i 1  i 1 
 yi x i 
n
ˆ  i 1

1
2
n
x 

i 
n
2
 i 1 
 xi 
i 1
n
11
• Now let's calculate the least square
estimators in R using heights data
manually first
• Then we can get the two estimators by
lm() funtion in R directly.
• Compare the two results.
Example 2.1 – The Heights Data
HUSBAND (in cm)
WIFE (in cm)
180
168
160
154
186
166
163
162
172
152
192
179
170
163
174
172
191
170
182
170
178
147
181
165
168
162
...
...
13
14
Example 2.1- Height Data
Let's find the coefficients of the linear model using least
square estimator.
Let's calculate Sxx and Sxy in R first (See R code in fie
Chaper2_simple_lin.txt" )
15
Example 2.1- Heights Data
S
xx
ˆ
1 

S xy
ˆ0  y  ˆ1 x
So the least squares regression line is
16
2.2 Least-Squares Estimation of the Parameters
• Just because we can fit a linear model
doesn’t mean that we should. We need
to consider the following seriously:
– How well does this equation fit the data?
– Is the model likely to be useful as a
predictor?
– Are any of the basic assumptions (such as
constant variance and uncorrelated errors)
violated, if so, how serious is this?
17
2.2 Least-Squares Estimation of
the Parameters
• Residuals: ei  yi  ŷi
• Residuals will be used to determine the
adequacy of the model
18
2.2 Least-Squares Estimation of the
Parameters
Computer Output (R) will be shown in
class.
19
• You may expore some of the properties of
OLS estimation by heights data
For example :
• Sum of all the fitted values
• sum of the observed values
• sum of the errors (residuals)
• We will learn more in next lecture.
2.2.2 Properties of the Least-Squares
Estimators and the Fitted Regression Model
• The ordinary least-squares (OLS) estimator
of the slope is a linear combinations of the
observations, yi:
ˆ1 
where
S xy
S xx
n
  ci yi
i 1
Useful in showing expected
value and variance properties
n
n
ci  ( xi  x ) / S xx ,  ci  0,  c  1
i 1
i 1
2
i
21
2.2.2 Properties of the Least-Squares
Estimators and the Fitted Regression Model
• The least-squares estimators are unbiased
estimators of their respective parameter:
 
E ˆ1  1
 
E ˆ0   0
• The variances are
 
Var ˆ1 
2
S xx
 
Var ˆ0
2


1
x
2
  

n
S
xx 

• The OLS estimators are Best Linear Unbiased
Estimators (BLUE)
22
2.2.2 Properties of the Least-Squares Estimatorsand
the Fitted Regression Model
•
Useful properties of the least-squares fit
1.  yi  ŷi    ei  0
n
n
i 1
n
i 1
n
i 1
i 1
2.  yi   ŷi
3. The least-squares regression line
always passes through the centroid y, x  of
the data.
n
4.  x i ei  0
i 1
n
5.  ŷi ei  0
i 1
23
2.2.3 Estimation of 2
•
Residual (error) sum of squares
2
n
SS Re s
n
2
ˆ
   yi  yi    ei
i 1
i 1
n
  y  ny  ˆ1S xy
i 1
SST 
2
i
2
y

y


i

 SST  ˆ1S xy
Linear Regression Analysis 5E
Montgomery, Peck & Vining
24
2.2.3 Estimation of 2
•
Unbiased estimator of 2
SS Re s
ˆ 
 MSRe s
n2
2
•
The quantity n – 2 is the number of
degrees of freedom for the residual
sum of squares.
25
2.2.3 Estimation of 2
•
̂ 2 depends on the residual sum of
squares. Then:
– Any violation of the assumptions on the
model errors could damage the
usefulness of this estimate
– A misspecification of the model can
damage the usefulness of this estimate
– This estimate is model dependent
26
2.3 Hypothesis Testing on the
Slope and Intercept
• Three assumptions needed to apply
procedures such as hypothesis testing
and confidence intervals. Model
errors, i,
– are normally distributed
– are independently distributed
– have constant variance
i.e.
i ~ NID(0, 2)
27
2.3.1 Use of t-tests
 
MSRe s
ˆ
se 1 
S xx
t  / 2,n  2
28
Let’s test the significance of linear regression using height
data.
First we can calculate the test statistics and critical value
manually and compare.
Alternatively we can use p-value approach.
Finally we can compare the value we calculated manually
with the ones returned from function lm()
T0: test statistic
P-value
for
testing
beta1
=0
2.3.2 Testing Significance of Regression
H 0:  1 = 0 H 1:  1  0
• This tests the significance of
regression; that is, is there a linear
relationship between the response and
the regressor.
• Failing to reject 1 = 0, implies that there
is no linear relationship between y and x
31
2.3.1 Use of t-tests
Intercept
H0: 0 = 00 H1: 0  00
• Standard error of the intercept: se  ˆ  
0
 1 x2 
MSRe s  

n
S
xx 

ˆ0   00
• Test statistic: t0  se ˆ
0
 
• Reject H0 if |t0| > t  / 2 , n  2
• Can also use the P-value approach
32
Testing
significance
of regression
33
2.3.3 The Analysis of Variance Approach
• Partitioning of total variability
yi  y   yˆ i  y    yi  yˆ i 
 y
i
 y
2
   yˆ i  y     yi  yˆ i 
2
   yˆ i  y     yi  yˆ i 
2
2
 2  yˆ i  y  yi  yˆ i 
or
  yi  y 
SST
•
2
2
SS R
SS Re s
It can be shown that SSR  ˆ1S xy
34
2.3.3 The Analysis of Variance
• Degrees of Freedom
 y
 y     yˆ i  y     yi  yˆ i 
2
2
i
SST
n 1
SS R

1
2
SS Re s
 ( n  2)
• Mean Squares
SS R
MS R 
1
MS Re s
SS Re s

n2
35
2.3.3 The Analysis of Variance
• ANOVA procedure for testing H0: 1 = 0
Source of Variation
Regression
Residual
Total
Sum of
Squares
SSR
SSRes
SST
DF
MS
F0
1
n-2
n-1
MSR
MSRes
MSR/MSRes
• A large value of F0 indicates that regression is
significant; specifically, reject if F0 > F ,1,n 2
• Can also use the P-value approach
36
Analysis of Variance Table for height data
Source of
Variation
Regression
Sum of
Squares
SSR=4497.9
DF
Residual
Total
SSRes=3294.5
SST=7792.4
93
94
1
MS
F0
MSR=4497.9 126.97
(MSR/MSRes)
MSRes=35.4
So the percent of the total varibility in response variable
(WIFE) is explained by the regression model is
R=4497.9/7792.4 =57%
2.3.3 The Analysis of Variance
Relationship between t0 and F0:
• For H0: 1 = 0, it can be shown that:
t02  F0
So for testing significance of regression, the
t-test and the ANOVA procedure are
equivalent (only true in simple linear
regression)
Linear Regression Analysis 5E
Montgomery, Peck & Vining
38
Confidence Interval of a true parameter
Confidence interval = sample statistic + Margin of error
margin of error = critical value * Standard deviation of statistic
( if you know the standard deviation)
Margin of error = Critical value x Standard error of the statistic
If you know the standard deviation of the statistic, use the first
equation to compute the margin of error. Otherwise, use the
second equation.
2.4 Interval Estimation in Simple
Linear Regression
• 100(1-)% Confidence interval for Slope
 
 
ˆ1  t /2,n 2 se ˆ1  1  ˆ1  t /2,n 2 se ˆ1
• 100(1-)% Confidence interval for the
Intercept
 
 
ˆ0  t /2,n 2 se ˆ0   0  ˆ0  t /2,n 2 se ˆ0
 
MSRe s
ˆ
se 1 
S xx
2


1
x
se ˆ0  MSRe s  

n
S
xx 

 
Linear Regression Analysis 5E
Montgomery, Peck & Vining
40
Linear Regression Analysis 5E
Montgomery, Peck & Vining
41
2.4 Interval Estimation in Simple
Linear Regression
• 100(1-)% Confidence interval for 2
(n  2) MS RES
2

/ 2, n  2
 
2
(n  2) MS RES
12 / 2,n  2
• Refer “chapter_sample_lin.txt” for how to
calculate a 95th for 2
Linear Regression Analysis 5E
Montgomery, Peck & Vining
42
2.4.2 Interval Estimation of the Mean
Response
• Let x0 be the level of the regressor variable at which
we want to estimate the mean response, i.e.
E  y | x0    y|x0
– Point estimator of E  y | x0 
once the model is fit:
E  y | x0   ˆ y|x0  ˆ0  ˆ1 x0
– In order to construct a confidence interval on the
mean response, we need the variance of the
point estimator.
43
2.4.2 Interval Estimation of the Mean
Response
• The variance of



ˆ y| x0
is

ˆ 
ˆ x  Var[ y  
ˆ ( x  x )]
ˆ y| x0  Var 
Var 
0
1 0
1
0
ˆ ( x  x )]
 Var ( y )  Var[
1
0
2

(
x

x
)
 2  2 ( x0  x ) 2
1


 2   0

n
S xx
S xx
 n

44
2.4.2 Interval Estimation of the Mean
Response
• 100(1-)% confidence interval for E(y|x0)
ˆ y| x0  t  / 2,n  2
 1 ( x0  x ) 2 
MS Re s  
  E ( y | x0 )
S xx 
 n
 ˆ y| x0  t  / 2,n  2
 1 ( x0  x ) 2 
MS Re s  

S xx 
 n
45
See pages 34-35, text
46
2.5 Prediction of New
Observations
• Suppose we wish to construct a prediction
interval on a future observation, y0
corresponding to a particular level of x, say x0.
• The point estimate would be:
ŷ0  ˆ0  ˆ1 x0
• The confidence interval on the mean response
at this point is not appropriate for this situation.
Why?
Linear Regression Analysis 5E
47
Montgomery, Peck & Vining
2.5 Prediction of New
Observations
• Let the random variable,  be   y0  ŷ0
•  is normally distributed with
– E() = 0
2

(
x

x
)
1
2
ˆ


– Var() = Var y0  y0   1   0

S xx 
 n
(y0 is independent of ŷ0 )
48
2.5 Prediction of New
Observations
• 100(1 - )% prediction interval on a future
observation, y0, at x0
yˆ 0  t  / 2, n  2
 1 ( x0  x ) 2 
MS Re s 1  
  y0
S xx 
 n
 yˆ 0  t  / 2,n  2
 1 ( x0  x ) 2 
MS Re s 1  

S xx 
 n
Linear Regression Analysis 5E
Montgomery, Peck & Vining
49
Linear Regression Analysis 5E
Montgomery, Peck & Vining
50
2.6 Coefficient of Determination
• R2 - coefficient of determination
SSRe s
SS R
R 
 1
SST
SST
2
• Proportion of variation explained by the
regressor, x
• For the rocket propellant data
Linear Regression Analysis 5E
Montgomery, Peck & Vining
51
2.6 Coefficient of Determination
• R2 can be misleading!
– Simply adding more terms to the model will
increase R2
– As the range of the regressor variable
increases (decreases), R2 generally
increases (decreases).
– R2 does not indicate the appropriateness of a
linear model
Linear Regression Analysis 5E
Montgomery, Peck & Vining
52
2.7 Considerations in the Use of
Regression
•
•
•
•
Extrapolating
Extreme points will often influence the slope.
Outliers can disturb the least-squares fit
Linear relationship does not imply cause-effect
relationship (interesting mental defective
example in the book pg 40)
• Sometimes, the value of the regressor variable
is unknown and itself be estimated or predicted.
Linear Regression Analysis 5E
Montgomery, Peck & Vining
53
Linear Regression Analysis 5E
Montgomery, Peck & Vining
54
2.8 Regression Through the
Origin
• The no-intercept model is
y  1 x  
• This model would be appropriate for situations
where the origin (0, 0) has some meaning.
• A scatter diagram can aid in determining where
an intercept- or no-intercept model should be
used.
• In addition, the practitioner could test both
models. Examine t-tests, residual mean square.
Linear Regression Analysis 5E
Montgomery, Peck & Vining
55
2.9 Estimation by Maximum
Likelihood
• The method of least squares is not the only method of
parameter estimation that can be used for linear
regression models.
• If the form of the response (error) distribution is known,
then the maximum likelihood (ML) estimation method
can be used.
• In simple linear regression with normal errors, the MLEs
for the regression coefficients are identical to that
obtained by the method of least squares.
• Details of finding the MLEs are on page 47.
• The MLE of 2 is biased. (Not a serious problem here
though.)
Linear Regression Analysis 5E
Montgomery, Peck & Vining
56
Linear Regression Analysis 5E
Montgomery, Peck & Vining
57
Identical to OLS
Biased
58
59
2.10 Case Where the Regressor is
Random
• When x and y have an unknown joint
distribution, all of the “fixed x’s” regression
results hold if:
Linear Regression Analysis 5E
Montgomery, Peck & Vining
60
61
Linear Regression Analysis 5E
Montgomery, Peck & Vining
62
63
The sample correlation coefficient is related to the slope:
Also,
64
Linear Regression Analysis 5E
Montgomery, Peck & Vining
65
Linear Regression Analysis 5E
Montgomery, Peck & Vining
66
67
68
Download