Chapter 2 Simple Linear Regression

advertisement
Chapter 2 Simple Linear Regression
Ray-Bing Chen
Institute of Statistics
National University of Kaohsiung
1
2.1 Simple Linear Regression Model
• y = 0 + 1 x + 
– x: regressor variable
– y: response variable
– 0: the intercept, unknown
– 1: the slope, unknown
– : error with E() = 0 and Var() = 2
(unknown)
• The errors are uncorrelated.
2
• Given x,
E(y|x) = E(0 + 1 x + ) = 0 + 1 x
Var(y|x) = Var(0 + 1 x + ) = 2
• Responses are also uncorrelated.
• Regression coefficients: 0, 1
– 1: the change of E(y|x) by a unit change in x
– 0: E(y|x=0)
3
2.2 Least-squares Estimation of the
Parameters
2.2.1 Estimation of 0 and 1
• n pairs: (yi, xi), i = 1, …, n
• Method of least squares: Minimize
n
S (  0 , 1 )   [ y i  (  0  1 xi )] 2
i 1
4
•
• Least-squares normal equations:
5
• The least-squares estimator:
6
• The fitted simple regression model:
– A point estimate of the mean of y for a
particular x
• Residual:
– An important role in investigating the
adequacy of the fitted regression model and in
detecting departures from the underlying
assumption!
7
• Example 2.1: The Rocket Propellant Data
– Shear strength is related to the age in weeks of
the batch of sustainer propellant.
– 20 observations
– From scatter diagram, there is a strong
relationship between shear strength (y) and
propellant age (x).
– Assumption
y = 0 + 1 x + 
8
9
•
S xx   xi2  nx 2  1106.56
i
S xy   xi yi  nx y  41112.65
•
i
ˆ1 
S xy
S xx
 37.15
ˆ0  y  ˆ1 x  2627.82
• The least-square fit:
yˆ  2627.82  37.15 x
10
• How well does this equation fit the data?
• Is the model likely to be useful as a predictor?
• Are any of the basic assumption violated and if so
how serious is this?
11
2.2.2 Properties of the Least-Squares Estimators
and the Fitted Regression Model
• ˆ1 and ˆ0 are linear combinations of yi
n
ˆ1   ci y i , ci  ( xi  x ) / S xx
i 1
• ˆ0  y  ˆ1 x are unbiased estimators.
ˆ1 and ˆ0
12
n
• E ( ˆ1 )  E ( ci y i )   ci E ( y i )
i 1
i
  ci (  0   1 xi )   1
i
E ( ˆ 0 )  E ( y  ˆ1 x )   0   1 x   1 x   0
• Var ( ˆ )  Var ( c y )  c 2Var ( y )
i i i
1
i
i
  2  ci2 
i
2
S xx2
i
2
(
x

x
)

 i
i
2
S xx
2
1
x
Var ( ˆ0 )   2 ( 
)
n S xx
13
• The Gauss-Markov Theorem: ˆ1 and ˆ0
are the best linear unbiased estimators (BLUE).
–
14
• Some useful properties:
– The sum of the residuals in any regression
model that contains an intercept 0 is always 0,
i.e.
e  (y
i
i
i
i
 yˆ i )   ( y i  y  ˆ1 ( xi  x ))  0
i
–  yi   yˆ i
i
i
– Regression line always passes through the
centroid point of data, ( x , y )
–  xi ei   xi ( yi  y  ˆ1 ( xi  x ))  0
i
i
–  yˆ i ei   ( y  ˆ1 ( xi  x ))(( yi  y )  ˆ1 ( xi  x ))  0
i
i
15
2.2.3 Estimator of 2
• Residual sum of squares:
SS Re s   e   ( y i  yˆ i )
2
i
i
2
i
  ( y i  y  ˆ1 ( xi  x )) 2
i
  ( y i  y )  ˆ1 S xy
2
i
 SS T  ˆ1 S xy
16
2
E
(
SS
)

(
n

2
)

• Since
,
E
2 is
the unbiased estimator
of

n2
E
ˆ 2 

MS
SS E
– MSE is called the residual mean square.
– This estimate is model-dependent.
• Example 2.2
17
2.2.4 An Alternate Form of the Model
• The new regression model:
y i   0   1 ( xi  x )   1 x   i
 (  0   1 x )   1 ( xi  x )   i
  0'  1 ( xi  x )   i
• Normal equations:
nˆ0'   y i
i
ˆ1  ( xi  x ) 2   y i ( xi  x )
i
i
• The least-squares estimators:
ˆ0'  y and ˆ1 
S xy
S xx
18
• Some advantages:
– The normal equations are easier to solve
– ˆ0'  y and ˆ1 
S xy
S xx
are uncorrelated.
– yˆ  y  ˆ1 ( x  x )
19
2.3 Hypothesis Testing on the Slope and
Intercept
• Assume εi are normally distributed
• yi ~ N(0 + 1 xi , 2 )
2.3.1 Use of t-Tests
• Test on slope:
– H0: 1 = 10 v.s. H1: 1  10
– 1 ~ N (1 ,  / S xx )
ˆ
2
20
• If 2 is known, under null hypothesis,
Z0 
ˆ1  10
 2 / S xx
~ N (0,1)
• (n-2) MSE/2 follows a 2n-2
• If 2 is unknown,
t0 
ˆ1  10
MS E / S xx
ˆ1  10

~ t n2
se( ˆ1 )
• Reject H0 if |t0| > t/2, n-2
21
• Test on intercept:
– H0: 0 = 00 v.s. H1: 0  00
– If 2 is unknown
ˆ0   00
ˆ0   00
t0 

~ t n2
2
se( ˆ0 )
MS E (1 / n  x / S xx )
– Reject H0 if |t0| > t/2, n-2
22
2.3.2 Testing Significance of Regression
• H0: 1 = 0 v.s. H1: 1  0
• Accept H0: there is no linear relationship between
x and y.
23
• Reject H0: x is of value in explaining the
variability in y.
•
ˆ1
t0 
~ t n2
se( ˆ1 )
• Reject H0 if |t0| > t/2, n-2
24
• Example 2.3:The Rocket Propellant Data
– Test significance of regression
– ˆ  37.15
1
– MSE = 9244.59
–
MS E
se( ˆ1 ) 
S xx
 2.89
– the test statistic is
ˆ1
t0 
 12.85
se( ˆ1 )
– t0.0025,18 = 2.101
– Reject H0
25
26
2.3.3 The Analysis of Variance (ANOVA)
• Use an analysis of variance approach to test
significance of regression
–
–
27
2
2
2
ˆ
ˆ
(
y

y
)

(
y

y
)

(
y

y
)
 i
 i i
–  i
i
i
– SST: the corrected sum of squares of the
observations. It measures the total variability in
the observations.
– SSRes: the residual or error sum of squares
– The residual variation left unexplained by the
regression line.
– SSR: the regression or model sum of squares
– The amount of variability in the observations
accounted for by the regression line
– SST = SSR + SSRes
28
– SS R  ̂1 S xy
– The degree-of-freedom:
• dfT = n-1
• dfR = 1
• dfRes = n-2
• dfT = dfR + dfRes
– Test significance regression by ANOVA
• SSRes = (n-2) MSRes ~ n-2
• SSR = MSR ~ 1
• SSR and SSRes are independent
•
SS R / 1
MS R
F0 
SS Re s /( n  2)

MS Re s
~ F1,n2
29
• E(MSRes) = 2
• E(MSR) = 2 + 12 Sxx
• Reject H0 if F0 > F/2,1, n-2
– If 1 0, F0 follows a noncentral F with 1 and
n-2 degree of freedom and a noncentrality
parameter
 2S

1
xx
2
30
• Example 2.4: The Rocket Propellant Data
31
• More About the t Test
ˆ
ˆ


1
– t  1 
0
– t2 
0
se( ˆ1 )
ˆ12 S xx
MS Re s
MS Re s / S xx
ˆ1 S xy
MS R


 F0
MS Re s MS Re s
– The square of a t random variable with f degree
of freedom is a F random variable with 1 and f
degree of freedom.
32
2.4 Interval Estimation in Simple Linear
Regression
2.4.1 Confidence Intervals on 0, 1 and 2
• Assume that εi are normally and independently
distributed
33
• 100(1-)% confidence intervals on 0, 1 are
given:
• Interpretation of C.I.
• Confidence interval for 2:
34
• Example 2.5 The Rocket Propellant Data
•
35
•
36
2.4.2 Interval Estimation of the Mean Response
• Let x0 be the level of the regressor variable for
which we wish to estimate the mean response.
• x0 is in the range of the original data on x.
• An unbiased estimator of E(y| x0) is
37
• ˆ y| x follows a normal distribution.
•
0
ˆ y| x
0
38
• A 100(1-)% confidence interval on the mean
response at x0:
39
Example 2.6 The Rocket Propellant Data
40
41
• The interval width is a minimum for x0  x and
widens as | x0  x | increases.
• Extrapolation
42
2.5 Prediction of New Observations
• yˆ 0  ˆ0  ˆ1 x0 is the point estimate of the new value
of the response ŷ 0
•   y 0  ŷ 0follows a normal distribution with mean
0 and variance
1 ( x0  x )
Var ( )  Var ( y0  yˆ 0 )   [1  
]
n
S xx
2
43
• The 100(1-)% confidence interval on a future
observation at x0 (a prediction interval for the
future observation y0)
44
• Example 2.7:
45
46
• The 100(1-)% confidence interval on y 0
47
2.6 Coefficient of Determination
• The coefficient of determination:
SS Re s
SS R
R 
 1
SST
SST
2
• The proportion of variation explained by the
regressor x
• 0  R2  1
48
• In Example 2.1, R2 = 0.9018. It means that
90.18% of the variability in strength is accounted
for by the regression model.
• R2 can be increased by adding terms to the model.
• For a simple regression model,
2
ˆ
1 S xx
2
E(R )  2
ˆ1 S xx   2
• E(R2) increases (decreases) as Sxx increases
(decreases)
49
• R2 does not measure the magnitude of the slope of
the regression line. A large value of R2 imply a
steep slope.
• R2 does not measure the appropriateness of the
linear model.
50
2.7 Some Considerations in the Use of
Regression
• Only suitable for interpretation over the range of
the regressors, not for extrapolation.
• Important: The disposition of the x values. Slope
strongly influenced by the remote values of x.
• Outliers and bad values can seriously disturb the
least-square fit. (intercept and the residual mean
square)
• Don’t imply the cause and effect relationship
51
52
53
• yˆ  4.582  2.204 x1
• The t statistic for
testing H0: 1= 0 for
this model is t0 =
27.312 and R2 =
0.9842
54
• x may be unknown. For example: consider
predicting maximum daily load on an electric
power generation system from a regression model
relating the load to the maximum daily
temperature.
55
2.8 Regression Through the Origin
• A no-intercept model is
• Given (yi, xi), i = 1 2 ,…, n,
56
• The 100(1-)% confidence interval on 1
• The 100(1-)% confidence interval on E(y| x0)
• The 100(1-)% confidence interval on y0
57
• Misuse: data lie in a region of x-space remote
from the origin.
58
• The residual mean square, MSRes
• Generally R2 is not a good comparative statistic
for two models.
– For the intercept model,
R2 
2
ˆ
(
y

y
)
 i
i
2
(
y

y
)
 i
i
– For the no-intercept model,
R02 
2
ˆ
y
 i
i
 yi
2
i
– Occasionally R02 > R2 , but MS0,Res < MSRes
59
• Example 2.8 The Shelf-Stocking Data
60
61
62
63
2.9 Estimation by Maximum Likelihood
• Assume that the errors are NID(0, 2). Then yi
~N(0 + 1xi, 2)
• The likelihood function:
64
• MLE v.s. LSE
– In general MLE have better statistical
properties than LSE.
– MLE are unbiased (asymptotically unbiased)
and have minimum variance when compare to
all the other unbiased estimators.
– They are also consistent estimators.
– They are a set of sufficient statistics.
65
– MLE requires more stringent statistical
assumptions than LSE.
– LSE only need to have the second moment
assumptions.
– MLE require a full distributional assumption.
66
2.10 Case Where the Regressor x Is
Random
2.10.1 x and y Jointly Distributed
• x and y are jointly distributed r.v. and this joint
distribution is unknown.
• All of our previous results hold if
– y|x ~ N(0 + 1x, 2)
– The x’s are independent r.v.’s whose probability
distribution does not involve 0, 1, 2
67
2.10.2 x and y Jointly Normally Distributed: the
Correlation Model
•
68
•
69
• The estimator of 
70
• Test on 
• 100(1-)% C.I. for 
71
• Example 2.9 The Delivery Time Data
72
Download