Uploaded by Sơn Nguyễn

Slide FU - W9-Linear Regression

advertisement
PROBABILITY & STATISTICS
Empirical models
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Chapter 11: Simple Linear Regression and
Correlation
Learning objectives
1. Empirical Models
2. Simple Linear Regression
3. Correlation
Correlation
Summary
19/05/2021
Department of Mathematics
1/40
Empirical Models
Empirical models
models
Empirical
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Correlation
• Many problems in engineering and science involve
exploring the relationships between two or more
variables.
• Regression analysis is a statistical technique that is
very useful for these types of problems.
• For example, in a chemical process, suppose that the
yield of the product is related to the process-operating
temperature.
• Regression analysis can be used to build a model to
predict yield at a given temperature level.
Summary
19/05/2021
Department of Mathematics
2/40
Empirical Models
Empirical models
models
Empirical
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Summary
19/05/2021
Department of Mathematics
3/40
Empirical Models
Empirical models
models
Empirical
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Summary
19/05/2021
Figure 11-1 Scatter Diagram of oxygen purity versus
hydrocarbon level from Table 11-1.
Department of Mathematics
4/40
Empirical Models
Empirical models
models
Empirical
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Based on the scatter diagram, it is probably reasonable
to assume that the mean of the random variable Y is
related to x by the following straight-line relationship:
E(Y | x) = µY | x = β0 + β1 x
regression coefficients.
The simple linear regression model is given by
Y = β 0 + β1 x + ε
Summary
random error
19/05/2021
Department of Mathematics
5/40
Empirical Models
Empirical models
models
Empirical
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Summary
19/05/2021
Suppose that the mean and variance of ε are 0 and σ2,
respectively, then
E(Y | x) = E(β0 + β1x + ε ) = β0 + β1x + E(ε ) = β0 + β1x
The variance of Y given x is
V (Y | x) = V (β0 + β1x +ε ) =V (β0 + β1x) +V (ε ) = 0 +σ 2 = σ 2
The true regression model is a line of mean values:
µY | x = β 0 + β1 x
Department of Mathematics
6/40
Empirical Models
Empirical models
models
Empirical
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Summary
19/05/2021
Figure 11-2 The distribution of Y for a given value of x
for the oxygen purity-hydrocarbon data.
Department of Mathematics
7/40
Simple Linear Regression
Empirical models
Simple Linear
Linear
Simple
Regression
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Summary
19/05/2021
• The case of simple linear regression considers a single
regressor or predictor x and a dependent or response
variable Y.
• The expected value of Y at each level of x is a random
variable:
E (Y | x) = β 0 + β1 x
• We assume that each observation, Y, can be described
by the model
Y = β 0 + β1 x + ε
Department of Mathematics
8/40
Simple Linear Regression
Empirical models
Simple Linear
Linear
Simple
Regression
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Summary
19/05/2021
Suppose that we have n pairs of observations (x1, y1), (x2,
y2), …, (xn, yn):
yi = β0 + β1xi +εi , i =1,...,n
Figure 11-3
Deviations of the
data from the
estimated
regression model.
Department of Mathematics
9/40
Simple Linear Regression
Empirical models
Simple Linear
Linear
Simple
Regression
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Summary
19/05/2021
The method of least squares is used to estimate the
parameters, β0 and β1 by minimizing the sum of the
squares of the vertical deviations in Figure 11-3.
Figure 11-3
Deviations of the
data from the
estimated
regression model.
Department of Mathematics
10/40
Simple Linear Regression
Empirical models
Simple Linear
Linear
Simple
Regression
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
The sum of the squares of the deviations of the observations
from the true regression line is
n
n
i =1
i =1
L = ∑ ε i2 = ∑ ( yi − β 0 − β1 x)
2
Correlation
Summary
19/05/2021
Department of Mathematics
11/40
Simple Linear Regression
Empirical models
Simplifying these two equations yields
Simple Linear
Linear
Simple
Regression
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Summary
19/05/2021
βˆ0 , βˆ1 = ?
Notation
 n  n 
 ∑ xi  ∑ yi 
n
n
S xy = ∑ ( yi − y )( xi − x ) = ∑ xi yi −  i =1  i =1 
n
i =1
i =1
Department of Mathematics
12/40
Simple Linear Regression
Empirical models
Simple Linear
Linear
Simple
Regression
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Theorem
The least squares estimates of the intercept and slope in
the simple linear regression model are
βˆ 0 = y − βˆ1 x
S xy
ˆ
β1 =
S xx
Estimated regression line is
yˆ = βˆ0 + βˆ1 x
Summary
19/05/2021
Department of Mathematics
13/40
Simple Linear Regression
Empirical models
Simple Linear
Linear
Simple
Regression
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Summary
19/05/2021
Department of Mathematics
14/40
Simple Linear Regression
Empirical models
Simple Linear
Linear
Simple
Regression
Regression
Example
We will fit a simple linear regression model to the oxygen
purity data in Table 11-1. The following quantities may be
computed:
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Summary
19/05/2021
Department of Mathematics
15/40
Simple Linear Regression
Empirical models
Simple Linear
Linear
Simple
Regression
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Therefore, the least squares estimates of the slope and
intercept are
Correlation
Summary
19/05/2021
Department of Mathematics
16/40
Simple Linear Regression
Empirical models
The fitted simple linear regression model is
Simple Linear
Linear
Simple
Regression
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Summary
19/05/2021
Department of Mathematics
17/40
Simple Linear Regression
Empirical models
Simple Linear
Regression
Estimating σσ22
Estimating
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Estimating σ2
Estimating σ2
The error sum of squares is
We have
E(SSE) = (n – 2)σ2.
SSE = SST − β̂1Sxy
Correlation
Summary
19/05/2021
Department of Mathematics
18/40
Estimating σ2
Simple Linear Regression
Empirical models
Simple Linear
Regression
Estimating σσ22
Estimating
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Estimating σ2
Theorem
An unbiased estimator of σ2 is
σˆ 2 =
where
SS E
n−2
σˆ
Standard error
SSE = SST − β̂1Sxy
Correlation
Summary
19/05/2021
Department of Mathematics
19/40
Hypothesis Tests in Simple Linear Regression
Empirical models
Test on the β1
Simple Linear
Regression
Estimating σ2
Hypothesis
Hypothesis
tests
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Summary
19/05/2021
Test statistic
has the t distribution with n - 2 degrees of freedom.
If |t0| > tα/2, n-2 : reject H0
If |t0| < tα/2, n-2 : fail to reject H0
Department of Mathematics
20/40
Hypothesis Tests in Simple Linear Regression
Empirical models
Simple Linear
Regression
Estimating σ2
Hypothesis
Hypothesis
tests
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Test on the β1
An important special case
These hypotheses relate to the significance of regression.
Failure to reject H0 is equivalent to concluding that there
is no linear relationship between x and Y.
Summary
19/05/2021
Department of Mathematics
21/40
Hypothesis Tests in Simple Linear Regression
Empirical models
Test on the β1
Simple Linear
Regression
Estimating σ2
Hypothesis
Hypothesis
tests
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Summary
19/05/2021
Figure 11-5 The hypothesis H0: β1 = 0 is not rejected.
Department of Mathematics
22/40
Hypothesis Tests in Simple Linear Regression
Empirical models
Test on the β1
Simple Linear
Regression
Estimating σ2
Hypothesis
Hypothesis
tests
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Summary
19/05/2021
Figure 11-6 The hypothesis H0: β1 = 0 is rejected.
Department of Mathematics
23/40
Hypothesis Tests in Simple Linear Regression
Empirical models
Simple Linear
Regression
Estimating σ2
Hypothesis
Hypothesis
tests
tests
Confidence
intervals
Prediction
Adequacy
Example
We will test for significance of regression using the model
for the oxygen purity data from Table 11-1. The hypotheses
are
and we will use α = 0.01.
Recall
Correlation
Summary
19/05/2021
Test statistic
Department of Mathematics
24/40
Hypothesis Tests in Simple Linear Regression
Empirical models
Simple Linear
Regression
Estimating σ2
Hypothesis
Hypothesis
tests
tests
Confidence
intervals
Prediction
Adequacy
tα/2, n-2 = t0.005,18 = 2.88 < |t0|
Reject H0
Test on the β0
If |t0| > tα/2, n-2 : reject H0
If |t0| < tα/2, n-2 : fail to reject H0
Test statistic
Correlation
Summary
19/05/2021
Department of Mathematics
25/40
Confidence Intervals
Empirical models
Confidence Intervals on the Slope and Intercept
Simple Linear
Regression
Under the assumption that the observations are normally
and independently distributed, a 100(1-α)% confidence
interval on the slope β1 in simple linear regression is
Estimating σ2
Hypothesis
tests
Confidence
Confidence
intervals
intervals
Prediction
Adequacy
Correlation
Summary
19/05/2021
βˆ1 − tα / 2 , n − 2
σˆ 2
S xx
≤ β 1 ≤ βˆ1 + tα / 2 , n − 2
σˆ 2
S xx
Similarly, a 100(1-α)% confidence interval on the
intercept β0 is
βˆ0 − tα / 2,n −2
2
1 x2 


1
x
2
ˆ +t
ˆ
σˆ  +
β
β
σ
≤
≤
+
α / 2, n − 2
0
0



n
S
n
S
xx 
xx 


2
Department of Mathematics
26/40
Confidence Intervals
Empirical models
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
Confidence
intervals
intervals
Example
We will find a 95% confidence interval on the slope of the
regression line using the data in Table 11-1.
Recall
CI 95% for β1 :
Prediction
Adequacy
Correlation
Summary
19/05/2021
Department of Mathematics
27/40
Confidence Intervals
Empirical models
Confidence Interval on the Mean Response
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
Confidence
intervals
intervals
Prediction
Adequacy
Correlation
A 100(1-α)% confidence interval about the mean
response at the value of x=x0 is given by
µˆY|x − tα / 2,n−2
0
2


1
(
)
x
x
−
2
0
σˆ  +

S
n
xx


≤ µY|x0 ≤ µˆY |x0 + tα / 2,n−2
2


1
(
)
x
x
−
2
0
σˆ  +

n
S
xx


Summary
19/05/2021
Department of Mathematics
28/40
Confidence Intervals
Empirical models
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
Confidence
intervals
intervals
Example
We will find a 95% confidence interval about the mean
response for the data in Table 11-1.
The fitted model is
If we are interested in predicting mean oxygen
purity when x0 = 100% then
Prediction
Adequacy
Correlation
CI 95% on
Summary
19/05/2021
Department of Mathematics
29/40
Confidence Intervals
Empirical models
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
Confidence
intervals
intervals
Prediction
Adequacy
Correlation
Scatter diagram of
oxygen purity with
fitted regression line
and 95% confidence
limits on µY|x0.
Summary
19/05/2021
Department of Mathematics
30/40
Prediction of New Observations
Empirical models
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Prediction
Adequacy
A 100(1-α)% prediction interval on a future observation
Y0 at the value x0 is given by
yˆ 0 − tα / 2,n − 2
 1 ( x0 − x ) 2 
σˆ 1 + +

n
S
xx


2
≤ Y0 ≤ yˆ 0 + tα / 2,n − 2
 1 ( x0 − x ) 2 
σˆ 1 + +

n
S
xx


2
Return to Table 11.1, confidence interval 95% on
Y0 at x0 = 100%
Correlation
Summary
19/05/2021
Department of Mathematics
31/40
Prediction of New Observations
Empirical models
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Prediction
Adequacy
Correlation
Summary
19/05/2021
Scatter diagram of oxygen purity data from Table 11.1
with fitted regression line, 95% prediction limits, and
95% confidence limits on µY|x0.
Department of Mathematics
32/40
Adequacy of the Regression Model
Empirical models
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Adequacy
Correlation
•
Fitting a regression model requires several
assumptions.
1. Errors are uncorrelated random variables with
mean zero;
2. Errors have constant variance; and,
3. Errors be normally distributed.
•
The analyst should always consider the validity of
these assumptions to be doubtful and conduct
analyses to examine the adequacy of the model
Summary
19/05/2021
Department of Mathematics
33/40
Adequacy of the Regression Model
Empirical models
Coefficient of Determination (R2)
Simple Linear
Regression
• The quantity
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Adequacy
Correlation
Summary
19/05/2021
is called the coefficient of determination and is often
used to judge the adequacy of a regression model.
• 0 ≤ R2 ≤ 1;
• We often refer to R2 as the amount of variability in the
data explained or accounted for by the regression model.
Department of Mathematics
34/40
Adequacy of the Regression Model
Empirical models
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Adequacy
Example
For the oxygen purity regression model,
R2 = SSR/SST
= 152.13/173.38
= 0.877
Thus, the model accounts for 87.7% of the variability in the
data.
Correlation
Summary
19/05/2021
Department of Mathematics
35/40
Correlation
Empirical models
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Definition
The sample correlation coefficient
n
R=
∑(X
i =1
n
∑(X
i =1
i
i
− X )(Yi − Y )
− X)
=
n
2
∑ (Y − Y )
i =1
2
S XY
S XX SST
i
Note that
Correlation
Correlation
Summary
19/05/2021
We may also write:
Department of Mathematics
36/40
Correlation
Empirical models
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Properties of the Linear Correlation Coefficient r
1. –1 ≤ r ≤ 1
2. The value of r does not change if all values of either
variable are converted to a different scale.
3. The value of r is not affected by the choice of x and y.
Interchange all x- and y-values and the value of r will
not change.
4. r measures strength of a linear relationship.
Correlation
Correlation
Summary
19/05/2021
Department of Mathematics
37/40
Correlation
y
y
r = −0.91
Empirical models
r = 0.88
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
x
x
Strong negative correlation
y
Strong positive correlation
y
r = 0.07
r = 0.42
Correlation
Correlation
Summary
x
x
Weak positive correlation
19/05/2021
Nonlinear Correlation
Department of Mathematics
38/40
Correlation
Empirical models
Test on the ρ
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Correlation
Correlation
Test statistic
T0 =
R n−2
1− R2
has the t distribution with n - 2 degrees of freedom.
If |t0| > tα/2, n-2 : reject H0
If |t0| < tα/2, n-2 : fail to reject H0
Summary
19/05/2021
Department of Mathematics
39/40
Correlation
Empirical models
Test on the ρ
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
Test statistic
Z 0 = (arctanh R − arctanh ρ0 ) n − 3
If |t0| > zα/2 : reject H0
If |t0| < zα/2 : fail to reject H0
Correlation
Correlation
Summary
19/05/2021
Department of Mathematics
40/40
SUMMARY
Empirical models
Simple Linear
Regression
Estimating σ2
Hypothesis
tests
Confidence
intervals
Prediction
Adequacy
We have studied:
1. Empirical Models
2. Simple Linear Regression
3. Correlation
Homework: Read slides of the next lecture.
Correlation
Summary
Summary
19/05/2021
Department of Mathematics
41/40
Download