Linear Regression Chapter 12: Introduction

advertisement
4/4/2012
Chapter 12:
Linear Regression
1
Introduction
• Regression analysis and Analysis of variance are the two
most widely used statistical procedures.
• Regression analysis:
– Description
– Prediction
– Estimation
2
1
4/4/2012
12.1 Simple Linear Regression
• In (univariate) regression, there is always a single
“dependent” variable, and one or more “independent”
variables.
– Number of non-conforming units is dependent on the amount
of time devoted to maintain control charts
• Simple is used to denote the fact that a single
independent variable is being used.
• Linear is referred to the parameters, not independent
variables.
(12.1)
Y = 0 + 1X + 
Y = 0′ + 1′ X + 𝛽11 𝑋 2 + 
(12.2)
3
12.1 Simple Linear Regression
• Y = 0 + 1X is the general form of the equation for a
straight line.
• Y = 𝛽0 + 1X +  indicates that there is not an exact
relationship between X and Y.
• Regression analysis is not used for variables that have an
9
exact linear relationship. 𝐹 = 5 𝐶 + 32
• 0 and 1 are generally unknown and must be estimated.
• The  is generally thought as an error term.
• Let Y denotes the number of non-conforming units
produced in each month, and X represents the amount of
time devoted to use QC charts each month.
4
2
4/4/2012
Table 12.1 Quality
Improvement Data
Month
January
February
March
April
May
June
July
August
September
October
November
December
Time Devoted to
Quality Impr.
# of Nonconforming
56
58
55
62
63
68
66
68
70
67
72
64
20
19
20
16
15
14
15
13
10
13
9
8
5
Figure 12.1 Scatter Plot
6
3
4/4/2012
Figure 12.1a Scatter Plot
7
12.1 Simple Linear Regression
• Regression equation: a line through the center of the points
minimizing the sum of the squares of the deviations from
each point to the line. (Method of least squares)
• 𝑛𝑖=1 𝜀𝑖 2 is to be minimized where 𝜀𝑖 = 𝑌𝑖 − 𝛽0 − 𝛽1 𝑋𝑖
𝛽1 =
• Round-off error
• Prediction equation
𝑋𝑌 − ( 𝑋)( 𝑌)/𝑛
𝑋 2 − ( 𝑋)2 /𝑛
𝛽0 = 𝑌 − 𝛽1 𝑋
𝑌 = 𝛽0 + 𝛽1 𝑋
8
4
4/4/2012
12.1 Simple Linear Regression
The regression equation is
Y = 55.9 - 0.641 X
Predictor
Constant
X
Coef
55.923
-0.64067
S = 0.888854
SE Coef
2.824
0.04332
R-Sq = 95.6%
T
19.80
-14.79
P
0.000
0.000
R-Sq(adj) = 95.2%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
10
11
SS
172.77
7.90
180.67
MS
172.77
0.79
F
218.67
P
0.000
9
12.1 Simple Linear Regression
• Prediction equation: should only be used for values within
the data range, or slightly outside the interval.
• Descriptive:
𝑌 = 𝛽0 + 𝛽1 𝑋
– A decrease of 0.64 non-conforming units for every additional hour
devoted to quality improvement
10
5
4/4/2012
12.2 Worth of the Prediction
Equation
Obs
1
2
3
4
5
6
7
8
9
10
11
12
X
56.0
58.0
55.0
62.0
63.0
68.0
66.0
68.0
70.0
67.0
72.0
74.0
Y
20.000
19.000
20.000
16.000
15.000
14.000
15.000
13.000
10.000
13.000
9.000
8.000
Fit
20.046
18.765
20.687
16.202
15.561
12.358
13.639
12.358
11.077
12.999
9.795
8.514
SE Fit
0.464
0.395
0.500
0.286
0.270
0.289
0.261
0.289
0.338
0.272
0.400
0.470
Residual
-0.046
0.235
-0.687
-0.202
-0.561
1.642
1.361
0.642
-1.077
0.001
-0.795
-0.514
St Resid
-0.06
0.30
-0.93
-0.24
-0.66
1.95
1.60
0.76
-1.31
0.00
-1.00
-0.68
11
12.2 Worth of the Prediction
Equation
• Pure error: data points with the same X but different Y’s
constitute pure error since regression line can’t be vertical.
• Measure of the worth of the prediction equation:
(𝑌 − 𝑌)2
𝑅 =1−
(𝑌 − 𝑌)2
2
(12.4)
• 0 ≤ 𝑅2 ≤ 1
• Since 𝑌 = 𝛽0 + 𝛽1 𝑋, (𝛽0 = 𝑌 − 𝛽1 𝑋)
𝑌 = 𝛽0 + 𝛽1 𝑋 =(𝑌 − 𝛽1 𝑋) + 𝛽1 𝑋 = 𝑌 + 𝛽1 (𝑋 − 𝑋)
• If 𝛽1=0 (no relationship between X and Y), 𝑅 2 =0
12
6
4/4/2012
12.3 Assumptions
• The true relationship between X and Y can be
adequately represented by the model
Y = 𝛽0 + 1X + 
(12.1)
• The errors should be independent.
• The errors are approximately normally distributed
𝜀~𝑁𝐼𝐷(0, 𝜎 2 )
13
12.4 Checking Assumptions
through Residual Plots
• The residuals should be plotted against
– X or 𝑌
– Time
– Any other variable
• Residual plots
– All points close to the midline
– Form a tight cluster that can be enclosed in a rectangle
• If there were residual outliers, investigate
• If the error variance increases or decreases, this
problem can be remedied by a transformation of X.
• If in the form of parabola, X2 term would probably
needed.
14
7
4/4/2012
12.4 Checking Assumptions
through Residual Plots
15
12.5 Confidence Intervals
• Assumption: Normality of the error terms
– Robust regression
– Non-parametric regression
• Confidence Interval for 𝛽0
𝛽0 ± 𝑡𝑠
• Confidence Interval for 𝛽1
𝛽1 ± 𝑡
Where 𝑡 = 𝑡𝛼,𝑛−2
𝑋2
𝑛 (𝑋 − 𝑋)2
𝑠
(𝑋 − 𝑋)2
2
𝑠=
(𝑌 − 𝑌)2
𝑛−2
16
8
4/4/2012
12.5 Hypothesis Test
• Hypothesis Test for 𝛽1 = 0
𝑡=
Where 𝑠𝛽 =
1
𝑠
(𝑋−𝑋)2
𝛽1
𝑠𝛽1
and
𝑠=
(𝑌 − 𝑌)2
𝑛−2
17
12.6 Prediction Interval for Y
1 (𝑋0 − 𝑋 )2
𝑌 ± 𝑡𝑠 1 + +
𝑛
(𝑋 − 𝑋)2
Where 𝑡 = 𝑡𝛼,𝑛−2 and 𝑠 =
2
(𝑌−𝑌 )2
𝑛−2
18
9
4/4/2012
12.6 Prediction Interval for Y
19
12.7 Regression Control Chart
• To monitor the dependent variable using a control chart
approach
• The center line is 𝑌 = 𝛽0 + 𝛽1 𝑋
• Control Limits for 𝑌
𝑌 ± 2𝑠
(12.5)
1 (𝑋0 − 𝑋)2
𝑌 ± 𝑘𝑠 1 + +
𝑛
(𝑋 − 𝑋)2
(12.6)
Where 𝑘 = 2 𝑜𝑟 3 and 𝑠 =
(𝑌−𝑌 )2
𝑛−2
20
10
4/4/2012
12.8 Cause-Selecting Control Chart
• The general idea is to try to distinguish between
quality problems that occur at one stage in a
process from problems that occur at a previous
processing step.
• Let Y be the output from the second step and let X
denote the output from the first step. The
relationship between X and Y would be modeled.
21
12.9 Linear, Nonlinear, and
Nonparametric Profiles
• Profile refers to the quality of a process or product
being characterized by a (Linear, Nonlinear, or
Nonparametric) relationship between a response
variable and one or more explanatory variables.
• A possible way is to monitor each parameter in the
model with a Shewhart chart.
– The independent variables must be fixed
– Control chart for R2
22
11
4/4/2012
12.10 Inverse Regression
• An important application of simple linear regression for
quality improvement is in the area of calibration.
• Assume two measuring tools are available – One is quite
accurate but expensive to use and the other is not as
expensive but also not as accurate. If the measurements
obtained from the two devices are highly correlated, then
the measurement that would have been made using the
expensive measuring device could be predicted fairly from
the measurement using the less expensive device.
• Let Y = measurement from the less expensive device
X = measurement from the accurate device
23
12.10 Inverse Regression
Classical estimation approach
• First, regress Y on X, to obtain 𝑌 = 𝛽0 + 𝛽1 𝑋
• Solve for X, 𝑋 = (𝑌 − 𝛽0 )/𝛽1
• For a known value of Y, 𝑌𝑐 , the equation is 𝑋𝑐 = (𝑌𝑐 − 𝛽0 )/𝛽1
Inverse regression (X is regressed on Y)
𝑋 ∗ 𝑐 = 𝛽 ∗ 0 + 𝛽 ∗1 𝑌𝑐
• 𝑋𝑐 = 𝑋 ∗ 𝑐 if X and Y were perfectly correlated
24
12
4/4/2012
12.10 Inverse Regression
Example
Classical estimation approach
• First, regress Y on X, to obtain 𝑌 = −0.1438 + 1.0208𝑋
• 𝑋𝑐 = (𝑌𝑐 + 0.1438)/1.0208
Inverse regression (X is regressed on Y)
𝑋 ∗ 𝑐 = 0.1759 + 0.9655𝑌𝑐
• At 𝑌𝑐 = 2.2, 𝑋𝑐 = 2.296, 𝑋 ∗ 𝑐 = 2.300Y X
Y
2.3
2.5
2.4
2.8
2.9
2.6
2.4
2.2
2.1
2.7
X
2.4
2.6
2.5
2.9
3.0
2.7
2.5
2.3
2.2
2.7
25
12.11 Multiple Linear Regression
• In multiple regression, there is more than one
“independent” variable.
𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑘 𝑋𝑘 + 𝜀
26
13
4/4/2012
12.12 Issues in Multiple Regression
12.12.1 Variable Selection
• R2 will virtually always increase when additional
variables are added to a prediction equation.
• 𝑉𝑎𝑟 𝑌 increases when new regressors are added
• A commonly used statistic for determining the number
of parameters is the Cp
𝑆𝑆𝐸𝑝
𝐶𝑝 = 2
− 𝑛 + 2𝑝
𝜎 𝑓𝑢𝑙𝑙
Where p is the number of parameters in the model
SSEp is the residual sum of squares 𝑆𝑆𝐸𝑝 = (𝑌 − 𝑌)2
𝜎 2𝑓𝑢𝑙𝑙 is the error variance using all the available regressors
• The idea is to look hard at those prediction equations for which Cp is
small and close to p.
27
12.12.3 Multicollinear Data
• Problems occur when at least two of the regressors
are related in some manner.
• Solutions:
– Discard one or more variables causing the multicollinearity
– Use ridge regression
28
14
4/4/2012
12.12.4 Residual Plots
• Residual plots are used extensively in multiple
regression for checking on the model assumptions
• The residuals should generally be plotted against 𝑌,
each of the regressors, time, and any potential
regressor.
29
12.12.6 Transformations
• A regression model can often be improved by
transforming one or more of the regressors, and
possibly the dependent variable as well.
• Transformation can also often be used to transform a
nonlinear regression model into a linear one.
• For example, 𝑌 = 𝛽0 𝛽 𝑋 1 𝜀 can be transformed into a linear
model ln 𝑦 = 𝑙𝑛𝛽0 + 𝑋𝑙𝑛𝛽1 + 𝑙𝑛𝜀
30
15
Download