4/4/2012 Chapter 12: Linear Regression 1 Introduction • Regression analysis and Analysis of variance are the two most widely used statistical procedures. • Regression analysis: – Description – Prediction – Estimation 2 1 4/4/2012 12.1 Simple Linear Regression • In (univariate) regression, there is always a single “dependent” variable, and one or more “independent” variables. – Number of non-conforming units is dependent on the amount of time devoted to maintain control charts • Simple is used to denote the fact that a single independent variable is being used. • Linear is referred to the parameters, not independent variables. (12.1) Y = 0 + 1X + Y = 0′ + 1′ X + 𝛽11 𝑋 2 + (12.2) 3 12.1 Simple Linear Regression • Y = 0 + 1X is the general form of the equation for a straight line. • Y = 𝛽0 + 1X + indicates that there is not an exact relationship between X and Y. • Regression analysis is not used for variables that have an 9 exact linear relationship. 𝐹 = 5 𝐶 + 32 • 0 and 1 are generally unknown and must be estimated. • The is generally thought as an error term. • Let Y denotes the number of non-conforming units produced in each month, and X represents the amount of time devoted to use QC charts each month. 4 2 4/4/2012 Table 12.1 Quality Improvement Data Month January February March April May June July August September October November December Time Devoted to Quality Impr. # of Nonconforming 56 58 55 62 63 68 66 68 70 67 72 64 20 19 20 16 15 14 15 13 10 13 9 8 5 Figure 12.1 Scatter Plot 6 3 4/4/2012 Figure 12.1a Scatter Plot 7 12.1 Simple Linear Regression • Regression equation: a line through the center of the points minimizing the sum of the squares of the deviations from each point to the line. (Method of least squares) • 𝑛𝑖=1 𝜀𝑖 2 is to be minimized where 𝜀𝑖 = 𝑌𝑖 − 𝛽0 − 𝛽1 𝑋𝑖 𝛽1 = • Round-off error • Prediction equation 𝑋𝑌 − ( 𝑋)( 𝑌)/𝑛 𝑋 2 − ( 𝑋)2 /𝑛 𝛽0 = 𝑌 − 𝛽1 𝑋 𝑌 = 𝛽0 + 𝛽1 𝑋 8 4 4/4/2012 12.1 Simple Linear Regression The regression equation is Y = 55.9 - 0.641 X Predictor Constant X Coef 55.923 -0.64067 S = 0.888854 SE Coef 2.824 0.04332 R-Sq = 95.6% T 19.80 -14.79 P 0.000 0.000 R-Sq(adj) = 95.2% Analysis of Variance Source Regression Residual Error Total DF 1 10 11 SS 172.77 7.90 180.67 MS 172.77 0.79 F 218.67 P 0.000 9 12.1 Simple Linear Regression • Prediction equation: should only be used for values within the data range, or slightly outside the interval. • Descriptive: 𝑌 = 𝛽0 + 𝛽1 𝑋 – A decrease of 0.64 non-conforming units for every additional hour devoted to quality improvement 10 5 4/4/2012 12.2 Worth of the Prediction Equation Obs 1 2 3 4 5 6 7 8 9 10 11 12 X 56.0 58.0 55.0 62.0 63.0 68.0 66.0 68.0 70.0 67.0 72.0 74.0 Y 20.000 19.000 20.000 16.000 15.000 14.000 15.000 13.000 10.000 13.000 9.000 8.000 Fit 20.046 18.765 20.687 16.202 15.561 12.358 13.639 12.358 11.077 12.999 9.795 8.514 SE Fit 0.464 0.395 0.500 0.286 0.270 0.289 0.261 0.289 0.338 0.272 0.400 0.470 Residual -0.046 0.235 -0.687 -0.202 -0.561 1.642 1.361 0.642 -1.077 0.001 -0.795 -0.514 St Resid -0.06 0.30 -0.93 -0.24 -0.66 1.95 1.60 0.76 -1.31 0.00 -1.00 -0.68 11 12.2 Worth of the Prediction Equation • Pure error: data points with the same X but different Y’s constitute pure error since regression line can’t be vertical. • Measure of the worth of the prediction equation: (𝑌 − 𝑌)2 𝑅 =1− (𝑌 − 𝑌)2 2 (12.4) • 0 ≤ 𝑅2 ≤ 1 • Since 𝑌 = 𝛽0 + 𝛽1 𝑋, (𝛽0 = 𝑌 − 𝛽1 𝑋) 𝑌 = 𝛽0 + 𝛽1 𝑋 =(𝑌 − 𝛽1 𝑋) + 𝛽1 𝑋 = 𝑌 + 𝛽1 (𝑋 − 𝑋) • If 𝛽1=0 (no relationship between X and Y), 𝑅 2 =0 12 6 4/4/2012 12.3 Assumptions • The true relationship between X and Y can be adequately represented by the model Y = 𝛽0 + 1X + (12.1) • The errors should be independent. • The errors are approximately normally distributed 𝜀~𝑁𝐼𝐷(0, 𝜎 2 ) 13 12.4 Checking Assumptions through Residual Plots • The residuals should be plotted against – X or 𝑌 – Time – Any other variable • Residual plots – All points close to the midline – Form a tight cluster that can be enclosed in a rectangle • If there were residual outliers, investigate • If the error variance increases or decreases, this problem can be remedied by a transformation of X. • If in the form of parabola, X2 term would probably needed. 14 7 4/4/2012 12.4 Checking Assumptions through Residual Plots 15 12.5 Confidence Intervals • Assumption: Normality of the error terms – Robust regression – Non-parametric regression • Confidence Interval for 𝛽0 𝛽0 ± 𝑡𝑠 • Confidence Interval for 𝛽1 𝛽1 ± 𝑡 Where 𝑡 = 𝑡𝛼,𝑛−2 𝑋2 𝑛 (𝑋 − 𝑋)2 𝑠 (𝑋 − 𝑋)2 2 𝑠= (𝑌 − 𝑌)2 𝑛−2 16 8 4/4/2012 12.5 Hypothesis Test • Hypothesis Test for 𝛽1 = 0 𝑡= Where 𝑠𝛽 = 1 𝑠 (𝑋−𝑋)2 𝛽1 𝑠𝛽1 and 𝑠= (𝑌 − 𝑌)2 𝑛−2 17 12.6 Prediction Interval for Y 1 (𝑋0 − 𝑋 )2 𝑌 ± 𝑡𝑠 1 + + 𝑛 (𝑋 − 𝑋)2 Where 𝑡 = 𝑡𝛼,𝑛−2 and 𝑠 = 2 (𝑌−𝑌 )2 𝑛−2 18 9 4/4/2012 12.6 Prediction Interval for Y 19 12.7 Regression Control Chart • To monitor the dependent variable using a control chart approach • The center line is 𝑌 = 𝛽0 + 𝛽1 𝑋 • Control Limits for 𝑌 𝑌 ± 2𝑠 (12.5) 1 (𝑋0 − 𝑋)2 𝑌 ± 𝑘𝑠 1 + + 𝑛 (𝑋 − 𝑋)2 (12.6) Where 𝑘 = 2 𝑜𝑟 3 and 𝑠 = (𝑌−𝑌 )2 𝑛−2 20 10 4/4/2012 12.8 Cause-Selecting Control Chart • The general idea is to try to distinguish between quality problems that occur at one stage in a process from problems that occur at a previous processing step. • Let Y be the output from the second step and let X denote the output from the first step. The relationship between X and Y would be modeled. 21 12.9 Linear, Nonlinear, and Nonparametric Profiles • Profile refers to the quality of a process or product being characterized by a (Linear, Nonlinear, or Nonparametric) relationship between a response variable and one or more explanatory variables. • A possible way is to monitor each parameter in the model with a Shewhart chart. – The independent variables must be fixed – Control chart for R2 22 11 4/4/2012 12.10 Inverse Regression • An important application of simple linear regression for quality improvement is in the area of calibration. • Assume two measuring tools are available – One is quite accurate but expensive to use and the other is not as expensive but also not as accurate. If the measurements obtained from the two devices are highly correlated, then the measurement that would have been made using the expensive measuring device could be predicted fairly from the measurement using the less expensive device. • Let Y = measurement from the less expensive device X = measurement from the accurate device 23 12.10 Inverse Regression Classical estimation approach • First, regress Y on X, to obtain 𝑌 = 𝛽0 + 𝛽1 𝑋 • Solve for X, 𝑋 = (𝑌 − 𝛽0 )/𝛽1 • For a known value of Y, 𝑌𝑐 , the equation is 𝑋𝑐 = (𝑌𝑐 − 𝛽0 )/𝛽1 Inverse regression (X is regressed on Y) 𝑋 ∗ 𝑐 = 𝛽 ∗ 0 + 𝛽 ∗1 𝑌𝑐 • 𝑋𝑐 = 𝑋 ∗ 𝑐 if X and Y were perfectly correlated 24 12 4/4/2012 12.10 Inverse Regression Example Classical estimation approach • First, regress Y on X, to obtain 𝑌 = −0.1438 + 1.0208𝑋 • 𝑋𝑐 = (𝑌𝑐 + 0.1438)/1.0208 Inverse regression (X is regressed on Y) 𝑋 ∗ 𝑐 = 0.1759 + 0.9655𝑌𝑐 • At 𝑌𝑐 = 2.2, 𝑋𝑐 = 2.296, 𝑋 ∗ 𝑐 = 2.300Y X Y 2.3 2.5 2.4 2.8 2.9 2.6 2.4 2.2 2.1 2.7 X 2.4 2.6 2.5 2.9 3.0 2.7 2.5 2.3 2.2 2.7 25 12.11 Multiple Linear Regression • In multiple regression, there is more than one “independent” variable. 𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ + 𝛽𝑘 𝑋𝑘 + 𝜀 26 13 4/4/2012 12.12 Issues in Multiple Regression 12.12.1 Variable Selection • R2 will virtually always increase when additional variables are added to a prediction equation. • 𝑉𝑎𝑟 𝑌 increases when new regressors are added • A commonly used statistic for determining the number of parameters is the Cp 𝑆𝑆𝐸𝑝 𝐶𝑝 = 2 − 𝑛 + 2𝑝 𝜎 𝑓𝑢𝑙𝑙 Where p is the number of parameters in the model SSEp is the residual sum of squares 𝑆𝑆𝐸𝑝 = (𝑌 − 𝑌)2 𝜎 2𝑓𝑢𝑙𝑙 is the error variance using all the available regressors • The idea is to look hard at those prediction equations for which Cp is small and close to p. 27 12.12.3 Multicollinear Data • Problems occur when at least two of the regressors are related in some manner. • Solutions: – Discard one or more variables causing the multicollinearity – Use ridge regression 28 14 4/4/2012 12.12.4 Residual Plots • Residual plots are used extensively in multiple regression for checking on the model assumptions • The residuals should generally be plotted against 𝑌, each of the regressors, time, and any potential regressor. 29 12.12.6 Transformations • A regression model can often be improved by transforming one or more of the regressors, and possibly the dependent variable as well. • Transformation can also often be used to transform a nonlinear regression model into a linear one. • For example, 𝑌 = 𝛽0 𝛽 𝑋 1 𝜀 can be transformed into a linear model ln 𝑦 = 𝑙𝑛𝛽0 + 𝑋𝑙𝑛𝛽1 + 𝑙𝑛𝜀 30 15