Chapter 2 Simple Linear Regression 1 Regression and Model Building • Regression analysis is a statistical technique for investigating and modeling the relationship between variables. • Equation of a straight line (classical) y = mx +b we usually write this as y = 0 +1x 2 Regression and Model Building • Not all observations will fall exactly on a straight line. y = 0 + 1x + where represents error - it is a random variable that accounts for the failure of the model to fit the data exactly. - is a random error 3 2.1 Simple Linear Regression Model • Single regressor, x; response, y y 0 1x Population regression model • 0 – intercept: if x = 0 is in the range, then 0 is the mean of the distribution of the response y, when x = 0; if x = 0 is not in the range, then 0 has no practical interpretation • 1 – slope: change in the mean of the distribution of the response produced by a unit change in x • We usually assume ~ N(0, 2) 4 2.1 Simple Linear Regression Model • The response, y, is a random variable • There is a probability distribution for y at each value of x – Mean: Ey | x 0 1x – Variance: Var y | x Var 0 1x 2 5 2.2 Least-Squares Estimation of the Parameters • 0 and 1 are unknown and must be estimated yi 0 1 xi i , i 1, 2,..., n Sample regression model • Least squares estimation seeks to minimize the sum of squares of the differences 2 between observed response, yi),2and the S ( 0 , the ) ( y x 1 i i 0 1 i straight line. i i 6 2.2 Least-Squares Estimation of the Parameters • Let ˆ 0 , ˆ 1 represent the least squares estimators of 0 and 1, respectively. • These estimators must satisfy: S 0 S 1 ˆ0 , ˆ1 ˆ0 , ˆ1 2 ( yi ˆ0 ˆ1 xi ) 0 i 2 ( yi ˆ0 ˆ1 xi )xi 0 i 7 2.2 Least-Squares Estimation of the Parameters • Solving the normal equations yields the ordinary least squares estimators: ˆ 0 y ˆ 1x n n y x i i n i 1 i 1 yi x i n ˆ 1 i 1 2 n x i n 2 i 1 xi i 1 n 8 2.2 Least-Squares Estimation of the Parameters • Simplifying yields the least squares normal equations: n n i 1 i 1 nˆ0 ˆ1 xi yi n n n i 1 i 1 i 1 2 ˆ 0 xi 1 xi yi xi ˆ 9 2.2 Least-Squares Estimation of the Parameters • The fitted simple linear regression model: ŷ ˆ 0 ˆ 1x • Sum of Squares Notation: 2 x i n n 2 i 1 2 Sxx x i x i x i 1 i 1 n n n y x i i n n i 1 i 1 Sxy y i x i y i x i x i 1 i 1 n n 10 2.2 Least-Squares Estimation of the Parameters • Sovel the least squres normal equation, then, we can get the estimate of the two coefficiants ˆ y ˆ x 0 1 n n y x i i n i 1 i 1 yi x i n ˆ i 1 1 2 n x i n 2 i 1 xi i 1 n 11 • Now let's calculate the least square estimators in R using heights data manually first • Then we can get the two estimators by lm() funtion in R directly. • Compare the two results. Example 2.1 – The Heights Data HUSBAND (in cm) WIFE (in cm) 180 168 160 154 186 166 163 162 172 152 192 179 170 163 174 172 191 170 182 170 178 147 181 165 168 162 ... ... 13 14 Example 2.1- Height Data Let's find the coefficients of the linear model using least square estimator. Let's calculate Sxx and Sxy in R first (See R code in fie Chaper2_simple_lin.txt" ) 15 Example 2.1- Heights Data S xx ˆ 1 S xy ˆ0 y ˆ1 x So the least squares regression line is 16 2.2 Least-Squares Estimation of the Parameters • Just because we can fit a linear model doesn’t mean that we should. We need to consider the following seriously: – How well does this equation fit the data? – Is the model likely to be useful as a predictor? – Are any of the basic assumptions (such as constant variance and uncorrelated errors) violated, if so, how serious is this? 17 2.2 Least-Squares Estimation of the Parameters • Residuals: ei yi ŷi • Residuals will be used to determine the adequacy of the model 18 2.2 Least-Squares Estimation of the Parameters Computer Output (R) will be shown in class. 19 • You may expore some of the properties of OLS estimation by heights data For example : • Sum of all the fitted values • sum of the observed values • sum of the errors (residuals) • We will learn more in next lecture. 2.2.2 Properties of the Least-Squares Estimators and the Fitted Regression Model • The ordinary least-squares (OLS) estimator of the slope is a linear combinations of the observations, yi: ˆ1 where S xy S xx n ci yi i 1 Useful in showing expected value and variance properties n n ci ( xi x ) / S xx , ci 0, c 1 i 1 i 1 2 i 21 2.2.2 Properties of the Least-Squares Estimators and the Fitted Regression Model • The least-squares estimators are unbiased estimators of their respective parameter: E ˆ1 1 E ˆ0 0 • The variances are Var ˆ1 2 S xx Var ˆ0 2 1 x 2 n S xx • The OLS estimators are Best Linear Unbiased Estimators (BLUE) 22 2.2.2 Properties of the Least-Squares Estimatorsand the Fitted Regression Model • Useful properties of the least-squares fit 1. yi ŷi ei 0 n n i 1 n i 1 n i 1 i 1 2. yi ŷi 3. The least-squares regression line always passes through the centroid y, x of the data. n 4. x i ei 0 i 1 n 5. ŷi ei 0 i 1 23 2.2.3 Estimation of 2 • Residual (error) sum of squares 2 n SS Re s n 2 ˆ yi yi ei i 1 i 1 n y ny ˆ1S xy i 1 SST 2 i 2 y y i SST ˆ1S xy Linear Regression Analysis 5E Montgomery, Peck & Vining 24 2.2.3 Estimation of 2 • Unbiased estimator of 2 SS Re s ˆ MSRe s n2 2 • The quantity n – 2 is the number of degrees of freedom for the residual sum of squares. 25 2.2.3 Estimation of 2 • ̂ 2 depends on the residual sum of squares. Then: – Any violation of the assumptions on the model errors could damage the usefulness of this estimate – A misspecification of the model can damage the usefulness of this estimate – This estimate is model dependent 26 2.3 Hypothesis Testing on the Slope and Intercept • Three assumptions needed to apply procedures such as hypothesis testing and confidence intervals. Model errors, i, – are normally distributed – are independently distributed – have constant variance i.e. i ~ NID(0, 2) 27 2.3.1 Use of t-tests MSRe s ˆ se 1 S xx t / 2,n 2 28 Let’s test the significance of linear regression using height data. First we can calculate the test statistics and critical value manually and compare. Alternatively we can use p-value approach. Finally we can compare the value we calculated manually with the ones returned from function lm() T0: test statistic P-value for testing beta1 =0 2.3.2 Testing Significance of Regression H 0: 1 = 0 H 1: 1 0 • This tests the significance of regression; that is, is there a linear relationship between the response and the regressor. • Failing to reject 1 = 0, implies that there is no linear relationship between y and x 31 2.3.1 Use of t-tests Intercept H0: 0 = 00 H1: 0 00 • Standard error of the intercept: se ˆ 0 1 x2 MSRe s n S xx ˆ0 00 • Test statistic: t0 se ˆ 0 • Reject H0 if |t0| > t / 2 , n 2 • Can also use the P-value approach 32 Testing significance of regression 33 2.3.3 The Analysis of Variance Approach • Partitioning of total variability yi y yˆ i y yi yˆ i y i y 2 yˆ i y yi yˆ i 2 yˆ i y yi yˆ i 2 2 2 yˆ i y yi yˆ i or yi y SST • 2 2 SS R SS Re s It can be shown that SSR ˆ1S xy 34 2.3.3 The Analysis of Variance • Degrees of Freedom y y yˆ i y yi yˆ i 2 2 i SST n 1 SS R 1 2 SS Re s ( n 2) • Mean Squares SS R MS R 1 MS Re s SS Re s n2 35 2.3.3 The Analysis of Variance • ANOVA procedure for testing H0: 1 = 0 Source of Variation Regression Residual Total Sum of Squares SSR SSRes SST DF MS F0 1 n-2 n-1 MSR MSRes MSR/MSRes • A large value of F0 indicates that regression is significant; specifically, reject if F0 > F ,1,n 2 • Can also use the P-value approach 36 Analysis of Variance Table for height data Source of Variation Regression Sum of Squares SSR=4497.9 DF Residual Total SSRes=3294.5 SST=7792.4 93 94 1 MS F0 MSR=4497.9 126.97 (MSR/MSRes) MSRes=35.4 So the percent of the total varibility in response variable (WIFE) is explained by the regression model is R=4497.9/7792.4 =57% 2.3.3 The Analysis of Variance Relationship between t0 and F0: • For H0: 1 = 0, it can be shown that: t02 F0 So for testing significance of regression, the t-test and the ANOVA procedure are equivalent (only true in simple linear regression) Linear Regression Analysis 5E Montgomery, Peck & Vining 38 Confidence Interval of a true parameter Confidence interval = sample statistic + Margin of error margin of error = critical value * Standard deviation of statistic ( if you know the standard deviation) Margin of error = Critical value x Standard error of the statistic If you know the standard deviation of the statistic, use the first equation to compute the margin of error. Otherwise, use the second equation. 2.4 Interval Estimation in Simple Linear Regression • 100(1-)% Confidence interval for Slope ˆ1 t /2,n 2 se ˆ1 1 ˆ1 t /2,n 2 se ˆ1 • 100(1-)% Confidence interval for the Intercept ˆ0 t /2,n 2 se ˆ0 0 ˆ0 t /2,n 2 se ˆ0 MSRe s ˆ se 1 S xx 2 1 x se ˆ0 MSRe s n S xx Linear Regression Analysis 5E Montgomery, Peck & Vining 40 Linear Regression Analysis 5E Montgomery, Peck & Vining 41 2.4 Interval Estimation in Simple Linear Regression • 100(1-)% Confidence interval for 2 (n 2) MS RES 2 / 2, n 2 2 (n 2) MS RES 12 / 2,n 2 • Refer “chapter_sample_lin.txt” for how to calculate a 95th for 2 Linear Regression Analysis 5E Montgomery, Peck & Vining 42 2.4.2 Interval Estimation of the Mean Response • Let x0 be the level of the regressor variable at which we want to estimate the mean response, i.e. E y | x0 y|x0 – Point estimator of E y | x0 once the model is fit: E y | x0 ˆ y|x0 ˆ0 ˆ1 x0 – In order to construct a confidence interval on the mean response, we need the variance of the point estimator. 43 2.4.2 Interval Estimation of the Mean Response • The variance of ˆ y| x0 is ˆ ˆ x Var[ y ˆ ( x x )] ˆ y| x0 Var Var 0 1 0 1 0 ˆ ( x x )] Var ( y ) Var[ 1 0 2 ( x x ) 2 2 ( x0 x ) 2 1 2 0 n S xx S xx n 44 2.4.2 Interval Estimation of the Mean Response • 100(1-)% confidence interval for E(y|x0) ˆ y| x0 t / 2,n 2 1 ( x0 x ) 2 MS Re s E ( y | x0 ) S xx n ˆ y| x0 t / 2,n 2 1 ( x0 x ) 2 MS Re s S xx n 45 See pages 34-35, text 46 2.5 Prediction of New Observations • Suppose we wish to construct a prediction interval on a future observation, y0 corresponding to a particular level of x, say x0. • The point estimate would be: ŷ0 ˆ0 ˆ1 x0 • The confidence interval on the mean response at this point is not appropriate for this situation. Why? Linear Regression Analysis 5E 47 Montgomery, Peck & Vining 2.5 Prediction of New Observations • Let the random variable, be y0 ŷ0 • is normally distributed with – E() = 0 2 ( x x ) 1 2 ˆ – Var() = Var y0 y0 1 0 S xx n (y0 is independent of ŷ0 ) 48 2.5 Prediction of New Observations • 100(1 - )% prediction interval on a future observation, y0, at x0 yˆ 0 t / 2, n 2 1 ( x0 x ) 2 MS Re s 1 y0 S xx n yˆ 0 t / 2,n 2 1 ( x0 x ) 2 MS Re s 1 S xx n Linear Regression Analysis 5E Montgomery, Peck & Vining 49 Linear Regression Analysis 5E Montgomery, Peck & Vining 50 2.6 Coefficient of Determination • R2 - coefficient of determination SSRe s SS R R 1 SST SST 2 • Proportion of variation explained by the regressor, x • For the rocket propellant data Linear Regression Analysis 5E Montgomery, Peck & Vining 51 2.6 Coefficient of Determination • R2 can be misleading! – Simply adding more terms to the model will increase R2 – As the range of the regressor variable increases (decreases), R2 generally increases (decreases). – R2 does not indicate the appropriateness of a linear model Linear Regression Analysis 5E Montgomery, Peck & Vining 52 2.7 Considerations in the Use of Regression • • • • Extrapolating Extreme points will often influence the slope. Outliers can disturb the least-squares fit Linear relationship does not imply cause-effect relationship (interesting mental defective example in the book pg 40) • Sometimes, the value of the regressor variable is unknown and itself be estimated or predicted. Linear Regression Analysis 5E Montgomery, Peck & Vining 53 Linear Regression Analysis 5E Montgomery, Peck & Vining 54 2.8 Regression Through the Origin • The no-intercept model is y 1 x • This model would be appropriate for situations where the origin (0, 0) has some meaning. • A scatter diagram can aid in determining where an intercept- or no-intercept model should be used. • In addition, the practitioner could test both models. Examine t-tests, residual mean square. Linear Regression Analysis 5E Montgomery, Peck & Vining 55 2.9 Estimation by Maximum Likelihood • The method of least squares is not the only method of parameter estimation that can be used for linear regression models. • If the form of the response (error) distribution is known, then the maximum likelihood (ML) estimation method can be used. • In simple linear regression with normal errors, the MLEs for the regression coefficients are identical to that obtained by the method of least squares. • Details of finding the MLEs are on page 47. • The MLE of 2 is biased. (Not a serious problem here though.) Linear Regression Analysis 5E Montgomery, Peck & Vining 56 Linear Regression Analysis 5E Montgomery, Peck & Vining 57 Identical to OLS Biased 58 59 2.10 Case Where the Regressor is Random • When x and y have an unknown joint distribution, all of the “fixed x’s” regression results hold if: Linear Regression Analysis 5E Montgomery, Peck & Vining 60 61 Linear Regression Analysis 5E Montgomery, Peck & Vining 62 63 The sample correlation coefficient is related to the slope: Also, 64 Linear Regression Analysis 5E Montgomery, Peck & Vining 65 Linear Regression Analysis 5E Montgomery, Peck & Vining 66 67 68