Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung 1 2.1 Simple Linear Regression Model • y = 0 + 1 x + – x: regressor variable – y: response variable – 0: the intercept, unknown – 1: the slope, unknown – : error with E() = 0 and Var() = 2 (unknown) • The errors are uncorrelated. 2 • Given x, E(y|x) = E(0 + 1 x + ) = 0 + 1 x Var(y|x) = Var(0 + 1 x + ) = 2 • Responses are also uncorrelated. • Regression coefficients: 0, 1 – 1: the change of E(y|x) by a unit change in x – 0: E(y|x=0) 3 2.2 Least-squares Estimation of the Parameters 2.2.1 Estimation of 0 and 1 • n pairs: (yi, xi), i = 1, …, n • Method of least squares: Minimize n S ( 0 , 1 ) [ y i ( 0 1 xi )] 2 i 1 4 • • Least-squares normal equations: 5 • The least-squares estimator: 6 • The fitted simple regression model: – A point estimate of the mean of y for a particular x • Residual: – An important role in investigating the adequacy of the fitted regression model and in detecting departures from the underlying assumption! 7 • Example 2.1: The Rocket Propellant Data – Shear strength is related to the age in weeks of the batch of sustainer propellant. – 20 observations – From scatter diagram, there is a strong relationship between shear strength (y) and propellant age (x). – Assumption y = 0 + 1 x + 8 9 • S xx xi2 nx 2 1106.56 i S xy xi yi nx y 41112.65 • i ˆ1 S xy S xx 37.15 ˆ0 y ˆ1 x 2627.82 • The least-square fit: yˆ 2627.82 37.15 x 10 • How well does this equation fit the data? • Is the model likely to be useful as a predictor? • Are any of the basic assumption violated and if so how serious is this? 11 2.2.2 Properties of the Least-Squares Estimators and the Fitted Regression Model • ˆ1 and ˆ0 are linear combinations of yi n ˆ1 ci y i , ci ( xi x ) / S xx i 1 • ˆ0 y ˆ1 x are unbiased estimators. ˆ1 and ˆ0 12 n • E ( ˆ1 ) E ( ci y i ) ci E ( y i ) i 1 i ci ( 0 1 xi ) 1 i E ( ˆ 0 ) E ( y ˆ1 x ) 0 1 x 1 x 0 • Var ( ˆ ) Var ( c y ) c 2Var ( y ) i i i 1 i i 2 ci2 i 2 S xx2 i 2 ( x x ) i i 2 S xx 2 1 x Var ( ˆ0 ) 2 ( ) n S xx 13 • The Gauss-Markov Theorem: ˆ1 and ˆ0 are the best linear unbiased estimators (BLUE). – 14 • Some useful properties: – The sum of the residuals in any regression model that contains an intercept 0 is always 0, i.e. e (y i i i i yˆ i ) ( y i y ˆ1 ( xi x )) 0 i – yi yˆ i i i – Regression line always passes through the centroid point of data, ( x , y ) – xi ei xi ( yi y ˆ1 ( xi x )) 0 i i – yˆ i ei ( y ˆ1 ( xi x ))(( yi y ) ˆ1 ( xi x )) 0 i i 15 2.2.3 Estimator of 2 • Residual sum of squares: SS Re s e ( y i yˆ i ) 2 i i 2 i ( y i y ˆ1 ( xi x )) 2 i ( y i y ) ˆ1 S xy 2 i SS T ˆ1 S xy 16 2 E ( SS ) ( n 2 ) • Since , E 2 is the unbiased estimator of n2 E ˆ 2 MS SS E – MSE is called the residual mean square. – This estimate is model-dependent. • Example 2.2 17 2.2.4 An Alternate Form of the Model • The new regression model: y i 0 1 ( xi x ) 1 x i ( 0 1 x ) 1 ( xi x ) i 0' 1 ( xi x ) i • Normal equations: nˆ0' y i i ˆ1 ( xi x ) 2 y i ( xi x ) i i • The least-squares estimators: ˆ0' y and ˆ1 S xy S xx 18 • Some advantages: – The normal equations are easier to solve – ˆ0' y and ˆ1 S xy S xx are uncorrelated. – yˆ y ˆ1 ( x x ) 19 2.3 Hypothesis Testing on the Slope and Intercept • Assume εi are normally distributed • yi ~ N(0 + 1 xi , 2 ) 2.3.1 Use of t-Tests • Test on slope: – H0: 1 = 10 v.s. H1: 1 10 – 1 ~ N (1 , / S xx ) ˆ 2 20 • If 2 is known, under null hypothesis, Z0 ˆ1 10 2 / S xx ~ N (0,1) • (n-2) MSE/2 follows a 2n-2 • If 2 is unknown, t0 ˆ1 10 MS E / S xx ˆ1 10 ~ t n2 se( ˆ1 ) • Reject H0 if |t0| > t/2, n-2 21 • Test on intercept: – H0: 0 = 00 v.s. H1: 0 00 – If 2 is unknown ˆ0 00 ˆ0 00 t0 ~ t n2 2 se( ˆ0 ) MS E (1 / n x / S xx ) – Reject H0 if |t0| > t/2, n-2 22 2.3.2 Testing Significance of Regression • H0: 1 = 0 v.s. H1: 1 0 • Accept H0: there is no linear relationship between x and y. 23 • Reject H0: x is of value in explaining the variability in y. • ˆ1 t0 ~ t n2 se( ˆ1 ) • Reject H0 if |t0| > t/2, n-2 24 • Example 2.3:The Rocket Propellant Data – Test significance of regression – ˆ 37.15 1 – MSE = 9244.59 – MS E se( ˆ1 ) S xx 2.89 – the test statistic is ˆ1 t0 12.85 se( ˆ1 ) – t0.0025,18 = 2.101 – Reject H0 25 26 2.3.3 The Analysis of Variance (ANOVA) • Use an analysis of variance approach to test significance of regression – – 27 2 2 2 ˆ ˆ ( y y ) ( y y ) ( y y ) i i i – i i i – SST: the corrected sum of squares of the observations. It measures the total variability in the observations. – SSRes: the residual or error sum of squares – The residual variation left unexplained by the regression line. – SSR: the regression or model sum of squares – The amount of variability in the observations accounted for by the regression line – SST = SSR + SSRes 28 – SS R ̂1 S xy – The degree-of-freedom: • dfT = n-1 • dfR = 1 • dfRes = n-2 • dfT = dfR + dfRes – Test significance regression by ANOVA • SSRes = (n-2) MSRes ~ n-2 • SSR = MSR ~ 1 • SSR and SSRes are independent • SS R / 1 MS R F0 SS Re s /( n 2) MS Re s ~ F1,n2 29 • E(MSRes) = 2 • E(MSR) = 2 + 12 Sxx • Reject H0 if F0 > F/2,1, n-2 – If 1 0, F0 follows a noncentral F with 1 and n-2 degree of freedom and a noncentrality parameter 2S 1 xx 2 30 • Example 2.4: The Rocket Propellant Data 31 • More About the t Test ˆ ˆ 1 – t 1 0 – t2 0 se( ˆ1 ) ˆ12 S xx MS Re s MS Re s / S xx ˆ1 S xy MS R F0 MS Re s MS Re s – The square of a t random variable with f degree of freedom is a F random variable with 1 and f degree of freedom. 32 2.4 Interval Estimation in Simple Linear Regression 2.4.1 Confidence Intervals on 0, 1 and 2 • Assume that εi are normally and independently distributed 33 • 100(1-)% confidence intervals on 0, 1 are given: • Interpretation of C.I. • Confidence interval for 2: 34 • Example 2.5 The Rocket Propellant Data • 35 • 36 2.4.2 Interval Estimation of the Mean Response • Let x0 be the level of the regressor variable for which we wish to estimate the mean response. • x0 is in the range of the original data on x. • An unbiased estimator of E(y| x0) is 37 • ˆ y| x follows a normal distribution. • 0 ˆ y| x 0 38 • A 100(1-)% confidence interval on the mean response at x0: 39 Example 2.6 The Rocket Propellant Data 40 41 • The interval width is a minimum for x0 x and widens as | x0 x | increases. • Extrapolation 42 2.5 Prediction of New Observations • yˆ 0 ˆ0 ˆ1 x0 is the point estimate of the new value of the response ŷ 0 • y 0 ŷ 0follows a normal distribution with mean 0 and variance 1 ( x0 x ) Var ( ) Var ( y0 yˆ 0 ) [1 ] n S xx 2 43 • The 100(1-)% confidence interval on a future observation at x0 (a prediction interval for the future observation y0) 44 • Example 2.7: 45 46 • The 100(1-)% confidence interval on y 0 47 2.6 Coefficient of Determination • The coefficient of determination: SS Re s SS R R 1 SST SST 2 • The proportion of variation explained by the regressor x • 0 R2 1 48 • In Example 2.1, R2 = 0.9018. It means that 90.18% of the variability in strength is accounted for by the regression model. • R2 can be increased by adding terms to the model. • For a simple regression model, 2 ˆ 1 S xx 2 E(R ) 2 ˆ1 S xx 2 • E(R2) increases (decreases) as Sxx increases (decreases) 49 • R2 does not measure the magnitude of the slope of the regression line. A large value of R2 imply a steep slope. • R2 does not measure the appropriateness of the linear model. 50 2.7 Some Considerations in the Use of Regression • Only suitable for interpretation over the range of the regressors, not for extrapolation. • Important: The disposition of the x values. Slope strongly influenced by the remote values of x. • Outliers and bad values can seriously disturb the least-square fit. (intercept and the residual mean square) • Don’t imply the cause and effect relationship 51 52 53 • yˆ 4.582 2.204 x1 • The t statistic for testing H0: 1= 0 for this model is t0 = 27.312 and R2 = 0.9842 54 • x may be unknown. For example: consider predicting maximum daily load on an electric power generation system from a regression model relating the load to the maximum daily temperature. 55 2.8 Regression Through the Origin • A no-intercept model is • Given (yi, xi), i = 1 2 ,…, n, 56 • The 100(1-)% confidence interval on 1 • The 100(1-)% confidence interval on E(y| x0) • The 100(1-)% confidence interval on y0 57 • Misuse: data lie in a region of x-space remote from the origin. 58 • The residual mean square, MSRes • Generally R2 is not a good comparative statistic for two models. – For the intercept model, R2 2 ˆ ( y y ) i i 2 ( y y ) i i – For the no-intercept model, R02 2 ˆ y i i yi 2 i – Occasionally R02 > R2 , but MS0,Res < MSRes 59 • Example 2.8 The Shelf-Stocking Data 60 61 62 63 2.9 Estimation by Maximum Likelihood • Assume that the errors are NID(0, 2). Then yi ~N(0 + 1xi, 2) • The likelihood function: 64 • MLE v.s. LSE – In general MLE have better statistical properties than LSE. – MLE are unbiased (asymptotically unbiased) and have minimum variance when compare to all the other unbiased estimators. – They are also consistent estimators. – They are a set of sufficient statistics. 65 – MLE requires more stringent statistical assumptions than LSE. – LSE only need to have the second moment assumptions. – MLE require a full distributional assumption. 66 2.10 Case Where the Regressor x Is Random 2.10.1 x and y Jointly Distributed • x and y are jointly distributed r.v. and this joint distribution is unknown. • All of our previous results hold if – y|x ~ N(0 + 1x, 2) – The x’s are independent r.v.’s whose probability distribution does not involve 0, 1, 2 67 2.10.2 x and y Jointly Normally Distributed: the Correlation Model • 68 • 69 • The estimator of 70 • Test on • 100(1-)% C.I. for 71 • Example 2.9 The Delivery Time Data 72