S IMPLE L INEAR R EGRESSION We consider the model for response variable, Y , as a function of the predictor, X, observed to take the value x. Specifically we consider the model Y = β0 + β1 x + ² where β0 and β1 are the intercept and slope parameters respectively, and ² is a random variable with expectation zero and variance σ 2 . In this model E[Y |X = x] = β0 + β1 x. To estimate the parameters β0 and β1 from data (xi , yi ), i = 1, . . . , n, we use the least-squares criterion, b and β b to minimize the sum of squared errors and choose the values β 0 1 SSE(β0 , β1 ) = n X e2i = n X i=1 (yi − (β0 + β1 xi ))2 i=1 It can be shown that the parameter estimates depend on the following sample summary statistics: • Sample mean of x values: n 1X xi x= n i=1 • Sample mean of y values: n 1X y= yi n i=1 • Sum of Squares SSxx : SSxx n X = (xi − x)2 i=1 • Sum of Squares SSxy : SSxy n X = (xi − x)(yi − y) i=1 The least-squares estimates are: b = SSxy β 1 SSxx b =y−β b x β 0 1 yielding fitted-values b +β b xi ybi = β 0 1 and residual errors (or residuals) ebi = yi − ybi . An estimate of the residual error variance is given by σ b2 = b ,β b ) SSE(β 0 1 n−2 1 E XAMPLE : B LOOD V ISCOSITY AND PACKED C ELL V OLUME The following data are measurements of packed cell volume (PCV) and blood viscosity in samples taken from 32 hospital patients. We wish to model viscosity (y) as a function of PCV (x). Reference: Begg, C. B. and Hearns, J. B. (1966) Components of Blood Viscosity. The relative contributions of haematocrit, plasma fibrinogen and other proteins, Clinical Science, 31, 87-92. Unit 1 2 3 4 5 6 7 8 PCV x 40.00 40.00 42.50 42.00 45.00 42.00 42.50 47.00 Viscosity y 3.71 3.78 3.85 3.88 3.98 4.03 4.05 4.14 Unit 9 10 11 12 13 14 15 16 PCV x 46.75 48.00 46.00 47.00 43.25 45.00 50.00 45.00 Viscosity y 4.14 4.20 4.20 4.27 4.27 4.37 4.41 4.64 Unit PCV x 51.25 50.25 49.00 50.00 50.00 49.00 50.50 51.25 17 18 19 20 21 22 23 24 Viscosity y 4.68 4.73 4.87 4.94 4.95 4.96 5.02 5.02 Unit 25 26 27 28 29 30 31 32 PCV x 49.50 56.00 50.00 47.00 53.25 57.00 54.00 54.00 • Sample mean of x values: x = 47.938; sample mean of y values: y = 4.646 • Sums of Squares SSxx n X = (xi − x)2 = 615.75 SSxy i=1 n X = (xi − x)(yi − y) = 75.386 i=1 Thus b = SSxy = 0.122 β 1 SSxx The estimate of the residual error variance is σ b2 = b =y−β b x = −1.223 β 0 1 b ,β b ) SSE(β 2.721 0 1 = = 0.091 n−2 30 Blood Viscosity vs PCV: Least−squares fit ^ β1 = 0.122 4.0 4.5 Viscosity 5.0 5.5 ^ β0 = − 1.223 40 45 50 PCV 2 55 Viscosity y 5.12 5.15 5.17 5.18 5.38 5.77 5.90 5.90