Chapter 12 Section 1 Inference for Linear Regression Inference for Linear Regression Students will be able to check conditions for performing inference about the slope (beta) for the population (true) regression line. to interpret computer output from a least-squares regression analysis to construct and interpret a confidence interval for the slope (beta) of the population (true) regression line. to perform a significance test about the slope (beta) of a population (true) regression line. Inference for Linear Regression Observing the scatter plot on pp. 739, the line that is draw out is known as the population regression line due to it using all the observations. If we take sample size out of the population (still use the equation (y (phat) = a + bx) for the sample regression line. More than likely the slope of the sample will vary on your choice of samples. The pattern of variation in the slope b is described by its sampling distribution. Sampling Distribution of b Confidence intervals and significance tests about the slope of the population regression line is based upon the sampling distribution of b, the slope of the sample regression line. Sampling Distribution of b Describing the approximate sampling distribution: Shape – a strong linear pattern in the graph tells that the approximate sampling distribution is close to Normal. Center – calculate the mean: as long as the mean of the sample is close to the mean of the population, then you are good. Spread – calculate the standard deviation: same rules of the center applies Conditions for Regression Inference Conditions: Linear – the actual relationship between x and y is linear. For any fixed value of x, the mean response (mhew), falls on the population (true) regression line mhewx = alpha + betax. The slope beta and intercept alpha are usually unknown parameters. Independent – individual observations are independent of each other (one does not effect the other) Normal – for any fixed value of x, the response y varies according to a normal distribution. Equal variance – the standard deviation of y (call it sigma) is the same for all values of x. The common standard deviation sigma is usually an unknown parameter. Random – the data come from a well-designed random sample or randomized experiment. Conditions for Regression Inference Regression model tells us: a linear regression tells us whatever x does it concludes with a predicted y value. **** Remember to always check conditions before doing inference about the regression model. Take a look at example on pp. 743 - 744 Estimating the Parameters When conditions are met, we can proceed to calculating the unknown parameters. If we calculate the least-square regression line, the slope is an unbiased estimator of the true slope and the y intercept is an unbiased estimator of the true y intercept. The remaining parameter is the standard deviation (sigma), which describes the variability of the response y about the population (true) regression line. Residuals estimate how much y varies about the population line. The standard deviation of responses about the population regression line, we estimate standard deviation using the formula at the top of page 745 Estimating the Parameters Take a look at example on pp. 745 It is possible to do inference about any of the three parameters. However, the slope (beta) is usually the most important parameter in a regression problem. So try to stick with that one. Sampling Distribution of b For spread – since we do not know the standard deviation, then we estimate using the standard deviation of the residuals. Then we estimate the spread of the sampling distribution of b with the standard error of the slope (formula on pp. 746) If we transform to a formula we use (the middle formula on pp. 746) which translates to the last formula (use this one). Now when calculating the degrees of freedom take the “n” value and subtract 2 from it (we use 2 instead of 1 – explanation is deeper and complicating). Constructing a Confidence Interval for the Slope The slope (beta) is the rate of change of the mean response as the explanatory variable increases. Mhew x = alpha + beta x A confidence interval is more useful than the point estimate because it shows how precise the estimate b is likely to be. (Statistic) plus/minus (critical value) * (standard deviation) B plus/minus t * Seb Take a look at yellow box on pp. 747 and example on pp. 747-748 Performing a Significance Test for the Slope Null hypothesis has the general form H0 : beta = hypothesized value. To do the test: Test statistic = (statistic – parameter) / (standard deviation of statistic) T = b – beta0 / Seb To find p-value, use t distribution with n – 2 Take a look at yellow box on pp. 751 Take a look at the remainder of the examples in this section for clarification