Chapter 13 Simple Linear Regression and Correlation: Inferential Methods 13.1: Simple Linear Regression Model Deterministic Relationship: A descriptions of the relationship between two variables that are not deterministically related can be given by a probabilistic model. The equation for an additive probabilistic model is: The simple linear regression model assumes that there is a line with y-intercept α and slope β, called the population regression line. When a value of the independent variable x is fixed and an observation on the dependent variable y is made, y x 1 x 2 x Basic Assumptions of the Simple Linear Regression Model 1. 2. 1 3. 4. Weight Let’s look at the heights and weights of a population of adult women 60 62 64 66 68 Height We use 𝑦̂ = 𝑎 + 𝑏𝑥 to estimate the true population regression line. 2 Medical researchers have noted that adolescent females are much more likely to deliver low-birthweight babies than are adult females. Because low-birth-weight babies have higher mortality rates, a number of studies have examined the relationship between birth weight and mother’s age for babies born to young mothers. The following data is on x = maternal age (in years) and y = birth weight of baby (in grams). x y 15 2289 17 3393 18 3271 15 2648 16 2897 The statistic for estimating the variance 𝜎 2 is The estimate for the standard deviation 𝜎 is Why is the degree of freedom n-2? 𝑠𝑒 = 𝑟2 = 3 19 3327 17 2970 16 2535 18 3138 19 3573 Practice Problems: 1. Can the number of watts be used to help determine the price of a microwave? The table below contains data on microwave ovens from Target. Power(watts) 1100 700 700 1200 1200 1200 1200 1000 1000 1000 700 Price ($) 80 80 50 90 100 90 110 90 75 80 63 a. Describe in words the X and Y variables and write a simple linear regression model in context. b. Check that the basic assumptions are met. c. Estimate the slope and intercept of the line using your calculator. d. Interpret the slope of the line in context. e. Predict the price of a microwave that has a power of 900 watts. f. What range of possible power values are you comfortable using this model to predict price? Explain why. g. Obtain the residuals: Go to [Stat] "1: Edit". Select L3 with the arrow keys. [Enter] [2nd] "list". Scroll down and select RESID. [Enter] [Enter] again. Fill in the residuals in the table below: Power(watts) 1100 700 700 1200 1200 1200 1200 1000 1000 1000 700 Price ($) 80 80 50 90 100 90 110 90 75 80 63 RESID h. Compute SSResid: Obtain 1-Var Stats for the RESID list and SSResid is ∑ 𝑥 2: 4 i. What are the values of se and r2? j. Interpret se and r2 in the context of the problem. Homework: 1, 3-5, 8, 11 13.2: Inferences About the Slope of the Population Regression Line Properties of the Sampling Distribution of b (Since β is almost always unknown, it must be estimated from independently selected observations. The slope of b of the least-squares line give a point estimate for β.) When the four basic assumptions of the simple linear regression model are satisfied, the following statements are true: 1. 2. 3. Confidence Interval for β When the four basic assumptions of the simple linear regression model are satisfied, a confidence interval for β, the slope of the population regression line, has the form: Is cardiovascular fitness (as measured by time to exhaustion from running on a treadmill) related to an athlete’s performance in a 20-km ski race? The following data on x = treadmill time to exhaustion (in minutes) and y = 20-km ski time (in minutes) were taken from the article “Physiological Characteristics and Performance of Top U.S. Biathletes” (Medicine and Science in Sports and Exercise, 1995): x y 5 7.7 71.0 8.4 71.4 8.7 65.0 9.0 68.7 9.6 64.4 9.6 69.4 10.0 63.0 10.2 64.6 10.4 66.9 11.0 62.6 11.7 61.7 Find a 95% confidence interval for the slope of the true regression line. (LinRegTInt) Interpretation: Summary of Hypothesis Tests Concerning β Null hypothesis: Test Statistic: Alternative Hypothesis: 6 P-value: For this test to be appropriate the four basic assumptions of the simple regression model must be met: 1. 2. 3. 4. A slope of zero: The Model Utility Test for Simple Linear Regression The model utility test for simple linear regress is the test of: Test Statistic: Biathletes Revisited: (LinRegTTest) Hypotheses: t= P-value: α: df: Conclusion: 7 Practice Problems: 2. How is resting body temperature dependent on heart rate? The Minitab output below is an analysis of body temperature data from the Journal of Statistics Education Data Archive using a simple linear regression model. a. Perform a hypothesis test for the model utility, including stating the hypothesis, checking assumptions, setting a reasonable value for α, giving the formula for the test statistic, identifying the relevant values from the minitab output and making a conclusion. b. Construct a 95% confidence interval for the slope of the regression line (include all necessary information and justifications!) Homework: 13, 15, 18-21 8 13.3: Checking Model Accuracy The simple linear regression model is We must estimate these deviation using the residuals from the estimated line. Thus, we use the residuals to check our assumptions. Residual Analysis: Standardize the residuals to look at their magnitudes A Look at Standardized Residual Plots 9 Biathletes Revisited: r= sr= Normal Probability Plot: Standardized Residual Plot: Residual Plot: Practice Problems: 3. The Federal Trade Commission evaluates cigarettes to determine their tar and carbon monoxide contents. An analysis of a random sample of 35 brands using a simple linear regression model had the following results: 10 a. State the simple linear regression model and its assumptions. b. What is the equation of the estimated regression line? c. Identify the values of se and r2. How well does this model perform? d. Use the scatterplot to roughly check the model assumptions. e. Examine the five plots below. Which of the plots shows an unusual observation? Which of the plots shows a potentially influential observation? Which of the plots shows evidence of a noncontsant variance? Which of the plots shows evidence of a curvi-linear relationship? 11 f. Which of the following plots show evidence of non-normality? Explain. g. Do following residual plots of the cigarette data show any interesting features that lead to you feel the simple linear regression model assumptions have been met? Homework:27, 28, 31 12