Bell Ringer A random sample of records of sales of homes from Feb. 15 to Apr. 30, 1993, from the files maintained by the Albuquerque Board of Realtors gives the Price and Size (in square feet) of 117 homes. A regression to predict Price (in thousands of dollars) from Size has r = 0.84. The residuals plot indicated that a linear model is appropriate. a)What are the variables and units in this regression? b)What units does the slope have? c)Do you think the slope is positive or negative? Linear Regression Recall that a residual is the difference between an observed value and the predicted value. 𝑒=𝑦 − 𝑦 The standard deviation of the residuals gives us a measure of how much the points spread around the regression line. 𝑠𝑒 = 𝑒2 𝑛−2 • r is the correlation coefficient • If we square r, we get the portion of the variation in “y” accounted for by variation in “x” 𝟐 𝒓 = 𝒗𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 𝒂𝒄𝒄𝒐𝒖𝒏𝒕𝒆𝒅 𝒇𝒐𝒓 AKA “coefficient of determination” 𝟏− 𝟐 𝒓 = 𝒗𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 𝒍𝒆𝒇𝒕 𝒊𝒏 𝒕𝒉𝒆 𝒓𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔 Example The correlation between a cereal’s fiber and potassium contents is r = 0.903. What fraction of the variability in potassium is accounted for by the amount of fiber that servings contain? About 81.5% of the variability in potassium content is accounted for by the model. The regression model for fiber (in grams) and potassium content (in mg) based on 77 breakfast cereals is 𝑃𝑜𝑡𝑎𝑠𝑠𝑖𝑢𝑚 = 38 + 27𝐹𝑖𝑏𝑒𝑟. What does it mean if 𝑠𝑒 = 30.77? True potassium content of cereals vary from the predicted values with a standard deviation of 30.77 milligrams. The notation that is typically 2 used is 𝑅 We express 𝑅 as a percent between 0% and 100% 2 A random sample of records of sales of homes from February 15 to April 30, 1993, from the files maintained by the Albuquerque Board of Realtors gives the Price and Size (in square feet) of 117 homes. A regression to predict Price (in thousands of dollars) from Size has an R-squared of 71.4%. The residuals plot indicated that a linear model is appropriate. a)What are the variables and units in the regression? The explanatory variable (x) is size, measured in square feet, and the response variable (y) is price measured in thousands of dollars. b)What units does the slope have? The units of the slope are thousands of dollars per square foot. c) Do you think the slope is positive or negative? The slope of the regression line predicting price from size should be positive. Bigger homes are expected to cost more. From the bell ringer example: A regression to predict Price (in thousands of dollars) from Size has an R-squared of 71.4%. The residuals plot indicated that a linear model is appropriate. a)What is the correlation between Size and Price? The correlation between size and price is 𝑟 = .714 = 0.845. The positive value of the square root is used, since the relationship is believed to be positive. b)What would you predict about the Price of a home 1 standard deviation above average in Size? The price of a home that is one standard deviation above the mean size would be predicted to be 0.845 standard deviations (in other words r standard deviations) above the mean price. c) What would you predict about the Price of a home 2 standard deviations below average in Size? The price of a home that is two standard deviations below the mean size would be predicted to be 1.69 (or 2 x 0.845 ) standard deviations below the mean price. Engine sizes (called displacement) measure the volume of the cylinders in cubic inches. The regression analysis of gasoline use and displacement is shown. The constant is the y-intercept of the regression line. Engine sizes (called displacement) measure the volume of the cylinders in cubic inches. The regression analysis of gasoline use and displacement is shown. The independent (explanatory) variable is paired with the slope of the regression line. Engine sizes (called displacement) measure the volume of the cylinders in cubic inches. The regression analysis of gasoline use and displacement is shown. The equation of the regression line: 𝑓𝑢𝑒𝑙 𝑒𝑐𝑜𝑛𝑜𝑚𝑦 = 34.9799 − 0.066196 ∙ 𝑒𝑛𝑔𝑖𝑛𝑒𝑠𝑖𝑧𝑒 Engine sizes (called displacement) measure the volume of the cylinders in cubic inches. The regression analysis of gasoline use and displacement is shown. The only other information we need at this time: n and r-squared (take the square root for r). 1) How many cars were included in this analysis? 2) What is the correlation between engine size and fuel economy? 3) A car you are thinking of buying is available with two different size engines, 190 cubic inches or 240 cubic inches. How much difference might this make in your gas mileage? Answers: 1) 89 2) r = -0.78 3) 19.1 mpg for 240 cubic inches or 22.4 mpg for 190 cubic inches – a difference of 3.3 mpg Today’s Assignment: Be sure to read Chapter 8 Add to HW: p. 192 #8, 10, 16, 18, 20, 22