Linear Regression One Double Whopper with cheese provides 53 grams of protein, 1020 calories, and 65 grams of fat. The correlation between Fat and Protein for 30 of the items on the Burger King menu is 0.83 The fat content and calories for a Double Whopper are extreme, but are they an outlier? The association between protein and fat is a positive linear association with a fairly strong correlation of 0.83 If you want 25 grams of protein from a BK item, how much fat should you expect to consume? We can use a linear model to predict values. Regression Line π¦ = π0 + π1 π₯ π1 = π ππππ; π0 = y-intercept (Also known as the least squares line or line of best fit) Slope b1 ο½ r sy sx y-intercept b 0 ο½ y ο b1 x “Putting a hat on it” is notation to indicate that something has been predicted by a model. π¦ ππ ππππ ππ "π¦ βππ‘" Before using a regression model, we need to check the same conditions as we do for correlation: οΌQuantitative variables οΌStraight pattern οΌCheck for outliers This model says that our predictions follow a straight line. The equation is given in slope-intercept form. For the association of fat and protein of Burger King items, the estimated linear model is: πππ‘ = 6.8 + 0.97ππππ‘πππ Example πππ‘ = 6.8 + 0.97ππππ‘πππ What is the predicted amount of fat for the BK Broiler chicken sandwich, which has 30 grams of protein? πππ‘ = 6.8 + 0.97 30 = 35.9π If we convert the data into z-scores, the scatterplot shifts to the origin. The origin is where both z-scores are 0. A zscore of 0 would happen at the mean. When the variables are standardized, the slope of the line turns out to be r. Moving one standard deviation away from the mean in one variable moves our estimate r standard devations from the mean in the other variable. Example A scatterplot of house price (in thousands of dollars) vs. house size (in thousands of square feet) for houses sold recently in Saratoga, NY shows a relationship that is straight, with only moderate scatter and no outliers. The correlation between Price and Size is 0.77. 1)You go to an open house and find that the house is 1 standard deviation above the mean in size. What would you guess about its price? 2)You read an ad for a house priced 2 standard deviations below the mean. What would you guess about its size? 3)A friend tells you about a house whose size in square meters is 1.5 standard deviations above the mean. What would you guess about its size in square feet? Answers 1)You should expect the price to be 0.77 standard deviations above the mean. 2)You should expect the size to be 2(0.77) = 1.54 standard deviations below the mean. 3)The home is 1.5 standard deviations above the mean in size no matter how it is measured. Residuals We predict that a BK Broiler chicken sandwich with 30 grams of protein should have 36 grams of fat, but it actually only has 25 grams of fat. The difference between the observed value and the predicted value is called the residual. π¦ − π¦ = 25 − 36 = −11 π οΆ A negative residual means the observed value is below the prediction on the regression line. οΆ A positive residual means the observed value is above the prediction on the regression line. If we keep the x-values and replace the y-values with the residuals, the resulting scatterplot has no pattern or direction. Example The linear model for Saratoga homes uses the Size and Price: πππππ = −3.117 + 94.454πππ§π. 1)What does the slope of 94.454 mean? 2)What are the units of the slope? 3)Your house is 2000 sq ft bigger than your neighbor’s house. How much more do you expect it to be worth? 4)Is the y-intercept of -3.117 meaningful? Answers 1)An increase in home size of 1000 sq ft is associated with an increase in price of $94,454 2)Units are in thousands of dollars per thousand square feet 3)About $188,908 4)No… You can’t have a zero square ft. house Example The linear model for Saratoga homes uses the Size and Price: πππππ = −3.117 + 94.454πππ§π. Suppose you’re thinking of buying a home there. 1)Would you prefer to find a home with a negative or positive residual? Explain. 2)You plan to look for a home of about 3000 square feet. How much should you expect to have to pay? 3)You find a nice home that size which is selling for $300,000. What’s the residual? Answers 1)Negative; that indicated it’s priced lower than a typical home of its size. 2)$280,250 3)$19,755 Today’s Assignment ο± Add to Homework #5: p. 192 #1-6, 11, 12