Activity - Least Squares Regression and Residual Plots How many iPhones will be sold? Here is the data of all iPhone sales during their opening weekends: iPhone Year Original 3G 3Gs 4 4S 5 5C, 5S 6, 6 Plus 6S, 6S Plus 2007 2008 2009 2010 2011 2012 2013 2014 2015 Units Sold (millions) 0.5 1 1 1.7 4 5 9 10 13 1. Use stapplet.com (Two Quantitative Variables) to create a scatterplot of the data with year as the explanatory variable and units sold as the response. Sketch the scatterplot in the space above. 2. Describe the form of the relationship. Circle one: Linear/Nonlinear 3. Use the stapplet.com (Two Quantitative Variables) to find the least squares regression line. Write the equation below and graph it on your scatterplot above. y^ = 1.605x - 3222.6328 4. Use the least squares regression line to calculate the residual for 2007. Interpret the residual. y2007 = 1.605(2007) - 3222.6328 = -1.3978. The actual # of units sold was approx 1.4 mil greater 7. For which points was the actual greater than the predicted? Which were less than predicted? Identify these on the graph. 8. Do you think the regression line is a good fit for the data? Why or why not? Explain using the residual plot. Not particularly. The residual plot is curved due to the distribution being nonlinear. Fueleconomy.gov gives the city and highway fuel economy for all makes and models of vehicles back to 1984. The table gives the city and highway fuel economy (mpg) for a random sample of ten 2021 vehicles. City fuel economy (mpg) Highway fuel economy (mpg) 14.4 24.3 27.2 29.9 20.4 28.8 20.9 23.2 28.6 25.4 25.5 37.4 36.5 45.5 28.7 46.1 33.6 38.3 41.3 35.3 a. Calculate the equation of the least-squares regression line. y^ = 1.264x + 6.084 b. Make a residual plot for the linear model in Question 1. c. What does the residual plot indicate about the appropriateness of the linear model? Explain your answer. It shows that overall there is a positive linear association between City and Highway Fuel Economies. Outliers for Scatterplots How do outliers affect the LSRL? 1. Use the Correlation and Regression applet at www.tinyurl.com/regressionapplet ● Click on the graphing area to add 10 points in the lower-left corner so that the correlation is about r = 0.50. ● Check the boxes to show the LSRL and the mean X and Y lines. ● Sketch it below. 2. For each of the following situations add the point to the scatterplot and decide if the slope, y-intercept and correlation will increase or decrease. a. If a point is added on the far right side of the graph on the horizontal line for the mean of Y. Slope: Decrease y-intercept: Increase Correlation: Decrease b. If a point is added on the far left side of the graph on the horizontal line for the mean of Y. Slope: Decrease y-intercept: Increase Correlation: Decrease c. If a point is added below the LSRL on the vertical line for the mean of X. Slope: Same y-intercept: Decrease Correlation: Decrease d. If a point is added above the LSRL on the vertical line for the mean of X. Slope: Same y-intercept: Increase Correlation: Decrease 3. Which outliers had the greatest impact on the LSRL, vertical or horizontal outliers? Horizontal outliers Check Your Understanding: You’ve probably heard the saying “Practice makes perfect!”, but does practice also help you complete a task faster? A study was conducted to find out. A random sample of 15 high school students were taught how to solve a Rubik’s cube. Then they were each randomly assigned a number of times to practice this new skill. After they completed their assigned number of practices they were timed solving the Rubik’s cube. Here is a scatterplot of the results along with the least-squares regression line. a. Describe the influence the student who was assigned to practice following the steps to solve a Rubik’s cube 14 times has on the equation of the least-squares regression line. It forces the slope closer to 0 and decreases the y-intercept b. Describe the influence the student who was assigned to practice following the steps to solve a Rubik’s cube 14 times has on the standard deviation of the residuals and r2. Because it has a large residual it makes the standard deviation greater and the r2 smaller c. The mean and standard deviation of the number of practices are 𝑥 = 8 practices and sx = 4.47 practices. The mean and standard deviation of time are 𝑦 = 7.71 minutes and sy = 1.20 minutes. The correlation between number of practices and time to solve the Rubik’s cube is r = –0.793. Find the equation of the least-squares regression line for predicting time to solve the Rubik’s cube from the number of practices. a = ȳ + bx̄ b = r*s_y/s_x y = a + bx y = ȳ + r*s_y/s_xx̄ + r*s_y/s_xx y = 7.71 + (-0.793*1.2/4.47)(8) + (-0.793*1.2/4.47)x y = 6 - 0.212885906x