Uploaded by I am not OK

Least Squares Regression and Residual Plots & Outliers for Scatterplots UPLOAD - Google Docs

advertisement
Activity - Least Squares Regression and Residual Plots
How many iPhones will be sold?
Here is the data of all iPhone sales during their opening weekends:
iPhone
Year
Original
3G
3Gs
4
4S
5
5C, 5S
6, 6 Plus
6S, 6S Plus
2007
2008
2009
2010
2011
2012
2013
2014
2015
Units Sold
(millions)
0.5
1
1
1.7
4
5
9
10
13
1. Use stapplet.com (Two Quantitative Variables) to create a scatterplot of the data with year as the
explanatory variable and units sold as the response. Sketch the scatterplot in the space above.
2. Describe the form of the relationship. Circle one: Linear/Nonlinear
3. Use the stapplet.com (Two Quantitative Variables) to find the least squares regression line.
Write the equation below and graph it on your scatterplot above.
y^ = 1.605x - 3222.6328
4. Use the least squares regression line to calculate the residual for 2007. Interpret the residual.
y2007 = 1.605(2007) - 3222.6328 = -1.3978.
The actual # of units sold was approx 1.4 mil greater
7. For which points was the actual greater than the predicted? Which were less than predicted?
Identify these on the graph.
8. Do you think the regression line is a good fit for the data? Why or why not? Explain using the
residual plot.
Not particularly. The residual plot is curved due to the distribution being nonlinear.
Fueleconomy.gov gives the city and highway fuel economy for all makes and models of vehicles back
to 1984. The table gives the city and highway fuel economy (mpg) for a random sample of ten 2021
vehicles.
City fuel economy (mpg)
Highway fuel economy
(mpg)
14.4
24.3
27.2
29.9
20.4
28.8
20.9
23.2
28.6
25.4
25.5
37.4
36.5
45.5
28.7
46.1
33.6
38.3
41.3
35.3
a. Calculate the equation of the least-squares regression line.
y^ = 1.264x + 6.084
b. Make a residual plot for the linear model in Question 1.
c. What does the residual plot indicate about the appropriateness of the linear model? Explain
your answer.
It shows that overall there is a positive linear association between City and Highway Fuel
Economies.
Outliers for Scatterplots
How do outliers affect the LSRL?
1. Use the Correlation and Regression applet at www.tinyurl.com/regressionapplet
● Click on the graphing area to add 10 points in the lower-left corner so that the
correlation is about r = 0.50.
● Check the boxes to show the LSRL and the mean X and Y lines.
● Sketch it below.
2. For each of the following situations add the point to the scatterplot and decide if the
slope, y-intercept and correlation will increase or decrease.
a. If a point is added on the far right side of the graph on the horizontal line for the
mean of Y.
Slope: Decrease
y-intercept: Increase
Correlation: Decrease
b. If a point is added on the far left side of the graph on the horizontal line for the mean
of Y.
Slope: Decrease
y-intercept: Increase
Correlation: Decrease
c. If a point is added below the LSRL on the vertical line for the mean of X.
Slope: Same
y-intercept: Decrease
Correlation: Decrease
d. If a point is added above the LSRL on the vertical line for the mean of X.
Slope: Same
y-intercept: Increase
Correlation: Decrease
3. Which outliers had the greatest impact on the LSRL, vertical or horizontal outliers?
Horizontal outliers
Check Your Understanding:
You’ve probably heard the saying “Practice makes perfect!”, but does practice also help you
complete a task faster? A study was conducted to find out. A random sample of 15 high school
students were taught how to solve a Rubik’s cube. Then they were each randomly assigned a
number of times to practice this new skill. After they completed their assigned number of practices
they were timed solving the Rubik’s cube. Here is a scatterplot of the results along with the
least-squares regression line.
a. Describe the influence the student who was
assigned to practice following the steps to
solve a Rubik’s cube 14 times has on the
equation of the least-squares regression line.
It forces the slope closer to 0 and decreases the
y-intercept
b. Describe the influence the student who was assigned to practice following the steps to
solve a Rubik’s cube 14 times has on the standard deviation of the residuals and r2.
Because it has a large residual it makes the standard deviation greater and the r2 smaller
c. The mean and standard deviation of the number of practices are 𝑥 = 8 practices and
sx = 4.47 practices. The mean and standard deviation of time are 𝑦 = 7.71 minutes
and sy = 1.20 minutes. The correlation between number of practices and time to
solve the Rubik’s cube is r = –0.793. Find the equation of the least-squares
regression line for predicting time to solve the Rubik’s cube from the number of
practices.
a = ȳ + bx̄
b = r*s_y/s_x
y = a + bx
y = ȳ + r*s_y/s_xx̄ + r*s_y/s_xx
y = 7.71 + (-0.793*1.2/4.47)(8) + (-0.793*1.2/4.47)x
y = 6 - 0.212885906x
Download