Correlation and Linear Regression Analyzing Bivariate Data With Fathom* CFU 3102.5.10 Using technology with a set of contextual linear data to examine the line of best fit; determine and interpret the correlation coefficient. Andy Wilson – APSU – wilsona@apsu.edu Jackie Vogel – APSU – vogelj@apsu.edu *Adapted from Preparing to Teach Mathematics with Technology by Lee, Hollebrands & Wilson, 2012 Correlation and Linear Regression Open the 2006Vehicle.ftm file. Is there a relationship between City mpg and Hwy mpg? Click on the 2006 Vehicles Collection Icon and then drag a table to the workspace. Be sure “Prevent Changing Values in Graphs” is selected in the Collection menu. Explore the data set. Drag two blank graphs to the workspace and create dot plots for each attribute by dragging each column heading in the table to each x-axis on the graphs. Correlation and Linear Regression Your screen should look something like this. Correlation and Linear Regression Click on a point in one of the graphs and notice what happens in the other graph and the table. You can also select several cases by shift-clicking or clicking and dragging a rectangle around multiple points. What do you anticipate might be a reasonable relationship between City mpg and Hwy mpg? In the upper right corner of each graph, use the pull down menu to display box plots. Click on the lower whisker in one box plot and notice the location of the highlighted cases in the other box plot. Repeat for other cases. What do you notice? Correlation and Linear Regression Drag the label (Hwy) from the x-axis and drop it on the y-axis. Change the window sizes and orient the two box plots as shown. Correlation and Linear Regression Drag a new graph to the workspace and drag Hwy from the table to the y-axis and City to the x-axis to make a Scatterplot as shown. Correlation and Linear Regression Choose the graph window and then Plot Value Mean (City) and Plot Function Mean (Hwy). Match up the scales by dragging. Correlation and Linear Regression Use form (linear, quadratic, exponential, etc), direction (positive or negative) and strength (weak, moderate or strong) to describe the relationship between City and Hwy mpg. Describe the location of the data points in relation to the mean City and mean Hwy mpg. What does this tell you about the general trend of the data? The correlation coefficient, r, is given below. Use the scatterplot to discuss the sign of r (positive or negative) for the mpg data. r x x y y x x y y i i 2 i 2 i Correlation and Linear Regression Open the Correlation.ftm file. Move the slider back and forth and notice how the scatterplot changes with the value of r. Use the slider to create a scatterplot that can help you estimate a value of the correlation coefficient for the relationship between City mpg and Hwy mpg. Correlation and Linear Regression The linear correlation coefficient, r, measures the strength and direction of the linear association between two variables. • between -1 and +1 • Positive, negative, or no association Correlation and Linear Regression Go back to the 2006Vehicle.ftm file. Drag an empty Summary object to the workspace. Click and drag the City and Hwy attributes as shown. The default measure that will be displayed is the correlation between the two attributes. Compare the calculated correlation coefficient with the one you estimated using the Correlation.ftm file. What does this value of the correlation coefficient imply about the relationship between City and Hwy mpg? Correlation and Linear Regression Since we have high correlation, it makes sense to try to use a linear function to model the vehicle data. Click on the scatterplot and under the Graph menu, select Add Movable Line. Dragging the line by its middle changes the intercept (translates the line) while dragging by either end changes the slope (rotates the line). Note that the equation is displayed at the bottom of the window. Adjust your line so that you feel it best models the data. Interpret the slope and y-intercept in the equation of your linear model. Use your equation to predict the Hwy mpg for a vehicle with City mpg of 31. Is the slope of your line the same as the value of r? Should they be the same? Why or why not? Correlation and Linear Regression One common method that is used for finding a linear model is to minimize the deviations of the actual data points from the predicted values. Visually, these are the vertical distances between the actual data points and the line. A linear model that minimizes the sum of the squares of these residuals in called the Least Squares regression line. Correlation and Linear Regression Double click below the scatterplot on the equations containing mean (Hwy) and mean (City), delete each expression and click OK. This will remove the lines for the means which is necessary for the next step. Click on the graph with the movable line and select Show Squares from the graph menu. Notice the sum of the squares is computed and displayed below the equation of the line. Manipulate the movable line to minimize the sum of squares. Correlation and Linear Regression Click on the graph and select Least-Squares Line from the graph menu. Compare the function rule for the least squares line with the function rule for your estimated linear model (movable line). Also compare the sum of squares. From the graph menu, choose Remove Movable Line. Use the regression line to find a predicted Hwy mpg for the Ford Ranger Manual and Ford Ranger Automatic? Is the least squares line a good model for the 2006 Vehicle data?