Drawing the “line of best fit

advertisement
Section 3.5
The Line of Best Fit 313
3.5 The Line of Best Fit
When gathering data in the real world, a plot of the data often reveals a “linear trend,”
but the data don’t fall precisely on a single line. In this case, we seek to find a linear
model that approximates the data. Let’s begin by looking at an extended example.
Aditya and Tami are lab partners in Dr. Mills’ physics class. They are hanging
masses from a spring and measuring the resulting stretch in the spring. See Table 1
for their data.
m (mass in grams)
x (stretch in cm)
10
6.8
20
10.2
30
13.9
40
21.2
50
24.2
Table 1. Aditya and Tami’s data set.
The goal is to find a model that describes the data, in both the form of a graph
and of an equation. The first step is to plot the data. Recall some of the guidelines
provided in the first section of the current chapter.
Guidelines. When plotting real data, we follow these guidelines.
1. You don’t want small graphs. It’s best to scale your graph so that it fills a full
sheet of graph paper. This will make it much easier to read and interpret the
graph.
2. You may have different scales on each axis, but once chosen, you must remain
consistent.
3. You want to choose a scale which facilitates our first objective, but which also
makes the data easy to plot.
Aditya and Tami are free to choose the masses which they hang on the spring.
Hence, the mass m is the independent variable. Consequently, we will scale the horizontal axis to accommodate the mass. The distance the spring stretches depends upon
the amount of mass that is hanging from the spring, so the distance stretched x is
the dependent variable. We will scale the vertical axis to accommodate the distance
stretched.
On the horizontal axis, we need to fit the masses 10, 20, 30, 40, and 50 grams. To
avoid a smallish graph, we will let every 5 boxes represent 10 grams. On the vertical
axis, we need to fit distances ranging from 6.8 centimeters up to and including 24.2
centimeters. Making each box represent 1 cm gives a nice sized graph and will allow
for easy plotting of our data points, which we’ve done in Figure 1(a).
Note the linear trend displayed by the data in Figure 1(a). It’s not possible to
draw a single line that will pass through every one of the data points, so a linear model
will not exactly “fit” the data. However, the data are “approximately linear,” so let’s
try to draw a line that “nearly fits” the data.
It is not our goal here to try to draw a line that passes through as many data points
as possible. If we do, then we are essentially saying that the points through which
Version: Fall 2007
314
Chapter 3
Linear Functions
the line does not pass have no influence on the overall model, nor do they have any
influence on any predictions we might make with our model. This is not a reasonable
assumption.
Indeed, the goal is to draw a line that comes as close to as many points as possible.
Some points will lie above the line, some will lie below, and what we’ll try to do is
“balance” the overestimates and the underestimates in an attempt to minimize the
overall error. The best way to do this is to take a clear plastic ruler, something you can
see through, and rotate and shift the ruler until you think you have a line that balances
the overestimates and underestimates. We’ve done this for you in Figure 1(b). The
resulting line is called the “line of best fit.”
x (cm)
x (cm)
25
25
20
20
15
15
10
10
5
5
0
10
10
20
30
40
(a) Scaling the axes
and plotting the data.
50
m (g)
0
10
10
20
30
40
50
m (g)
(b) Drawing the “line of best fit.”
Figure 1.
We can use the “line of best fit” in Figure 1(b) to make predictions. For example,
if we wanted to predict how much the spring will stretch when Aditya and Tami attach
a 22 gram mass, then we would locate 22 grams on the horizontal axis, draw a vertical
line upward to the “line of best fit,” followed by a horizontal line to the vertical axis,
as shown in Figure 2(a). Note that the x-value on the vertical axis appears to be
approximately 11.6 centimeters.
Alternatively, we will develop an equation model. First, select two points on the
“line of best fit” using the following guidelines.
Guidelines.
1. Pick two points on the “line of best fit” that are not data points.
2. Try to pick points passing through a lattice point of the grid. It makes interpreting the coordinates of the point a lot easier.
3. The further apart the two selected points, the better the accuracy. Don’t pick
points that are too close together.
Version: Fall 2007
Section 3.5
x (cm)
x (cm)
25
25
20
20
15
15
11.6
10
10
5
5
The Line of Best Fit 315
Q(36,18)
0
10
10
22
30
40
50
m (g)
P (12,7)
0
10
(a) Predicting the stretch
when the mass is 22 grams.
10
20
30
40
50
m (g)
(b) Pick two points on the
line that are not data points.
Figure 2.
In Figure 2(b), we’ve selected points P (12, 7) and Q(36, 18). The first point indicates that a mass of 12 grams stretches the spring 7 centimeters. The interpretation
for the second point is similar. We can find the slope of the line through the points P
and Q with the slope formula.
m=
∆x
18 cm − 7 cm
11 cm
=
=
.
∆m
36 g − 12 g
24 g
The slope of the line is the rate at which the distance stretched is changing with respect
to how the mass is changing. In this case, for every additional 24 grams of mass that
is hung, the spring stretches an additional 11 centimeters.
The next step is to use the point-slope formula to determine the equation of the
line.
y − y0 = m(x − x0 )
(1)
Let’s use point P (12, 7). That is, set (x0 , y0 ) = (12, 7). Substitute m = 11/24, x0 = 12,
and y0 = 7 into equation (1) to obtain
y−7=
11
(x − 12).
24
(2)
In the spring-mass application, the dependent variable is x, not y, and the independent
variable is m, not x. Replace the y on the left-hand side of equation (2) with x, then
replace x on the right-hand side of equation (2) with m to obtain
x−7=
11
(m − 12).
24
(3)
Solve equation (3) for x.
Version: Fall 2007
316
Chapter 3
Linear Functions
11
m−
24
11
x=
m−
24
11
x=
m−
24
11
x=
m+
24
x−7 =
132
24
132
+7
24
132 168
+
24
24
36
24
Reduce 36/24 to 3/2 to obtain
x=
11
3
m+ .
24
2
Recall that x represents the distance stretched and m represents the amount of
mass hung from the spring. That is, x is a function of m. We can use function notation
to write the last equation as follows.
x(m) =
3
11
m+
24
2
(4)
We can use the model in equation (4) to determine the amount of stretch when a
mass of 22 grams is attached to the spring. Substitute m = 22 in equation (4), then
use a calculator to approximate the stretch in the spring.
x(22) =
3
11
(22) + ≈ 11.6 cm
24
2
Note the agreement with the graphical solution found in Figure 2(a). Readers should
understand that this kind of accuracy is not the usual norm. There are a number of
factors that can introduce error.
•
Aditya and Tami might not have taken accurate measurements in the lab, so the
data could be flawed to begin with.
•
There could be errors made when we scale the axes and plot the data.
•
The “eyeball” line of best fit that we drew was very subjective. A slight rotation
or translation of the ruler during the drawing of the supposed “line of best fit” can
produce different results.
•
Our calculations could contain mistakes and round-off error.
Using the Graphing Calculator to Find the Line of Best Fit
Statisticians have developed a particular method, called the “method of least squares,”
which is used to find a “line of best fit” for a set of data that shows a linear trend. The
algorithm seeks to find the line that minimizes the total error. These algorithms are
programmed into the graphing calculator and are available for student use.
To use the graphing calculator to determine the line of best fit, the first thing you
have to learn how to do is load the data from Table 1 into your calculator.
Version: Fall 2007
Section 3.5
•
•
•
The Line of Best Fit 317
Locate and push the STAT button on your keyboard, which will open the menu
shown in Figure 3(a).
Select 1:Edit from this menu, which will open the edit window shown in Figure 3(b). 1
Enter the data from Table 1 into lists L1 and L2 , as shown in Figure 3(c)
(a)
Figure 3.
(b)
(c)
Enter the data from Table 1 into lists L1 and L2 in your graphing calculator.
The next step is to plot the data you’ve entered into lists L1 and L2 .
•
Press the 2ND key, followed by STAT PLOT (located above the Y= menu). This opens
the window shown in Figure 4(a).
• Select 1:Plot1 to open the plot selection window shown in Figure 4(b).
• In the plot selection window of Figure 4(b), there are several things you need to
check.
1. Use the arrow keys to place the cursor over the word “On” and press the ENTER
key to highlight this selection.
2. There are six “Types” of plots: scatterplot, lineplot, histogram, modified box
plot, box plot, and normal probability plot. These choices are arranged in two
rows of three plots. Move your cursor to the first plot of the first row, the
scatterplot, then press the ENTER key to highlight your selection.
3. The next selection is the XList. This is the list that goes on the horizontal axis.
In the case of Table 1, we want to place the mass data on the horizontal axis.
We entered the mass data in list L1 , so enter 2ND L1 (L1 is located above the 1
on the keyboard).
4. The next selection is the Ylist. Enter 2ND L2 (L2 is located above the 2 on the
keyboard). This lists the distance stretched and will be placed on the vertical
axis.
5. The last item is the marker. Choose the first one with the arrow keys (it’s the
easiest to see) and press the ENTER key to highlight this choice.
• Push the ZOOM button on the first row of keys on your keyboard. Use the arrow keys
to scroll the menu downward until you can select 9:ZoomStat. This will produce
the image shown in Figure 4(c).
1
You may have to clear out existing data sets. The easiest way to do this is to use the arrow keys on your
calculator to move the cursor into the header of the column, press the CLEAR button on your keyboard,
followed by the ENTER key. This should clear the data out of the corresponding column.
Version: Fall 2007
318
Chapter 3
Linear Functions
(a)
(b)
Figure 4.
(c)
Plotting the data points from Table 1
The final step is to calculate and plot the line of best fit.
•
Press the STAT button again, but then use the right-arrow to select the CALC submenu highlighted in Figure 5(a).
• Select 4:LinReg(ax+b) from the CALC submenu. 2 This places the command LinReg(ax+b) on your home screen, as shown in Figure 5(b). You must then type
2ND L1 , a comma (located on its own key just above the 7 key), then 2ND L2 , as
shown in Figure 5(b).
• Press the ENTER key to execute the command LinReg L1 , L2 , which produces the
equation of the line of best fit shown in Figure 5(c).
(a)
Figure 5.
(b)
(c)
Finding the equation of the line of best fit.
The screen in Figure 5(c) is quite informative. It tells us two things.
1. The equation of the line of best fit is y = ax + b.
2. The slope is a = .458 and the y-intercept is b = 1.52.
Substituting a = 0.458 and b = 1.52 into the equation y = ax + b gives us the
equation of the line of best fit.
y = 0.458x + 1.52
(5)
We can superimpose the plot of the line of best fit on our data set in two easy steps.
•
•
2
Press the Y= key and enter the equation 0.458*X+1.52 in Y1 , as shown in Figure 6(a).
Press the GRAPH button on the top row of keys on your keyboard to produce the
line of best fit in Figure 6(b).
The technical name of the process for finding the line of best fit is linear regression. Hence, the
abbreviation LinReg.
Version: Fall 2007
Section 3.5
(a)
The Line of Best Fit 319
(b)
Figure 6. Superimpose the line of best fit on the scatterplot of the data from Table 1.
On the left-hand side of equation (5), replace y with x (the distance stretched);
on the right-hand side, replace x with m (amount of mass). This leads to the result
x = 0.458m + 1.52
(6)
You might recall that our hand calculation produced equation (4), which we repeat
here for convenience.
x=
11
3
x+ .
24
2
Note that 11/24 ≈ 0.4583 and 3/2 = 1.5, so equation (6) agrees closely with our
hand-calculated equation of the line of best fit.
It is rather unusual to have a hand-calculated line of best fit agree so closely with
the sophisticated and very accurate result produced by the graphing calculator. So,
don’t be disappointed when your homework results don’t match as nicely as they have
in this example. If you are in the ballpark with your hand-calculated equation for the
line of best fit, that will usually be good enough. However, if your hand-calculated
equation is not even close to what your calculator produces, it’s “back to the drawing
board.” Recheck your plot and your calculations. Be stubborn! Don’t be satisfied with
your results until you have reasonable agreement.
Version: Fall 2007
Download