Modeling Nonlinear Data: Logarithmic and Power Transformations

advertisement
Modeling Nonlinear Data: Logarithmic and Power Transformations
AP Statistics – Section 4.1
I. Logarithmic Transformations and Exponential Functions
In this section, we will be looking at modeling data that is nonlinear. We begin with some background material as a review.
Definition
An exponential function is a function of the form y = abx, where a and b are constants and b  1.
As you text points out, "A variable grows exponentially if it is multiplied by a fixed number greater than 1 in each equal time
period. Exponential decay occurs when the factor is less than 1."
This is what we call the add-multiply property of exponential functions, which we will illustrate below. Before we do this,
notice what happens when we take a logarithm of both sides of the equation y = abx. Here we'll use a common logarithm
(base 10), although a natural logarithm (base e) or any other base logarithm would be fine.
y = abx
log y = log(abx)
log y = log a + log bx
log y = log a + x log b
original exponential function
by taking logs of both sides
by laws of logs: log of a product = sum of logs
by laws of logs: log of power = exponent multiplied by log of base
Since log a and log b are both constant, the result highlighted in bold above is a linear function expressing log y in terms of
x. The next example will make use of this important fact.
But first, we've seen two of the three laws of logs in the derivation above. What is the third?
Example 1 (IPS): Exact Exponential Growth and Grains of Rice on a Chess Board
A clever courtier, offered a reward by an ancient king of Persia, asked for a grain of rice on the first square of a chess board,
2 grains on the second square, then 4, 8, 16, and so on.
a.
Make a table of the numbers of grains on each of the first 10 squares of the chess board.
b.
Plot the number of grains on each square against the number of the square for the first 10 squares and connect the
points with a smooth curve. This is an exponential curve.
c.
How many grains of rice should the king deliver for the 64th (and final) square?
d.
Take the logarithm of each of your numbers of grains from (a). Plot these numbers against the numbers from 0 to 9
or 1 to 10. You should get a straight line.
Page 1 of 9
e.
Let x = the number of the square on the chess board and y = the number of grains of rice on that square. Calculate
the regression equation which expresses log yˆ in terms of x. Remember that the equation should begin with log yˆ .


f.
Calculate the correlation coefficient and check the residual plot for this regression. How good a fit is this equation to
the data (log y vs. x)?
g.
We would now like to find the actual equation that shows the relationship between the original variables x and y. Do
this by solving the equation you found in (e) for yˆ .

We have just performed what is a called a logarithmic transformation to our original set of data. This was done in order to
give us a data set whose scatterplot was approximately linear in shape. That, in turn, allowed us to use our techniques for
linear regression to find an equation that would allow us to predict the value of y for a given value of x.
Note: There is a technique called exponential regression that could have given us the result from part (g) in less time and with
less effort. As a matter of fact, it can be done with the TI graphing calculators. We're not covering it at this time because the
only regression topic on the AP exam is linear regression.
We shall summarize now.
Logarithmic Transformation
If the ordered pairs (x, y) in a data set display a graph with an approximately exponential shape, then the graph of the ordered
pairs (x, log y) will display a graph with an approximately linear shape. The equation of this line can be approximated using
linear regression and the resulting equation can be solved for yˆ using algebra.
Steps Used in a Logarithmic Transformation
1. Graph the original data set. If the shape is approximately exponential, proceed to step #2.

2.
Plot the ordered pairs (x, log y). The shape should be approximately linear if we want to use the linear regression
procedure.
3.
Find the linear regression equation for log yˆ in terms of x. Remember that the answer your calculator gives you is of
the form log yˆ = ax + b. Check the correlation coefficient and the residual plot to verify that the equation is a fairly
good fit for the the data.
4.
 (put each side in the exponent) of this equation to solve for yˆ . Use the
Take the antilogarithm of both sides

properties of exponents from algebra to simplify the right side of the equation.
Page 2 of 9

Note: We can use any type of logarithm in a log transformation. The most common types are log (base 10) and ln (base e).
Example 2
Consider the data set shown below:
x
y
1
6
2
18
3
54
4
162
5
486
a.
Construct a scatterplot of this data and describe its shape.
b.
Construct a scatterplot of log y vs. x. Describe the shape.
c.
Find the linear regression equation of log yˆ in terms of x. Don't forget that your result should be written in the form
log yˆ = ax + b. Find the correlation coefficient and check the residual plot to verify that the equation is a "good fit."


d.
Use the equation you found in (c) to find the value of yˆ when x = 6.
e.
Find the value of x when y = 781.
f.
Solve the equation you found in (c) for yˆ .


Page 3 of 9
We continue with a problem from your text that illustrates a very important concept in computer science.
Example 3 (modified from Yates et. al.): Moore's Law
Gordon Moore, one of the founders of Intel Corporation, predicted in 1965 that the number of transistors on an integrated
circuit chip would double every 18 months. This is "Moore's Law," one way to measure the revolution in computer. Here are
the data on the dates and number of transistors for Intel mircoprocessors:
Processor
Date
4004
8008
8080
8086
286
386
486 DX
Pentium
Pentium II
Pentium III
Pentium 4
1971
1972
1974
1978
1982
1985
1989
1993
1997
1999
2000
Number of
Transistors
2,250
2,500
5,000
29,000
120,000
275,000
1,180,000
3,100,000
7,500,000
24,000,000
42,000,000
a.
Examine this data graphically and sketch your scatterplot. Does the pattern appear to be closer to linear growth or
exponential growth?
b.
Now calculate the logarithms of the numbers of transistors and plot a scatterplot of time vs. number of transistors.
Calculate the LSRL and add it to your graph. Note the correlation coefficient and use to assess the fit.
c.
During which years was growth slower than the overall trend? Faster?
d.
Solve the equation for yˆ to express Moore's Law.

e.
How many transistors would our form of Moore's Law predict would be on an Intel processor in 2006?
Page 4 of 9
f.
How did we do? The book only gives so much information, so let's see what has happened since. Consider the
following graph, taken from Intel's web site (http://www.intel.com/technology/mooreslaw/index.htm) on 9/24/2006:
How good was our prediction for 2006?
Example 4 (IPS): Vehicles in the U.S.
The number of motor vehicles (cars, trucks, and buses) registered in the United States has grown as follows (vehicle counts in
millions):
Year
# Vehicles
1940
32.4
1945
31
1950
49.2
1955
62.7
1960
73.9
1965
90.4
1970
108.8
1975
132.9
1980
155.8
1985
171.7
1990
188.8
1995
203.1
a.
Plot the number of vehicles against time. Also plot the logarithm of the number of vehicles against time.
b.
Using the data from 1950 to 1980, find the equation of the LSRL of the logarithm of number of vehicles against
time. Solve for yˆ .

c.
Compare what your model tells you about 1990 to what really happened. Discuss.
Page 5 of 9
II. Power Regression
We now look at a slight twist on what we've been doing.
Definition: A power function is a function of the form y = axb.
Obviously, if b  1, then this function will not be linear. But if we take logs of both side of the equations, we get:
y = axb
log y = log(axb)
log y = log a + log xb
log y = log a + b log x
original exponential function
by taking logs of both sides
by laws of logs: log of a product = sum of logs
by laws of logs: log of power = exponent multiplied by log of base
The result is a linear equation expression log y in terms of log x. So we can find the power function that best fits a set of data
by first using linear regression to find an equation which expresses log y in terms of log x. We can then use algebra and the
laws of logarithms to find an equation which expresses y in terms of x.
Example 1 (Yates et. al.)
Imagine that you have been put in charge of organizing a fishing tournament in which prizes will be given the heaviest fish
caught. You know that many of the fish caught during the tournament will be measured and released. You are also aware that
trying to weigh a fish that is flipping around in a boat using delicate scales will probably not yield very reliable results. It
would be much easier to measure the length of the fish on the boat. What you need is a way to convert the length of the fish
to its weight. You contact the nearby marine research laboratory and it provides the average length and weight catch data for
the Atlantic ocean rockfish Sebastes mentella. The lab also advises you that the model relationship between body length and
weight has been found to be accurate for most fish species growing under normal feeding conditions.
Here is the data:
Age (years)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Length (cm)
5.2
8.5
11.5
14.3
16.8
19.2
21.3
23.3
25.0
26.7
28.2
29.6
30.8
32.0
33.0
34.0
34.9
36.4
37.1
37.7
Weight (g)
2
8
21
38
69
117
148
190
264
293
318
371
455
504
518
537
651
719
726
810
a.
Does the data appear to have an exponential relationship?
b.
If we let x = the length of the fish in cm. and y = the weight of the fish in g., would it make sense if the point (0,0)
were in our data set? (This is often one way people verify that their data can be approximated using a power
function.)
Page 6 of 9
c.
Make a scatterplot of y vs. x. Comment on the shape of the plot.
d.
Make a scatterplot of log y vs. log x. Comment on the shape of the plot.
e.
Find the LSRL of log y vs. log x. Remember that your equation will be of the form "log yˆ = a log x + b." Comment
on how good of a fit this equation is the data set by examining the correlation coefficient and the residual plot.

f.
Suppose your catch measured to 36 cm. What would your equation predict its weight to be?
g.
Solve the equation from (e) for yˆ .

Page 7 of 9
Example 2 (from chapter review in Yates, et. al.): Intensity of Light Bulbs
In a physics lab, the intensity of a 100-watt bulb was measured by a sensing device at various distances from the light source.
The following data were collected. Note that I is the symbol used for intensity in physics and a candela (cd) is an
international unit of luminous intensity.
Distance (m)
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
Intensity (cd)
.2965
.2522
.2055
.1746
.1534
.1352
.1145
.1024
.0923
.0832
.0734
a.
Plot the data before and after various transformations. Based on the pattern of points, propose a model for the data.
Then use a transformation followed by a linear regression and then an inverse transformation to construct a model.
b.
Report the equation and plot the original data with the model on the same axes.
c.
Describe the relationship between the intensity and the distance from the light source.
Homework: #4.6, 4.10, 4.11, 4.13-4.16, 4.25
Page 8 of 9
III. Some Vocabulary to Know
We consider a few important terms mentioned in your book in this section and which you will likely encounter during your
college career (especially during your study of calculus). We first consider monotonicity.
Definition: Monotonic
A monotonic function f(t) moves in one direction as its argument t increases.
There are two subclasses of monotonic functions:

For a monotonic increasing function, if a > b, then ___________________________________.

For a monotonic decreasing function, if a > b, then ___________________________________.
Finally, we consider concavity. Let's illustrate these concepts:
Concave Up
Concave Down
A function can be concave up in some intervals and concave down in others. The point at which concavity changes is known
as an inflection point. My way of remembering these is to think about whether the function "holds water": if it does, it's
concave up; if not, it's concave down.
IV. Questions to Ask About this Section (for you to consider while studying)
1.
Will a scatterplot of my data set always be enough to tell me whether my function should be exponential or power?
2.
Which function (power or exponential) yields a linear relationship between log y and x and which yields a linear
relationship between log y and log x?
3.
Could I write down a set of steps for each procedure so that I know what I am doing and can see the differences
between the two procedures?
Summary of Transformation Procedures
Data models an exponential function
Data models a power function
Page 9 of 9
Download