Modeling Nonlinear Data: Logarithmic and Power Transformations AP Statistics – Section 4.1 I. Logarithmic Transformations and Exponential Functions In this section, we will be looking at modeling data that is nonlinear. We begin with some background material as a review. Definition An exponential function is a function of the form y = abx, where a and b are constants and b 1. As you text points out, "A variable grows exponentially if it is multiplied by a fixed number greater than 1 in each equal time period. Exponential decay occurs when the factor is less than 1." This is what we call the add-multiply property of exponential functions, which we will illustrate below. Before we do this, notice what happens when we take a logarithm of both sides of the equation y = abx. Here we'll use a common logarithm (base 10), although a natural logarithm (base e) or any other base logarithm would be fine. y = abx log y = log(abx) log y = log a + log bx log y = log a + x log b original exponential function by taking logs of both sides by laws of logs: log of a product = sum of logs by laws of logs: log of power = exponent multiplied by log of base Since log a and log b are both constant, the result highlighted in bold above is a linear function expressing log y in terms of x. The next example will make use of this important fact. But first, we've seen two of the three laws of logs in the derivation above. What is the third? Example 1 (IPS): Exact Exponential Growth and Grains of Rice on a Chess Board A clever courtier, offered a reward by an ancient king of Persia, asked for a grain of rice on the first square of a chess board, 2 grains on the second square, then 4, 8, 16, and so on. a. Make a table of the numbers of grains on each of the first 10 squares of the chess board. b. Plot the number of grains on each square against the number of the square for the first 10 squares and connect the points with a smooth curve. This is an exponential curve. c. How many grains of rice should the king deliver for the 64th (and final) square? d. Take the logarithm of each of your numbers of grains from (a). Plot these numbers against the numbers from 0 to 9 or 1 to 10. You should get a straight line. Page 1 of 9 e. Let x = the number of the square on the chess board and y = the number of grains of rice on that square. Calculate the regression equation which expresses log yˆ in terms of x. Remember that the equation should begin with log yˆ . f. Calculate the correlation coefficient and check the residual plot for this regression. How good a fit is this equation to the data (log y vs. x)? g. We would now like to find the actual equation that shows the relationship between the original variables x and y. Do this by solving the equation you found in (e) for yˆ . We have just performed what is a called a logarithmic transformation to our original set of data. This was done in order to give us a data set whose scatterplot was approximately linear in shape. That, in turn, allowed us to use our techniques for linear regression to find an equation that would allow us to predict the value of y for a given value of x. Note: There is a technique called exponential regression that could have given us the result from part (g) in less time and with less effort. As a matter of fact, it can be done with the TI graphing calculators. We're not covering it at this time because the only regression topic on the AP exam is linear regression. We shall summarize now. Logarithmic Transformation If the ordered pairs (x, y) in a data set display a graph with an approximately exponential shape, then the graph of the ordered pairs (x, log y) will display a graph with an approximately linear shape. The equation of this line can be approximated using linear regression and the resulting equation can be solved for yˆ using algebra. Steps Used in a Logarithmic Transformation 1. Graph the original data set. If the shape is approximately exponential, proceed to step #2. 2. Plot the ordered pairs (x, log y). The shape should be approximately linear if we want to use the linear regression procedure. 3. Find the linear regression equation for log yˆ in terms of x. Remember that the answer your calculator gives you is of the form log yˆ = ax + b. Check the correlation coefficient and the residual plot to verify that the equation is a fairly good fit for the the data. 4. (put each side in the exponent) of this equation to solve for yˆ . Use the Take the antilogarithm of both sides properties of exponents from algebra to simplify the right side of the equation. Page 2 of 9 Note: We can use any type of logarithm in a log transformation. The most common types are log (base 10) and ln (base e). Example 2 Consider the data set shown below: x y 1 6 2 18 3 54 4 162 5 486 a. Construct a scatterplot of this data and describe its shape. b. Construct a scatterplot of log y vs. x. Describe the shape. c. Find the linear regression equation of log yˆ in terms of x. Don't forget that your result should be written in the form log yˆ = ax + b. Find the correlation coefficient and check the residual plot to verify that the equation is a "good fit." d. Use the equation you found in (c) to find the value of yˆ when x = 6. e. Find the value of x when y = 781. f. Solve the equation you found in (c) for yˆ . Page 3 of 9 We continue with a problem from your text that illustrates a very important concept in computer science. Example 3 (modified from Yates et. al.): Moore's Law Gordon Moore, one of the founders of Intel Corporation, predicted in 1965 that the number of transistors on an integrated circuit chip would double every 18 months. This is "Moore's Law," one way to measure the revolution in computer. Here are the data on the dates and number of transistors for Intel mircoprocessors: Processor Date 4004 8008 8080 8086 286 386 486 DX Pentium Pentium II Pentium III Pentium 4 1971 1972 1974 1978 1982 1985 1989 1993 1997 1999 2000 Number of Transistors 2,250 2,500 5,000 29,000 120,000 275,000 1,180,000 3,100,000 7,500,000 24,000,000 42,000,000 a. Examine this data graphically and sketch your scatterplot. Does the pattern appear to be closer to linear growth or exponential growth? b. Now calculate the logarithms of the numbers of transistors and plot a scatterplot of time vs. number of transistors. Calculate the LSRL and add it to your graph. Note the correlation coefficient and use to assess the fit. c. During which years was growth slower than the overall trend? Faster? d. Solve the equation for yˆ to express Moore's Law. e. How many transistors would our form of Moore's Law predict would be on an Intel processor in 2006? Page 4 of 9 f. How did we do? The book only gives so much information, so let's see what has happened since. Consider the following graph, taken from Intel's web site (http://www.intel.com/technology/mooreslaw/index.htm) on 9/24/2006: How good was our prediction for 2006? Example 4 (IPS): Vehicles in the U.S. The number of motor vehicles (cars, trucks, and buses) registered in the United States has grown as follows (vehicle counts in millions): Year # Vehicles 1940 32.4 1945 31 1950 49.2 1955 62.7 1960 73.9 1965 90.4 1970 108.8 1975 132.9 1980 155.8 1985 171.7 1990 188.8 1995 203.1 a. Plot the number of vehicles against time. Also plot the logarithm of the number of vehicles against time. b. Using the data from 1950 to 1980, find the equation of the LSRL of the logarithm of number of vehicles against time. Solve for yˆ . c. Compare what your model tells you about 1990 to what really happened. Discuss. Page 5 of 9 II. Power Regression We now look at a slight twist on what we've been doing. Definition: A power function is a function of the form y = axb. Obviously, if b 1, then this function will not be linear. But if we take logs of both side of the equations, we get: y = axb log y = log(axb) log y = log a + log xb log y = log a + b log x original exponential function by taking logs of both sides by laws of logs: log of a product = sum of logs by laws of logs: log of power = exponent multiplied by log of base The result is a linear equation expression log y in terms of log x. So we can find the power function that best fits a set of data by first using linear regression to find an equation which expresses log y in terms of log x. We can then use algebra and the laws of logarithms to find an equation which expresses y in terms of x. Example 1 (Yates et. al.) Imagine that you have been put in charge of organizing a fishing tournament in which prizes will be given the heaviest fish caught. You know that many of the fish caught during the tournament will be measured and released. You are also aware that trying to weigh a fish that is flipping around in a boat using delicate scales will probably not yield very reliable results. It would be much easier to measure the length of the fish on the boat. What you need is a way to convert the length of the fish to its weight. You contact the nearby marine research laboratory and it provides the average length and weight catch data for the Atlantic ocean rockfish Sebastes mentella. The lab also advises you that the model relationship between body length and weight has been found to be accurate for most fish species growing under normal feeding conditions. Here is the data: Age (years) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Length (cm) 5.2 8.5 11.5 14.3 16.8 19.2 21.3 23.3 25.0 26.7 28.2 29.6 30.8 32.0 33.0 34.0 34.9 36.4 37.1 37.7 Weight (g) 2 8 21 38 69 117 148 190 264 293 318 371 455 504 518 537 651 719 726 810 a. Does the data appear to have an exponential relationship? b. If we let x = the length of the fish in cm. and y = the weight of the fish in g., would it make sense if the point (0,0) were in our data set? (This is often one way people verify that their data can be approximated using a power function.) Page 6 of 9 c. Make a scatterplot of y vs. x. Comment on the shape of the plot. d. Make a scatterplot of log y vs. log x. Comment on the shape of the plot. e. Find the LSRL of log y vs. log x. Remember that your equation will be of the form "log yˆ = a log x + b." Comment on how good of a fit this equation is the data set by examining the correlation coefficient and the residual plot. f. Suppose your catch measured to 36 cm. What would your equation predict its weight to be? g. Solve the equation from (e) for yˆ . Page 7 of 9 Example 2 (from chapter review in Yates, et. al.): Intensity of Light Bulbs In a physics lab, the intensity of a 100-watt bulb was measured by a sensing device at various distances from the light source. The following data were collected. Note that I is the symbol used for intensity in physics and a candela (cd) is an international unit of luminous intensity. Distance (m) 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 Intensity (cd) .2965 .2522 .2055 .1746 .1534 .1352 .1145 .1024 .0923 .0832 .0734 a. Plot the data before and after various transformations. Based on the pattern of points, propose a model for the data. Then use a transformation followed by a linear regression and then an inverse transformation to construct a model. b. Report the equation and plot the original data with the model on the same axes. c. Describe the relationship between the intensity and the distance from the light source. Homework: #4.6, 4.10, 4.11, 4.13-4.16, 4.25 Page 8 of 9 III. Some Vocabulary to Know We consider a few important terms mentioned in your book in this section and which you will likely encounter during your college career (especially during your study of calculus). We first consider monotonicity. Definition: Monotonic A monotonic function f(t) moves in one direction as its argument t increases. There are two subclasses of monotonic functions: For a monotonic increasing function, if a > b, then ___________________________________. For a monotonic decreasing function, if a > b, then ___________________________________. Finally, we consider concavity. Let's illustrate these concepts: Concave Up Concave Down A function can be concave up in some intervals and concave down in others. The point at which concavity changes is known as an inflection point. My way of remembering these is to think about whether the function "holds water": if it does, it's concave up; if not, it's concave down. IV. Questions to Ask About this Section (for you to consider while studying) 1. Will a scatterplot of my data set always be enough to tell me whether my function should be exponential or power? 2. Which function (power or exponential) yields a linear relationship between log y and x and which yields a linear relationship between log y and log x? 3. Could I write down a set of steps for each procedure so that I know what I am doing and can see the differences between the two procedures? Summary of Transformation Procedures Data models an exponential function Data models a power function Page 9 of 9