""Each of us is a statistical impossibility around which hover a million other lives that were never destined to be born." -- Loren Eiseley Chapter 10: Re-expressing Data: Get it Straight! (pages 220 – 243) OVERVIEW: What happens if our scatterplot is not linear? We make it linear. If a scatter plot shows a curved pattern, it can perhaps be conveniently modeled by an exponential growth or decay function of the form or a power function of the form In these situations, we can linearize the data by making use of logarithms. Among the advantages of using logarithms is the fact that use of logarithms produces smaller numbers, making graphical displays more convenient to construct. Definition: log b x y if and only if b y x . [x > 0, b > 0 and b is not equal to 1] Rules for logarithms 1. log(AB) = 2. log(A/B) = 3. log A p Note that y = abx (exponential function) ==> ==> This is a linear relationship between the variables x and logy since loga and logb are constants. Also, y = axb (power function) ==> ==> This is a linear relationship between the variables logx and logy. Let’s take a closer look by linearizing some data using both models. Ex. The table shows the temperature of an instrument measured as its distance from a heat source is varied. Distance (cm) 1 2 3 4 5 6 7 8 Temperature (Fo) 130 105 95 87 83 80 78 77 1. Plot scatterplot 2. Run Linear Regression ŷ Looks like a curve r 3. Residual Plot Definite pattern r2 Let’s try the exponential model (x , logy) 1. Linearize data by taking log of y( L2 )and loading into L3 . 2. Scatterplot ( L1 , L3 ) Still a curve 3. Run Linear Regression*( L1 , L3 ) 4. Residual Plot ( L1 , RESID) ŷ r Still a definite pattern r 2 TIME OUT: Normally, we would not proceed b/c the residual plot says this is not a good model. We will continue with the process for the sake of keeping the notes organized. So far we have transformed exponential pattern to a linear pattern and then we ran regression on the transformed data. Now if there is no pattern in the residual plot we must perform the inverse transformation on the linear equation to find a curve of best fit on the original data and solve for ŷ . 5. Transform back in order to plot curve on original data log 10 yˆ 2.09 .03x now b/c log b x y iff b y x so b logb x x is what our transformed equation ends up looking like 6.TI-83 turn off Y1 / load Y2 with transformed equation 7. Stat Plot on original data ( L1 , L2 ) Let’s try the power model (logx, logy): 1. Linearize data by taking log of x( L1 ) and loading into L3 and log of y( L2 )and loading into L4 . 2. Scatterplot ( L3 , L4 ) 3. Run Linear Regression*( L3 , L4 ) ŷ r Much more linear 4. Residual Plot ( L1 , Resid) Less pattern than others r 2 5. Transform back in order to plot curve on original data We have log y a b log x 10 log y 10 a log x b y (10a )(10log x ) y 10 a ( x b ) b is what our transformed equation looks like. 6.TI-83 turn off Y1 / load Y2 with transformed equation 7. Stat Plot on original data ( L1 , L2 ) *We have used the common log which has base 10 in our example, but you may also use natural log (ln) Shortcut to Identify Exponential Growth yn y n 1 A variable grows exponentially if it is multiplied by a fixed number greater than 1 in each equal time period. 1. Test to see if you have common ratios among data. 2. Transform points into linear pattern (x, logy) and look at ratios of L3 which has logy in it. Approximately constant ratios is good evidence that the scatterplot of logy on x is linear. Don't forget residuals. These are the most useful in determining the best model to fit to a data set. Stick with me. I know it may look confusing but if you just practice the key strokes in the calculator with a few examples, you will have it in no time. Complete a regression analysis for the following age and income data as indicated. Age (years) Income ($1,000) 20 25 30 35 40 45 50 55 60 18.5 23.6 29.8 38.5 49.0 64.1 78.5 102.0 130.8 1. Construct and label a scatterplot of the data. 2. Perform a linear regression on the data & plot the regression line on the scatterplot. 3. Discuss the goodness of fit of the linear regression referencing the correlation coefficient & residual plot. 4. Perform the exponential and power transformations and show linear regression on both sets of data. Exponential Power ŷ ŷ r r2 r Residual Plot r2 Residual Plot 5. Plot each one of the models on the original scatterplot. Exponential Power 6. Which of the three models fits the data best and leads you to use in order to make predictions? 7. What income would you predict for a 47 year old person based on this data? Type of Model Linear Exponential Power Transform and Run Regression On Equation of Curve to Plot on Original Data