Chapter 10: Re-expressing Data (Get it Straight) Jami Copeland Shriya Varma Semester Project Straightening Relationships • In order to compare two variables using a linear regression model, the relationship between them must be linear • To re-express data you can use square roots, reciprocals or logarithms Goals of Re-expression • Make the distribution of a variable more symmetric • Make the spread of several groups (as seen in side-byside boxplots) more alike • Make the form of a scatterplot more nearly linear • Make the scatter in a scatterplot spread out evenly rather than following a fan shape. The Ladder Of Powers • The Ladder of Powers helps to decide how to re-express data • It offers a range of options and when each option should be used • It also orders the effects of the re- expressions, from weakest to strongest The Ladder of Powers Power Name Comment 2 Square of data values Try with unimodal distributions that are skewed to the left. 1 Raw data Data with positive and negative values and no bounds are less likely to benefit from re-expression. 1/2 Square root of data values Counts often benefit from a square root re-expression. “0” We’ll use logarithms here Measurements that cannot be negative often benefit from a log reexpression. -1/2 Reciprocal square root An uncommon re-expression, but sometimes useful. -1 The reciprocal of the data Ratios of two quantities (e.g., mph) often benefit from a reciprocal. Plan B: Attack of the Logarithms • If a stronger re-expression is needed, you can use Logarithms • When none of the data is zero or negative, logarithms can be used in different combinations Plan B: Attack of the Logarithims Model Name X-axis Y-axis Comment Exponential x log(y) This model is the “0’ power in the ladder approach, useful for values that grow by percentage increases. Logarithmic Log(x) y A wide range of x-values, or a scatterplot descending rapidly at the left but leveling off toward the right, may benefit from trying this model. Power Log(x) Log(y) The Goldilocks model: When one of the ladder’s powers is too big and the next is too small, this one may be just right. Problem 13: Planet distances and order • Let’s look again at the pattern in the locations of the planets in our solar system seen in the given table. – Use re-expressed data to create a model for the distance from the sun based on the planet’s position – There is some debate among astronomers as to whether Pluto is truly a planet or actually a large member of the Kuiper Belt of comets and other icy bodies. Does your model suggest that Pluto may not belong in the planet group? Explain. Problem 13 Planet Position Number Distance from Sun Mercury 1 36 Venus 2 67 Earth 3 93 Mars 4 142 Jupiter 5 484 Saturn 6 887 Uranus 7 1784 Neptune 8 2796 Pluto 9 3666 Problem 13a • Re-express the “Distance from the sun” data using logarithms and your calculator to find the new distances, log(y). Problem 13a Planet Position Number Distance from Sun Log(y) Mercury 1 36 1.5563 Venus 2 67 1.8261 Earth 3 93 1.9685 Mars 4 142 2.1523 Jupiter 5 484 2.6848 Saturn 6 887 2.9479 Uranus 7 1784 3.2514 Neptune 8 2796 3.4465 Pluto 9 3666 3.5642 Problem 13b • This problem is just asking whether Pluto should be counted as a planet. • The linear model predicts that Pluto will be 5741 million miles away, while the data shows it is only 3666 million miles away. • This means it doesn’t fit very well and supports the claim that Pluto doesn’t behave like a planet. Problem 15: Quaoar Planet • Caltech astronomers discovered a new large body orbiting around the Sun, a billion miles beyond Jupiter named Quaoar. Quaoar orbits at a distance of about 4 billion miles. It is classified as a member of the Kuiper Belt instead of a planet. There are many reasons for suspecting that Pluto is unlike other planets. Problem 15: Quaoar Planet • Omit Pluto from your count of planets, and consider Quaoar as a candidate for the new planet. – Based on its position, how does Quaoar’s distance from the sun compare with the prediction made by your model? – Refit the model using Quaoar’s distance and position in the model instead of Pluto’s. Now how well does your model predict the re-expressed distance and position? Problem 15a • To solve this, you find the predicted distance from the re-expressed linear regression model. The predicted distance is 3.635. Pluto’s distance is 3.564. Quaoar’s is 3.602. Quaoar is therefore a better fit on the model. Problem 15b • To refit the model, you replace Pluto’s data in the original chart with Quaoar's data. This makes the R^2 value go up to 99.5% which means it is a better fit.