Chapter 10
• In this chapter, we have leaned how to reexpress the data and why it is needed.
• To make the distribution of a variable more symmetric.
• To make the spread of several groups more alike.
• To make the form of a scatterplot more nearly linear.
• To make the scatter in a scatterplot spread out evenly rather than following a fan shape.
• Easier way to find out what power to use.
• 2, 1, 1/2, [0(log)] -1/2, -1
• When ladder of powers does not work because the curvature is more stubborn, and we are not satisfied with the residuals plots, logarithms can be helpful. (only if none of the data values is zero or negative)
• Take log on both sides first, then re-express the data using some combinations.
• When a clearly curved pattern shows up, we can use “curves of best fit” instead of trying to find a way to straighten the plot.
• Calculations are more difficult, though.
• Do not expect your model to be perfect.
“All models are wrong, but some are useful”
• Do not choose a model based on R ^ 2.
Plot residuals.
• Watch out for scatterplots that turn around.
Avoid it.
• Watch out for negative values.
It is impossible to re-express negative data values by any power that is not a whole number on the ladder of powers.
Add small constant, if possible.
• Do not stray too far from the ladder.
It artificially inflate R^2.
• Re-express data: By using log, root, the reciprocal, and more, we change the data more symmetric, similar, and straight.
• Ladder of Powers: Collection of re-expressions that tells us what power we should start.
The table below shows stopping distance in feet for a car tested 3 times at each of 5 speeds. We hope to create a model that predicts stopping distance from the speed of the car.
SPEED (mpg)
20
30
40
50
60
STOPPING DISTANCE (ft)
64, 62, 59
114, 118, 105
153, 171, 165
231, 203, 238
317, 321, 276
a) Explain why a linear model is not appropriate.
b) Re-express the data to straighten the scatterplot.
c) Create an appropriate model.
d) Estimate the stopping distance for a car traveling 55mph.
e) Estimate the stopping distance for a car traveling 70 mph.
f) How much confidence do you place in these predictions? Why?
Ball players have been signing ever larger contracts. The highest salaries (in millions of dollars per season) for some notable players are given in the following table.
KB
CD
AR
RC
KG
AB
PM
MP
MV
Player
NR
GF
KP
JC
1991
1996
1997
1998
1999
1999
1999
2001
2001
Year
1980
1982
1990
1990
5.3
8.5
11
12.5
12.5
13.3
15
17
25.2
Salary
1
2.04
3
4.7
a) Re-express the data to straighten the scatterplot.
b) Create an appropriate model for the tend in salary.
c) Predict a superstar salary for 2005.