Chapter 10: Re-expressing Data (Get it Straight)

advertisement
Chapter 10: Re-expressing
Data
(Get it Straight)
Jami Copeland
Shriya Varma
Semester Project
Straightening Relationships
• In order to compare two variables using a
linear regression model, the relationship
between them must be linear
• To re-express data you can use square roots,
reciprocals or logarithms
Goals of Re-expression
• Make the distribution of a variable more symmetric
• Make the spread of several groups (as seen in side-byside boxplots) more alike
• Make the form of a scatterplot more nearly linear
• Make the scatter in a scatterplot spread out evenly
rather than following a fan shape.
The Ladder Of Powers
• The Ladder of Powers helps to decide how to
re-express data
• It offers a range of options and when each
option should be used
• It also orders the effects of the re-
expressions, from weakest to strongest
The Ladder of Powers
Power
Name
Comment
2
Square of data values
Try with unimodal distributions that
are skewed to the left.
1
Raw data
Data with positive and negative
values and no bounds are less likely
to benefit from re-expression.
1/2
Square root of data values
Counts often benefit from a square
root re-expression.
“0”
We’ll use logarithms here
Measurements that cannot be
negative often benefit from a log reexpression.
-1/2
Reciprocal square root
An uncommon re-expression, but
sometimes useful.
-1
The reciprocal of the data
Ratios of two quantities (e.g., mph)
often benefit from a reciprocal.
Plan B: Attack of the Logarithms
• If a stronger re-expression is needed, you can
use Logarithms
• When none of the data is zero or negative,
logarithms can be used in different
combinations
Plan B: Attack of the Logarithims
Model Name
X-axis
Y-axis
Comment
Exponential
x
log(y)
This model is the “0’ power in the
ladder approach, useful for values
that grow by percentage increases.
Logarithmic
Log(x)
y
A wide range of x-values, or a
scatterplot descending rapidly at
the left but leveling off toward the
right, may benefit from trying this
model.
Power
Log(x)
Log(y)
The Goldilocks model: When one of
the ladder’s powers is too big and
the next is too small, this one may
be just right.
Problem 13: Planet distances and order
• Let’s look again at the pattern in the locations of the
planets in our solar system seen in the given table.
– Use re-expressed data to create a model for the distance from
the sun based on the planet’s position
– There is some debate among astronomers as to whether Pluto
is truly a planet or actually a large member of the Kuiper Belt of
comets and other icy bodies. Does your model suggest that
Pluto may not belong in the planet group? Explain.
Problem 13
Planet
Position Number
Distance from Sun
Mercury
1
36
Venus
2
67
Earth
3
93
Mars
4
142
Jupiter
5
484
Saturn
6
887
Uranus
7
1784
Neptune
8
2796
Pluto
9
3666
Problem 13a
• Re-express the “Distance from the sun” data
using logarithms and your calculator to find
the new distances, log(y).
Problem 13a
Planet
Position
Number
Distance
from Sun
Log(y)
Mercury
1
36
1.5563
Venus
2
67
1.8261
Earth
3
93
1.9685
Mars
4
142
2.1523
Jupiter
5
484
2.6848
Saturn
6
887
2.9479
Uranus
7
1784
3.2514
Neptune
8
2796
3.4465
Pluto
9
3666
3.5642
Problem 13b
• This problem is just asking whether Pluto should
be counted as a planet.
• The linear model predicts that Pluto will be 5741
million miles away, while the data shows it is
only 3666 million miles away.
• This means it doesn’t fit very well and supports
the claim that Pluto doesn’t behave like a planet.
Problem 15: Quaoar Planet
• Caltech astronomers discovered a new large
body orbiting around the Sun, a billion miles
beyond Jupiter named Quaoar. Quaoar orbits at
a distance of about 4 billion miles. It is classified
as a member of the Kuiper Belt instead of a
planet. There are many reasons for suspecting
that Pluto is unlike other planets.
Problem 15: Quaoar Planet
• Omit Pluto from your count of planets, and consider
Quaoar as a candidate for the new planet.
– Based on its position, how does Quaoar’s distance from
the sun compare with the prediction made by your
model?
– Refit the model using Quaoar’s distance and position in
the model instead of Pluto’s. Now how well does your
model predict the re-expressed distance and position?
Problem 15a
• To solve this, you find the predicted distance
from the re-expressed linear regression
model. The predicted distance is 3.635. Pluto’s
distance is 3.564. Quaoar’s is 3.602. Quaoar is
therefore a better fit on the model.
Problem 15b
• To refit the model, you replace Pluto’s data in
the original chart with Quaoar's data. This
makes the R^2 value go up to 99.5% which
means it is a better fit.
Download