Chp 10

advertisement

Statistics Review

Chapter 10

Important Ideas

• In this chapter, we have leaned how to reexpress the data and why it is needed.

Goals

• To make the distribution of a variable more symmetric.

• To make the spread of several groups more alike.

• To make the form of a scatterplot more nearly linear.

• To make the scatter in a scatterplot spread out evenly rather than following a fan shape.

The ladder of powers

• Easier way to find out what power to use.

• 2, 1, 1/2, [0(log)] -1/2, -1

Logarithms

• When ladder of powers does not work because the curvature is more stubborn, and we are not satisfied with the residuals plots, logarithms can be helpful. (only if none of the data values is zero or negative)

• Take log on both sides first, then re-express the data using some combinations.

Use curve instead of linear model

• When a clearly curved pattern shows up, we can use “curves of best fit” instead of trying to find a way to straighten the plot.

• Calculations are more difficult, though.

What can go wrong?

• Do not expect your model to be perfect.

“All models are wrong, but some are useful”

• Do not choose a model based on R ^ 2.

Plot residuals.

• Watch out for scatterplots that turn around.

Avoid it.

• Watch out for negative values.

It is impossible to re-express negative data values by any power that is not a whole number on the ladder of powers.

Add small constant, if possible.

• Do not stray too far from the ladder.

It artificially inflate R^2.

Terms

• Re-express data: By using log, root, the reciprocal, and more, we change the data more symmetric, similar, and straight.

• Ladder of Powers: Collection of re-expressions that tells us what power we should start.

Question #9

The table below shows stopping distance in feet for a car tested 3 times at each of 5 speeds. We hope to create a model that predicts stopping distance from the speed of the car.

SPEED (mpg)

20

30

40

50

60

STOPPING DISTANCE (ft)

64, 62, 59

114, 118, 105

153, 171, 165

231, 203, 238

317, 321, 276

a) Explain why a linear model is not appropriate.

b) Re-express the data to straighten the scatterplot.

c) Create an appropriate model.

d) Estimate the stopping distance for a car traveling 55mph.

e) Estimate the stopping distance for a car traveling 70 mph.

f) How much confidence do you place in these predictions? Why?

Question #11

Ball players have been signing ever larger contracts. The highest salaries (in millions of dollars per season) for some notable players are given in the following table.

KB

CD

AR

RC

KG

AB

PM

MP

MV

Player

NR

GF

KP

JC

1991

1996

1997

1998

1999

1999

1999

2001

2001

Year

1980

1982

1990

1990

5.3

8.5

11

12.5

12.5

13.3

15

17

25.2

Salary

1

2.04

3

4.7

a) Re-express the data to straighten the scatterplot.

b) Create an appropriate model for the tend in salary.

c) Predict a superstar salary for 2005.

Download