Sec. 4.1 Part 1 Blank Notes

advertisement
Section 4.1 – Transforming Relationships

Linear regression using the LSRL is not the only model for describing data.


Some data are not best described linearly.

In some cases the removal of _________________from data may cause a _____________________
so that linear no longer does a satisfactory job of describing the data.

In some cases: the bulk of the data may not be linear at all.
Non-linear relationships between two quantitative variables can sometimes be changed into linear relationships
by: transforming one or both variables.

Transforming can be thought of as: re-expressing the data.

We may want to transform either the_________________________, or the _________________________ in a
scatter plot, or maybe even both.

We will call the transformed variable ______ when talking about the transforming in general.

Many variables take only 0 or positive values, so: we are particularly interested in how functions behave for positive
values of t.

The following models are common functions in which you should be familiar with their shape and equation. These
are models in which t > 0.
The following scatterplot represents brain weight against body weight for 96 species of mammals.

The scatterplot is not very satisfactory since most mammals are so small relative to elephants and hippos

The lower left corner of the plot shows that most of the species overlap forming a “blob”

The correlation with all 96 species is________, but removing the elephant, ________

To get a closer look at the observations that are in the lower-left corner, the 4 outliers were removed.

This scatterplot represents the 92 observations with the 4 outliers removed.

Instead of a linear relationship, you can see that as body weight increases, the graph bends to the right
which is representative of a logarithmic function.

The following plot includes the original 96 observations, but instead of plotting the y-value against the xvalue, the logarithm of the brain weights (y-value) were plotted against the logarithm of the body weights
(x-value).

There are no longer any extreme outliers or very influential observations and the pattern is very linear with r
= .96
The ladder of power functions is in the form:
Concavity of Power Functions:

Power transformations for 𝑡 𝑝 for powers p greater than 1 are: concave up
o They h
ave the shape
o The transformations _____________________________
of a distribution and ____________________________
o

This effect gets stronger as the pow
er p moves up away from 1
Power transformations for 𝑡 𝑝 for powers p less than 1 are: concave down
o
They have the shape
o The transformations ________________________
of a distribution and ________________________
o
This effect gets stronger as the power p moves down away from 1
Exponential Growth:

Exponential growth occurs when a variable is multiplied by a fixed number in each time period.

Ex. – consider a population of bacteria in which each bacterium splits into two each hour. Beginning
with 1, we have ___ after one hour, ___ after two hours, ___after three hours, ___ after four hours,
____________and so on. After one day of doubling there are _____________________ bacteria in the
population.

Exponential growth _________________________________________________

linear growth _________________________________________________

If a variable grows exponentially, its logarithm grows linearly.
Steps in Transforming Data:
Review Properties of Logarithms:
Example 1 – Growth of Cell Phone Use
The cell phone industry enjoyed substantial growth in the 1990’s. One way to measure cell phone growth is to look at
the number of subscribers. Find a linear model to predict the number of subscribers in the year 2000.
Year
1990
1993
1994
1995
1996
1997
1998
1999
Subscribers
(thousands)
5283
16,009
24,134
33,786
44,043
55,312
69,209
86,047


While the curve may appear to be exponential growth, we can’t simply depend on what our eyes see.
If you suspect exponential growth, first calculate the ratios of consecutive terms to see if they are the same fixed
percentage of the previous total.
o To avoid overflow in the calculator it is good practice to code the years (let 1990 = 1)
 Don’t use 0 since you can’t take the log of 0
Year
Subscribers
Ratios
Log(y)

Now that you have verified that the ratios are similar, the next step is to apply a mathematical transformation
that changes exponential growth into linear growth.

We had hypothesized that an exponential model of the form 𝑦 = 𝑎𝑏 𝑥 represented the cell phone growth,
therefore we need to use properties of logarithms to transform:

Since log 𝑦 = log 𝑎 + (log 𝑏)𝑥 looks _____________________ we can plot ______________________and if
________________________ we would have better reason to believe that the cell phone growth is exponential.

The plot appears slightly_____________________, but certainly more linear than the original scatterplot.

Applying the least squares regression we get:

log 𝑦 = 3.66 + .134𝑥

𝑟 2 = .982.

This means that 98.2% of the variation in is explained by the least squares regression of log 𝑦 on
x.

Although the model appears to be useful for prediction purposes because the 𝑟 2 is so high, you should always
check the residual plot.

The purpose of finding a linear model is to be able to predict the number of subscribers in 2000. One approach
would be to discard the first 4 data points since they are the oldest and furthest removed from the year 2000.

By removing the first 4 points, the 𝑟 2 improves __________________ which is even better than the first.

The LSRL is represented by:
log 𝑁𝑒𝑤𝑌 = 3.966 + .097(𝑁𝑒𝑤𝑋)

Now that we have the linear model, we can use it to predict the number of subscribers in the year 2000 by:
substituting an 11 in for “NewX” and then “undoing” the logarithm
Download