http://www

advertisement
http://www.tufts.edu/~gdallal/logs.htm
Logarithms
[When I started to write this note, I thought, "Why reinvent the wheel?" so I
searched the World Wide Web for logarithm. I found some nice web pages--at Oak
Road Systems and SCT BOCES, for example. However, they tend to be variations
of the standard presentations found in most textbooks. If that type of presentation
was sufficient for a general adult audience, then there wouldn't be so many people
who were uncomfortable with logarithms! Here's my attempt to approach the
subject in a different way. Call it Logarithms: Part One. For Part Two, search the World
Wide Web. There are some excellent discussions out there!]
2
Let's talk about transformations. Some transformations are so commonplace it
seems strange to give them a name as formidable as transformations--things like
centimeters to inches, pounds to kilograms, Fahrenheit to Celsius, and currency
conversions. Transformations like these are linear transformations. If you take a set of
data, transform them, and plot the transformed values against the originals, the
points will lie exactly on a straight line.
One characteristic of linear transformations is that they preserve relative spacings.
Values that are evenly spaced before transformation remain evenly spaced after
transformation. Values that are spaced twice as far apart as other values before
transformation
remain
twice
as
far
apart
after
transformation.
There are common transformations that are not linear. For example, a 100-mile
journey can be described by the time it takes (duration) or by the speed of the trip.
Since speed is defined as distance divided by duration (or, speed = distance / time),
a 1 hour trip is a 100 mph trip, a 2 hour trip is a 50 mph trip, and so on. A plot of
speed against gives a curve that is demonstrably nonlinear, but this is a
transformation nonetheless. Each speed corresponds to a particular duration, and
vice-versa. Nonlinear transformations do not preserve relative spacings. For
example, consider the equally spaced durations of 0.5 hours, 1 hour, and 1.5 hours.
When expressed as speeds, they are 200 mph, 100 mph, and 66.7 mph.
3
The logarithm is another nonlinear transformation. Got it? In the spirit of the
late Lenny Bruce, lets repeat it so that the word logarithm loses some of its shock
value.



The logarithm is just a transformation!
The logarithm is just a transformation!
The logarithm is just a transformation!
To keep things simple, we'll stick with the kind called common logarithms and use the
informal name common logs. Common logs have the following fascinating property--if
you multiply something by 10 in the original scale, you add 1 unit to its value the
log scale. If you divide something by 10 in the original scale, you subtract 1 unit
from its value in the log scale. As we move from 0.1 to 1 to 10 on the original scale,
we move from -1 to 0 to 1 on the logarithmic scale,
There are three reasons why logarithms should interest us.

First, many statistical techniques work best with data that are single-peaked
and symmetric (symmetry).
4


Second, when comparing different groups of subjects, many techniques work
best when the variability is roughly the same within each group
(homoscedasticity).
Third, it is easier to describe the relationship between variables when it's
approximately linear (linearity).
When these conditions are not true in the original data, they can often be achieved
by applying a logarithmic transformation.
Symmetry
A logarithmic transformation will reduce positive skewness because it compresses
the upper end (tail) of the distribution while stretching out the lower end. This is
because the distances between 0.1 and 1, 1 and 10, 10 and 100, and 100 and 1000
are the same in the logarithmic scale. This is illustrated by the histogram of folate
levels in a sample of healthy adults. In the original scale, the data are long-tailed to
the right, but after a logarithmic transformation is applied, the distribution is
symmetric. The lines between the two histograms connect original values with their
logarithms to demonstrate the compression of the upper tail and stretching of the
lower tail.
5
Homoscedasticity
Often groups that tend to have larger values also tend to have greater within-group
variability.
A logarithmic transformation will often make the within-group variability more
similar across groups. The figure shows the serum progesterone levels in subjects
randomly assigned to receive estrogen and (separately) progesterone. In the original
scale, variability increases dramatically with the typical response. However, the
within-group variability is nearly constant after a logarithmic transformation is
applied. Also, in the logarithmic scale, the data tell a simpler story, In the log scale,
the effect of progesterone is the same whether or not a subject is taking estrogen.
Also, the effect of estrogen is the same whether or not a subject is taking
progesterone.
Linearity
Logarithmic transformations are sometimes used when constructing statistical
models to describe the relationship between two measurements. Consider
homocysteine. It's bad stuff, a sulphur based amino acid that indicates risk of heart
disease. Lately, it's been hard to escape advertising that tells you to drink your
orange juice because orange juice is a good source of folate, which lowers your
homocysteine.
A plot of homocysteine against folate shows a nonlinear relationship with most of
the data bunched in the lower left hand portion of the display. When logarithmic
6
transformations are applied to both variables, the association appears to be linear.
The fitted equation is
log(homocysteine) = 1.14 - 0.23 log(folate) .
If someone has folate levels of 20, her logged folate levels are log(20) or 1.301. Her
logged homocysteine value will be estimated to be 1.14 - 0.23 * 1.301 or 0.8408
units. If logged homocysteine is 0.8408, homocysteine itself is 100.8408 or 6.93 units.
Some things to notice and/or do

The common log of a number is the power to which 10 is raised in order to
obtain the number. This makes the logs of some numbers easy to calculate.
For example,
1og(1) = 0, because 100 = 1
1og(10) = 1, because 101 = 10
1og(100) = 2, because 102 = 100








Every positive number has a logarithm. You can get the logarithms of
numbers that aren't integer powers of 10 from tables or a calculator. For
example, log(48) = 1.6812 and log(123) = 2.0899. What is log(480)?.
The larger the number, the larger its logarithm. Thus, 123 > 48, so log(123) >
log(48).
Only positive numbers can have logarithms. Why? Hint: Think about powers
of 10.
Can logarithms themselves be negative? Yes. Give an example of a number
whose logarithm is negative. Hints: What number has a logarithm of 0? The
smaller the number, the smaller its logarithm.
Use a calculator to obtain the common log of some number. Use the
calculator to transform back from the logarithm to the original number.
"Transforming back" is known as taking the antilogarithm.
Use a calculator to obtain the common log of some number. Add 1 to the
logarithm. Take the antilog of the result. What do you get? How is it related
to the number you started with?
Use a calculator to obtain the common log of some number. Add 0.3010 to
the logarithm. Take the antilog. What number do you get? How is it related
to the number you started with. Think about it. If your head hurts, try the
next exercise!
Use the calculator to get the antilogarithm of 0.3010. Hmmm . . . The
previous paragraph demonstrates that the sum of two logarithms is equal to
the logarithm of their product. We took the log of a number, added the log
7
of 2, and obtained the log of twice the original number! But this is getting
way too technical.
Ratios
There are two commonly used ways to summarize a difference between two groups.
The first is the algebraic difference--for example, changing to this diet will lower
your blood pressure 20 mm. The second is the relative change--for example, this
diet will lower your cholesterol by 15%. Relative changes are often expressed in
terms of ratios, one treatment's response divided by another.
One problem with ratios is that their lack of symmetry. Consider the ratio of A to B,
that is, A/B. If A produces values greater than B, the ratio can take theoretically take
any value greater than 1. However, if A produces values less than B, the ratio is
restricted to the range of 0 to 1. To put it another way, if we change the way we
define our ratio--switching to B/A--values in the range 1 to infinity move into the
range 0 to 1 while values in the range 0 to 1 get switched into the range 1 to infinity.
Logged ratios solve this problem. Again consider the ratio, A/B. When their effects
are the same, their ratio is 1 and the log of the ratio is 0. Also, log(A/B) = log(B/A), so symmetry is restored. That is, when B is greater A, the log of the ratio
has the same magnitude as when A is the same number of multiples of B except that
the sign is different. You can use your calculator to check this for various choices of
A and B. This is why I rarely analyze ratios but often analyze logged ratios. I might
analyze the ratios directly if they are tightly grouped around 1, say, 0.9 to 1.1. There
may still be some assymetry, but it will be minor (1/1.1 = 0.0909), and a fair cost for
sparing the audience from dealijng with logarithms.
The last thing to bring into the discussion is the logarithm's property that log of a
ratio is the difference of the logs, that is, log(A/B) = log(A) - log(B). Many statistical
techniques work best when they are describing the algebraic difference between two
quantities. Therefore, when it is natural to think of some quantity in terms of ratios
rather than simple differences. it is common for analysts to begin with a logarithmic
transformation of the data and perform a formal analysis on the logarithms.
Logarithms also play an important role in analyzing probabilities. Statisticians have
developed many techniques for fitting straight-line models to predict a variety of
outcomes. There is a problem when using these methods to model probabilities.
The estimated probabilities can be less than 0 or greater than 1, which are
impossible values. Logistic regression models the log odds (odds = probability/(1probability)) instead. While probabilities must lie between 0 and 1 (with a neutral
value of 1/2), odds are ratios that lie between 0 and infinity (with a neutral value of
1). It follows from the discussion two paragraphs above, that log odds can take on
8
any value, with a neutral value of 0 and the log odds in favor of an event being equal
in magnitude and opposite in sign to the same odds against the event. Whew!
Different types of logarithms
Just as length can be measured in feet, inches, meters, kilometers, centimeters, or
whatever, logarithms can be defined in may ways according to what happens in the
original scale when there is a one unit change in the log scale. Common logs are
defined so that a 1 unit increase in the log scale is equivalent to multiplying by 10 in
the original scale. One could define logarithms so that a one unit increase in the log
scale is equivalent to multiplying by 2 in the original scale. These would be called logs
to base 2. The value by which a number is multiplied in the original scale when its
logarithm is increased by 1 is known as the base of the logarithm. Any positive
number different from 1 can be used as a base.
Mathematicians are fond of natural logarithms. A 1 unit increase in this log scale is
equivalent to multiplying in the original scale by a factor known as Euler's constant,
e (approximately 2.71828). Mathematicians like natural logs because they have
properties that are not shared by other types of logarithms. For example, if you
apply any logarithmic transformation to a set of data, the mean (average) of the logs
is approximately equal to the log of the original mean, whatever type of logarithms
you use. However, only for natural logs is the measure of spread called the standard
deviation (SD) approximately equal to the coefficient of variation (the ratio of the
SD to the mean) in the original scale.
However, as already mentioned, the different tyupes of logs are like different units
for measuring height. You can't report a height of 71. Is it the 71 inches that might
be appropiate for an adult male, or is it the 71 cm that might be appropriate for a
toddler? Similarly, you can't report a logarithm of 2. Is it the common log
corresponding to a value of 100 in the original scale or a natural log corresponding
to a value of 7.39?
Pick a number and write it down. A one or two digit number will do.



Enter the number into your calculator.
o Press the LOG key. Note the result.
o Press the 10x key. Note the result.
Enter the number into your calculator.
o Press the LN key. Note the result.
o Press the ex key. Note the result.
Enter the number into your calculator.
o Press the LOG key. Note the result.
o Press the ex key. Note the result.
9

Enter the number into your calculator.
o Press the LN key. Note the result.
o Press the 10x key. Note the result.
It really doesn't matter what kind of logarithms you use. It's like choosing units of
length. If you measure something in feet and later want it in inches, just multiply by
12. You can also switch between different kind of logarithm by multiplying by the
proper constant*.
A comment about notation. For nonmathematicians there are at most two kinds of
logarithms--common logs, denoted by log(x), and maybe Natural logs, denoted
ln(x). Mathematicians like general notation and write logarithms as log b(x), where b
denotes the base. Thus, common logs are written log10. If mathematicians were
entirely consistent, natural logs would be written loge. However, mathematicians use
natural logs almost to the exclusion of all others, so 'log' written without a base is
understood to stand for natural logs. I do this myself when I am writing
mathematics.
It is quite different when I am describing the results of an analysis to a general
audience. I go out of my way to avoid using the word logarithm. If an analysis
demands logs, I often write, "Data in tables and graphs are displayed in the original
scale of measurement. However, <for these stated reasons> a logarithmic
transformation was applied to the data prior to formal analysis." If I had to discuss
logged data formally with a general audience, I would write log(x) for common log
and ln(x) for natural log, but I'd do my best to avoid using natural logs. Logarithms
of any kind place huge demands on a general audience. They risk confusion and
greatly increase the chance the audience will tune out and the message will get lost.
If logarithms must be used, it is essential to do it in a way that causes the least
amount of discomfort for the audience--common logs denoted by 'log'.
Summary
Logarithms are just another transformation. We use them because sometimes it's
easier to analyze or describe something in terms of log transformed data than in
terms of the original values.
----------------------*
To transform common logs (base 10) to natural logs (base e), multiply the common logs by
2.3026, the natural log of 10. Try it with your calculator. Take the common log of 253. It is
2.4031. Multiply it by 2.3026 to get 5.5334. Now press the ex key to get 253! To transform
natural logs (base e) to common logs (base 10), the constant is 0.4343, the common log of e. In
general,
10
logb(x) = logb(a) loga(x) , and
logb(x) = loga(x) / loga(b)
[back to The Little Handbook of Statistical Practice]
Copyright © 1999 Gerard E. Dallal
Last modified: 09/20/2004 14:18:03.
Download