Normalization

advertisement
from Encyclopedia of Social Science Research Methods (Lewis-Beck,
Bryman, and Liao, eds.). Thousand Oaks, CA: Sage 2003.
Normalization
by Paul T. von Hippel
In many popular statistical models, we assume that some component of a variable Y has a
NORMAL DISTRIBUTION.
For example, in the LINEAR REGRESSION model Y = + X + ,
we typically assume that the error term  is normal. Although minor departures from
normality may be acceptable, distributions with heavier-than-normal tails can
compromise statistical estimates. In such cases, it may be preferable to TRANSFORM Y so
that the pertinent component is closer to normality. Transforming a variable in this way is
called normalization.
If the pertinent component of Y has one heavy tail (SKEW), then we often apply a
power transformation. True to their name, power transformations raise Y to some power
p (i.e., they transform Y into Yp). Powers greater than 1 reduce negative skew; an example
is the quadratic transformation Y2 (p = 2). Powers between 0 and 1 reduce positive skew;
an example is the square-root transformation
Y or
Y  1 2 (p = .5), which is common
when Y represents counts or frequencies. For a power of 0, the power transformation is
defined to be log(Y), which reduces positive skew in much the same way as a very small
power. Negative powers have the same effect as positive powers applied to the reciprocal
1
and are used when the reciprocal has a natural interpretation—as when Y is a rate
Y
(events per unit time), so that
1
is the time between events.
Y
In sum, the family of power transformations can be written as follows:
 Y p if p  0
t (Y ; p) = 
.
ln(Y ) if p = 0
Power transformations assume that Y is positive; if Y can be zero or negative, we
commonly make Y positive by adding a constant. There are formal procedures for
estimating the best constant to add, as well as the power p that yields the best
approximation to normality (Box & Cox, 1964). However, the optimal power and
additive constant are usually treated only as rough guidelines.
If the pertinent component of Y has two heavy tails (excess KURTOSIS), we may
use a modulus transformation (John & Draper, 1980),

 (| Y | 1) p  1


sign
(
Y
)


 if p  0 ,
t (Y ; p )  
p


sign (Y )ln( Y )  1 if p  0

which is a modified power transformation applied to each tail separately. Non-negative
powers p less than 1 reduce kurtosis, while powers greater than 1 increase kurtosis.
Again, there are formal procedures for estimating the optimal power p (John & Draper,
1980). If Y is symmetric around 0, then the modulus transformation will change the
kurtosis without introducing skew. If Y is not centered at 0, it may be advisable to add a
constant before applying the modulus transformation.
Other normalizations are typically used if Y represents proportions between 0 and
1: the arcsine or angular transformation sin 1 ( Y ) , the logit or logistic
 Y 
1
1
transformation ln 
 , and the probit transformation  (Y ) , where  is the
1 Y 
inverse of the cumulative standard normal density. The logit and probit are better
normalizations than the arcsine. On the other hand, the arcsine is defined when Y=0 or 1,
whereas the logit and probit transformations are not.
Even the best transformation may not provide an adequate approximation to
normality. Moreover, a transformed variable may be hard to interpret, and conclusions
drawn from it may not apply to the original, untransformed variable (Levin, Liukkonen,
& Levine, 1996). Fortunately, modern researchers often have good alternatives to
normalization. When working with non-normal data, we can use a GENERALIZED LINEAR
MODEL
that assumes a different type of distribution. Or we can make weaker assumptions
by using statistics that are “distribution-free” or NONPARAMETRIC.
In addition to the definition given here, the word normalization is sometimes used
when a variable is STANDARDIZED. It is also used when constraints are imposed to ensure
that a system of SIMULTANEOUS EQUATIONS is IDENTIFIED (e.g., Greene, 1997).
Paul T. von Hippel
References
Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of the
Royal Statistical Society (B), 26(2), 211252.
Cook, R. D., & Weisberg, S. (1996). Applied regression including computing and
graphics. New York: Wiley.
Greene, W. H. (1997) Econometric analysis (3rd ed.). Upper Saddle River, NJ:
Prentice Hall.
John, N. R., & Draper, J.A. (1980). An alternative family of transformations.
Applied Statistics, 29(2), 190197.
Levin, A., Liukkonen, J., & Levine, D. W. (1996). Equivalent inference using
transformations. Communications in Statistics, Theory and Methods, 25(5), 10591072.
Yeo, I.-K., & Johnson, R. (2000). A new family of power transformations to
improve normality or symmetry. Biometrika, 87, 954959.
Download