from Encyclopedia of Social Science Research Methods (Lewis-Beck, Bryman, and Liao, eds.). Thousand Oaks, CA: Sage 2003. Normalization by Paul T. von Hippel In many popular statistical models, we assume that some component of a variable Y has a NORMAL DISTRIBUTION. For example, in the LINEAR REGRESSION model Y = + X + , we typically assume that the error term is normal. Although minor departures from normality may be acceptable, distributions with heavier-than-normal tails can compromise statistical estimates. In such cases, it may be preferable to TRANSFORM Y so that the pertinent component is closer to normality. Transforming a variable in this way is called normalization. If the pertinent component of Y has one heavy tail (SKEW), then we often apply a power transformation. True to their name, power transformations raise Y to some power p (i.e., they transform Y into Yp). Powers greater than 1 reduce negative skew; an example is the quadratic transformation Y2 (p = 2). Powers between 0 and 1 reduce positive skew; an example is the square-root transformation Y or Y 1 2 (p = .5), which is common when Y represents counts or frequencies. For a power of 0, the power transformation is defined to be log(Y), which reduces positive skew in much the same way as a very small power. Negative powers have the same effect as positive powers applied to the reciprocal 1 and are used when the reciprocal has a natural interpretation—as when Y is a rate Y (events per unit time), so that 1 is the time between events. Y In sum, the family of power transformations can be written as follows: Y p if p 0 t (Y ; p) = . ln(Y ) if p = 0 Power transformations assume that Y is positive; if Y can be zero or negative, we commonly make Y positive by adding a constant. There are formal procedures for estimating the best constant to add, as well as the power p that yields the best approximation to normality (Box & Cox, 1964). However, the optimal power and additive constant are usually treated only as rough guidelines. If the pertinent component of Y has two heavy tails (excess KURTOSIS), we may use a modulus transformation (John & Draper, 1980), (| Y | 1) p 1 sign ( Y ) if p 0 , t (Y ; p ) p sign (Y )ln( Y ) 1 if p 0 which is a modified power transformation applied to each tail separately. Non-negative powers p less than 1 reduce kurtosis, while powers greater than 1 increase kurtosis. Again, there are formal procedures for estimating the optimal power p (John & Draper, 1980). If Y is symmetric around 0, then the modulus transformation will change the kurtosis without introducing skew. If Y is not centered at 0, it may be advisable to add a constant before applying the modulus transformation. Other normalizations are typically used if Y represents proportions between 0 and 1: the arcsine or angular transformation sin 1 ( Y ) , the logit or logistic Y 1 1 transformation ln , and the probit transformation (Y ) , where is the 1 Y inverse of the cumulative standard normal density. The logit and probit are better normalizations than the arcsine. On the other hand, the arcsine is defined when Y=0 or 1, whereas the logit and probit transformations are not. Even the best transformation may not provide an adequate approximation to normality. Moreover, a transformed variable may be hard to interpret, and conclusions drawn from it may not apply to the original, untransformed variable (Levin, Liukkonen, & Levine, 1996). Fortunately, modern researchers often have good alternatives to normalization. When working with non-normal data, we can use a GENERALIZED LINEAR MODEL that assumes a different type of distribution. Or we can make weaker assumptions by using statistics that are “distribution-free” or NONPARAMETRIC. In addition to the definition given here, the word normalization is sometimes used when a variable is STANDARDIZED. It is also used when constraints are imposed to ensure that a system of SIMULTANEOUS EQUATIONS is IDENTIFIED (e.g., Greene, 1997). Paul T. von Hippel References Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society (B), 26(2), 211252. Cook, R. D., & Weisberg, S. (1996). Applied regression including computing and graphics. New York: Wiley. Greene, W. H. (1997) Econometric analysis (3rd ed.). Upper Saddle River, NJ: Prentice Hall. John, N. R., & Draper, J.A. (1980). An alternative family of transformations. Applied Statistics, 29(2), 190197. Levin, A., Liukkonen, J., & Levine, D. W. (1996). Equivalent inference using transformations. Communications in Statistics, Theory and Methods, 25(5), 10591072. Yeo, I.-K., & Johnson, R. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87, 954959.