Nonlinear Transformations and Variance Stabilizing Transformations Consider a random variable Y whose variance depends on its mean: E[Y] = Var[Y] = g(), so we can write Y = + e where E[e] = 0 and Var[e] = g(). Make a nonlinear transformation of Y to h(Y). Then using a first‐order Taylor expansion. h(Y ) h( e) h( ) e h ' where h’() is the first derivative of h. This approximation is good to the extent that g() is small. Thus E[ h(Y )] h( ); Var[h(Y )] [h '( )]2 g ( ) (*) This methodology has many uses. We might for example have fitted a model and estimated a parameter and got its standard error. And then we want to know about some function of the parameter – for example its log. Equation (*) give us a way to get the standard error for the function. The variance stabilizing transformation consists of finding the function h that will make Var[h(Y)] approximately constant. To do this, solve [h '( )]2 g ( ) 1 , ie h '( ) 1 g ( ) h( ) dt g (t ) Then making the transformation from Y to h(Y) will make the variance approximately 1. Particular examples of variance‐stabilizing transformations are: 1. Var[Y] = A h(Y) = ln(Y) will have variance approximately A, a constant 2. Var[Y] = Ah(Y) = √Y will have variance approximately A/4, a constant 3. Var[Y] = Ah(Y) = 1/Y will have variance approximately A, a constant 4. Var[Y] = A + Bh(Y) = ln[Y+sqrt(Y2 + C)], where C = A/B 5. Var[Y] = A h(Y) = arc sin √Y Situations 2 and 5 come up with Poisson data and with binomial proportions respectively. You may consider variance stabilizing transformations if you see a plot of residuals vs fitted values with shapes like the megaphone (1), car headlight (2), trumpet (3), megaphone with mouthpiece (4) and Big Mac thinning to nothing at the edges (5). The next page shows some plots of residuals vs fitted values that follow some standard shapes. The corresponding plot on the right shows what happens when you make the variance stabilizing transformation and fit the regression to the transformed variable. In all three, we have done a decent job of stabilizing the variance. But in the case of the trumpet data set, this came at the cost of destroying linearity, a terrible tradeoff. The other two look good, so the transformation fixed the scedasticity problem without damaging linearity that we have to have.