Variance Stabilizing Transformations
Variance is Related to Mean
• Usual Assumption in ANOVA and Regression is that the variance of each observation is the same
• Problem: In many cases, the variance is not constant, but is related to the mean.
– Poisson Data (Counts of events): E(Y) = V(Y) = m
– Binomial Data (and Percents): E(Y) = n p
V(Y) = n p(1-p)
– General Case: E(Y) = m
V(Y) =
W(m)
– Power relationship: V(Y) = s
2 = a 2 m 2b s am b ln( s
)
ln( a
)
b ln( m
)
a
* bm
*
Transformation to Stabilize Variance
(Approximately)
• V(Y) = s
2 =
W(m)
. Then let: f ( m
)
W
(
1 m
)
1
/ 2 d m
V
f ( Y )
constant
This results from a Taylor Series expansion: f ( y )
f ( m
)
( Y
m
) f ' ( m
)
f ( Y )
f ( m
)
2
( Y
m
)
2
f ' ( m
)
2
V ( f ( Y ))
W
( m
)
1
(
W
( m
))
1 / 2
2
1
Special Case: s 2 a 2 m 2b
Case 1 : b
1 : f ( m
)
Case 2
: b
W
(
1 m
)
1
/ 2
1 d m 1 am b d m
1 a
m b
b
1
1
c m
1
b f ( m
)
W
(
1 m
)
1
/ 2 d m 1 am d m
1 a ln( m
)
Estimating b
From Sample Data
• For each group in an ANOVA (or similar X levels in Regression, obtain the sample mean and standard deviation
• Fit a simple linear regression, relating the log of the standard deviation to the log of the mean
• The regression coefficient of the log of the mean is an estimate of b
• For large n , can fit a regression of squared residuals on predictors expected to be related to variance
50
40
30
20
10
0
0
60
70
Example - Bovine Growth Hormone
Bovine Growth Hormone Data
50 100 150 200 250
Mean
300 350 400 450 500
Example - Bovine Growth Hormone ln(mean) ln(sd)
5.7807
3.6687
6.0684
4.0993
5.7900
3.6661
5.7621
3.9703
5.7838
3.8351
5.7930
3.7612
4.9972
3.1946
5.3799
3.4751
4.9416
2.9755
5.0239
3.4340
4.9904
3.0910
5.0239
3.0204
Intercept ln(mean)
Coefficients Standard Error
-1.0553
0.5373
0.8396
0.0984
Estimated b
= .84
1, A logarithmic transformation on data should have approximately constant variance