variance-based sensitivity analysis in unbalanced

advertisement
VARIANCE-BASED SENSITIVITY ANALYSIS IN
UNBALANCED FACTORIAL DESIGNS
O. Eriksson*
Linköping University, Sweden
oleri@mai.liu.se
Variance-based sensitivity analysis has been extensively discussed for models in which the inputs can be
regarded as independent, uniformly distributed random variables. In this case, there are generally recognized
methods for attributing the variance to main effects, higher order terms and random variation. Here, we will
address the case of nominal inputs for an unbalanced, possibly incomplete, design for a model without
interaction.
The correlation ratio
Var ( E Y | x )
Var ( E Y | x )  E Var (Y | x ) 
is a widely used measure of variance-based sensitivity. We
examine three estimators of this ratio in a simple case—a balanced one-factor design. The three approaches are:
(i) to estimate each of the three components of the correlation ratio, without bias, from ANOVA mean squares,
(ii) to estimate the variation between factor levels by use of margin means, and (iii) to estimate the correlation
ratio by use of the coefficient of multiple determination. We discuss the bias, if any, in the components and
correlation ratios obtained with the three approaches.
When dealing with unbalanced designs, we assume that the imbalance reflects an intention to give the
factor levels different weights, based on logical considerations or a known true representation. Given that
assumption, we define theoretical measures of variation and combine them into correlation ratios in unbalanced
one-way designs and multi-way designs for which the number of observations in each cell is proportional to the
product of the margin numbers. We also show that the component-wise unbiased approach is easily extended
from the perfectly balanced case to the latter form of imbalance.
In the case of a truly unbalanced two-factor design, we argue that there are only three sources of
variation: effects in the first factor, effects in the second factor, and random variation. The imbalance, however,
makes it difficult to separate these sources, as discussed by Saltelli et.al[1]. The theoretical measures of
variation must be redefined in order to achieve the desired properties of correlation ratios, i.e. ratios summing to
one and being non-negative. For the redefined measures, we prove that the technique of solving variation
components from ANOVA mean squares can not be used, and at the same time we also show that the coefficient
of multiple determination can not be used for estimating the correlation ratio. We suggest a method for
estimating variation components by replacing the theoretical factor level effects by their estimators and then
combine them into estimators of correlation ratios. This design and the suggested method do not require any
thorough discussion of sampling and integration techniques; the method uses only well-known ANOVA
calculus and simple calculations based on margin means or estimated effects. We discuss mainly two-way
designs, but the method can easily be extended to multi-way designs. It is not component-wise unbiased, but the
bias decreases with increasing sample size.
The methods discussed have been successfully applied to data sets representing outputs from models of
road traffic emissions. We have studied a selection of pollutants for gasoline and diesel-powered private cars.
Emission data were generated by the COPERT III model [2] using a new computer program [3] that can quickly
compute emissions for each input value over a grid of input combinations. This program can read and write data
on text files for subsequent use in statistical software.
References
[1] Saltelli A, Tarantola S, Campolongo F, Ratto M: Sensitivity analysis in practice, 2004
[2] http://lat.eng.auth.gr/copert
[3] Eriksson O: Licentiate thesis, Linköping University, Linköping, 2004
Download