VARIANCE-BASED SENSITIVITY ANALYSIS IN UNBALANCED FACTORIAL DESIGNS O. Eriksson* Linköping University, Sweden oleri@mai.liu.se Variance-based sensitivity analysis has been extensively discussed for models in which the inputs can be regarded as independent, uniformly distributed random variables. In this case, there are generally recognized methods for attributing the variance to main effects, higher order terms and random variation. Here, we will address the case of nominal inputs for an unbalanced, possibly incomplete, design for a model without interaction. The correlation ratio Var ( E Y | x ) Var ( E Y | x ) E Var (Y | x ) is a widely used measure of variance-based sensitivity. We examine three estimators of this ratio in a simple case—a balanced one-factor design. The three approaches are: (i) to estimate each of the three components of the correlation ratio, without bias, from ANOVA mean squares, (ii) to estimate the variation between factor levels by use of margin means, and (iii) to estimate the correlation ratio by use of the coefficient of multiple determination. We discuss the bias, if any, in the components and correlation ratios obtained with the three approaches. When dealing with unbalanced designs, we assume that the imbalance reflects an intention to give the factor levels different weights, based on logical considerations or a known true representation. Given that assumption, we define theoretical measures of variation and combine them into correlation ratios in unbalanced one-way designs and multi-way designs for which the number of observations in each cell is proportional to the product of the margin numbers. We also show that the component-wise unbiased approach is easily extended from the perfectly balanced case to the latter form of imbalance. In the case of a truly unbalanced two-factor design, we argue that there are only three sources of variation: effects in the first factor, effects in the second factor, and random variation. The imbalance, however, makes it difficult to separate these sources, as discussed by Saltelli et.al[1]. The theoretical measures of variation must be redefined in order to achieve the desired properties of correlation ratios, i.e. ratios summing to one and being non-negative. For the redefined measures, we prove that the technique of solving variation components from ANOVA mean squares can not be used, and at the same time we also show that the coefficient of multiple determination can not be used for estimating the correlation ratio. We suggest a method for estimating variation components by replacing the theoretical factor level effects by their estimators and then combine them into estimators of correlation ratios. This design and the suggested method do not require any thorough discussion of sampling and integration techniques; the method uses only well-known ANOVA calculus and simple calculations based on margin means or estimated effects. We discuss mainly two-way designs, but the method can easily be extended to multi-way designs. It is not component-wise unbiased, but the bias decreases with increasing sample size. The methods discussed have been successfully applied to data sets representing outputs from models of road traffic emissions. We have studied a selection of pollutants for gasoline and diesel-powered private cars. Emission data were generated by the COPERT III model [2] using a new computer program [3] that can quickly compute emissions for each input value over a grid of input combinations. This program can read and write data on text files for subsequent use in statistical software. References [1] Saltelli A, Tarantola S, Campolongo F, Ratto M: Sensitivity analysis in practice, 2004 [2] http://lat.eng.auth.gr/copert [3] Eriksson O: Licentiate thesis, Linköping University, Linköping, 2004