RisCenter 1.1 Subproject full title; Graphical Models for Decision Under Risk 1.2 Subproject acronym: GMDUR 1.3 RISKMAN research area; Area 1 Integrated approach to risk management 2.1 Proposing organization: Delft University of Technology 2.2 Contact Person: Prof. Dr. R.M.Cooke 2.3 Address: Mekelweg 4, 2600GA Delft, The Netherlands 2.4 Tel: 31 15278 2548 2.5 Fax: 31 15278 7255 2.6 Email: r.m.cooke@its.tudelft.nl 2.7 Web site http://ssor.twi.tudelft.nl/~risk/ 2.8 Participating Organisations: U. of Strathclyde United Kingdom George Washington Univ. USA National Aeronautics Lab The Netherlands HKV The Netherlands 3. Proposal summary: Graphical models are becoming the weapon of choice for modeling decision problems involving uncertainty and risk. Most popular are Bayesian belief nets (bbn’s) (Smith, 1988 Lauritzen and Spiegelhalter, 1988, Lauritzen, 1996, Bedford and Cooke, 2001). By capturing ‘influence” relationships between uncertain variables, these models enable engineers and problem owners to specify their problem in a language that is easy to understand and communicate. Using available software packages, these models provide a user interface to drive underlying calculational models. Bbn’s in particular allow the effects of different decisions to be captured in a perspicuous manner. In spite of their growing popularity, the current generation of graphical models pose challenging problems: they have very heavy assessment and calculational burdens. As normally used, they require the specification of a great many conditional distributions, and require variables to be discretized drastically. Continuous variables can be used only in very special cases. This research applies a new graphical modeling tool, called regular vines, to model dependence in high dimensional continuous distributions. These allow influence to be captured as conditional rank correlation, and support samplying on the fly, and updating with arbitrary continuous marginal distributions. 4 Objectives: Regular vines provide a graphical modeling tool for representing the correlation structure of high dimensional distributions. Specifically, they specify sets of conditional or partial rank correlations which are algebraically independent and which are sufficient to determine the rank correlation matrix. The graphical structure also provides a sampling scheme allowing these distributions to be sampled on the fly. Recent reseach has shown how a distribution specified by a regular vine can be translated into a belief net or an independence graph. Using well-known recursive relations for partial correlations, regular vines can be transformed in such a way as to allow simple conditionalization and updating. The univariate marginal distributions need not be normal. The goal of this research is to develop regular vines into a practical modeling tool with software implementation. 5. Deliverables: The research will result in a software implementation. The uncertainty analysis package UNICORN has recently been supplied with a vine dependence module. This RisCenter module will be extended so as to enable a bbn interface to drive the UNICORN Monte Carlo sampling engine. Hence, users will specify a set of continuous univariate distributions, and will specify a bbn. The software will translate this into a regular vine for sampling and updating. UNICORN’s graphical output tools and report generator will then be available for analyzing the results. 6. Justification and potential impact: Given the widespread use of bbn’s and the evident problems which users encounter, the potential impact of this research is great. Bbn’s have been applied at the TU Delft, for example, in analysing airport safety, underground construction, and emergency planning. Enhancing the flexibility of modeling problems with continuous variables, together with tractable computational algorithms would have an immediate impact in improving decision making whenever substantial uncertainties are present. The work will result in a continuous bbn front end user interface which drives the UNICORN engine, and as such will be immediately available to problem owners. However, the theoretical work can implemented by other uncertainty analysis and belief net codes, of which Europe has several. In this way this research contributes to the competitiveness and employment in Europe. Finally, a significant improvement in the modeling of decisionmaking under uncertainty will contribute in a straightforward way to the health safety and quality of life. 7. Description of the work: Introduction High dimensional probabilistic models are often formulated as Bayesian belief nets (bbn’s), that is, as directed acyclic graphs with nodes representing random variables and arcs representing “influence”. bbn’s are conditionalized on incoming information to support probabilistic inference real-time modelling accident management and emergency planning. For continuous random variables, an adequate theory of bbn’s exists only for the joint normal distribution. In general, an arbitrary correlation matrix is not compatible with arbitrary marginals, and conditionalization is quite intractable. Transforming the marginal distributions to normals does not help, as the joint distribution is not thereby joint normal. Moreover, the joint normal cannot realize exactly a specified rank correlation matrix. A continuous belief net can be represented as a regular vine, where an arc from node i to j is associated with a (conditional) rank correlation between i and j. Using the elliptical copula and the partial correlation transformation properties, it is very easy to conditionalize the distribution on the value of any node, and hence update the bbn. We associate nodes in a bbn with continuous univariate random variables and to interpret “influence” in terms of correlation. This should ideally be done in a way that does not impose intractable constraints, and which supports conditionalization. Simply associating arcs with correlations is unsuitable for two reasons; (i) the compatibility of marginal distributions and a specified product moment correlation is not easily determined, and (ii) the correlation matrix must be positive definite. One option is to represent influence as rank correlation. Any rank correlation in the interval [1, 1] is compatible with arbitrary continuous invertible marginals. We could transform the variables to standard normal, induce a product moment correlation structure using well known methods, and transform back to the original variables. The rank correlation thus induced would not be exactly equal the specified rank correlation, but would be in the right ball park. Exact replication of a given rank correlation matrix could be obtained with this method if the joint normal could realize every rank correlation matrix. This is not the case; indeed the rank RisCenter correlation matrices of joint normal distributions are very sparse in the set of correlation matrices. Of course, the problem of positive definiteness noted above would still be present. Vines offer a more promising solution. Vines A vine on N variables is a nested set of trees, where the edges of tree j are the nodes of tree j+1; j = 1,… ,N-2, and each tree has the maximum number of edges (Cooke 1997, Bedford and Cooke 2001). A regular vine on N variables is a vine in which two edges in tree j are joined by an edge in tree j+1 only if these edges share a common node, j = 1,… ,N-2. There are (N-1)+(N-2)+ … +1 =N(N-1)/2 edges in a regular vine on N variables. Figure 1 shows a regular vine on 5 variables. The four nested trees are distinguished by the line style of the edges; tree 1 has solid lines, tree 2 has dashed lines, etc. The conditioned ( before |) and conditioning ( after |) sets associated with each edge are determined as follows: the variables reachable from a given edge are called the constraint set of that edge. When two edges are joined by an edge of the next tree, the intersection of the respective constraint sets are the conditioning variables, and the symmetric difference of the constraint sets are the conditioned variables. The regularity condition insures that the symmetric difference of the constraint sets always contains two variables. Note that each pair of variables occurs once as conditioned variables. We recall two generic vines, the D-vine D(1,2,…n ) and C-vine C(1,2,…,n), shown on Figures 1 and 2. 12 1 23 34 45 2 3 4 13|2 24|3 35|4 14|23 5 25|34 15|234 Figure 1: The D-vine on 5 variables D(12345) showing conditioned and conditioning sets. 1 34|12 12 13 2 23|1 3 24|1 4 14 RisCenter Figure 2: The C-vine on 4 variables C(1,2,3,4) showing conditioned and conditioning sets. Each edge in a regular vine may be associated with a constant conditional rank correlation (for j=1 the conditions are vacuous) and, using minimum information copulae, a unique joint distribution satisfying the vine-copulae specification with minimum information can be constructed and sampled on the fly (Cooke 1997). Moreover, the (constant conditional) rank correlations may be chosen arbitrarily in the interval [-1,1]. The edges of a regular vine may also be associated with partial correlations, with values chosen arbitrarily in the interval (-1,1). Using the well known recursive formulae it can be shown that each such partial correlation regular vine uniquely determines the correlation matrix, and every full rank correlation matrix can be obtained in this way (Bedford and Cooke 2002). In other words, a regular vine provides a bijective mapping from (-1,1)N(N-1)/2 into the set of positive definite matrices with 1's on the diagonal. One verifies that ij can be computed from the sub-vine generated by the constraint set of the edge whose conditioned set is {i, j} using the following recursive formulae. When (X,Y) and (X,Z) are joined by the elliptical copula (Kurowicka et. al. 2000) and the conditional copula (Y,Z|X) does not depend on X, then the conditional correlation (Y,Z) given X does not depend on X and conditional product moment correlation of Y,Z given X is equal to partial correlation (Kurowicka and Cooke 2001). Figure 3: The scatter plot of the elliptical copula with correlation 0.8. Moreover, there exists very compact functional form of the conditional distribution using the elliptical copula. Using the elliptical copula, the vine structure can be uniquely associated with a full rank correlation matrix and can be converted into an on-the-fly sampling routine. RisCenter For a regular vine-rank correlation specification with elliptical copula updating with information is very simple. Since for elliptical copulae partial and conditional product moment correlations are equal then using the recursive formula we can convert any partial correlation vine into any other partial correlation vine. We convert given vine to the C-vine with variable which we observe, say 1, as a root. Conditional correlations don't depend on a value of 1 then we drop the "1"s from all conditions and as a result we obtain C-vine with variable 2 as a root. We can convert this "updated vine" to any other regular vine, recalculate conditional rank correlations and sample this "updated" vine if desired. Belief Nets A finite valued Belief Net is a directed acyclic, graph, together with an associated set of probability tables. The graph consists of nodes and arcs. The nodes represent variables, which can be discrete or continuous. The arcs represent causal/influential or functional relationships between variables. 1 1 3 2 3 2 b a Figure 4: A simple example of bbn’s. The graph in Figure 4a) tells us that variables 2 and 3 are conditional independent given variable 1. The message of the graph on Figure 4b) is that 2 and 3 are independent and a distribution of 1 given 2 and 3 is arbitrary. If variables 1,2 and 3 in Figure 4b take values “True” or “False” then two 2x2 probability tables must be specified (example shown in tables below). 1 2 True True 0.1 False 0.9 1 3 True True 0.6 False 04 Even for such a simple example, figuring out the right probabilities in the probability tables requires some work (e.g. statistical data or expert’s opinions). For a large net with many dependences and nodes that can take more values this is extremely difficult, and often produces superficial quantification. False 0.5 0.5 False 0.7 0.3 The main use of bbns is in situations that require learning. If we know events that have actually been observed, we might want to infer the probabilities of other events, which have not yet been observed. Using Bayes Theorem it is then possible to update the values of all the other probabilities in the bbn. Updating bbn’s is very complex involving arc reversing and addition but with the algorithm proposed by (Lauritzen and Spiegelhalter 1988) it is possible to perform fast updating in large bbns. It has recently been shown (Kurowicka and Cooke 2002) that bbns can be translated into regular vines, where influence on the bbn gets interpreted as conditional rank correlation. This enables a specification of belief nets with arbitrary continuous marginal distributions, and RisCenter arbitrary conditional rank correlations. Such a specification is always consistent. By translating to regular vines, the joint distribution may be efficiently sampled and updated. The following figure shows a simple bbn whose edges are associated with conditional rank correlations. From the theory sketched above, it follows that these correlations uniquely determine correlation matrix and are consistent with arbitrary continuous univariate marginals. Further, the regular vine representation leads to an efficient sampling algorithm. Figure 5, : A Bayesian belief net with conditional rank correlations (1,3) 1 (1,2) 2 3 (3,4|2) 4 (3,5) (2,4) (5,4|3) 5 Anyone who has used bbns will appreciate how dramatically the use of conditional rank correlations and vines simplifies the assessment and computational burden. The user need specify only the marginal distributions and the conditional rank correlations as indicated in Figure 5. Extensive experience with eliciting such correlations from experts has been accumulated. The work involes : RTD: developing the algorithms for bbn-vine conversions, and developing the software to effect a handshake between a graphical bbn and the UNICORN Monte Carlo engine DEM: in the last phase of this work, prototype applications involving problems supplied by GW university and HKV will be used to demonstrate this approach/ Previous work The TU Delft has developed two commercial codes, EXCALIBUR and UNICORN. UNICORN is a generic uncertainty analysis program which incorporates the regular vine method for dependence modeling in high dimensional distributions. It has been sold to research institutions involved in risk modeling, including RIVM, ABS consulting and GW University. A previous PhD at the TU Delft (Kurowicka, 2001) contributed substantially to the theoretical developments underlying the present proposal. Information regarding these programs, and light versions can be downloaded gratis from the website http://ssor.twi.tudelft.nl/~risk/. An experienced programmer at the TU Delft is available to assist the software implementation. Milestones The first year will focus on extending the theoretical developments. Algorithms for translating bbns into regular vines will be extended and optimized. Prof. Bedford (U. Strathclyde) will be consulted during this phase. The second year will be devoted to building a graphical software front end to drive the UNICORN engine. In the third year, prototype case studies will be performed, under supervision of Prof. van Noortwijk and Dr. Kok, (HKV) and Prof. RisCenter Mazzuchi (GWU,. TU Delft). The fourth year will be concerned with write up and dissemination. 8. Partners involved University of Strathclyde George Washington University HKV consulting Prof. T.J. Bedford Prof. T.A. Mazzuchi Prof. J. van Noortwijk, dr. M. Kok Bedford has participated in the theoretical developments underlying this proposal, and will help supervise the implementation of this work. Mazzuchi has extensive experience in real time risk modelling and will contribute a practical viewpoint, and will provide prototype applications for testing the software implementation. Similarly, Kok and Van Noortwijk have extensive experience in risk modelling in water management. They will also oversee prototype applications. 9. Resources Total resources for this project are: Item PhD position Travel, PhD Travel Bedford Personal computer Total 180,000 euro 7,000 euro 8,000 euro 2,000 euro 197,000 euro Travel for Mazzuchi is not anticipated, as he is employed at the TU Delft for 6 months per year. 10. Duration The duration of the subproject is four years. 11. Financial Plan Investment from the TU Delft and partner institutions in this project are: Item Project Leader (Cooke, TUD) Assisst.Prof (Kurowicka)(TUD) Prof. Bedford (U. Strathclyde) Programmer (Kritchallo)(TUD) Prof.Mazzuchi (GWU) Prof. van Noortwijk (HKV) % committed per year 0.1 0.2 0.05 0.3 0.1 0.05 RisCenter Total 0.8 12. Other Issues: none References Bedford, T.J. and Cooke R.M. (2001) Probabilistic Risk Analysis, Foundations and Methods, Cambridge University Press, 2001. Bedford, T.J., and Cooke. R.M.,2002, Vines - a new graphical model for dependent random variables, Ann. of Statistics vol. 30 no. 4, 1031-1068. Bedford, T.J. and Cooke R.M. (2001) "Probability density decomposition for conditionally dependent random variables modelled by vines", Annals of Mathematics and Artificial Intelligence 32,245-268. Cooke, R.M.,1997, Markov and entropy properties of tree and vines-dependent variables, Proceedings of the ASA Section of Bayesian Statistical Science. Kurowicka, D. (2001) Techniques for Representing High Dimensional Distributions, PhD Thesis, Department of Information, Technology and Systems , T.U. Delft. Kurowicka, D., Misiewicz, J., and Cooke, R.M., 2000, Elliptical copulae, Proc. of International Conference on Simulations- Monte Carlo, 201-214. Kurowicka, D., and Cooke, R.M., 2001, Conditional, Partial and Rank Correlation for Elliptical Copula; Dependence Modeling in Uncertainty Analysis, Proc. of ESREL 2001. Kurowicka, D. and Cooke, R.M. “The vine copula method for representing high dimensional dependent distributions: Application to continuous belief nets” Proceedings of the 2002 Winter Simulation Conference E. Yücesan, C.-H. Chen, J. L. Snowdon, and J. M. Charnes, eds. Lauritzen, S.L., 1996, Graphical Models, Clarendon Press, Oxford. Lauritzen S.L.,Spiegelhalter D.J., 1988, Local computations with probabilities on graphical structures and their application to expert systems, Journal of the Royal Statistical Society, Series B, vol. 50, pp.157-224. Smith, J.Q., 1988, Statistical principals on graphs, in Proceedings Influence Diagrams for Decision Analysis, Inference and Prediction, Smith and Oliver eds, Wiley, Chichester.