6. Probabilistic Inversion Definitions and Theorems Roger M Cooke ShortCourse on Expert Judgment National Aerospace Institute April 15,16, 2008 This note gives mathematical definitions of probabilistic inversion (PI) and provides references to the principal mathematical results. It is written for mathematicians. Probabilistic inversion may be introduced either from a measure theoretic or from a random variable viewpoint. Each has its own appeal, so we present both, following45, 51. Measure theoretic approach Let (M;B(M);λ) and (N;B(N); ν) be two Borel probability spaces, where M;N are compact nonempty subsets of Rm and Rn respectively. Let T : Rm → Rn be a continuous mapping. γ = λ◦ T -1 is called the push forward of λ under T, and similarly λ is called the pull back of γ under T. In the following λ plays the role of a background measure. Measure μ on (M; B(M)) is called an inverse of T at ν if μ is the pull back of ν under T; that is: B B(N); μ◦T -1(B) = ν(B) 1) The problem of probabilistic inversion can then be formulated as: Definition (Probabilistic inversion problem) Given (M;B(M);λ) and (N; B(N); ν), with T : Rm → Rn continuous, find a measure μ absolutely continuous with respect to λ, μ << λ , on (M; B(M)) such that μ is an inverse of T at ν. Such problems may be infeasible, and if feasible may have many solutions. Under certain conditions a measure μ can be found solving (1). If ν << γ, the Radon-Nikodym derivative g : N → R exists, is non-negative, unique up to sets of γ-measure zero, and satisfies B B(N); ν(B) = B g(y)dγ(y). If in addition g is continuous, g(y) > 0; y N, and N g(y) log g(y)dγ(y) < ∞, then define f* := g ◦T, and define μ* as a new probability measure: A B(M); A = T-1(B); μ*(A) := T -1(B) f*dλ(x) 2) It is easy to check that μ* is an inverse of T at ν: μ* T-1(B) = T -1(B) f*dλ(x) = T -1(B) g ◦ T(x)d λ(x) = B g(y)dγ(y) = ν(B). It can be shown that the measure μ* is the unique measure satisfying equation (1) and minimizing the relative information with respect to λ in the class of measures satisfying equation(1).45 Random variable approach If G: Rn → Rm, G(X) {Y | Y C} where C is a subset of random vectors on Rm, then X is called a probabilistic inverse of G at C. X is sometimes termed the input to model G, and Y the output. If the problem is feasible it may have many solutions and we require a preferred solution; if it is infeasible we seek a random vector X for which G(X) is "as close as possible" to C. Typical Application A typical application takes the following form: We wish to model the transport of pollution through an ecosystem as a homogeneous set of first order differential equations,9, 25 dAi(t) / dt = ∑j=1..m xjAj(t) ; i = 1,…m 3) where the coefficients xi are unknown. We wish to capture our uncertainty in the functions Ai(t) treating X = X1,…Xm as a random vector. Direct measurements are not possible. We can obtain data in the form of quantiles of quantities Ai(tk) for selected values of i= 1,..r and at some time points tk, k = 1,..s. These become the components of random vector Y = (Y1,..Yn) where n = r × s. Solving (3), we can express Y as a function of X. We now seek a distribution for X that complies with the quantile constraints on Y. There may also be physical constraints on the possible values of X which should be taken into account. One feature of such problems may be surmised at once: there may be no solution. In such cases we must quantify and attempt to reduce the infeasibility Numerical Algorithms Until very recently, applications of PI involved sophisticated optimization routines. Duality theory was used extensively to try to reduce infeasibility, and methods for dealing with infeasibility were crude. Recently, simple iterative methods have been discovered which at once render PI easier and mathematically well founded. We believe that these techniques can now move into a wide field of application. The idea is based on sample re-weighting using the Iterative Proportional Fitting algorithm. A variation on this algorithm exhibits excellent convergence behavior, even in case of infeasibility. These developments are sketched below. We first define the simplex, the strict simplex, margins and the relative information: Definition(Simplex, strict simple, margins, relative information) SK = {P RK | Pk ≥ 0, ∑k=1…K pk = 1} S*K = {P RK | Pk > 0, ∑k=1…K pk = 1} For P SK×K, P j = ∑i=1,...K pi j; Pi = ∑ j=1,…K pi j , For, P,Q SK, such that pi = 0 qi = 0; i = 1,…K, the relative information of Q with respect to P is: I(Q | P) = ∑k=1,…K qi ln(qi/pi). It is not difficult to show that Proposition 116 Let Pm SK; m = 1,…M. Then Argmin{PSK} ∑i=1,…M I(Pm | P) = (1/M)∑Pm. Proposition 219 [5] For a,b S*K , let Q(1)={Q SK×K | Q(1)i = ai }; Q(2) = {Q SK×K | Q j = bj} and let P SK×K; then (Argmin{Q Q(1)} I(Q | P) )i j = ai Pi j / Pi ; (Argmin{Q Q(2)} I(Q | P) )i j = bj Pi j / P j; Propositions 1 and 2 assert that the argmins exist and are unique. The argmins in proposition 2 are absolutely continuous with respect to P. The main results for the iterative numerical algorithms are sketched for the case of two marginal distributions (proposition 2) to simplify notation, but can be readily generalized. Definition (Iterative Proportional fitting IPF, PARFUM) With the notation of Proposition 2, P0, P1, P2… is a sequence generated from P = P0, a,b, S*K by Iterative Proportional Fitting if Pkij = ai Pk -1i j / Pk - 1i ; k odd Pkij = bj Pk - 1i j / Pk - 1 j; k even. P0, P1, P2… is a sequence generated from P = P0, a,b, S*K by PARFUM if Pkij =½ ( ai Pk -1i j / Pk - 1i + bj Pk - 1i j / Pk - 1 j) IPF was introduced by Kruithof47 re-discovered independently by Deming and Stephan23 and many others. It has been studied extensively7,11,17,23,31,32,39,47,49,50,69,71. The fundamental result was proved by Csiszar19,20,21, though a more general result was proved by Bregman and published in Russian10,13 Specialized to the case of Proposition 2, Csiszar’s result is: Theorem 1: The sequence P0, P1, P2… generated from P = P0, a,b, S*K by IPF converges to P* SK×K if and only if there exists R SK×K, R << P such that R Q(1) Q(2). In this case P* = argmin{Q Q(1) Q(2)} I(Q | P). In other words, if the problem is feasible, IPF converges to the unique minimum information solution, relative to the starting measure P (uniqueness of the solution follows trivially from the convexity of Q(1) Q(2)). The solution P* is a measure satisfying the marginal constraints given by a,b S*K. In the case of 2 marginal constraints, Csiszar and Tusnady20 proved that the even and odd subsequences of IPF iterates converge; hence if the sequence does not converge, then it oscillates asymptotically between the limits of the even and odd subsequences. In higher dimensions nothing is known. For PARFUM the main results are24,59 : Theorem 2: If the sequence P0, P1, P2… is generated from P = P0, a,b, S*K by PARFUM, then Pi converges, and the sequence Jk = I( ai Pk -1i j / Pk - 1i | Pk)+I( bj Pk - 1i j / Pk - 1 j | Pk) is monotonically decreasing and I(Pk+1| Pk) → 0 monotonically. If there exists R SK×K, R << P such that R Q(1) Q(2), then any R << P satisfying Rij =½ ( ai Ri j / Ri + bj Ri j / R j) belongs to Q(1) Q(2). The convergence of Pi in Theorem 2 is proved by Matus59 but still unpublished. Similar convergence results hold when the arithmetic average in PARFUM is replaced by more general averages. If the problem is feasible, the IPF and PARFUM solutions are not identical, but are close80. The term Jk is known to physicists as the Jensen-Shannon divergence57. Many questions remain open at this point, including What is the behavior of IPF if the problem is infeasible What is the relation between IPF and PARFUM solutions when they exist? How does PARFUM depend on the starting distribution? Are there other iterative algorithms with good conversion behavior? References 1. Anderson, S.P., A. de Palma and J-F Thissen, 1996 Discrete Choice Theory of Product Differentiation, MIT Press, Cambridge. 2. Bart, D. 2006. Integrating local ecological knowledge and manipulative experiments to find the causes of environmental change. Frontiers in Ecology and the Environment 4: 541-546. 3. Beggs, S., Cardell, S. and Hausman, J. 1981 Assessing the Potential Demand for Electric Cars, J. Econometrics 16: 1-19. 4. Ben-Akiva, M. and S. R. Lerman, 1985. Discrete Choice Analysis: Theory and Application to Travel Demand. The MIT Press, Cambridge, MA. 5. Ben-Akiva, M. T. Morikawa, and F. Shiroishi, 1991 Analysis of the Reliability of Preference Ranking Data, Journal of Business Research 23(3): 253-68 . 6. Berkson, J. 1944, Application of the Logistic Function to Bioassay, Journal of the American Statistical Association 39: 357-365. 7. Bishop Y.M.M. 1967 Multidimensional contingency tables: cell estimates. Ph.D. dissertation, Harvard Univ, 1967. 8. Bradley, R. 1953 “Some statistical methods in taste testing and quality evaluation” Biometrica, vol. 9, 22-38. 9. Bradley, R. and Terry, M. 1952 “Rank analysis of incomplete block designs Biometrica, vol. 39, 324-345. 10. Bregman, L.M. 1967 The relaxation method to find the common point of convex sets and its applications to the solution of problems in convex programming, USSR Computational Mathematics and Mathematical Physics, 7, 200-217. 11. Brown. D.T. 1959 A note on approximations to discrete probability distributions. Inform. and Control, 2:386-392. 12. Brownstone, D. and Train, K. 1999 Forecasting New Product Penetration with Flexible Substitution Patterns, J. Econometrics, Vol. 89: 109-129. 13. Censor, Y. and Lent, A 1981. An iterative row-action method for interval convex programming, Journal of Optimization Theory and Applications, vol. 34, No. 3. 14. Cooke R. M. and Misiewicz, 2007 J. Discrete Choice with Probabilistic Inversion: Application to energy policy choice and wiring failure, Mathematical Methods in Reliability. 15. Cooke R.M. and. Goossens L.H.J. 2000 Procedures guide for structured expert judgement in accident consequence modelling. Radiation Protection Dosimetry, 90(3:303{309. 16. Cooke R.M. 1994 Parameter Fitting for uncertain models: modelling uncertainty in small models. Reliability Engineering and System Safety, 44:89-102 17. Cooke, R.M. Nauta. M., Havelaar, A.H. and van der Fels, H.J. 2006 "Probabilistic inversion for chicken processing lines" Reliability Engineering and System Safety, 91 pp 13674-1372. 18. Covich, A.P., M.C. Austen, F. Barlocher, E. Chauvet, B.J. Cardinale, C.L. Biles, P. Inchausti, O. Dangles, M. Solan, M.O. Gessner, B. Statzner, and B. Moss. 2004. The role of Biodiversity in the functioning of freshwater and marine benthic ecosystems. Bioscience 54: 767-775. 19. Csiszar I. 1975 I-divergence geometry of probability distributions and minimization problems. Ann. of Probab., 3:146-158. 20. Csiszar, I and G. Tusnady, 1984 Information geometry and alternating minimization procedures. Statistics & Decisions, 1:205-237. 21. Csiszar, I. (undated) Information Theoretic Methods in Probability and Statistics. 22. David, H.A. 1957 The Method of Paired comparisons, Charles griffin, London, 1963. 23. Deming, W.E., and Stephan, F.F. 1944. On a least squares adjustment to sample frequency tables when the expected marginal totals are known, Ann Math. Statist. 40, 11, 427-44. 24. Du, C. Kurowicka D. and Cooke R.M. 2006 Techniques for generic probabilistic inversion, Comp. Stat. & Data Analysis (50), 1164-1187. 25. Van der Fels-Klerx, H.J., Cooke, R.M., Nauta, M.J., Goossens, L.H.J., Havelaar, A.H. 2005 "A Structured Expert Judgement Study For A Model of Campylobacter Transmission During Broiler Chicken Processing" Risk Analysis 25 No. 1, pp 109-124. 26. Fienberg S.E., 1970 An iterative procedure for estimation in contingency tables. Ann. Of Math. Statist., 41:907-917. 27. Fraser, D.J., T. Coon, M.R. Prince, R. Dion, and L. Bernatchez. 2006. Integrating traditional and evolutionary knowledge in biodiversity conservation: a population level case study. Ecology and Society 11: 4 28. Fraser, E.D.G., A.J. Dougill, W.E. Mabee, M. Reed, and P. McAlpine. 2006. Bottom up and top down: Analysis of participatory processes for sustainability indicator identification as a pathway to community empowerment and sustainable environmental management. Journal of Environmental Management 78: 114-127. 29. Gilchrist, G., M. Mallory, and F. Merkel. 2005. Can local ecological knowledge contribute to wildlife management? Case studies of migratory birds. Ecology and Society 10: 20 30. Girardin V. and Limnios N. , 2001 Probabilités en vue des applications. Vuibert, Paris. 31. Haberman S.J., 19074 An Analysis of Frequency Data. Univ. Chicago Press. 32. Haberman S.J. , 1984 Adjustment by minimum discriminant information. Ann. of Statist., 12:971-988. 33. Halpern, B.S., K.A. Selkoe. F. Micheli, and C.V. Cappel. In press. Evaluating and ranking global and regional threats to marine ecosystems. Conservation Biology. 34. Harper, F. Goossens, L.H.J. Cooke, R.M. Hora, S. Young, M. Pasler-Ssauer, J.. Miller, L Kraan, B.C.P. Lui, C. McKay, M. Helton, J. and Jones A. , 1994 Joint USNRC CEC consequence uncertainty study: Summary of objectives, approach, application, and results for the dispersion and deposition uncertainty assessment. Technical Report VOL. III, NUREG/CR-6244, EUR 15755 EN, SAND94-1453. 35. Hausman, J. A., and D. A. Wise, 1978 A Conditional Probit Model for Qualitative Choice: Discrete Decisions Recognizing Interdependence and Heterogeneous Preferences, Econometrica, Vol. 46, No. 2, pp. 403-426. 36. Hausman. J. A.,. and P.A. Ruud, 1987 Specifying and Testing Econometric Models for RankOrdered Data, J. Econometrics 34: 83-104. 37. Hector, A. and R. Bagchi. 2007. Biodiversity and ecosystem multifunctionality. Nature 448: 188-191. 38. Hiddink, J.G., S. Jennings, and M.J. Kaiser. 2007. Assessing and predicting the relative ecological impacts of disturbance on habitats with different sensitivities. Journal of Applied Ecology 44: 405-413. 39. Ireland C.T. and Kullback. S. 1968 Contingency tables with given marginals. Biometrika, 55:179-188. 40. Keough, H.L. and D.J. Blahna. 2006. Achieving integrative, collaborative ecosystem management. Conservation Biology 20: 1373-1382. 41. Kind, P.1996 "Deriving cardinal scales from ordinal preference data: the analysis of time trade-off data using pair-wise judgement models", Paper presented to HESG, Brunel University. 42. Koop, G. and Poirier, D.J. 1994 "Rank-ordered logit models: an empirical analysis of Ontario voter preferences. Journal of Applied Econometrics 9, 369-388. 43. Kraan B.C.P. and Cooke. R.M. 2000 Processing expert judgements in accident consequence modeling. Radiation Protection Dosimetry, 90(3):311-315. 44. Kraan B.C.P. and Cooke. R.M. 2000 Uncertainty in compartmental models for hazardous materials - a case study. J. of Hazardous Materials, 71:253-268. 45. Kraan, B.C.P and Bedford. T.J. 2005 Probabilistic inversion of expert judgements in the quantification of model uncertainty. Management Science, 51(6):995-1006. 46. Kraan. B.C.P. Probabilistic Inversion in Uncertainty Analysis and related topics. 2002 PhD dissertation, TU Delft, Dept. Mathematics. 47. Kruithof J.. Telefoonverkeersrekening. De Ingenieur, 52(8):E15{E25, 1937. 48. Kullback S. 1959 Information theory and Statistics. John Wiley and Sons, New York. 49. Kullback S. 1968 Probability densities with given marginals. The Annals of Mathematical Statistics, 39(4):1236{1243. 50. Kullback S. 1971 Marginal homegeneity of multidimensional contingency tables. The Annals of Mathematical Statistics, 42(2):594-606. 51. Kurowicka D. and Cooke R.M. 2006. Uncertainty Analysis with High Dimensional Dependence Modelling. Wiley. 52. Lancaster, Kelvin J 1966. "A New Approach to Consumer Theory.'' Journal of Political Economy, 132-157. 53. Latour, R.J., M.J. Brush, and C.F. Bonzek. 2003. Toward ecosystem-based fisheries management: Strategies for multispecies modeling and associated data requirements. Fisheries 28: 10-22. 54. Luce R. D. and P. Suppes, 1965, Preference, Utility, and Subjective Probability. In Handbook of Mathematical Psychology, vol. 3, ed. R. D. Luce, R. Bush, and E. Calanter. New York, Wiley. 55. Luce, R. D. 1959, Individual Choice Behavior; A Theoretical Analysis. New York, Wiley. 56. M.E.A. (Millennium Ecosystem Assessment). 2005. Ecosystems and Human Well-Being: Synthesis Report. Washington, D.C. 57. Majtey, A.P., Lamberti, P.W., and Prato, D.P. 2005 Jensen-Shannon divergence as a measure of distinguishability between mixed quantum states, Phys. Rev. A, 72, 052310-1 052310-6. 58. Marschak, J. 1960, Binary-Choice Constraints and Random Utility Indicators, in Mathematical Methods in the Social Sciences, ed. K. Arrow, S. Karlin, and P. Suppes. Stanford. Stanford University Press. 59. Matus, F. 2007 On iterated averages of I-projections, Statistiek und Informatik, Universit”at Bielefeld, Bielefeld, Germany, matus@utia.cas.cz. 60. May, K.D. 1952) 'A set of necessary and sufficient conditions for simple majority decisions' Econometrica 20, 680-684. 61. McCabe, C. Brazier, J. Gilks, P. Tauchiya, A. Roberts, J. O'Hagan, A. and Stevens, K. 2004 " Estimating population cardinal health state valuation models from ordinal (rank) health state preference data" Sheffield Health Economics Group, Discussion Paper Series, Ref. 04/2. 62. McFadden D. and K. Train, 2000 Mixed MNL Models for Discrete Response, J. Applied Econometrics 15: 447-470. 63. McFadden, D. 1974, ‘Conditional logit analysis of qualitative choice behavior’, in P. Zarembka, ed., Frontiers in Econometrics, Academic Press, New York, pp. 105– 142. 64. McFadden, D. 1987, ‘Regression-based specification tests for the multinomial logit model’, Journal of Econometrics 34, 63–82. 65. McFadden, D., 1981 Econometric Models of Probabilistic Choice, in “Structural Analysis of Discrete Data with Econometric Applications” (C.F. Manski and D McFadden, Eds., Cambridge University Press. 66. Mosteller, F. 1952 “Remarks on the method of paired comparisons: the least squares solution assuming equal standard deviations and equal correlations” Psychometrica, vol. 16, no. 1 3-9. 67. Plaganyi, E.E. and D.S. Butterworth. 2004. A critical look at the potential of ecopath with ECOSIM to assist in practical fisheries management. African Journal of Marine Science 26: 261-287. 68. R.G. Chapman and R. Staelin, 1982 Exploiting Rank Ordered Choice Set Data within the Stochastic Utility Model, J. Marketing Res. 19: 288-301. 69. Rauschendorf L. 1995 Convergence of the iterative proportional fitting procedure. The Annals of Statistics, 23(4):1160-1174. 70. Revelt D. and Train, K, 1998 Mixed Logit with Repeated Choice: Households’ Choices of Appliance Efficiency Level, Review of Economics and Statistics 80: 647-657. 71. Salomon, J.A. 2004 "The use of ordinal ranks in health state valuations" IHEA Conference, USA, San Francisco. 72. Siikamäki, J. and D. F. Layton, 2007 Discrete Choice Survey Experiments: A Comparison Using Flexible Methods, Journal of Environmental Economics and Management, Vol. 53, pp. 127-139. 73. Stringer, L.C., A.J. Dougill, E. Fraser, K. Hubacek, C. Prell, and M.S. Reed. 2006. Unpacking "participation" in the adaptive management of social ecological systems: A critical review. Ecology and Society 11: 39 74. Thurstone, L. 1927 “A law of comparative judgment” Pschyl. Rev. vol. 34, 273-286. 75. Torgerson, W. 1958 Theory and Methods of Scaling, Wiley, New York, 1958. 76. Torrance, G.W., Feeny, D.H., Furlong, W.J. Barr, R.D. Xhang, Y., and Wang, Q. 1996 "A multi-attribute utility function for a comprehensive health status classification system: Health Utilities Mark 2. Medical Care 34 (7) 702-722. 77. Train K.E. 2003 “Discrete Choice Methods with Simulation” Cambridge University Press. 78. Train, K. 1998 Recreation Demand Models with Taste Differences over People, Land Economics 74: 230-39. 79. Train, K. 2001 A Comparison of Hierarchical Bayes and Maximum Simulated Likelihood for Mixed Logit, Department of Economics, University of California, Berkeley. 80. Vomlel,J 1999 Methods of Probabilistic Knowledge Integration Phd thesis Czech Technical University, Faculty of Electrical Engineering