6. Probabilistic Inversion - Applied Mathematics

advertisement
6. Probabilistic Inversion
Definitions and Theorems
Roger M Cooke
ShortCourse on Expert Judgment
National Aerospace Institute
April 15,16, 2008
This note gives mathematical definitions of probabilistic inversion (PI) and provides references
to the principal mathematical results. It is written for mathematicians. Probabilistic inversion
may be introduced either from a measure theoretic or from a random variable viewpoint. Each
has its own appeal, so we present both, following45, 51.
Measure theoretic approach
Let (M;B(M);λ) and (N;B(N); ν) be two Borel probability spaces, where M;N are compact nonempty subsets of Rm and Rn respectively. Let T : Rm → Rn be a continuous mapping. γ = λ◦ T -1 is
called the push forward of λ under T, and similarly λ is called the pull back of γ under T. In the
following λ plays the role of a background measure. Measure μ on (M; B(M)) is called an inverse
of T at ν if μ is the pull back of ν under T; that is:
B  B(N); μ◦T -1(B) = ν(B)
1)
The problem of probabilistic inversion can then be formulated as:
Definition (Probabilistic inversion problem) Given (M;B(M);λ) and (N; B(N); ν), with T : Rm
→ Rn continuous, find a measure μ absolutely continuous with respect to λ, μ << λ , on (M; B(M))
such that μ is an inverse of T at ν.
Such problems may be infeasible, and if feasible may have many solutions. Under certain
conditions a measure μ can be found solving (1). If ν << γ, the Radon-Nikodym derivative g : N
→ R exists, is non-negative, unique up to sets of γ-measure zero, and satisfies
B  B(N); ν(B) = B g(y)dγ(y).
If in addition g is continuous, g(y) > 0;  y  N, and N g(y) log g(y)dγ(y) < ∞, then define f* := g ◦T,
and define μ* as a new probability measure:
A  B(M); A = T-1(B); μ*(A) :=  T -1(B) f*dλ(x)
2)
It is easy to check that μ* is an inverse of T at ν:
μ* T-1(B) =  T -1(B) f*dλ(x) =
 T -1(B) g ◦ T(x)d λ(x) = B g(y)dγ(y) = ν(B).
It can be shown that the measure μ* is the unique measure satisfying equation (1) and
minimizing the relative information with respect to λ in the class of measures satisfying
equation(1).45
Random variable approach
If G: Rn → Rm, G(X)  {Y | Y  C} where C is a subset of random vectors on Rm, then X is called
a probabilistic inverse of G at C. X is sometimes termed the input to model G, and Y the output.
If the problem is feasible it may have many solutions and we require a preferred solution; if it is
infeasible we seek a random vector X for which G(X) is "as close as possible" to C.
Typical Application
A typical application takes the following form: We wish to model the transport of pollution
through an ecosystem as a homogeneous set of first order differential equations,9, 25
dAi(t) / dt = ∑j=1..m xjAj(t) ;
i = 1,…m
3)
where the coefficients xi are unknown. We wish to capture our uncertainty in the functions Ai(t)
treating X = X1,…Xm as a random vector. Direct measurements are not possible. We can obtain
data in the form of quantiles of quantities Ai(tk) for selected values of i= 1,..r and at some time
points tk, k = 1,..s. These become the components of random vector Y = (Y1,..Yn) where n = r × s.
Solving (3), we can express Y as a function of X. We now seek a distribution for X that complies
with the quantile constraints on Y. There may also be physical constraints on the possible
values of X which should be taken into account.
One feature of such problems may be surmised at once: there may be no solution. In such cases
we must quantify and attempt to reduce the infeasibility
Numerical Algorithms
Until very recently, applications of PI involved sophisticated optimization routines. Duality
theory was used extensively to try to reduce infeasibility, and methods for dealing with
infeasibility were crude. Recently, simple iterative methods have been discovered which at
once render PI easier and mathematically well founded. We believe that these techniques can
now move into a wide field of application. The idea is based on sample re-weighting using the
Iterative Proportional Fitting algorithm. A variation on this algorithm exhibits excellent
convergence behavior, even in case of infeasibility. These developments are sketched below.
We first define the simplex, the strict simplex, margins and the relative information:
Definition(Simplex, strict simple, margins, relative information)
SK = {P RK | Pk ≥ 0, ∑k=1…K pk = 1}
S*K = {P RK | Pk > 0, ∑k=1…K pk = 1}
For P  SK×K, P j = ∑i=1,...K pi j; Pi  = ∑ j=1,…K pi j ,
For, P,Q  SK, such that pi = 0  qi = 0; i = 1,…K, the relative information of Q with respect to P is:
I(Q | P) = ∑k=1,…K qi ln(qi/pi).
It is not difficult to show that
Proposition 116 Let Pm  SK; m = 1,…M. Then
Argmin{PSK} ∑i=1,…M I(Pm | P) = (1/M)∑Pm.
Proposition 219 [5] For a,b  S*K , let Q(1)={Q  SK×K | Q(1)i  = ai }; Q(2) =
{Q  SK×K | Q j = bj} and let P  SK×K; then
(Argmin{Q  Q(1)} I(Q | P) )i j = ai Pi j / Pi ;
(Argmin{Q  Q(2)} I(Q | P) )i j = bj Pi j / P j;
Propositions 1 and 2 assert that the argmins exist and are unique. The argmins in proposition 2
are absolutely continuous with respect to P. The main results for the iterative numerical
algorithms are sketched for the case of two marginal distributions (proposition 2) to simplify
notation, but can be readily generalized.
Definition (Iterative Proportional fitting IPF, PARFUM)
With the notation of Proposition 2, P0, P1, P2… is a sequence generated from P = P0, a,b,  S*K by
Iterative Proportional Fitting if
Pkij = ai Pk -1i j / Pk - 1i ; k odd
Pkij = bj Pk - 1i j / Pk - 1 j; k even.
P0, P1, P2… is a sequence generated from P = P0, a,b,  S*K by PARFUM if
Pkij =½ ( ai Pk -1i j / Pk - 1i  + bj Pk - 1i j / Pk - 1 j)
IPF was introduced by Kruithof47 re-discovered independently by Deming and Stephan23 and
many others. It has been studied extensively7,11,17,23,31,32,39,47,49,50,69,71. The fundamental result was
proved by Csiszar19,20,21, though a more general result was proved by Bregman and published in
Russian10,13 Specialized to the case of Proposition 2, Csiszar’s result is:
Theorem 1: The sequence P0, P1, P2… generated from P = P0, a,b,  S*K by IPF
converges to P*  SK×K if and only if there exists R  SK×K, R << P such that R  Q(1) Q(2). In this
case P* = argmin{Q  Q(1) Q(2)} I(Q | P).
In other words, if the problem is feasible, IPF converges to the unique minimum information
solution, relative to the starting measure P (uniqueness of the solution follows trivially from the
convexity of Q(1) Q(2)). The solution P* is a measure satisfying the marginal constraints given
by a,b  S*K. In the case of 2 marginal constraints, Csiszar and Tusnady20 proved that the even
and odd subsequences of IPF iterates converge; hence if the sequence does not converge, then it
oscillates asymptotically between the limits of the even and odd subsequences. In higher
dimensions nothing is known.
For PARFUM the main results are24,59 :
Theorem 2: If the sequence P0, P1, P2… is generated from P = P0, a,b,  S*K by PARFUM, then Pi
converges, and the sequence
Jk = I( ai Pk -1i j / Pk - 1i  | Pk)+I( bj Pk - 1i j / Pk - 1 j | Pk)
is monotonically decreasing and I(Pk+1| Pk) → 0 monotonically. If there exists R  SK×K, R << P such
that R  Q(1) Q(2), then any R << P satisfying
Rij =½ ( ai Ri j / Ri  + bj Ri j / R j)
belongs to Q(1) Q(2).
The convergence of Pi in Theorem 2 is proved by Matus59 but still unpublished. Similar
convergence results hold when the arithmetic average in PARFUM is replaced by more general
averages. If the problem is feasible, the IPF and PARFUM solutions are not identical, but are
close80. The term Jk is known to physicists as the Jensen-Shannon divergence57.
Many questions remain open at this point, including
 What is the behavior of IPF if the problem is infeasible
 What is the relation between IPF and PARFUM solutions when they exist?
 How does PARFUM depend on the starting distribution?
 Are there other iterative algorithms with good conversion behavior?
References
1. Anderson, S.P., A. de Palma and J-F Thissen, 1996 Discrete Choice Theory of Product
Differentiation, MIT Press, Cambridge.
2. Bart, D. 2006. Integrating local ecological knowledge and manipulative experiments to find
the causes of environmental change. Frontiers in Ecology and the Environment 4: 541-546.
3. Beggs, S., Cardell, S. and Hausman, J. 1981 Assessing the Potential Demand for Electric Cars, J.
Econometrics 16: 1-19.
4. Ben-Akiva, M. and S. R. Lerman, 1985. Discrete Choice Analysis: Theory and Application to
Travel Demand. The MIT Press, Cambridge, MA.
5. Ben-Akiva, M. T. Morikawa, and F. Shiroishi, 1991 Analysis of the Reliability of Preference
Ranking Data, Journal of Business Research 23(3): 253-68 .
6. Berkson, J. 1944, Application of the Logistic Function to Bioassay, Journal of the American
Statistical Association 39: 357-365.
7. Bishop Y.M.M. 1967 Multidimensional contingency tables: cell estimates. Ph.D. dissertation,
Harvard Univ, 1967.
8. Bradley, R. 1953 “Some statistical methods in taste testing and quality evaluation”
Biometrica, vol. 9, 22-38.
9. Bradley, R. and Terry, M. 1952 “Rank analysis of incomplete block designs Biometrica, vol.
39, 324-345.
10. Bregman, L.M. 1967 The relaxation method to find the common point of convex sets and its
applications to the solution of problems in convex programming, USSR Computational
Mathematics and Mathematical Physics, 7, 200-217.
11. Brown. D.T. 1959 A note on approximations to discrete probability distributions. Inform. and
Control, 2:386-392.
12. Brownstone, D. and Train, K. 1999 Forecasting New Product Penetration with Flexible
Substitution Patterns, J. Econometrics, Vol. 89: 109-129.
13. Censor, Y. and Lent, A 1981. An iterative row-action method for interval convex
programming, Journal of Optimization Theory and Applications, vol. 34, No. 3.
14. Cooke R. M. and Misiewicz, 2007 J. Discrete Choice with Probabilistic Inversion:
Application to energy policy choice and wiring failure, Mathematical Methods in Reliability.
15. Cooke R.M. and. Goossens L.H.J. 2000 Procedures guide for structured expert judgement in
accident consequence modelling. Radiation Protection Dosimetry, 90(3:303{309.
16. Cooke R.M. 1994 Parameter Fitting for uncertain models: modelling uncertainty in small
models. Reliability Engineering and System Safety, 44:89-102
17. Cooke, R.M. Nauta. M., Havelaar, A.H. and van der Fels, H.J. 2006 "Probabilistic inversion
for chicken processing lines" Reliability Engineering and System Safety, 91 pp 13674-1372.
18. Covich, A.P., M.C. Austen, F. Barlocher, E. Chauvet, B.J. Cardinale, C.L. Biles, P. Inchausti,
O. Dangles, M. Solan, M.O. Gessner, B. Statzner, and B. Moss. 2004. The role of Biodiversity
in the functioning of freshwater and marine benthic ecosystems. Bioscience 54: 767-775.
19. Csiszar I. 1975 I-divergence geometry of probability distributions and minimization
problems. Ann. of Probab., 3:146-158.
20. Csiszar, I and G. Tusnady, 1984 Information geometry and alternating minimization
procedures. Statistics & Decisions, 1:205-237.
21. Csiszar, I. (undated) Information Theoretic Methods in Probability and Statistics.
22. David, H.A. 1957 The Method of Paired comparisons, Charles griffin, London, 1963.
23. Deming, W.E., and Stephan, F.F. 1944. On a least squares adjustment to sample frequency
tables when the expected marginal totals are known, Ann Math. Statist. 40, 11, 427-44.
24. Du, C. Kurowicka D. and Cooke R.M. 2006 Techniques for generic probabilistic inversion,
Comp. Stat. & Data Analysis (50), 1164-1187.
25. Van der Fels-Klerx, H.J., Cooke, R.M., Nauta, M.J., Goossens, L.H.J., Havelaar, A.H. 2005 "A
Structured Expert Judgement Study For A Model of Campylobacter Transmission During
Broiler Chicken Processing" Risk Analysis 25 No. 1, pp 109-124.
26. Fienberg S.E., 1970 An iterative procedure for estimation in contingency tables. Ann. Of
Math. Statist., 41:907-917.
27. Fraser, D.J., T. Coon, M.R. Prince, R. Dion, and L. Bernatchez. 2006. Integrating traditional
and evolutionary knowledge in biodiversity conservation: a population level case study.
Ecology and Society 11: 4
28. Fraser, E.D.G., A.J. Dougill, W.E. Mabee, M. Reed, and P. McAlpine. 2006. Bottom up and
top down: Analysis of participatory processes for sustainability indicator identification as a
pathway to community empowerment and sustainable environmental management.
Journal of Environmental Management 78: 114-127.
29. Gilchrist, G., M. Mallory, and F. Merkel. 2005. Can local ecological knowledge contribute to
wildlife management? Case studies of migratory birds. Ecology and Society 10: 20
30. Girardin V. and Limnios N. , 2001 Probabilités en vue des applications. Vuibert, Paris.
31. Haberman S.J., 19074 An Analysis of Frequency Data. Univ. Chicago Press.
32. Haberman S.J. , 1984 Adjustment by minimum discriminant information. Ann. of Statist.,
12:971-988.
33. Halpern, B.S., K.A. Selkoe. F. Micheli, and C.V. Cappel. In press. Evaluating and ranking
global and regional threats to marine ecosystems. Conservation Biology.
34. Harper, F. Goossens, L.H.J. Cooke, R.M. Hora, S. Young, M. Pasler-Ssauer, J.. Miller, L
Kraan, B.C.P. Lui, C. McKay, M. Helton, J. and Jones A. , 1994 Joint USNRC CEC
consequence uncertainty study: Summary of objectives, approach, application, and results
for the dispersion and deposition uncertainty assessment. Technical Report VOL. III,
NUREG/CR-6244, EUR 15755 EN, SAND94-1453.
35. Hausman, J. A., and D. A. Wise, 1978 A Conditional Probit Model for Qualitative Choice:
Discrete Decisions Recognizing Interdependence and Heterogeneous Preferences,
Econometrica, Vol. 46, No. 2, pp. 403-426.
36. Hausman. J. A.,. and P.A. Ruud, 1987 Specifying and Testing Econometric Models for RankOrdered Data, J. Econometrics 34: 83-104.
37. Hector, A. and R. Bagchi. 2007. Biodiversity and ecosystem multifunctionality. Nature 448:
188-191.
38. Hiddink, J.G., S. Jennings, and M.J. Kaiser. 2007. Assessing and predicting the relative
ecological impacts of disturbance on habitats with different sensitivities. Journal of Applied
Ecology 44: 405-413.
39. Ireland C.T. and Kullback. S. 1968 Contingency tables with given marginals. Biometrika,
55:179-188.
40. Keough, H.L. and D.J. Blahna. 2006. Achieving integrative, collaborative ecosystem
management. Conservation Biology 20: 1373-1382.
41. Kind, P.1996 "Deriving cardinal scales from ordinal preference data: the analysis of time
trade-off data using pair-wise judgement models", Paper presented to HESG, Brunel
University.
42. Koop, G. and Poirier, D.J. 1994 "Rank-ordered logit models: an empirical analysis of Ontario
voter preferences. Journal of Applied Econometrics 9, 369-388.
43. Kraan B.C.P. and Cooke. R.M. 2000 Processing expert judgements in accident consequence
modeling. Radiation Protection Dosimetry, 90(3):311-315.
44. Kraan B.C.P. and Cooke. R.M. 2000 Uncertainty in compartmental models for hazardous
materials - a case study. J. of Hazardous Materials, 71:253-268.
45. Kraan, B.C.P and Bedford. T.J. 2005 Probabilistic inversion of expert judgements in the
quantification of model uncertainty. Management Science, 51(6):995-1006.
46. Kraan. B.C.P. Probabilistic Inversion in Uncertainty Analysis and related topics. 2002 PhD
dissertation, TU Delft, Dept. Mathematics.
47. Kruithof J.. Telefoonverkeersrekening. De Ingenieur, 52(8):E15{E25, 1937.
48. Kullback S. 1959 Information theory and Statistics. John Wiley and Sons, New York.
49. Kullback S. 1968 Probability densities with given marginals. The Annals of Mathematical
Statistics, 39(4):1236{1243.
50. Kullback S. 1971 Marginal homegeneity of multidimensional contingency tables. The Annals
of Mathematical Statistics, 42(2):594-606.
51. Kurowicka D. and Cooke R.M. 2006. Uncertainty Analysis with High Dimensional Dependence
Modelling. Wiley.
52. Lancaster, Kelvin J 1966. "A New Approach to Consumer Theory.'' Journal of Political
Economy, 132-157.
53. Latour, R.J., M.J. Brush, and C.F. Bonzek. 2003. Toward ecosystem-based fisheries
management: Strategies for multispecies modeling and associated data requirements.
Fisheries 28: 10-22.
54. Luce R. D. and P. Suppes, 1965, Preference, Utility, and Subjective Probability. In Handbook of
Mathematical Psychology, vol. 3, ed. R. D. Luce, R. Bush, and E. Calanter. New York, Wiley.
55. Luce, R. D. 1959, Individual Choice Behavior; A Theoretical Analysis. New York, Wiley.
56. M.E.A. (Millennium Ecosystem Assessment). 2005. Ecosystems and Human Well-Being:
Synthesis Report. Washington, D.C.
57. Majtey, A.P., Lamberti, P.W., and Prato, D.P. 2005 Jensen-Shannon divergence as a measure
of distinguishability between mixed quantum states, Phys. Rev. A, 72, 052310-1 052310-6.
58. Marschak, J. 1960, Binary-Choice Constraints and Random Utility Indicators, in Mathematical
Methods in the Social Sciences, ed. K. Arrow, S. Karlin, and P. Suppes. Stanford. Stanford
University Press.
59. Matus, F. 2007 On iterated averages of I-projections, Statistiek und Informatik, Universit”at
Bielefeld, Bielefeld, Germany, matus@utia.cas.cz.
60. May, K.D. 1952) 'A set of necessary and sufficient conditions for simple majority
decisions' Econometrica 20, 680-684.
61. McCabe, C. Brazier, J. Gilks, P. Tauchiya, A. Roberts, J. O'Hagan, A. and Stevens, K. 2004 "
Estimating population cardinal health state valuation models from ordinal (rank) health
state preference data" Sheffield Health Economics Group, Discussion Paper Series, Ref. 04/2.
62. McFadden D. and K. Train, 2000 Mixed MNL Models for Discrete Response, J. Applied
Econometrics 15: 447-470.
63. McFadden, D. 1974, ‘Conditional logit analysis of qualitative choice behavior’, in P.
Zarembka, ed., Frontiers in Econometrics, Academic Press, New York, pp. 105– 142.
64. McFadden, D. 1987, ‘Regression-based specification tests for the multinomial logit model’,
Journal of Econometrics 34, 63–82.
65. McFadden, D., 1981 Econometric Models of Probabilistic Choice, in “Structural Analysis of
Discrete Data with Econometric Applications” (C.F. Manski and D McFadden, Eds., Cambridge
University Press.
66. Mosteller, F. 1952 “Remarks on the method of paired comparisons: the least squares
solution assuming equal standard deviations and equal correlations” Psychometrica, vol. 16,
no. 1 3-9.
67. Plaganyi, E.E. and D.S. Butterworth. 2004. A critical look at the potential of ecopath with
ECOSIM to assist in practical fisheries management. African Journal of Marine Science 26:
261-287.
68. R.G. Chapman and R. Staelin, 1982 Exploiting Rank Ordered Choice Set Data within the
Stochastic Utility Model, J. Marketing Res. 19: 288-301.
69. Rauschendorf L. 1995 Convergence of the iterative proportional fitting procedure. The
Annals of Statistics, 23(4):1160-1174.
70. Revelt D. and Train, K, 1998 Mixed Logit with Repeated Choice: Households’ Choices of
Appliance Efficiency Level, Review of Economics and Statistics 80: 647-657.
71. Salomon, J.A. 2004 "The use of ordinal ranks in health state valuations" IHEA Conference,
USA, San Francisco.
72. Siikamäki, J. and D. F. Layton, 2007 Discrete Choice Survey Experiments: A Comparison
Using Flexible Methods, Journal of Environmental Economics and Management, Vol. 53, pp.
127-139.
73. Stringer, L.C., A.J. Dougill, E. Fraser, K. Hubacek, C. Prell, and M.S. Reed. 2006. Unpacking
"participation" in the adaptive management of social ecological systems: A critical review.
Ecology and Society 11: 39
74. Thurstone, L. 1927 “A law of comparative judgment” Pschyl. Rev. vol. 34, 273-286.
75. Torgerson, W. 1958 Theory and Methods of Scaling, Wiley, New York, 1958.
76. Torrance, G.W., Feeny, D.H., Furlong, W.J. Barr, R.D. Xhang, Y., and Wang, Q. 1996 "A
multi-attribute utility function for a comprehensive health status classification system:
Health Utilities Mark 2. Medical Care 34 (7) 702-722.
77. Train K.E. 2003 “Discrete Choice Methods with Simulation” Cambridge University Press.
78. Train, K. 1998 Recreation Demand Models with Taste Differences over People, Land Economics
74: 230-39.
79. Train, K. 2001 A Comparison of Hierarchical Bayes and Maximum Simulated Likelihood for
Mixed Logit, Department of Economics, University of California, Berkeley.
80. Vomlel,J 1999 Methods of Probabilistic Knowledge Integration Phd thesis Czech Technical
University, Faculty of Electrical Engineering
Download