Stochastic Order Induced by a Measurable Preorder by David C. Nachman Department of Finance J. Mack Robinson College of Business Georgia State University Atlanta, Georgia 30303-3083 Phone: 404-651-1696 Fax: 404-651-2630 E-mail: dnachman@gsu.edu November, 2005 Abstract. Kamae, et. al. [8, Theorem 1] presents a general characterization of the partial ordering of probability measures induced by a closed partial ordering on the underlying Polish state space. A preorder is a reflexive and transitive, but not necessarily antisymetric relation. This paper presents a similar characterization of the preordering of probability measures induced by a measurable preordering on the underlying Polish space. We then apply this result to obtain a characterization of stochastic majorization, the preorder induced by the widely applied majorization preorder on Euclidean space. We show that a multifunction associated with this preorder is compact and convex valued and continuous, and hence satisfies the hypotheses of our characterization. The continuity properties of the majorization ordering and the induced stochastic majorization ordering have not been widely recognized and are of interest in their own right. Kamae, et. al. [8, Theorem 1] presents a general characterization of the partial ordering of probability measures induced by a closed partial ordering on the underlying Polish state space. A preorder is a reflexive and transitive, but not necessarily antisymetric relation, and so is a weaker kind of relation than a partial order. The intention here is to present a similar characterization of the preordering of probability measures induced by a measurable preordering on the underlying Polish space. Marshall and Olkin [9, Ch. 17.C] already hint that Kamae, et. al. [8, Proposition 1] is “. . . a much more general version involving general preorders.” Specifically, we exploit the properties of the preorder and theorems of Strassen [11] and Himmelberg and Van Vleck [6] to provide a Kamae, et. al. like characterization of the induced stochastic preorder. We then apply this result to obtain a characterization of stochastic majorization, Marshall and Olkin [9, Ch. 11], the preorder induced by the widely applied majorization preorder on Euclidean space. We show that a multifunction associated with this preorder is compact and convex valued and continuous, and hence satisfies the hypotheses of our characterization. The continuity properties of the majorization ordering and the induced stochastic majorization ordering have not been widely recognized and are of interest in their own right. A Borel space S is a Borel subset a Polish space, a complete separable metric space. A multifunction from S to a set T is a function with domain S with value F s a nonempty subset of T, for each s S . It is Borel measurable if F 1 B s S : F s B is a Borel set in S for each closed subset B of T . Let R be a Polish space. Throughout R is assumed to be endowed with a preorder, denoted by , a reflexive and transitive relation on R . For each x R , let x y R : y x . By reflexivity, x x , and by transitivity, y x implies that y x . Thus is a multifunction from R to R . We say the preorder is a measurable preorder if the multifunction is Borel measurable. We assume that is a compact valued and Borel measurable. For compact valued multifunctions, Borel measurability has many equivalent definitions. See [7, Theorem 3]. Let P R , P R R , denote the set of probability measures on R , respectively on R R R 2 , endowed with the topology of weak convergence. In this case, P R , P R R are both Polish spaces (in the Prohorov metric, [2, Theorem 6.8]). For P R denote by supp the support of , the smallest closed subset of R with measure one, [3, p. 18]. For each x R , let x P R : supp x and let x P R be the probability measure such that supp x x . Theorem 1. For each x R , x x and y x implies that y x . is a compact and convex valued Borel measurable multifunction from R to P R . Proof. The first statement follows from the same properties of . Thus is a multifunction. For each x R , x is convex, closed and tight. By Prohorov’s theorem, [2, Theorem 5.1], x is compact. That is Borel measurable follows from Himmelberg and Van Vlect [6, Theorem 3 (ii)] . Let F denote the collection of real valued Borel measurable functions on R that are increasing (nondecreasing) in the preorder on R , i. e., for f : R R , R the real line, f Borel measurable, f F if and only if for x, y R and x y , f x f y . to P R as follows. For , P R , say that is larger in We extend the preorder this extended preorder than , and write , if and only if fd fd f F for which both these integrals exist. Then our extension of an extension since for x, y R , x y in R if and only if x for every to P R is truly y in P R . Intuitively, if puts more weight on elements that are less extreme in the relation on R than does . This intuition is formalized by the characterization of on P R given below (Theorem 2). This definition of on P R is typical in the literature on stochastic orderings ([8] and [9, 17.A.3], but there are many others. See [10, Chs. 1, 4]. There are many definitions given in terms of R valued random quantities say X and Y . For example, one version of stochastic majorization of interest here is the relation E1 in [9, Ch. 11]. This relation, denoted E1 , is stated as X E1 Y , Y stochastically majorizes X in the sense of E1 , if E f X E f Y for all f F for which these expectations exist. In [9, Ch. 11], F is the cone of Borel measurable Schur convex functions. It is easy to see that this is equivalent to the above definition since these expectations are given by integration with respect to the distributions in R of these random quantities and given these distributions there are R valued random quantities with these distributions. There is a another definition of stochastic majorization that Marshall and Olkin [9, pp. 282-283] call P1 that implies E1 and appears ostensibly to be stronger than E1 . There X P1 Y if f X st f Y for all f F , where st is the typical meaning of stochastically larger, [10, 1.A]. Clearly P1 implies E1 since stochastically larger random variables have larger expectations. It turns out that in this particular case we also have E1 implies P1 . See the argument in [9, top of p. 283]. We will use this argument to show one part of the characterization of the relation orders E1 and P1 in P R defined above. Here the are the orderings as defined above, but in the abstract setting of this paper. Let B denote the Borel subsets of R . A Markov kernel on R is a map m : R B [0,1] such that for each set B B the map x m x, B is Borel measurable and for x R fixed m x m x, P R . For such a Markov kernel m and a probability measure P R denote by m the element of P R 2 defined by m A B m x, B dx , for measurable rectangles, A, B B . We say that the first A marginal of m is and denote the second marginal m . Finally, we say that a set B R is increasing if its indicator function belongs to F (necessarily, then B B ). These designations are borrowed from [8, pp. 899-900]. The following is the desired characterization of on P R and flushes out the intuition given above. Theorem 2. For , P R the following are equivalent: (i) ; (ii) There exists a Markov kernel m on R such that m and m x x , almost every x R ; (iii) There exists a probability measure P R 2 with K 1 with first marginal and second marginal ; (iv) There exists a real valued random variable Z and two measurable functions f , g : R R with f g (i. e., f t g t , t R ) such that the distribution of f Z is and the distribution of g Z is ; (v) There exist R valued random variables Y and X such that X P1 Y and the distribution of X is and the distribution of Y is ; (vi) B B for every increasing set B B . Proof: The key equivalence is (i) and (ii). The rest follow easily. Let , P R and assume that (i) holds. For every bounded continuous function z : R R define h x, z sup zd : x . By Theorem 1 and [7, Theorem 2], h , z is Borel measurable in x , and for each x R , z x h x, z sup z since x x and zd x z x . Thus h , z is bounded as well, so all integrals below exist. Finally, h , z F , for if x, y R and x y then by Theorem 1, x y , and hence h x, z h y, z . It follows that zd h x, z dx h x, z dx , the last inequality from (i). Condition (ii) then follows from [11, Theorem 3]. Assume (ii) and let m . Since is Borel measurable, its graph K is a Borel subset of R 2 , [7, Theorem 3]. Then m x, x dx 1 , since the x -section x x for every x . This gives (iii). Therefore assume (iii). The construction in [8, Theorem 1(iii)] goes through here as well and this gives (iv). Assuming (iv) let X f Z and Y g Z . Then clearly X E1 X YX P1 E1 Y and (v) follows from the fact that Y , since F satisfies [9, (3) p. 283]. Therefore assume (v). If B B and the indicator I B F , then B E I B X E I B Y B , which is (vi), where the inequality follows from the fact that I B X st I B Y . It remains to show that (vi) imples (i). Assume (vi). For f F , IxR: f x t F for all real t . It follows from (vi) that Since f F , f x R : f x t x R : f x t . d x R : f x tdt x R : f x tdt f d 0 0 (the equalities here hold by [1, (4), p. 223]). Also f F implies that f F , and we get that x R : f x t x R : f x t . As f d x R : f x t dt x R : f x t dt f d (the 0 0 here again by [1, (4), p. 223]). If both the integrals fd , fd a result, equalities exist, it follows that fd fd , which is (i) . The crucial implication (i) implies (ii) in Theorem 2 relies on Theorem 3 of Strassen [11]. In obtaining essentially the same implication Kamae, et. al. [8, Theorem 1, (i) implies (ii)] and [9, 17.B.1] use Theorem 11 of Strassen [11], which requires that the graph K of the multifunction be closed. Our assumption of measurability of yields the weaker condition that K is a measurable set. Of course if K is closed, as it is in the majorization application to follow, then for the measure in Theorem 2(iii), supp K . All that is required to use Theorem 3 of [11] is that the preorder be sufficiently regular to give Borel measurability in x of the function h x, z , defined above in the first paragraph of the proof of Theorem 2. Some measurability of the multifunction seems essential to this result, but weaker conditions than compact valuedness of may give the result. See for example [5, Proposition 3, p.60]. Theorem 2 of [7], however, is very handy. Transitivity of the preorder gives monotonicity of h , z in the preorder. Reflexivity of the preorder gives x x and x is convex whether x is or not. This convexity is essential to apply [11, Theorem 3], but it comes at no cost. The result Theorem 2(ii) formalizes our intuition expressed above that if puts more weight on elements that are less extreme in the relation on R than does . The Markov kernel m of Theorem 2(ii) is such that for almost every x , m x, x 1 . In this sense m shifts weight of to elements less extreme in the relation on R . Borrowing from the language suggested in [8, top of p. 900], the kernel m might be termed “downward.” Let us now consider the application to majorization. Let x x1 , y y1 , y y1 , , yn be n-tuples of real numbers and let x x1 , , xn and , xn and , yn denote the vectors x and y with coordinates rearranged in decreasing order, i. e., x1 xn and y1 (or y majorizes x ), written x yn . The vector x is majorized by the vector y y , if for each k 1, equality holding for k n , [9, A.1, p. 7]. ,n , x i 1 yi with i 1 i k k In words, x is majorized by y if the components of x are more evenly spread out than the components of y or the components of y are more concentrated than the components of x . This intuition is reinforced by noting the following. Let e 1, ,1 , the n-tuple whose coordinates are all equal to one. Then for a vector x the inner product x e is the sum of the components of x . Let x x e n, x k (0, , x e,0, , x e n and let ,0) , where x e appears in the k th component. The vectors x , x , and x k all have the same total sum of components, but the components of x are more evenly spread out than those of x . Clearly x k concentrates this sum in one component. In this sense, x is the most evenly spread of this sum of components and x k is the most concentrated of this sum. Indeed, we have that x x x k , k 1, ,n . We note that the majorization relation is reflexive and transitive (established below in Lemma 5) and hence is a preordering. It is not a partial ordering, however, since it is not antisymmetric. Indeed it is symmetric in that x x ' , where x ' is any permutation of x . Let Rn denote n-dimensional Euclidean space. All topological properties in the sequel will be with respect to the usual metric on Rn . Let Π denote the set of n n permutation matrices and let D denote the set of n n doubly stochastic matrices. Then M Π if and only if there is one one in each row and each column of M and all other entries are zero. Similarly, M D if and only if the entries in M are nonnegative and each row and each column sum to one. Theorem 3. For x, y R n , the following are equivalent: (i) x y; (ii) x yD , some D D ; (iii) x y i i , for some i 0 , i i 1 , and some i Π . i Proof: The equivalence of (i) and (ii) is due to Hardy, Littlewood and Polya. See [9, Theorem 2.B.2]. The equivalence of (ii) and (iii) is due to Birkhoff. See [9, Theorem 2.A.2] . For each x R n let x y R n : y x , the set of n-tuples that are majorized by x . For a picture of this set in the case n = 3 see [9, Figure 3, p. 9]. Let y, x R n R n : y x , the graph of the multifunction . In the following, the notions of upper and lower semicontinuity in [6] are the same as the notions of upper and lower hemi-continuity in [5], which gives convenient characterizations of these notions in terms of sequences in Rn . We will use the terminology of [6], but refer to these convenient characterizations in [5]. A multifunction is continuous if it is both upper and lower semicontinuous. The following are properties of . Theorem 4. is a compact convex valued continuous multifunction in Rn . Consequently is closed in R 2n . Proof: Clearly for each x R n , x x , so is a multifunction. x is convex, by Theorem 3(ii) (convex combinations of doubly stochastic matrices are doubly stochastic) and compact since, by Theorem 3(iii), it is the convex polyhedron generated by the finite number of permutations of x . Suppose x R n and xk R n with x limk xk . If y x , by Theorem 3(ii), y xD , some D D . Then again by Theorem 3(ii), yk xk D xk and y limk yk , so is lower semicontinuous, [5, Theorem 2, p. 27]. Let yk xk Dk xk with Dk D arbitrary. By Birkhoff’s theorem D is the 2 convex polyhedron generated by the permutation matrices and hence is compact in Rn . Thus there is a subsequence of the Dk that converges to an element D D . For this subsequence indexed by k ' , limk ' yk ' xD x . Thus is upper semicontinuous, [5, Theorem 1, p. 24]. The closure of then follows by the same result . As obvious as these properties of are, except for convexity of x , they appear nowhere prominently in the literature on majorization to this author’s knowledge. Eaton and Perlman [4] do exploit the upper semicontinuity of in the proof of their Lemma 4.1. The following result establishes the transitivity of majorization and is needed later for the characterization of stochastic majorization. Lemma 5. For x, y R n , if y x , then y x . Proof: For x, y R n , suppose y x and z y . Then by Theorem 3(ii) z yDˆ and y xD some D, Dˆ D . But then z xDDˆ and DDˆ D , [9, 2.A.3, p. 20]. Again by Theorem 3(ii), z x . As in the abstract x P( R n ) : supp x . graph of the multifunction . case, Let for each x Rn , x, R n P( R n ) : x , let the Theorem 6. is a compact and convex valued continuous multifunction from Rn to P( R n ) . The graph is closed in R n P( R n ) . Proof: For x R n , x x . This result is then Theorem 1 above, but specialized to the example here. By [6, Theorem 3(i)], inherits the continuity of established in Theorem 4. The closure of follows from [5, Theorem 1, p. 24] . The functions f : R n R that are increasing (non-decreasing) in the majorization relation are called Schur-convex. See [9, Ch. 1.D, Ch. 3] for the origins of this terminology and the characterizations of this class of functions. Denote by SC the class of Borel measurable Schur-convex functions. The measurability requirement is a restriction, [9, 3.C.4, p. 70]. We can extend the relation in Rn to a relation in P( R n ) as we did above by taking F SC . This relation in P( R n ) is the version of stochastic majorization E1 studied in [9, Ch. 11]. Here the relations E1 and P1 are the relations in [9, Ch. 11]. Let Bn denote the Borel sets in Rn and call B B n Schur convex if I B SC . The following characterization of in P( R n ) results. Theorem 7. For , P R n the following are equivalent: (i) ; (ii) There exists a Markov kernel m on Rn such that m and m x x , almost every x R n ; (iii) There exists a probability measure P R 2n with supp K with first marginal and second marginal ; (iv) There exists a real valued random variable Z and two measurable functions f , g : R R n with f g (i. e., f t g t , t R ) such that the distribution of f Z is and the distribution of g Z is ; (v) There exist Rn valued random variables Y and X such that X P1 Y and the distribution of X is and the distribution of Y is ; (vi) B B for every Schur convex set B B n . Proof: This result follows from Theorem 2 by noting that by Theorem 4, is compact valued and the graph K is closed and hence is a Borel set, which by [7, Theorem 3] implies that the multifunction is Borel measurable . References 1. P. Billingsley, “Convergence of Probability Measures,” Wiley, New York, 1968. 2. P. Billingsley, “Convergence of Probability Measures,” 2nd, Wiley, New York, 1999. 3. J. L. Doob, “Measure Theory,” Springer-Verlag, New York, 1994. 4. M. L. Eaton and M. D. Perlman, Reflection groups, generalized Schur functions, and the geometry of majorization, Annals of Probability, 5 (1977), 829-860. 5. W. Hildenbrand, “Core and Equilibria of a Large Economy,” Princeton University Press, Princeton, 1974. 6. C. J. Himmelberg and F. S. Van Vleck, Multifunctions with values in a space of probability measures, Journal of Mathematical Analysis and Applications, 50 (1975), 108-112. 7. C. J. Himmelberg, T. Parthasarathy, and F. S. Van Vleck, Optimal plans for dynamic programming problems, Mathematics of Operations Research, 1 (1976), 390-394. 8. T. Kamae, U. Krengel, and G. L. O’Brien, Stochastic inequalities on partially ordered spaces, Annals of Probability, 5 (1977), 899-912. 9. A. W. Marshall and I. Olkin, “Inequalities: Theory of Majorization and Its Applications,” Academic Press, New York, 1979. 10. M. Shaked and J. G. Shanthikumar, “Stochastic Orders and Their Applications,” Academic Press, New York, 1994. 11. V. Strassen, The existence of probability measures with given marginals, Annals of Mathematical Statistics, 36 (1965), 423-439.