Stochastic Majorization: A Characterization by David C. Nachman Department of Finance J. Mack Robinson College of Business Georgia State University Atlanta, Georgia 30303-3083 September, 2005 Abstract Stochastic majorization is a pre-order on the space of probability measures on a finite dimensional Euclidean space induced by the majorization pre-order on this underlying space. Taking advantage of techniques used in mathematical economics and the continuity properties of the majorization relation, we provide a general characterization of stochastic majorization. Keywords: Majorization, stochastic majorization, Schur-convex functions and sets, continuous correspondences. 1 1. Introduction The applications of majorization and various notions of stochastic majorization are extensive in mathematics especially in probability and statistics. The excellent treatise by Marshall and Olkin (1979) presents the theory of majorization and an extensive display of its applications and extensions. See this treatise for appropriate references to the original work. In particular, Marshall and Olkin, 1979, Chapter 11, presents and characterizes various notions of stochastic majorization. The major one of import here is the one induced by the cone of Schur-convex functions. Kamae, et. al., 1977, Theorem 1, present a general characterization of the partial ordering of probability measures induced by a partial ordering on the underlying space. Majorization is a pre-order but not a partial order since it is not antsymmetric (Marshall and Olkin, 1979, 1.B). In this paper, we exploit the continuity properties of majorization and theorems of Strassen, (1965) and Himmelberg and Van Vleck (1975) to provide a Kamie, et. al. like characterization of stochastic majorization. The basics including the continuity properties are presented in section 2. To this author’s knowledge, these continuity properties have not been noticed. The characterization of stochastic majorization is presented in section 3. To this author’s knowledge, this characterization is new as well. We borrow much from Marshall and Olkin (1979). 2. Majorization Let x x1 , x x1 , , xn , xn and y y1 , and y y1 , , yn , yn be n-tuples of real numbers and let denote the vectors rearranged in decreasing order, i. e., x1 x and y with coordinates xn and y1 2 yn . The vector y is majorized by the vector x (or x majorizes y ), written y k i 1 x , if for each k 1, ,n , yi i 1 xi with equality holding for k n (Marshall and Olkin, 1979, A.1, p. 7). k In words, y is majorized by x if the components of y are more evenly spread out than the components of x or the components of x are more concentrated than the components of y . This intuition is reinforced by noting the following. Let e 1, ,1 , the n-tuple whose coordinates are all equal to one. Then for a vector x the inner product x e is the sum of the components of x . Let x x e n, x k (0, , x e,0, , x e n and let ,0) , where x e appears in the k th component. The vectors x , x , and x k all have the same total sum of components, but the components of x are more evenly spread out than those of x . Clearly x k concentrates this sum in one component. In this sense, x is the most evenly spread of this sum of components and x k is the most concentrated of this sum. Indeed, we have that x x x k , k 1, ,n . We note that the majorization relation is reflexive and transitive (established below in Lemma 3) and hence is a pre-ordering. It is not a partial ordering, however, since it is not antisymmetric. Let R n denote n-dimensional Euclidean space. All topological properties in the sequel will be with respect to the usual metric on R n . Let denote the set of n n permutation matrices and let D denote the set of n n doubly stochastic matrices. Then M if and only if there is one one in each row and each column of M and all other entries are zero. Similarly, M D if and only if the entries in M are nonnegative and each row and each column sum to one. 3 Theorem 1. For x, y Rn , the following are equivalent: i. y x; ii. y xP , some P D ; iii. y x i i , for some i 0 , i i 1 , and some i . i Proof: The equivalence of i and ii is due to Hardy, Littlewood and Polya. See Marshall and Olkin, 1979, Theorem 2.B.2. The equivalence of ii and iii is due to Birkhoff. See Marshall and Olkin, 1979, Theorem 2.A.2 . For each x R n let x y R n : y x , the set of n-tuples that are majorized by x . For a picture of this set in the case n = 3 see Marshall and Olkin, 1979, Figure 3, p. 9. Let y, x R n R n : y x , the graph of the relation . The following are properties of this relation. Theorem 2. is a compact convex valued continuous correspondence in R n . Consequently is closed in Rn Rn . Proof: Clearly for each x R n , x x , so is a correspondence in terms of Hildenbrand, 1974, p. 5. x is convex, by Theorem 1.ii (convex combinations of doubly stochastic matrices are doubly stochastic) and compact since, by Theorem 1.iii, it is the convex polyhedron generated by the finite number of permutations of x . Suppose x R n and xk Rn with x limk xk . If y x , by Theorem 1.ii, y xP , some P D . Then again by Theorem 1.ii, yk xk P xk and y limk yk , so is lower hemi-continuous (Hildenbrand, 1974, Theorem 2, p. 27). 4 Let yk xk Pk xk with Pk D arbitrary. By Birkhoff’s theorem D is the 2 convex polyhedron generated by the permutation matrices and hence is compact in R n . Thus there is a subsequence of the Pk that converges to an element P D . For this subsequence indexed by k ' , limk ' yk ' xP x . Thus is upper hemi-continuous (Hildenbrand, 1974, Theorem 1, p. 24). The closure of then follows by the same result . As obvious as these properties of are, except for convexity of x , they appear nowhere in the literature on majorization to this author’s knowledge. The following result establishes the transitivity of majorization and will be used later in the characterization of stochastic majorization. Lemma 3. For x, y Rn , if y x , then y x . Proof: For x, y Rn , suppose y x and z y . Then by Theorem 1.ii z yPˆ and y xP some P, Pˆ D . But then z xPPˆ and PPˆ D (Marshall and Olkin, 1979, 2.A.3, p. 20). Again by Theorem 1.ii, z x . We are interested in probability measures on R n . Let Bn denote the Borel sets in R n . Let M 1 R n denote the set of probability measures on Bn endowed with the topology of weak convergence (Hildenbrand, pp. 48-53). For the rest of the paper we drop the reference space and just write M 1 . However, the reference space is different and will be mentioned explicitly for one part of the characterization of stochastic majorization in Section 3. 5 For M 1 denote by supp the support of , the smallest closed subset of Rn with measure one (Chung, 1974, p. 31). For each x Rn , let x M 1 : supp x . Let x, R n M 1 : x , the graph of the correspondence . Theorem 4. is a compact convex valued continuous correspondence in M 1 . The graph is closed in R n M 1 . Proof: Let x R n and denote by x the probability measure with supp x x . Then x x . , x Let and let 1 x x 1 x 1, so 0,1 . Then supp( 1 ) x . Thus is convex valued. By Himmelberg and Van Vleck, 1975, Theorem 3.i, inherits the continuity and compact valuedness of established in Theorem 2. The closure of follows from Hildenbrand, 1974, Theorem 1, p. 24 . We use the correspondence to characterize stochastic majorization in the next section. 3. Stochastic Majorization The functions f : Rn R that are increasing (non-decreasing) in the majorization relation are called Schur-convex. See Marshall and Olkin, 1979, Ch. 1.D, Ch. 3, for the origins of this terminology and the characterizations of this class of functions. Denote by SC the class of Borel measurable Schur-convex functions. The measurability requirement is a restriction (Marshall and Olkin, 1979, 3.C.4, p. 70). We can extend the relation in R n to a relation in M 1 . For , M 1 we say that 6 majorizes (or that is majorized by ), and write , if and only if fd fd for every f SC for which both these integrals exist. The range of integration is all of R n unless specifically mentioned. This is truly an extension since for x in R n if and only if y x, y Rn , y x in M 1 . Intuitively, if puts more weight on vectors that are extreme in the on R n than does . The relation relation in M 1 is the version of stochastic majorization E1 studied in Marshall and Olkin, 1979, Ch. 11. There definitions are given in terms of R n valued random vectors say X and Y . The relation E1 is then stated as Y E1 X ( X stochastically majorizes Y or Y is stochastically majorized by X in the sense of E1 ) if E f Y E f X for all f SC for which these expectations exist. It is easy to see that this is equivalent to the above definition since these expectations are given by integration with respect to the distributions in R n of these random vectors and given these distributions there are R n valued random variables with these distributions. There is a another definition of stochastic majorization that Marshall and Olkin, 1979, pp. 282-283, call P1 that implies E1 and appears ostensibly to be stronger than E1 . There Y P1 X if f Y st f X for all f SC , where st is the typical meaning of stochastically larger (Marshall and Olkin, 1979, 17A.1). Clearly P1 E1 since stochastically larger random variable have larger expectations. It turns out that in this particular case we also have E1 P1 as well. See the argument in Marshall and Olkin, 1979, top of p. 283. We will use this argument to show one part of the characterization of the relation in M 1 defined above. 7 A Markov kernel on R n is a map m : Rn Bn [0,1] such that for each set B Bn the map x m x, B is Borel measurable and for x Rn fixed m x m x, M1 . For such a Markov kernel m and a probability measure M 1 denote by m the element of M 1 R 2 n defined by m A B m x, B dx , for A measurable rectangles, A, B Bn . We say that the first marginal of m is and denote the second marginal m . Finally, we say that a set B Bn is Schur-convex if its indicator function is Schur-convex. These designations are borrowed from Kamae, et. al., 1977, pp. 899-900. The following characterization of the relation on M 1 is new and flushes out the intuition given above. Theorem 5. For , M 1 the following are equivalent: i. ; ii. There exists a Markov kernel m on R n such that m and m x x , almost every x R n ; iii. There exists a probability measure M 1 R 2 n with supp with first marginal and second marginal ; iv. There exists a real valued random variable Z and two measurable functions f , g : R Rn with f g ( f t g t , t R ) such that the distribution of f Z is and the distribution of g Z is ; v. There exist R n valued random variables Y and X such that Y distribution of Y is and the distribution of X is ; 8 P1 X and the vi. B B for every Schur-convex set B Bn n. Proof: The key equivalence is i. and ii. The rest follow easily. Let , M 1 and assume that ii holds. Let f SC be such that the integrals fd f y m x, dy v dx fd , since fd , fd f y m x, dy f x , exist. Then almost every x R n . This establishes i. Therefore assume i. For every bounded continuous function z : R n R define h x, z sup zd : x . By Theorem 4 and Hildenbrand, 1974, Corollary p. 30, h , z is continuous in x and for each x R n , z x h x, z sup z since x x and zd x z x . Thus h , z is bounded as well, so all integrals below exist. Finally, h , z is also Schur-convex in x . For if x, y Rn and y x then by Lemma 3 y x , implying that y x , and hence h y, z h x, z . It follows that zd h x, z dx h x, z dx , the last inequality from i. Condition ii then follows from Strassen, 1965, Theorem 3. Assume ii and let m . Then m x, x dx 1 , since the x -section x x for every x . From Theorem 2, is closed in R 2n . This gives iii. Therefore assume iii. The construction in Kamae, et. al., 1977, Theorem 1. (iii) goes through here as well and this gives iv. Assuming iv let Y f Z and X g Z . Then clearly Y and v. follows from the fact that Y 283). Therefore assume E1 v. X Y P1 9 X X (Marshall and Olkin, 1979, top of p. B Bn n If E1 is Schur-convex, then B E I B Y E I B X B , where I B is the indicator of the set B and the inequality follows from the fact that I B SC and so I B Y st I B X . It remains to show that vi imples i. Assume vi. For f SC , I all real t . It follows from vi that x R n xR : f x t x R : f x t n n SC for : f x t and hence i. follows from Marshall and Olkin, 1979, 17.A.1 . Kamae, et. al. (1977) use a theorem of Strassen (1965) to characterize stochastic orderings induced by a partial order on the underlying space. Their result, Kamae, et. al., 1977, Theorem 1, is the model for Theorem 5 above, but the relevant theorem of Strassen used in the proof of Theorem 5 is not the one used by Kamae, et. al. (1977). We emphasize here the relation on R n is not a partial order. The crucial implication i. implies ii. in Theorem 5 relies on Theorem 3 of Strassen (1965) and this theorem applies more generally. It can be used to obtain the same implication for any pre-order on any Polish space that is sufficiently regular to give the function h x, z , defined above in the second paragraph the proof of Theorem 5, to be Borel measurable in x . Weaker conditions than those of Theroem 2 above suffice for this function to be Borel measurable. See for example Hildenbrand, 1974, Proposition 3, p.60. The result Hildenbrand, 1974, Corollary p. 30, is referred to in the mathematical economics literature as the maximum theorem and is used there to establish continuity of consumer demand in various situations. Transitivity of the pre-order gives monotonicity of h , z in the pre-order. Reflexivity of the pre-order gives x x and x is 10 convex whether x is or not. This convex valuedness is essential to apply Strassen, 1965, Theorem 3, but comes at no cost. Theorem 5 is reminiscent of the characterization of dilations as given for example in Phelps, 1966, Ch. 13. In this case, a dilation moves probability weight toward extreme points of a compact convex set. Here the Markov kernel m of Theorem 5.ii moves probability weight away from extreme points x (which is an extreme point of x ) to less extreme points, in the sense of majorization, in x . Borrowing with a little license from the terminology in Kamae, et. al., 1977, p. 900, we could call the Markov kernel m of Theorem 5.ii downward. As in the case of dilations, it is natural to ask about maximal measures for the relation on M1 X , where X is a compact convex subset of R n , and the support of these measures if they exist. This of course is complicated by the fact that pre-order and not a partial order. This is a project for future research. 11 is only a REFERENCES Chung (1974) Chung, K. L. (1974), A Course in Probability Theory (Academic Press, New York, 2nd ed.). Hildenbrand (1974) Hildenbrand, W. (1974), Core and Equilibria of a Large Economy (Princeton University Press, Princeton). Himmelberg and Van Vleck, (1975) Himmelberg, C. J. and Van Vleck, F. S. (1975), Multifunctions with values in a space of probability measures. J. Math. Anal. Appls. 50, 108-112. Kamae, et. al., (1977) Kamae, T., Krengel, U. and O’Brien, G. L. (1977), Stochastic inequalities on partially ordered spaces. Ann. Probab. 5, 899-912. Marshall and Olkin (1979) Marshall, A. W. and Olkin, I. (1979), Inequalities: Theory of Majorization and Its Applications (Academic Press, New York). Phelps (1966) Phelps, R. R. (1966), Lectures on Choquet’s Theorem (Van Nostrand, Princeton). Strassen (1965) Strassen, V. (1965), The existence of probability measures with given marginals. Ann. Math. Statist. 36, 423-439. 12 Derivation of Theorem 5.iv from Theorem 5.iii. (taken from Kamae, et. al., 1977, Theorem 1 (iii) from Theorem 1 (ii)). The probability space K , is isomorphic mod 0 to B, B, P where B is a Borel subset of R1 , B is the collection of Borel subsets of B , and P is a probability measure on B, B . The reference for this result is given by Kamae, et. al., 1977, p. 900. Let Z : K B be the isomorphism and let f p1 Z 1 and let g p2 Z 1 , where p1 and p2 are the projections of Rn Rn onto the first and second factors. This defines f and g on B . For t R , but t B , take f t g t 0 Rn . For each t B , Z 1 t y, x K and f t p1 Z 1 t y and g t p2 Z 1 t x and y x . Also f Z y and g Z x and thus the distribution of f Z is and the distribution of g Z is . 13