Stochastic Majorizat.. - Georgia State University

advertisement
Stochastic Majorization: A Characterization
by
David C. Nachman
Department of Finance
J. Mack Robinson College of Business
Georgia State University
Atlanta, Georgia 30303-3083
September, 2005
Abstract
Stochastic majorization is a pre-order on the space of probability measures on a finite
dimensional Euclidean space induced by the majorization pre-order on this underlying
space. Taking advantage of techniques used in mathematical economics and the
continuity properties of the majorization relation, we provide a general characterization
of stochastic majorization.
Keywords: Majorization, stochastic majorization, Schur-convex functions and sets,
continuous correspondences.
1
1. Introduction
The applications of majorization and various notions of stochastic majorization are
extensive in mathematics especially in probability and statistics. The excellent treatise by
Marshall and Olkin (1979) presents the theory of majorization and an extensive display
of its applications and extensions. See this treatise for appropriate references to the
original work. In particular, Marshall and Olkin, 1979, Chapter 11, presents and
characterizes various notions of stochastic majorization. The major one of import here is
the one induced by the cone of Schur-convex functions.
Kamae, et. al., 1977, Theorem 1, present a general characterization of the partial
ordering of probability measures induced by a partial ordering on the underlying space.
Majorization is a pre-order but not a partial order since it is not antsymmetric (Marshall
and Olkin, 1979, 1.B). In this paper, we exploit the continuity properties of majorization
and theorems of Strassen, (1965) and Himmelberg and Van Vleck (1975) to provide a
Kamie, et. al. like characterization of stochastic majorization. The basics including the
continuity properties are presented in section 2. To this author’s knowledge, these
continuity properties have not been noticed. The characterization of stochastic
majorization is presented in section 3. To this author’s knowledge, this characterization is
new as well. We borrow much from Marshall and Olkin (1979).
2. Majorization
Let x   x1 ,

x  x1 ,
, xn

, xn  and y   y1 ,

and y  y1 ,
, yn
, yn  be n-tuples of real numbers and let
 denote the vectors
rearranged in decreasing order, i. e., x1 
x and y with coordinates
 xn and y1 
2
 yn . The vector y is
majorized by the vector x (or x majorizes y ), written y

k
i 1
x , if for each k  1,
,n ,
yi  i 1 xi with equality holding for k  n (Marshall and Olkin, 1979, A.1, p. 7).
k
In words, y is majorized by x if the components of y are more evenly spread
out than the components of x or the components of x are more concentrated than the
components of y . This intuition is reinforced by noting the following. Let e  1,
,1 ,
the n-tuple whose coordinates are all equal to one. Then for a vector x the inner product
x  e is the sum of the components of x . Let x   x  e n,
x k  (0,
, x  e,0,
, x  e n  and let
,0) , where x  e appears in the k th component. The vectors x , x ,
and x k all have the same total sum of components, but the components of x are more
evenly spread out than those of x . Clearly x k concentrates this sum in one component.
In this sense, x is the most evenly spread of this sum of components and x k is the most
concentrated of this sum. Indeed, we have that x
x
x k , k  1,
,n .
We note that the majorization relation is reflexive and transitive (established
below in Lemma 3) and hence is a pre-ordering. It is not a partial ordering, however,
since it is not antisymmetric.
Let R n denote n-dimensional Euclidean space. All topological properties in the
sequel will be with respect to the usual metric on R n . Let  denote the set of n  n
permutation matrices and let D denote the set of n  n doubly stochastic matrices. Then
M   if and only if there is one one in each row and each column of M and all other
entries are zero. Similarly, M  D if and only if the entries in M are nonnegative and
each row and each column sum to one.
3
Theorem 1. For x, y  Rn , the following are equivalent:
i. y
x;
ii. y  xP , some P  D ;


iii. y  x   i  i  , for some i  0 ,
 i


i
 1 , and some  i   .
i
Proof: The equivalence of i and ii is due to Hardy, Littlewood and Polya. See Marshall
and Olkin, 1979, Theorem 2.B.2. The equivalence of ii and iii is due to Birkhoff. See
Marshall and Olkin, 1979, Theorem 2.A.2 .
For each x  R n let   x    y  R n : y
x , the set of n-tuples that are majorized
by x . For a picture of this set in the case n = 3 see Marshall and Olkin, 1979, Figure 3, p.
9. Let    y, x   R n  R n : y    x  , the graph of the relation  . The following are
properties of this relation.
Theorem 2. 
is a compact convex valued continuous correspondence in R n .
Consequently  is closed in Rn  Rn .
Proof: Clearly for each x  R n , x    x  , so  is a correspondence in terms of
Hildenbrand, 1974, p. 5.   x  is convex, by Theorem 1.ii (convex combinations of
doubly stochastic matrices are doubly stochastic) and compact since, by Theorem 1.iii, it
is the convex polyhedron generated by the finite number of permutations of x .
Suppose x  R n and xk   Rn with x  limk xk . If y    x  , by Theorem 1.ii,
y  xP , some P  D . Then again by Theorem 1.ii, yk  xk P    xk  and y  limk yk , so
 is lower hemi-continuous (Hildenbrand, 1974, Theorem 2, p. 27).
4
Let yk  xk Pk    xk  with Pk  D arbitrary. By Birkhoff’s theorem D is the
2
convex polyhedron generated by the permutation matrices and hence is compact in R n .
Thus there is a subsequence of the Pk that converges to an element P  D . For this
subsequence indexed by k ' , limk ' yk '  xP    x  . Thus  is upper hemi-continuous
(Hildenbrand, 1974, Theorem 1, p. 24). The closure of  then follows by the same
result .
As obvious as these properties of  are, except for convexity of   x  , they
appear nowhere in the literature on majorization to this author’s knowledge. The
following result establishes the transitivity of majorization and will be used later in the
characterization of stochastic majorization.
Lemma 3. For x, y  Rn , if y    x  , then   y     x  .
Proof: For x, y  Rn , suppose y    x  and z    y  . Then by Theorem 1.ii z  yPˆ and
y  xP some P, Pˆ  D . But then z  xPPˆ and PPˆ  D (Marshall and Olkin, 1979,
2.A.3, p. 20). Again by Theorem 1.ii, z    x  .
We are interested in probability measures on R n . Let Bn denote the Borel sets in
R n . Let M 1  R n  denote the set of probability measures on Bn endowed with the
topology of weak convergence (Hildenbrand, pp. 48-53). For the rest of the paper we
drop the reference space and just write M 1 . However, the reference space is different and
will be mentioned explicitly for one part of the characterization of stochastic majorization
in Section 3.
5
For   M 1 denote by supp    the support of  , the smallest closed subset of
Rn
with

measure one (Chung, 1974, p. 31). For each
x  Rn ,
let
  x     M 1 : supp       x  . Let    x,    R n  M 1 :     x  , the graph of
the correspondence  .
Theorem 4.  is a compact convex valued continuous correspondence in M 1 . The graph
 is closed in R n  M 1 .
Proof: Let x  R n and denote by  x the probability measure with supp  x   x . Then
 x   x  .
,   x 
Let
and
let
   1       x      x   1      x   1, so
  0,1 .
Then
supp(  1    )    x  .
Thus  is convex valued. By Himmelberg and Van Vleck, 1975, Theorem 3.i,  inherits
the continuity and compact valuedness of  established in Theorem 2. The closure of 
follows from Hildenbrand, 1974, Theorem 1, p. 24 .
We use the correspondence  to characterize stochastic majorization in the next
section.
3. Stochastic Majorization
The functions
f : Rn  R
that are increasing (non-decreasing) in the
majorization relation are called Schur-convex. See Marshall and Olkin, 1979, Ch. 1.D,
Ch. 3, for the origins of this terminology and the characterizations of this class of
functions. Denote by SC the class of Borel measurable Schur-convex functions. The
measurability requirement is a restriction (Marshall and Olkin, 1979, 3.C.4, p. 70). We
can extend the relation
in R n to a relation in M 1 . For  ,  M 1 we say that 
6
majorizes  (or that  is majorized by  ), and write   , if and only if
 fd    fd
for every f  SC for which both these integrals exist. The range of
integration is all of R n unless specifically mentioned. This is truly an extension since for
x in R n if and only if  y
x, y  Rn , y
 x in M 1 .
Intuitively,   if  puts more weight on vectors that are extreme in the
on R n than does  . The relation
relation
in M 1 is the version of stochastic
majorization E1 studied in Marshall and Olkin, 1979, Ch. 11. There definitions are given
in terms of R n valued random vectors say X and Y . The relation E1 is then stated as
Y
E1
X ( X stochastically majorizes Y or Y is stochastically majorized by X in the
sense of E1 ) if E  f Y    E  f  X   for all f  SC for which these expectations exist.
It is easy to see that this is equivalent to the above definition since these expectations are
given by integration with respect to the distributions in R n of these random vectors and
given these distributions there are R n valued random variables with these distributions.
There is a another definition of stochastic majorization that Marshall and Olkin,
1979, pp. 282-283, call P1 that implies E1 and appears ostensibly to be stronger than E1 .
There Y
P1
X if f Y  st f  X  for all f  SC , where st is the typical meaning of
stochastically larger (Marshall and Olkin, 1979, 17A.1). Clearly P1  E1 since
stochastically larger random variable have larger expectations. It turns out that in this
particular case we also have E1  P1 as well. See the argument in Marshall and Olkin,
1979, top of p. 283. We will use this argument to show one part of the characterization
of the relation
in M 1 defined above.
7
A Markov kernel on R n is a map m : Rn  Bn  [0,1] such that for each set
B  Bn
the map
x  m  x, B 
is Borel measurable and for
x  Rn
fixed
m  x   m  x,   M1 . For such a Markov kernel m and a probability measure   M 1
denote by m the element of M 1  R 2 n  defined by m  A  B    m  x, B   dx  , for
A
measurable rectangles, A, B  Bn . We say that the first marginal of m is  and denote
the second marginal  m . Finally, we say that a set B  Bn is Schur-convex if its indicator
function is Schur-convex. These designations are borrowed from Kamae, et. al., 1977, pp.
899-900.
The following characterization of the relation
on M 1 is new and flushes out
the intuition given above.
Theorem 5. For  ,  M 1 the following are equivalent:
i.   ;
ii. There exists a Markov kernel m on R n such that    m and m  x    x  ,  almost
every x  R n ;
iii. There exists a probability measure   M 1  R 2 n  with supp      with first
marginal  and second marginal  ;
iv. There exists a real valued random variable Z and two measurable functions
f , g : R  Rn with f
g ( f t 
g  t  , t R ) such that the distribution of f  Z  is
 and the distribution of g  Z  is  ;
v. There exist R n valued random variables Y and X such that Y
distribution of Y is  and the distribution of X is  ;
8
P1
X and the
vi.   B     B  for every Schur-convex set B  Bn n.
Proof: The key equivalence is i. and ii. The rest follow easily. Let  ,  M 1 and assume
that ii holds. Let
f  SC be such that the integrals
 fd    f  y m  x, dy  v  dx    fd ,
since
 fd  ,  fd
 f  y m  x, dy   f  x  ,
exist. Then
 almost every
x  R n . This establishes i.
Therefore assume i. For every bounded continuous function z : R n  R define
h  x, z   sup
 zd :   x  . By Theorem 4 and Hildenbrand, 1974, Corollary p. 30,
h , z  is continuous in x and for each x  R n , z  x   h  x, z   sup z since  x   x 
and
 zd 
x
 z  x  . Thus h , z  is bounded as well, so all integrals below exist. Finally,
h , z  is also Schur-convex in x . For if x, y  Rn and y
x then by Lemma 3
  y     x  , implying that   y     x  , and hence h  y, z   h  x, z  . It follows that
 zd    h  x, z   dx    h  x, z   dx  ,
the last inequality from i. Condition ii then
follows from Strassen, 1965, Theorem 3.
Assume ii and let   m . Then       m  x,  x   dx   1 , since the x -section
 x    x  for every x . From Theorem 2,  is closed in R 2n . This gives iii. Therefore
assume iii. The construction in Kamae, et. al., 1977, Theorem 1. (iii) goes through here as
well and this gives iv. Assuming iv let Y  f  Z  and X  g  Z  . Then clearly Y
and v. follows from the fact that Y
283).
Therefore
assume
E1
v.
X Y
P1
9
X
X (Marshall and Olkin, 1979, top of p.
B  Bn n
If
E1
is
Schur-convex,
then
  B   E  I B Y    E  I B  X      B  , where I B is the indicator of the set B and the
inequality follows from the fact that I B  SC and so I B Y  st I B  X  .
It remains to show that vi imples i. Assume vi. For f  SC , I
all real t . It follows from vi that 
x  R
n
xR : f  x t
 x  R
: f  x   t  
n
n
 SC for

: f  x   t and
hence i. follows from Marshall and Olkin, 1979, 17.A.1 .
Kamae, et. al. (1977) use a theorem of Strassen (1965) to characterize stochastic
orderings induced by a partial order on the underlying space. Their result, Kamae, et. al.,
1977, Theorem 1, is the model for Theorem 5 above, but the relevant theorem of Strassen
used in the proof of Theorem 5 is not the one used by Kamae, et. al. (1977). We
emphasize here the relation
on R n is not a partial order.
The crucial implication i. implies ii. in Theorem 5 relies on Theorem 3 of Strassen
(1965) and this theorem applies more generally. It can be used to obtain the same
implication for any pre-order on any Polish space that is sufficiently regular to give the
function h  x, z  , defined above in the second paragraph the proof of Theorem 5, to be
Borel measurable in x . Weaker conditions than those of Theroem 2 above suffice for this
function to be Borel measurable. See for example Hildenbrand, 1974, Proposition 3, p.60.
The result Hildenbrand, 1974, Corollary p. 30, is referred to in the mathematical
economics literature as the maximum theorem and is used there to establish continuity of
consumer demand in various situations. Transitivity of the pre-order gives monotonicity
of h , z  in the pre-order. Reflexivity of the pre-order gives  x   x  and   x  is
10
convex whether   x  is or not. This convex valuedness is essential to apply Strassen,
1965, Theorem 3, but comes at no cost.
Theorem 5 is reminiscent of the characterization of dilations as given for example
in Phelps, 1966, Ch. 13. In this case, a dilation moves probability weight toward extreme
points of a compact convex set. Here the Markov kernel m of Theorem 5.ii moves
probability weight away from extreme points x (which is an extreme point of   x  ) to
less extreme points, in the sense of majorization, in   x  . Borrowing with a little license
from the terminology in Kamae, et. al., 1977, p. 900, we could call the Markov kernel m
of Theorem 5.ii downward.
As in the case of dilations, it is natural to ask about maximal measures for the
relation
on M1  X  , where X is a compact convex subset of R n , and the support of
these measures if they exist. This of course is complicated by the fact that
pre-order and not a partial order. This is a project for future research.
11
is only a
REFERENCES
Chung (1974) Chung, K. L. (1974), A Course in Probability Theory (Academic Press,
New York, 2nd ed.).
Hildenbrand (1974) Hildenbrand, W. (1974), Core and Equilibria of a Large Economy
(Princeton University Press, Princeton).
Himmelberg and Van Vleck, (1975) Himmelberg, C. J. and Van Vleck, F. S. (1975),
Multifunctions with values in a space of probability measures. J. Math. Anal. Appls. 50,
108-112.
Kamae, et. al., (1977) Kamae, T., Krengel, U. and O’Brien, G. L. (1977), Stochastic
inequalities on partially ordered spaces. Ann. Probab. 5, 899-912.
Marshall and Olkin (1979) Marshall, A. W. and Olkin, I. (1979), Inequalities: Theory of
Majorization and Its Applications (Academic Press, New York).
Phelps (1966) Phelps, R. R. (1966), Lectures on Choquet’s Theorem (Van Nostrand,
Princeton).
Strassen (1965) Strassen, V. (1965), The existence of probability measures with given
marginals. Ann. Math. Statist. 36, 423-439.
12
Derivation of Theorem 5.iv from Theorem 5.iii. (taken from Kamae, et. al., 1977,
Theorem 1 (iii) from Theorem 1 (ii)). The probability space  K ,   is isomorphic mod 0
to  B, B, P  where B is a Borel subset of R1 , B is the collection of Borel subsets of B ,
and P is a probability measure on
 B, B .
The reference for this result is given by
Kamae, et. al., 1977, p. 900. Let Z : K  B be the isomorphism and let f  p1  Z 1  and
let g  p2  Z 1  , where p1 and p2 are the projections of Rn  Rn onto the first and
second factors. This defines
f
and g on B . For
t R , but t  B , take
f  t   g  t   0 Rn . For each t  B , Z 1  t    y, x   K and f  t   p1  Z 1  t    y
and g  t   p2  Z 1  t    x and y
x . Also f  Z   y and g  Z   x and thus the
distribution of f  Z  is  and the distribution of g  Z  is  .
13
Download