Uploaded by 徐嘉佑

inv2023lecture0

advertisement
Investments
Lecture 0: Mathematical Review
1. (Set and Function.)
A well-defined collection of objects (called elements) is a set. A set
of sets is usually referred to as a collection, and a set of collections
is usually referred to as a family. A set contains nothing is called
an empty set and denoted ∅. A set contains exactly one element is a
singleton. If two sets A and B are such that B contains each and every
element of A, then we say that A is a subset of B and B a superset of
A, and we write A ⊂ B, and if furthermore B contains an element y
which is not contained in A, then A is a proper subset of B. Two sets
A and B are equal, if A ⊂ B and B ⊂ A. If one has a universe set X
in mind (so that all sets under consideration are subsets of X), then
define the complement of A ⊂ X to be the set containing exactly those
x contained by X but not by A. The complement of A is denoted by
Ac . Denote by 2X the collection containing all subsets of X, which
will be referred to as the power set of X.
2. Given two sets A and B, an assigning rule f is a (single-valued) function from A into B if ∀a ∈ A, there is b ∈ B such that f (a) = b. In
this case, we write f : A → B, and A and B are called the domain and
range of f respectively. If A is itself a collection of sets, then we say
that f is a set function. If instead B is a collection of sets then f is a
multi-valued function (or a correspondence).1 The function f : A → B
is surjective if B ⊂ f (A), injective if x, y ∈ A, x 6= y ⇒ f (x) 6= f (y),
and bijective if it is both surjective ane injective. A bijective function
is also called a one-to-one correspondence. Given f : A → B, and
given any C ⊂ A and D ⊂ B, the sets f (C) = {f (a) : a ∈ C} and
f −1 (D) = {a ∈ A : f (a) ∈ D} are called respectively the image of C
and the pre-image of D under f .2
3. A bijective function f : A → B has an inverse function f −1 : B →
A which assigns for each b ∈ B “the” a ∈ A that satisfies f (a) =
1
S
In general we would write for the correspondence f : A → b∈B b instead.
Let A = B be the set of real numbers. Let f (x) = x2 + 1, C = (−1, 0], and D = [0, 12 ].
What is f −1 (D)? What is f −1 (f (C))? Is the latter a subset or a superset of C?
2
1
b. (Depending on the context, the notation f −1 should not create
confusion with the pre-image operator.)
4. Two sets A and B are of the same cardinality if we can find a one-to-one
correspondence between them. A set A is finite if it is empty or there
is a one-to-one correspondence between A and the set {1, 2, · · · , n}
for some n ∈ Z+ (and we say in the latter case the cardinality of A,
denoted |A| or #(A), is n), where Z+ stands for the set of strictly
positive integers.3 A set A is countably infinite (denumerable) if there
is a one-to-one correspondence between A and Z+ . A set which is
neither finite nor countably infinite is uncountable. A set is countable
if it is not uncountable.4
5. Given an indexed family of sets {Ab ; b ∈ β}, where β is an arbitrary
non-empty index set, the Cartesian product of these sets, denoted by
S
Πb∈β Ab , is a set containing each and every function f : β → b∈β Ab
satisfying that for all b ∈ β, f (b) ∈ Ab .5 The union of sets {Ab ; b ∈ β},
S
denoted b∈β Ab , contains exactly those x contained by some Ab . The
T
intersection of these sets, denoted b∈β Ab , contains exactly those x
contained by all Ab ’s. Two sets are disjoint if they have an empty
intersection.
6. A collection {Ab ; b ∈ β} of subsets of X is called a partition of X if
S
T
Ab = ∅.
b∈β Ab = X and ∀a, b ∈ β, a 6= b ⇒ Aa
7. DeMorgan’s law says that
[
[
b∈β
Ab ]c =
\
Acb , [
b∈β
\
b∈β
Ab ]c =
[
Acb .
(1)
b∈β
3
The notation Z+ is taken from James R. Munkres, 1975, Topology: A First Course,
New Jersey: Prentice-Hall. It represents the set of natural numbers. We shall interchangeably use the notation N to stand for the same set, as in Kai Lai Chung, 1974, A Course
in Probability Theory.
4
The following three statements are apparently equivalent: (i) A is countable; (ii) there
is a surjective function f : Z+ → A; (iii) there is an injective function g : A → Z+ . Because
of (ii) a subset of a countable set is apparently countable. Because of (iii), A = Z2+ is
countable: simply define g(n, m) = 2n 3m . Now since the set of positive rationals has a oneto-one correspondence with Z2+ , the former is countable. The following three statements
are also equivalent, but less apparent: (a) A is infinite; (b) there exists an injective function
f : Z+ → A; (c) there exists a bijective function of A with a proper subset of itself.
5
For example, Suppose that β = {1, 2}, A1 = {x, y}, A2 = {a, b, c}. Then Πb∈β Ab =
A1 × A2 = {(x, a), (x, b), (x, c), (y, a), (y, b), (y, c)}, where, for example, (x, a) is a shorthand notation for f (x) = a. Similarly, if instead β = {2, 1} , then Πb∈β Ab = A2 × A1 =
{(a, x), (a, y), (b, x), (b, y), (c, x), (c, y)}.
2
8. The following facts are well known. A countable union or a finite
product of countable sets is countable. A infinite product of countable sets need not be countable; in particular, the power set of Z+ is
uncountable.
9. (Linear Space or Vector Space.)
10. A set V is a linear space on <, where < is the set of real numbers,
if it is equipped with and closed under operations of vector addition
and scalar multiplication together with a number of well-defined operational (such as commutative and associative) laws. Elements in V
are called vectors. Given a finite number of elements x1 , x2 , · · · , xn in
V , an element y ∈ V is a linear combination of these elements if there
P
exist real numbers a1 , a2 , · · · , an such that y = ni=1 ai xi . A subset
B of V is said to span V if all vectors in V are linear combinations
of elements of B. If no proper subset of B can span B, then B is
linearly independent, and in this case B is a basis for V if B spans
V . The cardinality of (the number of elements in) a basis (any basis)
is called the linear space V ’s dimension. V may be finite-dimensional
or infinite-dimensional depending on whether the cardinality of B is
finite or infinite.
11. Let V be the set of continuous functions f : [0, 1] → <. For any
f, g ∈ V , define f + g as the function h : [0, 1] → < such that h(x) =
f (x) + g(x) for all x ∈ [0, 1]. For any f ∈ V and any r ∈ <, define rf
as the function g : [0, 1] → < such that g(x) = r · f (x) for all x ∈ [0, 1].
Then, clearly, for all f, g ∈ V and for all r ∈ <, f + g ∈ V and rf ∈ V .
Then V is an infinite-dimensional real linear space.
12. A subset A of the linear space V is a convex set if for all x, y ∈ A and
for all real numbers λ ∈ [0, 1], λx + (1 − λ)y ∈ A. The intersection of
a collection of convex sets is itself convex.
13. (Metric Space and Topology.)
Given any non-empty set X a function d : X × X → R+ (where R+
or interchangeably <+ denotes the set of non-negative real numbers)
is a metric if
(i) d(x, y) = 0 ⇔ x = y, ∀x, y ∈ X;
3
(ii) d(x, y) = d(y, x) ≥ 0, ∀x, y ∈ X;
(iii) d(x, y) + d(y, z) ≥ d(x, z).
The pair (X, d) is then a metric space. For example, let X = <, which
is the set of real numbers, and define for all x, y ∈ X, d0 (x, y) = 1 if
x 6= y and d0 (x, y) = 0 if otherwise. Define d(x, y) = |x − y| for all
x, y ∈ X also. Then (X, d) and (X, d0 ) are both metric spaces, but
they are different metric spaces.
14. A function x : Z+ → X is called a sequence in (X, d). It is said to
converge to z ∈ X if for all real > 0 there exists n() ∈ Z+ such that
n ≥ n() ⇒ d(x(n), z) < . A sequence is Cauchy if for all e > 0, there
exists n(e) ∈ Z+ such that m, n ≥ n(e) ⇒ d(x(m), x(n)) < e.
15. Given two metric spaces (X, d) and (Y, d0 ), a function f : X → Y is
continuous if {f (x(n))} is a convergent sequence in (Y, d0 ) whenever
{x(n)} is a convergent sequence in (X, d).
16. Given a metric space (X, d), for all z ∈ X and e ∈ R++ ,6 the subsets
B(z, e) = {x ∈ X : d(x, z) < e} are called open balls in X. A subset
G of X is open if it can be represented as a (perhaps empty) union of
open balls. A subset F of X is closed if F c is open. The closure of
G ⊂ X is the smallest (with respect to the order relation “inclusion”
on X) closed set containing G, and the interior of F is the largest open
set contained in F . A point x ∈ X is a limit point of A ⊂ X if for all
T T
e > 0, B(x, e) A {x}c 6= ∅. The set of limit points of A, denoted
A0 , is the derived set of A. A point x ∈ A is an isolated point if for
T T
some e > 0, B(x, e) A {x}c = ∅. A closed subset A of X is perfect
if it contains no isolated points. The boundary of A is the intersection
of the closure of A and the closure of Ac in (X, d). The collection τ
of all the open sets on X is called a topology on X. The pair (X, τ ) is
called a topological space.
17. A set A contained in the metric space (X, d) is bounded if for some
x ∈ X and e ∈ R++ , A ⊂ B(x, e).
18. Take the linear space (<, k · k) for example, where for all x, y ∈ <, the
norm or the length of vector x, denoted by kxk, is such that kxk = |x|
and we can define a metric from the norm by d(x, y) = |x − y|, which
generates the so-called standard topology on <. In this metric space
6
Here R++ , or interchangeably <++ , denotes the set of strictly positive real numbers.
4
(<, d), an open ball is an open interval, and an open set is a countable
union of pair-wise disjoint open intervals. (Why countable, and why
pair-wise disjoint?)
• Let A = {x} for some x ∈ <. Is x a limit point of A? Is x an
isolated point of A? Is A a closed set?
• Suppose that {xn ; n ∈ Z+ } is a sequence in <, which converges
to x0 ∈ <. Let X contains exactly all the xn ’s in the convergent
sequence. Is x0 a limit point of X? If this is not always true,
then offer a condition that ensures this conclusion.
19. Let (X, τ ) be a topological space. A subset γ of τ is an open covering of
S
X if X ⊂ U ∈γ U . X is compact if every open covering of X contains
a finite sub-covering of X; that is, if for each open covering γ of X,
there exists a finite number of member sets U1 , U2 , · · · , Un ∈ γ, such
S
that X ⊂ ni=1 Ui .
Note that < with the standard topology is not compact, for the open
covering {(n, n + 2) : n ∈ Z} does not have a finite sub-covering of
<. (We shall see that this is because < is not bounded.) Similarly,
the interval (0, 1) is not a compact subset of R under the standard
1
topology, for the collection {( n+2
, n1 ) : n ∈ Z+ } is an open covering of
(0, 1) which does not have a finite sub-covering. (We shall see that this
is because (0, 1) is not closed.) It can be shown that compact subsets
of < under the standard topology are exactly those subsets which are
both bounded and closed.
20. The following statements are true.
• A closed subset of a compact space is compact.
• A (finite or infinite) Cartesian product of compact spaces is compact.
• Consider the metric space (<n , d), where the Euclidean metric
d is such that for x, y ∈ <n , withpx = (x1 , x2 , · · · , xn ) and y =
Pn
2
(y1 , y2 , · · · , yn ), we have d(x, y) =
i=1 (xi − yi ) . Then under
n
the metric topology d, A ⊂ < is compact if and only if it is both
closed and bounded.
• Is the first orthant in <2 closed? Is it bounded?
5
• Given two topological spaces (X, τx ) and (Y, τy ) and a continuous
function f : (X, τx ) → (Y, τy ), f (A) is compact in (Y, τy ) whenever A is compact in (X, τx ). That is, a continuous image of a
compact space is compact.
21. (Real Sequence and Real Function.)
If A = Z+ and B = <, then (the image of) f : A → B is called a real
(real number) sequence, and we write f (n) as fn for all n ∈ Z+ . If
h : Z+ → Z+ is increasing and x : Z+ → <, then {x(h(n))} is called a
subsequence of {x(n)}.
22. We say that l ∈ < is a limit of the sequence {xn } if for all > 0 there
is an N () ∈ Z+ such that
n ∈ Z+ , n ≥ N () ⇒ |xn − l| < .
In this case we say that {xn } converges to l and {xn } a convergent
sequence. The limit of a sequence so defined is unique.
23. A set A ⊂ < is bounded above by u ∈ <, if for all x ∈ A, x ≤ u. In this
case, u is called an upper bound of the set A. The least upper bound of
A, if it exists, is denoted by sup A. Lower bounds are similarly defined.
The greatest lower bound of A, if it exists, is denoted by inf A. The
set A is bounded if it has an upper and a lower bound in <.
24. Given a sequence {xn } in <, define its limit inferior as limxn =
inf n supk≥n xk and its limit superior as limxn = supn inf k≥n xk , which
S
will be shown to always exist in < {−∞, +∞}, and we can show that
they are equal if and only if the real sequence {xn } converges.
25. A set I ⊂ < is an interval if it is not a singleton (a one-point set)
and if x, y ∈ I, x < y implies that there exists some z ∈ I such that
x < z < y.
26. A function g : I → <, where I is some interval, is continuous if for
all x ∈ I and all {xn } converging to x in I, {g(xn )} is a sequence
converging to g(x). The derivative of g(·) at x ∈ I, if it exists (i.e.,
it is a finite number), is the (common) limit of g(xxnn)−g(x)
for all {xn }
−x
converging to x. If the derivative of g is defined for all x ∈ I, written
g 0 (x), we say that g is differentiable on I. In this case, g 0 (·) itself is a
6
mapping from I into R. If g 0 is a continuous function, we say that g
is continuously differentiable on I.
27. (Axiom of Continuity.) If the real sequence {xn } is increasing (i.e.
xn+1 ≥ xn , for all n ∈ Z+ ), and if xn ≤ M ∈ < for all n ∈ Z+ , then
{xn } converges to some l ≤ M and xn ≤ l for all n ∈ Z+ .
28. (Nested Interval Theorem.) Suppose that {[an , bn ]; Z+ } is a sequence of closed intervals such that for all n, [an+1 , bn+1 ] ⊂ [an , bn ]
T
and limn→+∞ (bn − an ) = 0. Then, n∈Z+ [an , bn ] is a singleton.
29. (Least Upper Bound Property of <.) Suppose that A is a nonempty subset of <, and A is bounded above by some u ∈ <. Then
sup A exists. Similarly, if A is non-empty and bounded below, then
inf A exists.
30. Define the extended real line < ≡ < {+∞, −∞}. Define sup ∅ = −∞
and inf ∅ = +∞. If A ⊂ < is unbounded above (there exists no
upper bound for A in <), then define sup A = +∞. If A ⊂ < is
unbounded below (there exists no lower bound for A in <), then define
inf A = −∞. In this way, all subsets of < have well-defined suprema
and infima in <.
S
31. Suppose that {xn ; n ∈ Z+ } is a real sequence. Then its limit inferior
limxn ≡ sup inf xk
n∈Z+ k≥n
and its limit superior
limxn ≡ inf sup xk
n∈Z+ k≥n
both exist in extended real line <.
32. For any two real sequences {xn } and {yn }, we have
limxn + limyn ≤ lim(xn + yn ) ≤ limxn + limyn
≤ lim(xn + yn ) ≤ limxn + limyn .
33. (Bolzano-Weierstrass Theorem.) Every bounded sequence in <
contains a convergent subsequence.
7
34. A function f : A ⊂ < → < is continuous at x ∈ A if for all > 0, there
exists δ(x, ) > 0 such that
∀y ∈ A, |x − y| < δ(x, ) ⇒ |f (x) − f (y)| < .
The function is uniformly continuous at x if δ can be chosen to be
independent of x. We say f is a (uniformly) continuous function on
A if it is (uniformly) continuous at x for all x ∈ A. We can show that
if f : [a, b] → < is continuous on [a, b], then it is uniformly continuous
on [a, b].
35. (Intermediate Value Theorem.) If f : I → < is continuous where
I is an interval, then f (I) is either a singleton or an interval.
36. (Extreme Value Theorem.) If f : [a, b] → < is continuous then
f ([a, b]) is either a singleton or a bounded closed interval. In particular,
there exist m, M ∈ I such that for all x ∈ I, f (m) ≤ f (x) ≤ f (M ).
37. Concave Maximization.
38. A (real-valued) symmetric matrix An×n is positive definite (or PD),
if for all xn×1 6= 0n×1 , we have x0 Ax > 0. A symmetric matrix An×n
is negative definite (or ND), if −A is positive definite. A symmetric
matrix An×n is positive semi-definite (or PSD), if for all xn×1 ∈ Rn ,
we have xT Ax ≥ 0. A symmetric matrix An×n is negative semidefinite (or NSD), if −A is positive semi-definite.
39. Consider a twice differentiable function f : Rn → R. Let the Df :
Rn → Rn be the vector function
 ∂f
∂x1


 ∂f

 ∂x2

Df = 
 .
 .
 .


∂f
∂xn







,





2
which will be referred to as the gradient of f . Let D2 f : Rn → Rn
8
be the matrix function

∂2f
∂x1 ∂x1



 ∂2f
 ∂x2 ∂x1

D2 f = 


..

.



∂2f
∂xn ∂x1
∂2f
∂x1 ∂x2
···
∂2f
∂x1 ∂xn
∂x2 ∂x2
···
∂2f
∂x2 ∂xn
..
.
..
.
..
.
∂2f
∂xn ∂x2
···
∂2f
∂xn ∂xn
∂2f







,






which will be referred to as the Hessian of f .
40. A function f : Rn → R is concave, if for all x, y ∈ Rn and all λ ∈ [0, 1],
f (λx + (1 − λ)y) ≥ λf (x) + (1 − λ)f (y). A concave function is said to
be strictly concave if the above defining inequality is always strict. A
function f is (strictly) convex if −f is (strictly) concave. A function
is affine if it is both concave and convex. An affine function is linear
if it passes through the origin; that is, if f (0n×1 ) = 0. Note that by
definition, f is concave if f is strictly concave.
Theorem 1 A twice differentiable function f : U ⊂ Rn → R is concave, where U is an open convex subset of Rn , if and only if D2 f
is a negative semi-definite matrix at each and every x ∈ U ; and f is
strictly concave if D2 f is a negative definite matrix at each and every
x ∈ U .7
Suppose that n = 1 in the above theorem. Then f 00 ≤ 0 if f is
concave, and if x̃ is a random variable with finite expectation, then we
have f (E[x̃]) ≥ E[f (x̃)]. (This is the so-called Jensen’s inequality.)
41. Exercise 1 Define the function
f (x, y) = xa y 1−a , ∀x, y ∈ (0, +∞),
If f (x) = −x4 , then f 00 (0) = 0 (so that the Hessian of f at zero is negative semidefinite but not negative definite), but f is strictly concave. Hence the “only if” part
for strict concavity in general does not hold. However, if f : U ⊂ < → < is twice
continuously differentiable and concave (so that U becomes an open interval), and if f 00 is
not constantly zero over any subinterval in U , then f is strictly concave; see the following
link: people.exeter.ac.uk/dgbalken/BME05/LectTwo.pdf.
7
9
where a is a constant with 0 < a < 1. Clearly f : (0, +∞)×(0, +∞) →
< is twice continuously differentiable. Let us verify that f is strictly
concave. Note that

−y 2
D2 f = a(1 − a)xa−2 y −1−a 
xy

−x2
xy


.
Hence for any




k

2
∈< ,
h
we have

h
k h
i
D2 f 


= a(1 − a)xa−2 y −1−a
h
k h

k
i


h
−y 2


xy
xy
−x2



k
h



= −a(1 − a)xa−2 y −1−a (ky − hx)2 ≤ 0,
and the last inequality is strict unless k = h = 0. This proves that D2 f
is negative definite for all x, y ∈ (0, +∞), and hence f (·, ·) is strictly
concave.
Exercise 2 Define the function
√
f (x) = x, ∀x > 0.
Clearly f : (0, +∞) → < is twice continuously differentiable. Note
that
∂2f
1
D2 f =
=− 3,
(dx)2
4x 2
so that for all k ∈ <, we have
k × D2 f × k = −
k2
3
≤ 0,
4x 2
and the last inequality is strict unless k = 0. This proves that D2 f is
negative definite for all x > 0, and hence f (·) is strictly concave.
10
Exercise 3 Define the function
f (x, y) = g(x) + h(y),
where g, h : < → < are two strictly concave twice-differentiable functions. Then f (·, ·) is a strictly concave function also. Indeed, we have
in this case


g 00 (x)
0


D2 f = 
,
00
0
h (y)
so that given any




a

2
∈< ,
b
we have

h
a b
i
D2 f 
a

b



= a2 g 00 (x) + b2 h00 (y) ≤ 0,
and the last inequality holds strictly except in the case where a = b = 0.
This proves that D2 f is negative definite, and hence f (·, ·) is strictly
concave.
The preceding example shows that a twice-differentiable additively
separable function
f (x1 , x2 , · · · , xn ) = f1 (x1 ) + f2 (x2 ) + · · · + fn (xn )
is strictly concave if for all j, fj : < → < is strictly concave and twice
differentiable.
Theorem 2 Suppose that f : <n → < is twice-differentiable and concave, then for all x, a ∈ <n , we have8
f (x) − f (a) ≤ Df (a)0 (x − a).
The above inequality becomes strict if the Hessian of f is everywhere
negative definite and x 6= a.
8
The reader should draw a graph for these inequalities for the case of n = 1.
11
Proof. Recall the Theorem of Taylor Expansion with a Remainder: if
f : <n → < is twice continuously differentiable, then for each x, a ∈
<n , there exists some y lying on the line segment connecting x and a
such that
1
f (x) = f (a) + Df (a)0 (x − a) + (x − a)0 D2 f (y)(x − a).
2
Now suppose that f is concave, so that D2 f (y) is negative semidefinite. Then we have
1
(x − a)0 D2 f (y)(x − a) ≤ 0,
2
so that
f (x) − f (a) ≤ Df (a)0 (x − a).
If D2 f (y) is negative definite, then given x 6= a, we have
1
(x − a)0 D2 f (y)(x − a) < 0,
2
implying that
f (x) − f (a) < Df (a)0 (x − a).
Q.E.D.
42. According to the Theorem of Taylor Expansion with a Remainder,
when moving from a to x ∈ B(a, r) in f ’s domain of definition, the
functional value of f would increase if
1
Df (a)T (x − a) + (x − a)T D2 f (y)(x − a) > 0.
2
Note that when r is small enough, the preceding inequality holds if
and only if9
Df (a)T (x − a) = kDf (a)kkx − ak cos(θ) > 0,
9
Imagine that we replace (x − a) by (x − a), and observe that when > 0 is a
tiny number, |Df (a)T (x − a)| is much greater than 2 | 12 (x − a)T D2 f (y)(x − a)|. Thus
Df (a)T (x − a) + 2 [ 21 (x − a)T D2 f (y)(x − a)] is positive (respectively, negative) if and
only if Df (a)T (x − a) is positive (respectively, negative).
12
where θ is the angle between the two vectors Df (a) and x − a, which
must be acute to ensure the above inequality.10
43. We shall from now on confine attention to twice differentiable functions.
A necessary condition for x∗ ∈ <n to solve the following maximization
program (P1)
maxn f (x)
x∈<
Df (x∗ )
is that
= 0, which will be referred to as the first-order conditions for the optimal solution x∗ . This necessary condition is also
sufficient, if f is concave.
44. Consider the following maximization program (P2):
max f (x)
x∈<n
subject to
∀i = 1, 2, · · · , m, gi (x) = 0,
where m < n.
Theorem 3 (Lagrange Theorem) Suppose that x∗ solves (P2) and
{Dgj (x∗ ); j ∈ {1, 2, · · · , m} is a set of linearly independent gradient
vectors. (This is known as a constraint qualifications condition.)
Then, there must exist m constants (called the Lagrange multipliers) π1 , π2 , · · · , πm such that
10
Recall the following Law of Cosines: Let ∆ABC be a triangle whose sides a, b, c are
such that a is opposite A, b is opposite B and c is opposite C. Let D be the element in
the line passing A and C such that the two line segments BD and CD are orthogonal to
each other. Then, since the length of BD is a sin C and the length of AD is |b − a cos C|,
we have c2 = a2 + b2 − 2ab cos C. Now, given two vectors v, w and the angle θ between
them, if we let a = kvk, b = kwk and c = kv − wk, then it follows from the Law of Cosines
that
kvk2 +kwk2 −2(v·w) = (v−w)·(v−w) = c2 = a2 +b2 −2ab cos θ = kvk2 +kwk2 −2kvkkwk cos(θ),
implying that the following Cosine Formula for Dot Product must hold:
v · w = kvkkwk cos(θ).
13
(i) ∀i = 1, 2, · · · , m, gi (x∗ ) = 0; and
(ii)
m
X
πi Dgi (x∗ ) = Df (x∗ ).
i=1
Conversely, if f is concave and all gi ’s are affine, then if x∗ satisfies
(i) and (ii), then x∗ solves (P2). If f is strictly concave, such x∗ is
unique.
To see the role of the constraint qualifications condition in the
above necessary condition, consider the following example with n =
m = 2: f (x1 , x2 ) = x1 + x2 , g1 (x1 , x2 ) = x21 + x22 − 1, g2 (x1 , x2 ) =
x21 + (x2 − 2)2 − 1. There is only one feasible solution in this case,
namely, (x1 , x2 ) = (0, 1), which is apparently the optimal solution.
Thus x∗ = (0, 1). However, there does not exist π1 , π2 ∈ < such that
Df (x∗ ) = π1 Dg1 (x∗ ) + π2 Dg2 (x∗ ). This happens because Dg1 (x∗ )
and Dg2 (x∗ ) are linearly dependent!
45. Now, let us prove the sufficiency of Lagrange Theorem. Suppose that
there exists x∗ satisfying
(i) ∀i = 1, 2, · · · , m, gi (x∗ ) = 0; and
(ii)
m
X
πi Dgi (x∗ ) = Df (x∗ ),
i=1
for some π1 , π2 , · · · , πm ∈ <. Consider any other x satisfying
∀i = 1, 2, · · · , m, gi (x) = 0.
We must show that
f (x∗ ) ≥ f (x).
To see that this is true, first note that for all i = 1, 2, · · · , m, by the fact
that gi is affine, we have aTi x + bi = gi (x) = 0 = gi (x∗ ) = aTi x∗ + bi ,
where ai is the gradient of gi , so that
0 = gi (x) − gi (x∗ ) = aTi (x − x∗ ) = Dgi (x∗ )T (x − x∗ ).
14
Now, observe that
f (x) − f (x∗ ) ≤ Df (x∗ )T (x − x∗ ) =
m
X
πi Dgi (x∗ )T (x − x∗ ) = 0,
i=1
and hence x∗ is indeed an optimal solution to (P2). Finally, if both x∗
and x∗∗ are optimal solutions to (P2), then x0 ≡ 12 (x∗ + x∗∗ ) must be
feasible also (i.e., gi (x0 ) = 0 for all i), and yet by Jensen’s inequality
we have
1
f (x0 ) > [f (x∗ ) + f (x∗∗ )] = f (x∗ ),
2
which is a contradiction. Hence the optimal solution must be unique
when f is strictly concave.
46. Consider the following maximization program (P3):
max f (x)
x∈<n
subject to
∀i = 1, 2, · · · , m, gi (x) ≤ 0.
Theorem 4 (Kuhn-Tucker Theorem) Suppose that there exists some
x̂ such that gi (x̂) < 0 for all i = 1, 2, · · · , m. (This is called the Slater
Condition.)11 Then if x∗ is a solution to (P3), there must exist m
non-negative constants (called the Lagrange multipliers for the m
constraints) π1 , π2 , · · · , πm such that (i)
m
X
πi Dgi (x∗ ) = Df (x∗ );
i=1
and (ii) (complementary slackness) for all i = 1, 2, · · · , m, πi gi (x∗ ) =
0. Conversely, if f is concave and for all i = 1, 2, · · · , m, gi : Rn → R
is convex, and if x∗ satisfies the above (i) and (ii), then x∗ is a solution
to the above program (P3). If f is strictly concave, such x∗ is unique.
11
Slater condition is a constraint qualification condition, and it applies mainly to the
case where f is concave and gi ’s are convex. There are several other constraint qualification
conditions. For example, one of them requires that there exist x̂ such that, for all i =
1, 2, · · · , m, Dgi (x∗ )T x̂ < 0 whenever gi (x∗ ) = 0. Let us call this condition A.
15
To see the role of the Slater Condition consider the following example with m = n = 2: f (x1 , x2 ) = x1 + x2 , g1 (x1 , x2 ) = x21 + x22 − 1,
g2 (x1 , x2 ) = x21 + (x2 − 2)2 − 1. There is only one feasible solution in
this case, namely, (x1 , x2 ) = (0, 1), which is apparently the optimal
solution. Thus x∗ = (0, 1). However, there does not exist π1 , π2 ∈ <+
such that Df (x∗ ) = π1 Dg1 (x∗ ) + π2 Dg2 (x∗ ). This happens because
we cannot find some x̂ such that g1 (x̂) < 0 and g2 (x̂) < 0!
To see why Df (x∗ ) must be contained in the polyhedral cone generated
by {Dgi (x∗ ); i = 1, 2, · · · , m}, note that in the opposite case we would
be able to find some tiny vector a ≡ x − x∗ such that a and each
Dgi (x∗ ) together create an obtuse angle but it and Df (x∗ ) together
create an acute angle. This implies that we can find another feasible
solution x close to x∗ such that f (x) > f (x∗ ), which is a contradiction.
I shall demonstrate this fact in class using a graph.12
47. Let me prove the sufficiency of Kuhn-Tucker Theorem. Suppose that
f and all −gi are concave. Suppose that there exist non-negative
π1 , π2 , · · · , πm such that at x∗ , (i)
m
X
πi Dgi (x∗ ) = Df (x∗ );
i=1
and (ii) (complementary slackness) for all i = 1, 2, · · · , m, πi gi (x∗ ) =
0. We must show that given any x such that gi (x) ≤ 0 for all i =
1, 2, · · · , m, we have
f (x) − f (x∗ ) ≤ 0.
12
The necessity that Df (x∗ ) must be contained in the polyhedral cone generated by
{Dgi (x∗ ); i = 1, 2, · · · , m} is actually a consequence of the following Farkas Lamma: if
a1 , a2 , · · · , am , b ∈ <n are such that b · x ≤ 0 whenever
Pm ai · x ≤ 0 for all i = 1, 2, · · · , m,
where x ∈ <n , then it must be that b ∈ P ≡ { i=1 ti ai : ti ∈ <+ , ∀i = 1, 2, · · · , m}.
We say that P is a polyhedral cone generated by a1 , a2 , · · · , am . The idea is that P is
non-empty, closed and convex, and if P does not contain b, then there exists a hyperplane
that separates strictly P and b; see section 11 of the lecture note Separating Hyperplane
Theorem. That is, there exists some y ∈ <n and α ∈ < such that p · y < α for all p ∈ P
and yet b · y > α. Note that if p ∈ P is such that p · y > 0, then α > 0, but since tp ∈ P
for any t ∈ <+ , we must have tp · y > α for sufficiently large t, a contradiction. Since
0 ∈ P, we actually have α > 0 = 0 · y. Thus we must have p · y ≤ 0 for all p ∈ P and
yet b · y > 0. Now, if we let b = Df (x∗ ) and ai = Dg(x∗ ) for all i = 1, 2, · · · , m, and if
we suppose that the necessity fails but condition A holds, then for , e > 0 small enough,
gi (x∗ + (1 − e)y + ex̂) ≤ gi (x∗ ) ≤ 0 but f (x∗ + (1 − e)y + ex̂) > f (x∗ ), which is a
contradiction.
16
To this end, note that
f (x) − f (x∗ ) ≤ Df (x∗ )T (x − x∗ )
=
m
X
πi Dgi (x∗ )T (x − x∗ ) ≤
i=1
m
X
πi [gi (x) − gi (x∗ )]
i=1
=
m
X
πi gi (x) ≤ 0,
i=1
where the first inequality follows from the preceding footnote and the
fact that f is concave, the first equality follows from the assumption
that the Kuhn-Tucker condition holds at x∗ , the second inequality follows from the fact that πi ≥ 0 and gi is convex for all i = 1, 2, · · · , m,
and the last equality follows from the assumption that the complementary slackness holds at x∗ , and the fact that gi (x) ≤ 0 for all
i = 1, 2, · · · , m. Thus x∗ is indeed an optimal solution to (P3). Finally,
if both x∗ and x∗∗ are optimal solutions to (P2), then x0 ≡ 12 (x∗ + x∗∗ )
must be feasible also (i.e., by Jensen’s inequality gi (x0 ) ≤ 0 for all i),
and yet by Jensen’s inequality again we have
1
f (x0 ) > [f (x∗ ) + f (x∗∗ )] = f (x∗ ),
2
which is a contradiction. Hence the optimal solution must be unique
when f is strictly concave.
48. Theorem 5 (Kuhn-Tucker Theorem with Both Inequality and
Equality Constraints.) Consider the following maximization problem, which generalizes the problems stated in Theorem 3 and Theorem
4:
maxn f (x)
x∈<
subject to
gi (x) ≤ 0, ∀i = 1, 2, · · · , m;
hj (x) = 0, ∀j = 1, 2, · · · , l.
This maximization problem involves m inequality constraints and l
equality constraints. The associated Slater condition is as follows:
there exists x̂ ∈ <n such that
gi (x̂) < 0, ∀i = 1, 2, · · · , m;
17
hj (x̂) = 0, ∀j = 1, 2, · · · , l.
When the Slater condition holds, the following Kuhn-Tucker necessary
conditions must hold: at an optimal solution x∗ , where {Dhj (x∗ ); j =
1, 2, · · · , l} consists of a set of linearly independent vectors, there must
T
l
exist (µ1 , µ2 , · · · , µm )T ∈ <m
+ and (λ1 , λ2 , · · · , λl ) ∈ < such that
(Stationarity) Df (x∗ ) =
m
X
µi Dgi (x∗ ) +
i=1
l
X
λj Dhj (x∗ );
j=1
hj (x∗ ) = 0, ∀j = 1, 2, · · · , l;
(Complementary Slackness) µi gi (x∗ ) = 0, ∀i = 1, 2, · · · , m.
Conversely, when the Slater condition holds, f is concave, gi ’s are conT
vex, and hj ’s are affine, if x∗ , (µ1 , µ2 , · · · , µm )T ∈ <m
+ , and (λ1 , λ2 , · · · , λl ) ∈
<l satisfy the above Kuhn-Tucker necessary conditions, then x∗ must
be an optimal solution.13
49. Let us give exercises.
Exercise 4 Consider the differentiable function f : <n → < defined
by
f (xn×1 ) = a0 x, ∀x ∈ <n ,
where an×1 is a given vector. Show that Df (x) = a for all x ∈ <n .14
Exercise 5 Consider the differentiable function f : <n → < defined
by
f (xn×1 ) = x0 Ax, ∀x ∈ <n ,
13
Visit for example
https://en.wikipedia.org/wiki/Karush%E2%80%93Kuhn%E2%80%93Tucker conditions
to see a list of regularity conditions (or constraint qualification conditions) including the
Slater condition. After Harold W. Kuhn and Albert W. Tucker published the Kuhn-Tucker
conditions in 1951, scholars discovered that these necessary conditions had been stated by
William Karush in his master’s thesis in 1939. Hence nowadays people also refer to the
Kuhn-Tucker conditions as Karush-Kuhn-Tucker
conditions.
Pn
∂f
14
By definition, f (xn×1 ) = a0 x =
a x , so that ∂x
= aj , which by definition is
j=1 j j
j
the j-th element of Df .
18
where An×n is a given square matrix. Show that15 for all x ∈ <n ,
Df (x) = Ax + A0 x
and
D2 f (x) = A + A0 .
Exercise 6 Let 1 be the n-vector of one’s. Let e and V be respectively
some given n-vector and n × n symmetric positive definite matrix.
Solve the following maximization problem: given constant k > 0,
k
max x0 e − x0 Vx,
2
x∈<n
15
(Step 1.) First recall that the j-th element of Am×n xn×1 is simply the inner product
of x and the j-th row of A:
α1T x
α1T
T
 α2T x 
 α2 

 . ,
x
=
Ax = 
 . 
 .. 
.
.
T
T
αn
x
αn




where αjT is the j-th row of A, and the second equality follows when we treat A as a
column vector and x as a scalar (check conformability!).
(Step 2.) Now, by definition, we have
f (xn×1 ) = x0 Ax =
n
n
X
X
xi aik xk ,
i=1 k=1
so that
∂
∂f
=
∂xj
Pn Pn
i=1
k=1
∂xj
xi aik xk
=
∂
=
Pn
Pn
i=1
n
X
xi k=1 aik xk
∂
|i=j +
∂xi
ajk xk +
k=1
n
X
Pn
k=1
Pn
xk
∂xk
i=1
xi aik
|k=j
aij xi ,
i=1
which is the inner product of x and the j-th row of A + AT ! Hence we know from Step 1
that Df (x) = (A+A0 )x. Now, observe that the j-th row of D2 f is simply the transpose of
∂f
the gradient of ∂x
, or equivalently, it is the transpose of the gradient of the j-th element
j
of Df . Since the j-th element of Df is simply the inner product of x and the j-th row of
A + AT , its gradient, by the preceding exercise, is simply the j-th row of A + AT ! Now,
since A + AT is a symmetric matrix, the transpose of the j-th row of A + AT is exactly
the j-th row of A + AT . Hence we conclude that the j-th row of D2 f is the j-th row of
A + AT , or equivalently, D2 f = A + AT .
19
subject to the constraint
x0 1 = 1.
(Hint: First check if the objective function f (x) ≡ x0 e − k2 x0 Vx is
concave and if the contraint function g(x) ≡ x0 1 − 1 is affine. Then
apply Lagrange theorem.)
Exercise 7 Consider the following utility maximization problem:
1
2
max f (x, y) = x 3 y 3 ,
x,y∈<+
subject to the budget constraint
Px x + Py y ≤ I,
where Px , Py , I > 0 are given constants. The idea is that the decisionmaker is a consumer endowed with I dollars, who wishes to maximize
the utility f (x, y) from consuming x units of good X and y units of
good Y, where the prices for the two goods are respectively Px and Py .
(i) Show that f (x, y) = 0 if xy = 0, and yet there exists (x,0 , y 0 ) satisfying x0 > 0, y 0 > 0, f (x0 , y 0 ) > 0, and Px x0 + Py y 0 < I. (Try for
example x0 = 3PI x and y 0 = 3PI y .) Thus we conclude that (1) the Slater
condition holds; and (2) the decision-maker can focus on those (x, y)
satisfying x > 0, y > 0, and Px x + Py y ≤ I.
(ii) Denote by <2++ the set {(x, y) ∈ <2 : x > 0, y > 0}. Show that
on the restricted domain <++ , f (x, y) is such that

D2 f = 
5
2
− 29 x− 3 y 3
2 − 23 − 13
y
9x

2 − 23 − 13
y
9x
1
4
− 29 x 3 y − 3


,
and that

h
x y
i


5
2
− 29 x− 3 y 3
2 − 32 − 13
y
9x
2 − 23 − 13
y
9x
1
4
− 92 x 3 y − 3
20



x
y


2
 ≤ 0, ∀(x, y) ∈ <++ .
Conclude that f : <2++ → < a concave function.
(iii) Define g(x, y) = Px x + Py y − I. Show that D2 g is a positive
semi-definite matrix at all (x, y) ∈ <2++ .
(iv) Use the Kuhn-Tucker theorem to show that the optimal solution
is
I
2I
(x∗ , y ∗ ) = (
,
).
3Px 3Py
50. If Ãm×n is a two-dimensional array of mn random variables (i.e. each
element ãij is a random variable), then à is called a random matrix
and E(Ã) is a non-random matrix with the same size as Ã, with its
(i, j)-th element being
E(ãij ), ∀i = 1, 2, · · · , m; ∀j = 1, 2, · · · , n.
We shall call E(Ã) the expected value of the random matrix Ã.
If n = 1, Ãm×n = r̃m×1 is called a random vector. The variancecovariance matrix of the random vector r̃ is defined by the m ×
m non-random matrix V, with vij = cov(r˜i , r˜j ). Since for any two
random variables x̃ and ỹ, cov(x̃, ỹ) = cov(ỹ, x̃) (verify!), the matrix V
is symmetric. Moreover, since for any random variable x̃, cov(x̃, x̃) =
var(x̃) (verify!), the major diagonal elements of V are, respectively,
the variances of the random variables in the random vector r̃!
51. Theorem 6. Let V be the covariance matrix of the random vector
r̃. The following statements are true.
(i) Denote E[r̃] by e. Then V = E[(r̃ − e)(r̃ − e)0 ].
(ii) V is positive semi-definite. V is positive definite if and only if
there exists no non-zero m-vector x such that the variance of xT r̃
equals zero. In case m = 2, such a non-zero x exists if and only if r̃1
and r̃2 are perfectly correlated.
21
(iii) If ỹ has zero variance, then the event {ỹ 6= E[ỹ]} may occur only
with zero probability; that is, ỹ is essentially a (non-random) constant.
(iv) Given any m-vectors w, w1 and w2 , the expected value of wT r̃
is w0 e, the variance of wT r̃ is w0 Vw, and the covariance of w1T r̃ and
w2T r̃ is w10 Vw2 .
Proof. Consider part (i). For all k, j ∈ {1, 2, · · · , m}, the (k, j)-th
element of the matrix E[(r̃ − e)(r̃ − e)0 ] is
E[(r̃k − E[r̃k ])(r̃j − E[r̃j ])] = cov(r̃k , r̃j ),
and hence V and E[(r̃ − e)(r̃ − e)0 ] are the same matrix.
Next, consider part (ii). Pick any xm×1 , and observe that
x0 Vx = x0 E[(r̃ − e)(r̃ − e)0 ]x
= E[x0 (r̃ − e)(r̃ − e)0 x]
= E[(x0 r̃ − x0 e))((r̃0 x − e0 x)]
= E[(x0 r̃ − E[x0 r̃])2 ] = var[x0 r̃] ≥ 0.
Thus by definition, V is positive semi-definite. Moreover, we see that
V is positive definite if and only if
var[x0 r̃] = x0 Vx > 0
for all xm×1 6= 0m×1 .
Now, if m = 2, we have
"
V2×2 =
var[r̃1 ]
cov(r̃1 , r̃2 )
cov(r̃1 , r̃2 )
var[r̃2 ]
#
.
Recall that a positive semi-definite matrix V is singular if and only
if V is not positive definite, and the latter is true if and only if there
exists xm×1 6= 0m×1 such that x0 Vx = 0. Thus there exists a vector
xm×1 6= 0m×1 such that x0 Vx = 0 if and only if the determinant of V
is zero, or
22
cov(r̃1 , r̃2 )2
= 1,
var[r̃1 ]var[r̃2 ]
implying that the coefficient of correlation for (r̃1 , r̃2 ) equals either 1
or −1.
For part (iii), suppose that var[ỹ] = 0. A mathematical fact says that
if var[ỹ] is a non-negative real number, then E[ỹ] is a real number. By
definition,
0 = var[ỹ] = E[(ỹ − E[ỹ])2 ],
and the above right-hand side is the Lebesgue integral of a non-negative
function (ỹ − E[ỹ])2 . Since the integral is equal to zero, the event that
(ỹ − E[ỹ])2 > 0 may occur only with zero probability. Equivalently,
we conclude that
{ỹ 6= E[ỹ]}
is a zero-probability event; that is, ỹ is essentially a constant.
An example is as follows. Let z̃ ∼N(0, 1) be a standard normal random
variable, and define a new random variable ỹ = f (z̃) as follows:
f (x) =


 0, if x is not a rational number;

 1, if x is a rational number.
Then we have E[ỹ] = 0 =var[ỹ], and although the realization of ỹ is not
always equal to zero, those non-zero realizations together may occur
only with zero probability.
Finally, consider part (iv). By definition, we have
E[w0 r̃] = E[
N
X
wj r̃j ] =
j=1
N
X
wj E[r̃j ] = w0 e.
j=1
Also, note that
w10 Vw2 = E[(w10 r̃ − w10 e)(w20 r̃ − w20 e)0 ]
23
= E[(w10 r̃ − w10 e)(w20 r̃ − w20 e)] = cov(w10 r̃, w20 r̃).
In particular, if w1 = w2 = w, then we have
w0 Vw = w10 Vw2 = cov[w10 r̃, w20 r̃] = cov[w0 r̃, w0 r̃] = var[w0 r̃].
This finishes the proof. k
52. (Multi-variate Gaussian Random Variables.) Suppose that V
is positive definite. We say that the m ≥ 1 random variables in r̃ are
multi-variate normal, if the joint density function of (r̃1 , r̃2 , · · · , r̃m ) is
m
1
1
0
f (r) = (2π)− 2 |V|− 2 e− 2 (r−e) V
−1 (r−e)
, ∀r ∈ <m .
In this case, we simply write16
r̃ ∼ N(e, V).
More generally, given a vector
"
z̃(m+n)×1 =
x̃n×1
ỹm×1
#
of multi-variate normal random variables, where
"
x̃n×1
ỹm×1
#
"
an×1
∼ N(
bm×1
#
"
,
Vn×n Cn×m
C0m×n Um×m
#
),
the conditional expectation of x given y is
E[x|y] = a + CU−1 (y − b),
and the conditional covariance matrix of x given y is
var(x|y) = V − CU−1 C0 .
16
Some related results are as follows. Given any non-random matrix An×m , the random
vector Ar̃ is multivariate normal if r̃ is multivariate normal. In fact, an equivalence
condition for r̃ to be multivariate normal is that bT r̃ is a normal random variable for
all bm×1 ∈ <m . Note that if r̃ is multivariate normal, then each random variable r̃i
is one normal random variable. Conversely, suppose that for all i ∈ {1, 2, · · · , m}, r̃i is
one normal random variable. Then, r̃ is multivariate normal if the m random variables
(r̃1 , r̃2 , · · · , r̃n ) are totally independent.
24
53. Now we review conditional moments and the law of iterated expectations.
Suppose that x̃ has two possible outcomes, 0 and 30, and that ỹ has
two possible outcomes, 20 and -20. The joint probabilities for the 4
possible pairs of outcomes of x̃ and ỹ are summarized in the following
table.
−20
0.1
0.3
x̃/ỹ
0
30
20
0.2
0.4
The probabilities of the following 4 events define the joint distribution
of x̃ and ỹ:
prob.(x̃ = 0, ỹ = −20) = 0.1; prob.(x̃ = 0, ỹ = 20) = 0.2;
prob.(x̃ = 30, ỹ = −20) = 0.3; prob.(x̃ = 30, ỹ = 20) = 0.4.
From the joint distribution of the two random variables, we can derive
the marginal distribution of each random variable. More precisely,
we immediately have the marginal distribution of x̃, defined by the
probabilities for the following two events:
prob.(x̃ = 0) = prob.(x̃ = 0, ỹ = −20)+prob.(x̃ = 0, ỹ = 20) = 0.1+0.2 = 0.3;
prob.(x̃ = 30) = prob.(x̃ = 30, ỹ = −20)+prob.(x̃ = 30, ỹ = 20) = 0.3+0.4 = 0.7.
We can similarly obtain the marginal distribution of ỹ, defined by the
probabilities for the following two events:
prob.(ỹ = −20) = prob.(x̃ = 0, ỹ = −20)+prob.(x̃ = 30, ỹ = −20) = 0.1+0.3 = 0.4;
prob.(ỹ = 20) = prob.(x̃ = 0, ỹ = 20)+prob.(x̃ = 30, ỹ = 20) = 0.2+0.4 = 0.6.
Now we can obtain the expected value and variance for each random
variable, using that random variable’s marginal distribution. Verify
that
E[x̃] = 0 × 0.3 + 30 × 0.7 = 21, E[ỹ] = −20 × 0.4 + 20 × 0.6 = 4,
var(x̃) = (0 − 21)2 × 0.3 + (30 − 21)2 × 0.7 = 189,
var(ỹ) = (−20 − 4)2 × 0.4 + (20 − 4)2 × 0.6 = 384.
25
Finally, we can use the joint distribution of the two random variables
to compute their covariance:
cov(x̃, ỹ) = (0 − 21) × (−20 − 4) × 0.1 + (0 − 21) × (20 − 4) × 0.2
+(30 − 21) × (−20 − 4) × 0.3 + (30 − 21) × (20 − 4) × 0.4 = −24.
We shall need the concepts of conditional distribution and conditional expectation of a random variable. The former is the probability distribution of one random variable conditional on knowing the
other random variable’s outcome. For example, the conditional distribution of x̃, after we know that the outcome of ỹ is −20, is defined by
the probabilities of the following two events:
prob.(x̃ = 0|ỹ = −20) =
=
prob.(x̃ = 0, ỹ = −20)
0.1
1
=
= ;
prob.(x̃ = 0, ỹ = −20) + prob.(x̃ = 30, ỹ = −20)
0.1 + 0.3
4
prob.(x̃ = 30|ỹ = −20) =
=
prob.(x̃ = 0, ỹ = −20)
prob.(ỹ = −20)
prob.(x̃ = 30, ỹ = −20)
prob.(ỹ = −20)
prob.(x̃ = 30, ỹ = −20)
0.3
3
=
= .
prob.(x̃ = 0, ỹ = −20) + prob.(x̃ = 30, ỹ = −20)
0.1 + 0.3
4
Now, after we know that the outcome of ỹ is −20, with the conditional
distribution of x̃, we can compute the conditional expectation for x̃:
E[x̃|ỹ = −20] = 0 ×
1
3
90
+ 30 × = .
4
4
4
Similarly, we can find the conditional distribution of x̃ after we know
that the outcome of ỹ is 20. Again, this is defined by the probabilities
of two events:
prob.(x̃ = 0|ỹ = 20) =
=
prob.(x̃ = 0, ỹ = 20)
prob.(ỹ = 20)
prob.(x̃ = 0, ỹ = 20)
0.2
1
=
= ;
prob.(x̃ = 0, ỹ = 20) + prob.(x̃ = 30, ỹ = 20)
0.2 + 0.4
3
prob.(x̃ = 30|ỹ = 20) =
26
prob.(x̃ = 30, ỹ = 20)
prob.(ỹ = 20)
=
0.4
2
prob.(x̃ = 30, ỹ = 20)
=
= .
prob.(x̃ = 0, ỹ = 20) + prob.(x̃ = 30, ỹ = 20)
0.2 + 0.4
3
Hence we have
E[x̃|ỹ = 20] = 0 ×
1
2
+ 30 × = 20.
3
3
These computations tell us that, if a high outcome of x̃ would make
us better off, then a lower realization of ỹ is good news.
54. Theorem 7. (Law of iterated expectations; LIE). The formula
E[E[x̃|ỹ]] = E[x̃],
which says that the expected value of the conditional expectations of
x̃ is equal to the original (unconditional) expected value of x̃, holds
whenever E[x̃] is a finite real number, and is referred to as the law of
iterated expectations.
Note that E[x̃|ỹ] is the expected value of x̃ after seeing the realization
of ỹ, and hence it naturally varies with the realization of ỹ. In the
above example, we know from the marginal distribution of ỹ that ỹ
may equal −20 with probability 0.4 and equal 20 with probability 0.6,
and E[x̃|ỹ] is the random variable that equals 90
4 when ỹ = −20 and
equals 20 when ỹ = 20. Thus the expected value of these conditional
expectations is
0.4 ×
90
+ 0.6 × 20 = 9 + 12 = 21 = E[x̃],
4
confirming that, indeed,
E[E[x̃|ỹ]] = E[x̃].
55. Suppose that a : < → <, b : < → <, and f : <2 → < are all continuous
differentiable. Define
F (x) ≡
Z a(x)
f (x, t)dt, ∀x ∈ <.
b(x)
Then we have (Leibniz Rule)
F 0 (x) = −b0 (x)f (x, b(x)) + a0 (x)f (x, a(x)) +
Z a(x)
∂f
b(x)
27
∂x
(x, t)dt.
56. A function f : U ⊂ <n → < is quasi-concave, if for all r ∈ <, the set
{x ∈ U : f (x) ≥ r}
is a convex subset of <n , where we assume that U is itself open and
convex. When n = 1, any monotone function f : < → < is quasiconcave. The function f : <2++ → < defined by f (x, y) = xy is not
concave, but is still quasi-concave.
57. A function f : <n → < is strictly increasing if for all > 0 and for all
x ∈ <n , f (x + uj ) > f (x), where uj has its j-th element equal to one
and other elements equal to zero.
58. Consider the following maximization problem (P4), where f : <n → <
is strictly increasing and quasi-concave, and g : <n → < is strictly
increasing and convex, and both f and g are twice continuously differentiable:
maxn f (x)
x∈<
subject to
g(x) ≤ 0.
Suppose that there exists x0 , x̂ ∈ <n such that g(x̂) < 0 < g(x0 ).
Then (P4) has a solution x∗ such that g(x∗ ) = 0 and for some µ > 0,
Df (x∗ ) = µDg(x∗ ).
28
Download