ERROR ESTIMATE AND CONVERGENCE ANALYSIS OF MOMENT-PRESERVING DISCRETE APPROXIMATIONS OF CONTINUOUS DISTRIBUTIONS

advertisement
ERROR ESTIMATE AND CONVERGENCE ANALYSIS OF
MOMENT-PRESERVING DISCRETE APPROXIMATIONS OF
CONTINUOUS DISTRIBUTIONS∗
KEN’ICHIRO TANAKA† AND ALEXIS AKIRA TODA‡
Abstract. The maximum entropy principle is a powerful tool to solve underdetermined inverse
problems. In this paper we consider the problem of finding a discrete approximation of a continuous
distribution, which arises in various applied fields. We obtain the approximating distribution by minimizing the Kullback-Leibler information of the unknown discrete distribution relative to the known
continuous distribution (evaluated at given discrete points) subject to some moment constraints. We
study the theoretical error bound and the convergence property of this approximation method as
the number of discrete points increases. The order of the theoretical error bound of the expectation of any bounded continuous function with respect to the approximating discrete distribution is
never worse than the integration formula we start with, and we prove the weak convergence of the
discrete distribution to the given continuous distribution. Moreover, we present some numerical examples that show the advantage of the method and apply to numerically solving an optimal portfolio
problem.
Key words. probability distribution, discrete approximation, generalized moment, integration
formula, Kullback-Leibler information, Fenchel duality, error estimate, convergence analysis
AMS subject classifications. 41A25, 41A29, 62E17, 62P20, 65D30, 65K99
1. Introduction. This paper has two goals. First, we propose a numerical
method to approximate continuous probability distributions by discrete ones. Second,
we study the convergence property of the method and derive error estimates. Discrete
approximations of continuous distributions is important in applied numerical analysis.
To motivate the problem, we list a few concrete examples, many of which come from
economics.
Optimal portfolio problem. Suppose that there are J assets indexed by j =
1, . . . , J. Asset j has gross return Rj (which is a random variable), which means
that a dollar invested in asset j will give a total return of Rj dollars over the investment horizon. Let θj be the fraction of an investor’s wealth invested in asset j (so
∑J
j=1 θj = 1) and θ = (θ1 , . . . , θJ ) be the portfolio. Then the gross return on the port∑J
folio is the weighted average of each asset return, R(θ) := j=1 Rj θj . Assume that
[
]
1
the investor wishes to maximize the risk-adjusted expected returns E 1−γ
R(θ)1−γ ,
where γ > 0 is the degree of relative risk aversion.1 Since in general this expectation has no closed-form expression in the parameter θ, in order to maximize it we
need to carry out a numerical integration. This problem is equivalent to finding an
approximate discrete distribution of the returns (R1 , . . . , RJ ).
Optimal consumption-portfolio problem. In the above example the portfolio choice
was a one time decision. But we can consider a dynamic version, where the investor
chooses the optimal consumption and portfolio over time (Samuelson [31] and Merton
∗ June
2, 2014
† Corresponding
author: School of Systems Information Science, Future University Hakodate
(ketanaka@fun.ac.jp).
‡ Department of Economics, University of California San Diego (atoda@ucsd.edu).
1 Readers unfamiliar with economic concepts need not worry here. The point is that we want to
maximize an expectation that is a function of some parameter. The case γ = 1 corresponds to the
log utility E[log R(θ)].
1
2
KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA
[28] are classic examples that admit closed-form solutions. Almost all practical problems, however, admit no closed-form solutions). Since we need to numerically solve a
portfolio problem for each time period, we face one more layer of complexity to the
problem.
General equilibrium problem. Instead of solving the optimization problem of a
single investor, we can consider a model of the whole economy, and might want to
determine the asset prices that make demand equal to supply. This is called a general
equilibrium problem in economics. Since we need to solve a dynamic optimal portfolio
problem given asset prices, and we need to find asset prices that clear markets, we
face yet another layer of complexity. Such problems are especially computationally
intensive [18, 4, 25], and it is important to find a discrete approximation of a continuous distribution with only a small number of support points in order to mitigate the
‘curse of dimensionality’.
Simulating stochastic processes. In many fields, including option pricing, it is
necessary to simulate many sample paths of diffusion or autoregressive processes. To
carry out the simulation, one needs to approximate them by a finite state Markov
chain [35].
The above examples can be generalized as follows. We want compute an expectation E[g(X)] for a function g : RK → R and a random variable X with a given
continuous distribution, say with a density f . If the function g is explicitly known, it
suffices to use some integration formula
∫
(1.1)
E[g(X)] =
RK
g(x)f (x) dx ≈
M
∑
wi,M g(xi,M )f (xi,M )
i=1
for the computation, where M is the number of integration points and wi,M is the
weight on the point xi,M . In many cases, especially in economics (as the above
portfolio problem), the function g may contain some unknown parameter θ that we
want to determine from some given conditions for E[g(X, θ)], say
θ∗ = arg max E[g(X, θ)].
θ
In addition, the values g(x, θ) or f (x) may be available only on some prescribed
discrete set of x’s, say D ⊂ RK . Such a restriction may arise when g(·, θ) or f is
obtained by some statistical estimate from real data, or when the application requires
the discrete set D to be a particular set, say lattice points. In such a situation, we
need to compute E[g(X, θ)] many times for different parameter values θ using only
the discrete points x’s in D. Thus it is desirable to use a highly accurate integration
formula with light computing load, say small M in (1.1).
Some popular formulas, such as the Newton-Cotes type or the Gauss type formulas
(see [13] for a standard textbook treatment), are suitable for such a purpose. These
formulas, however, are not necessarily available if the integration points xi,M ’s are
restricted as {xi,M } ⊂ D. If that is the case, we need some recipe for approximating
an expectation as in (1.1). Note that such an approximation is equivalent to finding
a discrete distribution {wi,M f (xi,M )} that approximates the given distribution f in
the sense of the weak topology.
Several methods for discrete approximations of continuous distributions have been
proposed in the literature. Given a continuous probability distribution, Tauchen [35]
and Adda and Cooper [3] adopt simple partitions of the domain of the distribution
function and assign the true probability to a representative point of each partitioned
MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS
3
domain. Although their methods are intuitive, simple, and work in any dimension,
their methods are not so accurate, for they generate discrete distributions with only
approximate moments. Miller and Rice [29] and Devuyst and Preckel [15] discretize
the density function using the weights and the points of the Gaussian integration
and its generalization to multi-dimensions, respectively. Although their methods are
often more accurate and can match prescribed polynomial moments exactly, they do
not allow for the restriction {xi,M } ⊂ D and cannot be applied to non-polynomial
moments. Furthermore, the multi-dimensional method by Devuyst and Preckel [15] is
computationally intensive and does not have a theoretical guarantee for the existence
of the discretization, error bounds, or convergence.
As a remedy, in Tanaka and Toda [34] we proposed an approximation method
based on the maximum entropy principle (MaxEnt) that matches prescribed moments
exactly. Starting from any integration formula, we “fine-tune” the given probabilities (chosen proportionally to {wi,M f (xi,M )}) by minimizing the Kullback-Leibler
information (relative entropy) of the unknown probabilities subject to some moment
constraints. In that paper we proved the existence and the uniqueness of the solution
of the minimization problem, showed that the solution can be easily computed by
solving the dual problem, and presented some numerical examples that show that
the approximation method is satisfactorily accurate. The method is computationally
very simple and works on any discrete set D of any dimension with any prescribed
moments (not necessarily polynomials). However, up to now the theoretical approximation error and the convergence property of this method remains unknown.
This paper gives a theoretical error bound for this approximation method and
shows its convergence property. We first evaluate the theoretical error of our proposed method. It turns out that the order of the theoretical error estimate is at most
that of the initial integration formula, and actually improves if the integrand is sufficiently smooth. Thus our proposed method does not compromise the order of the
error at the expense of matching moments. Second, as a theoretical consequence of the
error estimate, we show the weak convergence of the discrete distribution generated
by the method to the given continuous distribution. This means that for any bounded
continuous function g, the expectation of g with respect to the approximating discrete distribution converges to the exact one with respect to the given distribution as
the number of integration points increases. This convergence property is practically
important because it guarantees that the approximation method never generates a
pathological discrete distribution with exact moments which has extremely different
probability from the given distribution on some domain, at least when the discrete
set is large enough. In addition, we present some numerical examples (including a
numerical solution to an optimal portfolio problem) that show the advantage of our
proposed method.
The idea of using the maximum entropy principle to obtain a solution to underdetermined inverse problems (such as the Hausdorff moment problem) is similar to
that of Jaynes [20] and Mead and Papanicolaou [27] (Junk [22] studies the existence
of maximum entropy solutions with unbounded domains). There is an important
distinction, however. In typical inverse problems, one studies the convergence of the
approximating solution to the true one when the number of moment constraints tends
to infinity. In contrast, in this paper we study the convergence when the number
of approximating points tends to infinity, fixing the moments. Thus the two problems are quite different. The literature on the foundations, implementations, and the
applications of maximum entropy methods is immense: any literature review is nec-
4
KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA
essarily partial. The maximum entropy principle (as an inference method in general,
not necessarily restricted to physics) was proposed by Jaynes [19]. For axiomatic
approaches, see Shore and Johnson [33], Jaynes [21], Caticha and Giffin [11], and
Knuth and Skilling [24]. For the relation to Bayesian inference, see Van Campenhout
and Cover [37] and Csiszár [12]. For the duality theory of entropy maximization,
see [7, 14, 17]. For numerical algorithms for computing maximum entropy densities,
see [1, 2, 5]. Budišić and Putinar [10] study the opposite problem of ours, namely
transforming a probability measure to one that is absolutely continuous with respect
to the Lebesgue measure. Applications of maximum entropy methods can be found
in economics [16, 36], statistics and econometrics [6, 23, 38], finance[9], kinetic theory
[32, 30], among many other fields.
The rest of the paper is organized as follows. In Section 2 we review the approximation method proposed in [34]. Section 3, which is the main part of the paper,
presents the theoretical error estimate and the convergence property of the method.
Section 4 shows some numerical examples, one of which is an optimal portfolio problem. Section 5 contains the proofs.
2. The approximation method. In this section, we review the discrete approximation method of continuous distributions proposed in [34].
Let f be a probability density function on RK and assume that some generalized
moments
∫
(2.1)
T̄ =
f (x)T (x) dx
RK
are given, where T : RK → RL is a continuous function. (Below, we sometimes refer
to this function as the “moment defining function”.) For instance, if the first and
second polynomial moments are given, T should be defined by
T (x) = (x1 , . . . , xK , x21 . . . , xk xl , . . . , x2K ).
In this case, we are prescribed K expectations, K variances, and K(K−1)
covariances.
2
Therefore, the total number of moment constraints (the dimension of the range space
of T ) is
L=K +K +
K(K − 1)
K(K + 3)
=
.
2
2
In general, the components of T (x) need not be polynomials.
Moreover, for each positive integer M , assume that a finite discrete set
DM = {xi,M | i = 1, . . . , M } ⊂ RK
is given. An example of DM is the lattice
DM = {(n1 h, n2 h, . . . , nK h) | n1 , n2 . . . , nK = 0, ±1, . . . , ±N } ,
where h > 0 is the grid size and N a positive integer, in which case M = (2N + 1)K .
Our aim is to find a discrete probability distribution
PM = {p(xi,M ) | xi,M ∈ DM }
on DM with exact moments T̄ that approximates f (in the sense of the weak topology,
that is, convergence in distribution).
MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS
5
To match the moments T̄ with PM = {p(xi,M ) | xi,M ∈ DM }, it suffices to assign
p(xi,M )’s such that
M
∑
(2.2)
p(xi,M )T (xi,M ) = T̄ .
i=1
Note that the solution to this equation is generally underdetermined because the
number of unknowns p(xi,M )’s, namely M , is typically much larger than the number
of equations (moments), L + 1.2 To obtain PM with (2.2) approximating f , we
first choose a numerical integration formula by setting positive weights wi,M (i =
1, 2, . . . , M ):
∫
(2.3)
RK
f (x)g(x) dx ≈
M
∑
wi,M f (xi,M )g(xi,M ),
i=1
where g is an arbitrary function that we want to compute the expectation with respect
to the density f . For instance, if K = 1 and DM = {nh | k = 0, ±1, . . . , ±N } for
h > 0, we can choose the (2N + 1)-point trapezoidal formula for a univariate function
on R by setting wi,M = h if 1 < i < M and wi,M = h/2 otherwise. In the following,
we do not address how to choose the integration formula (2.3) but take it as given.
Now the approximation method is defined as follows. We obtain the approximate discrete distribution PM = {p(xi,M ) | xi,M ∈ DM } as a solution to the following
optimization problem:
(P)
min
{p(xi,M )}
M
∑
p(xi,M ) log
i=1
subject to
M
∑
p(xi,M )
w(xi,M )f (xi,M )
p(xi,M )T (xi,M ) = T̄ ,
i=1
∑
p(xi,M ) = 1, p(xi,M ) ≥ 0.
i=1
The problem (P) is equivalent to the minimization problem of the Kullback-Leibler
information (also known as the relative entropy) of PM relative to the discrete distribution proportional to {w(xi,M )f (xi,M ) | xi,M ∈ DM }. Note that the problem (P)
has a unique solution if T̄ ∈ co T (DM ), where co T (DM ) is the convex hull of T (DM )
defined by
M
{M
}
∑
∑
αi,M T (xi,M ) (2.4)
co T (DM ) =
αi,M = 1 and αi,M ≥ 0 ,
i=1
i=1
because in that case the constraint set is nonempty, compact, convex, and the objective function is continuous (by adopting the convention 0 log 0 = 0) and strictly
convex.
To characterize the solution of (P), we consider the Fenchel dual3 of (P), which
can be written as
[
(M
)]
∑
⟨
⟩
(D)
λ̄M = arg min − λ, T̄ + log
wi,M f (xi,M )e⟨λ,T (xi,M )⟩
,
λ∈RL
2 The
3 See
i=1
∑
“+1” comes from accounting the probabilities M
i=1 p(xi,M ) = 1.
[7] for an application of the Fenchel duality to entropy-like minimization problems.
6
KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA
where ⟨ · , · ⟩ denotes the usual inner product in RL . Note the simplicity of the dual
problem (D) compared to the primal problem (P): the dual (D) is an unconstrained
optimization problem with typically a small number of unknowns (L) whereas the primal problem (P) is a constrained optimization problem with typically a large number
of unknowns (M ).
The following theorems in [34] show that we can obtain the solution of (P) in the
form of fine-tuned values of w(xi,M )f (xi,M ). These theorems are routine exercises
in convex duality theory (see for example [8] for a textbook treatment): we present
them nevertheless in order to make the paper self-contained.
Theorem 1. Suppose that T̄ ∈ co T (DM ). Then the set of probabilities PM =
{p(xi,M ) | xi,M ∈ DM } given by
(2.5)
wi,M f (xi,M )e⟨λ̄M ,T (xi,M )⟩
p(xi,M ) = ∑M
⟨λ̄M ,T (xi,M )⟩
i=1 wi,M f (xi,M )e
is the solution of (P), where λ̄M is the minimizer in (D).
Theorem 1 indicates that the solution of (P) can be determined by (2.5) if a
solution λ̄M of (D) exists. Theorem 2 below guarantees the existence of a solution
λ̄M of (D). Here, in order to guarantee the uniqueness of the solution as well, we
adopt a stronger assumption T̄ ∈ int(co T (DM )) than T̄ ∈ co T (DM ) in Theorem 1,
where “int” denotes the set of the interior points of a region.
Theorem 2. Suppose that T̄ ∈ int(co T (DM )). Then
1. the objective function in (D) is continuous and strictly convex, and
2. the solution λ̄M uniquely exists.
Based on Theorems 1 and 2, in [34] we showed by some numerical examples that
our method generates accurate discrete approximations of continuous distributions.
3. Error bound and convergence property. In this section we give the theoretical error estimate of the approximation method introduced in Section 2.
Let g : RK → R be a bounded continuous function. Under appropriate assumptions, we first estimate the error
∫
M
∑
(3.1)
Eg,M = f (x)g(x) dx −
p(xi,M )g(xi,M ) ,
RK
i=1
where p(xi,M )’s are determined by (2.5). Next, we show the weak convergence of PM
to f , i.e., Eg,M → 0 (M → ∞) for any g. Throughout this paper, ⟨ · , · ⟩ and ∥ · ∥
denote the usual inner product and the Euclidean norm of RL , respectively.
Since f (x) is a probability density function, the moment condition (2.1) is equivalent to
∫
f (x)(T (x) − T̄ ) dx = 0.
RK
Hence by redefining T (x) − T̄ as T (x), without loss of generality we may assume
T̄ = 0. We keep this convention throughout the remainder of this section.
We consider the error estimate and the convergence analysis under the following
two assumptions. The first assumption states that the moment defining function T
has no degenerate components and the moment T̄ = 0 can also be expressed as an
expectation on the discrete set DM .
Assumption 1. The components of the moment defining function T are affine
independent as functions both on RL ∩ supp f and DM ∩ supp f for any positive in-
MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS
7
teger M . Namely, for any 0 ̸= (λ, µ) ∈ RL × R, there exists xi,M ∈ DM such that
⟨λ, T (xi,M )⟩ + µ ̸= 0. Furthermore, 0 ∈ int(co T (DM )) for any positive integer M .
The second assumption concerns the convergence property of the initial integration formula (2.3).
Assumption 2. The integration formula converges: for any bounded continuous
function g on RK , we have
∫
M
∑
f (x)g(x) dx.
(3.2)
lim
wi,M f (xi,M )g(xi,M ) =
M →∞
RK
i=1
Furthermore, the integration formula applies to ∥T (x)∥ as well:
∫
M
∑
f (x) ∥T (x)∥ dx < ∞.
(3.3)
lim
wi,M f (xi,M ) ∥T (xi,M )∥ =
M →∞
RK
i=1
Since (3.2) merely states that the integration formula converges to the true
∫ value, (3.3)
is the only essential assumption. Note that since
the
moments
T̄
=
f (x)T (x) dx
∫
exists to∫begin with and the Lebesgue integral f (x)T (x) dx exists if and only if the
integral f (x) ∥T (x)∥ dx does, the finiteness assumption in (3.3) is not an additional
restriction.
We start with the following estimate of the error Eg,M obtained by the triangle
inequality:
(a)
(b)
(c)
Eg,M ≤ Eg,M + Eg,M + Eg,M ,
(3.4)
where
(3.5)
(a)
Eg,M
(3.6)
Eg,M
(3.7)
Eg,M
(b)
(c)
∫
M
∑
=
f (x)g(x) dx −
wi,M f (xi,M )g(xi,M ) ,
RK
i=1
M
M
∑
∑
=
wi,M f (xi,M )g(xi,M ) −
wi,M f (xi,M )e⟨λ̄M ,T (xi,M )⟩ g(xi,M ) ,
i=1
i=1
M
M
∑
∑
λ̄M ,T (xi,M )⟩
⟨
=
wi,M f (xi,M )e
g(xi,M ) −
p(xi,M )g(xi,M ) .
i=1
i=1
(a)
Eg,M is the error of the integration formula for the integrand f g, which depends on
the choice of the formula and is assumed to converge to 0 as M → ∞ by Assumption
(b)
(c)
2. Hence we focus on the other errors Eg,M and Eg,M .
To bound these two errors, we use the errors
∫
M
M
∑
∑
(a)
(3.8)
E1,M = f (x) dx −
wi,M f (xi,M ) = 1 −
wi,M f (xi,M ) ,
RK
i=1
i=1
∫
M
∑
(a)
(3.9)
E∥T ∥,M = f (x) ∥T (x)∥ dx −
wi,M f (xi,M ) ∥T (xi,M )∥ ,
RK
i=1
(a)
which are the values of Eg,M in (3.5) for the special cases corresponding to g = 1 and
g = ∥T (·)∥, respectively. The integral
∫
f (x) ∥T (x)∥ dx
(3.10)
I∥T ∥ =
RK
8
KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA
(a)
(b)
is finite and the error E∥T ∥,M tends to 0 as M → ∞ by (3.3). The errors Eg,M and
(c)
Eg,M are bounded as shown in the following lemmas, which we prove in Section 5.
For a bounded function g : RK → R, let ∥g∥∞ = supx∈RK |g(x)| be the sup norm.
Lemma 3. Let Assumptions 1 and 2 be satisfied. Then, for any bounded continuous function g, we have
) (
(b)
(a)
λ̄M .
(3.11)
E
≤ 3 ∥g∥
I∥T ∥ + E
∞
g,M
∥T ∥,M
Lemma 4. Let Assumptions 1 and 2 be satisfied. Then, for any bounded continuous function g, we have
(
(
) )
(c)
(a)
(a)
(3.12)
Eg,M ≤ ∥g∥∞ E1,M + 3 I∥T ∥ + E∥T ∥,M λ̄M .
According to these lemmas, we need to estimate the solution λ̄M of (D) in order
to complete the estimate of Eg,M . To obtain an estimate of λ̄M , we additionally use
(a)
an error ET,M defined by
(3.13)
(a)
ET,M
∫
M
∑
=
f (x)T (x) dx −
wi,M f (xi,M )T (xi,M )
RK
i=1
and a constant Cα defined by
(3.14)
Cα =
1
λ∈RL ,∥λ∥=1 2
∫
2
inf
RK
f (x) (max {0, min {⟨λ, T (x)⟩ , α}}) dx.
Note that by Assumption 1 we have Cα > 0 for large enough α. Furthermore, since
|Tl (x)| ≤ ∥T (x)∥ for any component l = 1, 2, . . . , L, by (3.3) and the Dominated
(a)
Convergence Theorem the error ET,M tends to 0 as M → ∞.
The following lemma gives an estimate of λ̄M , which we also prove in Section 5.
Lemma 5. Let Assumptions 1 and 2 be satisfied and α > 0 be large enough such
that Cα > 0. Then, for any ε with 0 < ε < Cα , there exists a positive integer Mε
such that for any M with M ≥ Mε , we have
(3.15)
λ̄M ≤
1
(a)
E
.
Cα − ε T,M
Combining Lemmas 3, 4, and 5, we immediately obtain the following theorem for
the estimate of Eg,M .
Theorem 6. Let Assumptions 1 and 2 be satisfied, g be a bounded continuous
function, and α > 0 be large enough such that Cα > 0. Then, for any ε with 0 < ε <
Cα , there exists a positive integer Mε such that for any M with M ≥ Mε , we have
)
(
(
)
6
(a)
(a)
(a)
(a)
(3.16)
Eg,M ≤ Eg,M + ∥g∥∞ E1,M +
I∥T ∥ + E∥T ∥,M ET,M .
Cα − ε
MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS
9
Proof. Combining (3.4), (3.11), (3.12), and (3.15), we obtain the estimate
(a)
(b)
(c)
Eg,M ≤ Eg,M + Eg,M + Eg,M
(
(
) )
(a)
(a)
(a)
≤ Eg,M + ∥g∥∞ E1,M + 6 I∥T ∥ + E∥T ∥,M λ̄M (
)
(
)
6
(a)
(a)
(a)
(a)
≤ Eg,M + ∥g∥∞ E1,M +
I∥T ∥ + E∥T ∥,M ET,M ,
Cα − ε
which is (3.16).
(a)
(a)
(a)
Note that Eg,M is bounded by a formula consisting of Eg,M , E1,M , ET,M , and
(a)
E∥T ∥,M , which are the errors of the integration formula for the given functions g, 1,
T , and ∥T ∥. Since all of them tend to zero as M → ∞, it follows from (3.16) that
})
(
{
(a)
(a)
(a)
Eg,M = O max Eg,M , E1,M , ET,M
(M → ∞).
(3.17)
The equality (3.17) shows that the error Eg,M is at most of the same order as the
error of the initial integration formula. Thus our method does not compromise the
order of the error at the expense of matching moments.
Using Theorem 6, we can prove our main result, the weak convergence4 of the
approximating discrete distribution PM = {p(xi,M )} to f , as shown in the following
theorem.
Theorem 7. Let Assumptions 1 and 2 be satisfied. Then, for any bounded
continuous function g, we have
(3.18)
lim
M →∞
M
∑
i=1
∫
p(xi,M )g(xi,M ) =
f (x)g(x) dx,
RK
i.e., the discrete distribution PM weakly converges to the exact continuous distribution
f.
Proof. It follows from the definition of the error Eg,M in (3.1), Theorem 6, and
Assumptions 1 and 2 that
∫
M
∑
f (x)g(x) dx −
p(xi,M )g(xi,M )
RK
i=1
(
{
})
(a)
(a)
(a)
= Eg,M = O max Eg,M , E1,M , ET,M
→0
as M → ∞, which is (3.18).
Theorem 6 shows that the order of the theoretical error of our approximation
method is at most that of the initial integration formula, but does not say that the
error actually improves. Next we show that our method improves the accuracy of the
integration in some appropriate situation.
For simplicity, we consider the following fundamental case. Let [a, b] ⊂ R be a
real interval, f : [a, b] → R be a density function on [a, b], and g : [a, b] → R be a
4 For readers unfamiliar with probability theory, a sequence of probability measures {µ } is said
n
∫
∫
to weakly converge to µ if limn→∞ g dµn = g dµ for every bounded continuous function g (in
which case the equation holds for any bounded measurable function g as well). In particular, by
choosing g as an indicator function, we have µn (B) → µ(B) for any Borel set B, so the probability
distribution µn approximates µ.
10
KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA
bounded continuous function. We match the polynomial moments, so the moment
defining function T : [a, b] → RL is T (x) = (1, x, . . . , xL−1 )′ .
Theorem 8. Let
(a)
EM,α,ε = E1,M +
(
)
6
(a)
(a)
I∥T ∥ + E∥T ∥,M ET,M
Cα − ε
be the error term in Theorem 6. If g ∈ C L [a, b], then
(3.19)
Eg,M ≤
)
(b − a)L ( (a)
Eg̃L ,M + g (L) EM,α,ε ,
L!
∞
where
(3.20)
L!
g̃L (x) =
(b − a)L
(
g(x) −
L−1
∑
l=0
g (l) (a)
(x − a)l
l!
)
is a bounded continuous function with ∥g̃L ∥∞ ≤ g (L) ∞ .
Proof. Since g ∈ C L [a, b], we can consider the Taylor series of g(x) at x = a:
qL−1 (x) =
L−1
∑
k=0
g (l) (a)
(x − a)l .
l!
Estimating the residual term of the Taylor series, we have
(b − a)L (L) g .
L!
∞
Dividing both sides by (b − a)L /L!, we get ∥g̃L ∥∞ ≤ g (L) ∞ . Noting that
∥g − qL−1 ∥∞ ≤
∫
b
f (x)qL−1 (x) dx =
a
M
∑
p(xi,M )qL−1 (xi,M )
i=1
because moments of order up to L − 1 are exact, by Theorem 6 we obtain
(a)
Eg,M = Eg−qL−1 ,M ≤ Eg−qL−1 ,M + ∥g − qL−1 ∥∞ EM,α,ε
(b − a)L (a)
(b − a)L (L) ≤
Eg̃L ,M +
g EM,α,ε ,
L!
L!
∞
which is the conclusion.
Theorem 8 shows that our approximation method improves the error by the factor
(b − a)L /L! when the integrand is sufficiently smooth.
4. Numerical experiments. In this section, we present some numerical examples that compare the accuracy of the approximate expectations computed by an
initial integration formula and its modifications by our proposed method. All computations in this section are done by MATLAB programs with double precision floating
point arithmetic on a PC.
MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS
11
4.1. Gaussian and beta distributions. We choose the Gaussian distribution
N (0, 1) and the beta distribution Be(2, 4) as examples of continuous distributions and
adopt the trapezoidal formula as an initial integration formula. In the following, let
M be an odd integer with M = 2N + 1 (N = 1, 2, . . . , 12).
For the Gaussian distribution, we set
( 2)
1
x
f1 (x) = √ exp −
(x ∈ (−∞, ∞)),
2
2π
{
(1)
hM ,
(i ̸= 1, M )
(1)
(1)
(1)
wi,M =
xi,M = (i − ⌈M/2⌉)hM (i = 1, . . . , M ),
(1)
hM /2, (i = 1, M )
√
(1)
where hM = 1/ N for N = (M − 1)/2. This means that we let
{
}
(1)
D = nhM | n = 0, ±1, . . . , ±N
and approximate the integral
∫∞
−∞
by
∫ √N
√ .
− N
{
(4.1)
g1 (x) =
For the test function, we pick
ex , (|x| ≤ 10)
0. (|x| > 10)
For X ∼ N (0, 1), the exact expectation is
{ (
)
(
)}
1 1/2
9
11
(4.2)
E[g1 (X)] = e
erf √
+ erf √
,
2
2
2
where
(4.3)
2
erf(x) = √
π
∫
x
exp(−t2 ) dt
0
is the error function. For the beta distribution, we set
f2 (x) = x(1 − x)3 /B(2, 4) (x ∈ [0, 1]),
{
(2)
hM
(i ̸= 1, M ),
(2)
(2)
(2)
wi,M =
xi,M = (i − 1)hM
(2)
hM /2 (i = 1, M ),
(i = 1, . . . , M ),
(2)
where B( · , · ) is the beta function and hM = 1/(M − 1). For the test function, we
pick g2 (x) = ex for x ∈ [0, 1]. For X ∼ Be(2, 4), the exact expectation is
E[g2 (X)] = 20(49 − 18e).
For numerical experiments, we compute each of
• E[g1 (X)] for X ∼ N (0, 1), and
• E[g2 (X)] for X ∼ Be(2, 4)
using five formulas: the trapezoidal formula
(4.4)
E[gk (X)] ≈
M
∑
i=1
(k)
(k)
(k)
wi,M fk (xi,M )gk (xi,M )
(k = 1, 2),
12
KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA
its modifications by our proposed method with exact polynomial moments E[X l ] up
to 2nd order (l = 1, 2), 4th order (l = 1, . . . , 4), and 6th order (l = 1, . . . , 6), and
Simpson’s formula with the number of grid points M = 2N + 1 (N = 1, 2, . . . , 12).
Here, we intend to observe the relative errors of the computed values for small M ’s.
We numerically solve the dual problem (D) as follows. First, in order for (P) to
have a solution, it is necessary that there are at least as many unknown variables
(p(xi,M )’s, so in total M ) as the number of constraints (L moment constraints and
+1 for probabilities to add up to 1, so L + 1). Thus we need M ≥ L + 1.5 A sufficient
condition for the existence of a solution is T̄ ∈ co T (D) (Theorem 1), which we can
easily verify in the current application.
Second, note that (D) is equivalent to the minimization of
(4.5)
JM (λ) =
M
∑
wi,M f (xi,M )e⟨λ,T (xi,M )−T̄ ⟩ ,
i=1
which is a strictly convex function of λ. In order to minimize JM , we apply a variant
of the Newton-Raphson algorithm. Starting from λ0 = 0, we iterate
′′
′
λn+1 = λn − [κI + JM
(λn )]−1 JM
(λn )
(4.6)
over n = 0, 1, . . . , where κ > 0 is a small number, I is the L-dimensional identity
′
′′
matrix, and JM
, JM
denote the gradient and the Hessian matrix of JM . Such an
algorithm is advocated in [26]. The Newton-Raphson algorithm corresponds to setting
′′
κ = 0 in (4.6). Since the Hessian JM
is often nearly singular, the presence of κ > 0
stabilizes the iteration (4.6). Below we set κ = 10−7 and terminate the iteration
(4.6) when ∥λn+1 − λn ∥ < 10−10 . The results for E[g1 (X)] and E[g2 (X)] are shown
in Figures 4.1(a) and 4.1(b), respectively.
0
0
10
10
−1
−2
10
Relative Error
Relative Error
10
−2
10
−3
10
−4
10
−5
10
0
Trapezoidal
Simpson
2nd order
4th order
6th order
5
−4
10
−6
10
−8
10
−10
10
15
M
(a) X ∼ N (0, 1).
20
25
10
0
Trapezoidal
Simpson
2nd order
4th order
6th order
5
10
15
20
25
M
(b) X ∼ Be(2, 4).
Fig. 4.1. Relative errors of the computed values of E[g(X)], where g(x) = ex 1[−10,10] (x) for
X ∼ N (0, 1) and g(x) = ex for X ∼ Be(2, 4). The legend “Trapezoidal” and “Simpson” represent
the relative errors by the trapezoidal and Simpson’s formulas, and “2nd order”, “4th order”, and
“6th order” represent those by our method with exact polynomial moments E[X l ] up to 2nd order
(l = 1, 2), 4th order (l = 1, . . . , 4), and 6th order (l = 1, . . . , 6), respectively.
5 Since
(2)
p(xi,M )
(2)
the beta density is zero at x = 0, 1, which are included in xi,M ’s, we necessarily have
= 0 for i = 1, M . Thus, the number of unknown variables is M − 2, so we need M − 2 ≥
L + 1 ⇐⇒ M ≥ L + 3 in the case of the beta distribution.
13
MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS
Both results for E[g1 (X)] and E[g2 (X)] show that our proposed method excels
the trapezoidal and Simpson’s formula in the accuracy. The errors basically decrease
as the order of the moment increases, consistent with Theorem 8.
The reason why our proposed method does not necessarily give accurate results
for very small M is because when the number of constraints L + 1 is large relative to
the number of unknown variables M , the method generates pathological probability
distributions at the expense of matching many moments. For instance, Figures 4.2(a)
and 4.2(b) show the discrete distributions that match polynomial moments of order up
to 6 to those of the Gaussian distribution N (0, 1) and the beta distribution Be(2, 4)
with M = 9 grid points. Clearly the discrete approximations do not resemble the
continuous counterparts. This pathological behavior is rarely an issue, however. As
long as there are twice as many grid points as constraints (M ≥ 2(L+1)), the discrete
approximation is well-behaved.
0.4
0.5
0.35
0.4
0.25
Probabilities
Probabilities
0.3
0.2
0.15
0.3
0.2
0.1
0.1
0.05
0
−2
−1
0
x
(a) N (0, 1).
1
2
0
0
0.2
0.4
0.6
0.8
1
x
(b) Be(2, 4)
Fig. 4.2. 6th order discrete approximation with M = 9 grid points.
4.2. Uniform distribution. Theorem 7 can be used to obtain an integration
formula by setting the density f (x) to be uniform on the unit interval [0, 1]. As before
we start from the trapezoidal formula and match polynomial moments or order up to
3
1
1
6. The test functions are g1 (x) = x 2 , g2 (x) = x 2 , g3 (x) = 1+x
, and g4 (x) = sin(πx).
Figure 4.3 shows the numerical results. In all cases, our method excels the trapezoidal formula. When the integrand is smooth as in the case with g3 , g4 , the improvement in the accuracy is significant (of the order 10−4 ), consistent with Theorem 8.
3
For g1 (x) = x 2 , which has bounded first derivative but unbounded second derivative,
1
the improvement is more modest, and even more so for g2 (x) = x 2 , which has an unbounded first derivative. For these non-smooth cases, our method is not necessarily
more accurate than Simpson’s rule when the number of integration points is large.
However, note that we chose the trapezoidal rule as the initial integration formula in
Assumption 2; when we start from Simpson’s formula, our method excels Simpson’s
rule (not shown in the figure).
According to Figures 4.1 and 4.3, the rate of convergence (the slope of the relative
error with respect to the number of grid points M ) seem to be the same for the initial
integration formula and our method. This observation is consistent with Theorem 8,
which shows that the error estimate is of the same order as the integration formula but
improves by the factor (b − a)L /L!. Therefore our method seems to be particularly
suited for “fine-tuning” an integration formula with a small number of integration
14
KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA
points. For instance, we can construct a highly accurate compound rule by subdividing
the interval and applying our method to each subinterval.
−1
−1
10
10
Trapezoidal
Simpson
2nd order
4th order
6th order
−2
Relative Error
Relative Error
10
Trapezoidal
Simpson
2nd order
4th order
6th order
−3
10
−2
10
−4
10
−5
10
0
−3
5
10
15
20
10
25
0
5
10
M
0
Trapezoidal
Simpson
2nd order
4th order
6th order
0
10
Trapezoidal
Simpson
2nd order
4th order
6th order
−2
−2
10
Relative Error
10
−4
10
−6
−4
10
−6
10
10
−8
0
25
(b) g2 (x) = x .
10
Relative Error
20
1
2
(a) g1 (x) = x .
10
15
M
3
2
−8
5
10
15
20
25
10
0
5
10
M
(c) g3 (x) =
15
20
25
M
1
.
1+x
(d) g4 (x) = sin(πx).
∫
Fig. 4.3. Relative errors of the computed values of 01 g(x) dx. The legend “Trapezoidal” and
“Simpson” represent the relative errors by the trapezoidal and Simpson’s formulas, and “2nd order”,
“4th order”, and “6th order” represent those by our method with exact polynomial moments E[X l ]
up to 2nd order (l = 1, 2), 4th order (l = 1, . . . , 4), and 6th order (l = 1, . . . , 6), respectively.
4.3. Optimal portfolio problem. In this section we numerically solve the optimal portfolio problem briefly discussed in the introduction (see [34] for more details).
Suppose that there are two assets, stock and bond, with gross returns R1 , R2 . Asset
1 (stock) is stochastic and lognormally distributed: log R1 ∼ N (µ, σ 2 ), where µ is the
expected return and σ is the volatility. Asset 2 (bond) is risk-free and log R2 = r,
where r is the (continuously compounded) interest rate. The optimal portfolio θ is
determined by the optimization
(4.7)
U = max
θ
1
E[(R1 θ + R2 (1 − θ))1−γ ],
1−γ
where γ > 0 is the relative risk aversion coefficient. We set the parameters such as
γ = 3, µ = 0.07, σ = 0.2, and r = 0.01. We numerically solve the optimal portfolio
problem (4.7) in two ways, applying the trapezoidal formula and our proposed method.
(We also tried Simpson’s method but it was similar to the trapezoidal method.) To
approximate the lognormal distribution, let M = 2N + 1 be the number of grid points
MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS
15
(N is the
√ number of positive grid points) and D = {nh | n = 0, ±1, . . . , ±N }, where
h = 1/ N is the grid size. Let p(x) be the approximating discrete distribution of
N (0, 1) as in the previous subsection (trapezoidal or the proposed method with various
moments). Then we put probability p(x) on the point eµ+σx for each x ∈ D to obtain
the approximate stock return R1 .
Table 4.1 shows the optimal portfolio θ and its relative error for various moments
L and number of points M = 2N + 1. The result is somewhat surprising. Even with 3
approximating points (N = 1), our proposed method derives an optimal portfolio that
is off by only 0.5% to the true value, whereas the trapezoidal method is off by 127%.
While the proposed method virtually obtains the true value with 9 points (N = 4,
especially when the 4th moment is matched), the trapezoidal method still has 23% of
error.
Table 4.1
Optimal portfolio and relative error for the trapezoidal method and our method. N : number of
positive grid points, M = 2N + 1: total number of grid points, L: maximum order of moments.
# of grid points
N
M
12 = 1
3
22 = 4
9
32 = 9
19
42 = 16
33
52 = 25
51
L = 0 (trapezoidal)
θ
Error (%)
1.5155
127
0.8246
23.4
0.6830
2.24
0.6687
0.088
0.6681
0
θ
0.6717
0.6694
0.6684
0.6682
0.6681
L=2
Error (%)
0.54
0.20
0.044
0.015
0
θ
0.6680
0.6681
0.6681
0.6681
L=4
Error (%)
−0.015
0
0
0
The reason why the trapezoidal method gives poor results when the number of
approximating points are small is because the moments are not matched. To see this,
taking the first-order condition for the optimal portfolio problem (4.7), we obtain
E[(θX + R2 )−γ X] = 0, where X = R1 − R2 is the excess return on the stock. Using
the Taylor approximation
x−γ ≈ a−γ − γa−γ−1 (x − a)
for x = θX + R2 and a = E[θX + R2 ] and solving for θ, after some algebra we get
θ=
R2 E[X]
.
γ Var[X] − E[X]2
Therefore the (approximate) optimal portfolio depends on the first and second moments of the excess return X. Our method is accurate precisely because we match
the moments. In complex economic problems, oftentimes we cannot afford to use
many integration points, in which case our method might be useful to obtain accurate
results.
5. Proofs.
5.1. Proof of Lemma 3. Since by assumption T̄ = 0, the objective function of
the problem (D) becomes
(
(5.1)
log
M
∑
i=1
)
⟨λ,T (xi,M )⟩
wi,M f (xi,M )e
.
16
KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA
Hence the problem (D) is equivalent to the minimization of
(5.2)
JM (λ) =
M
∑
wi,M f (xi,M )e⟨λ,T (xi,M )⟩ .
i=1
It follows from Assumption 1 and Theorem 2 that the function JM is strictly convex
and has the unique minimizer λ̄M . We use the function JM in (5.2) throughout the
rest of this section.
(b)
It suffices to consider the case λ̄M ̸= 0. Using the definition of the error Eg,M in
(3.6), we first note the estimate
(b′ )
(b)
Eg,M ≤ ∥g∥∞ EM ,
(5.3)
where
(b′ )
(5.4)
EM =
M
∑
wi,M f (xi,M ) e⟨λ̄M ,T (xi,M )⟩ − 1 .
i=1
(b′ )
In the following, we give an estimate of EM . By the mean value theorem, for each
i, there exists s∗i,M with 0 < s∗i,M < 1 such that
⟨
⟩
∗
e⟨λ̄M ,T (xi,M )⟩ − 1 = λ̄M , T (xi,M ) e⟨λ̄M ,T (xi,M )⟩si,M .
(5.5)
To simplify the notation, let
⟨
⟩
ai,M = wi,M f (xi,M ) λ̄M , T (xi,M ) ,
{ ⟨
⟩
}
I + = i | λ̄M , T (xi,M ) ≥ 0 ,
{ ⟨
⟩
}
I − = i | λ̄M , T (xi,M ) < 0 .
∗
Using ai,M and smax
M = max1≤i≤M si,M , it follows from (5.4) and (5.5) that
(5.6)
(b′ )
EM =
∑
ai,M e⟨λ̄M ,T (xi,M )⟩si,M −
∗
≤
=
ai,M e⟨λ̄M ,T (xi,M )⟩
smax
M
∗
∑
+
|ai,M |
i∈I +
i∈I −
M
∑
∑(
max
ai,M e⟨λ̄M ,T (xi,M )⟩sM +
M
∑
|ai,M | − ai,M e⟨λ̄M ,T (xi,M )⟩sM
max
)
i∈I −
i=1
≤
ai,M e⟨λ̄M ,T (xi,M )⟩si,M
i∈I −
i∈I +
∑
∑
max
ai,M e⟨λ̄M ,T (xi,M )⟩sM + 2
∑
|ai,M | .
i∈I −
i=1
Noting the definition of JM in (5.2), we can rewrite the first term of the rightmost
side in (5.6) as
(5.7)
M
∑
i=1
ai,M e⟨λ̄M ,T (xi,M )⟩sM =
max
M
∑
⟨
⟩
max
wi,M f (xi,M ) λ̄M , T (xi,M ) e⟨λ̄M ,T (xi,M )⟩sM
i=1
d
=
JM (sλ̄M )
.
ds
s=smax
M
MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS
17
By Assumption 1 and λ̄M ̸= 0, the univariate function s 7→ JM (sλ̄M ) is strictly
convex and its minimizer is s = 1. Therefore, the derivative of JM ( · λ̄M ) is negative
and monotone increasing on the interval [0, 1). Noting that 0 < smax
M < 1, we have
d
d
(5.8)
0>
JM (sλ̄M )
>
JM (sλ̄M )
ds
ds
s=smax
s=0
M
=
M
∑
⟨
⟩
wi,M f (xi,M ) λ̄M , T (xi,M ) .
i=1
Combining (5.7) and (5.8), by the Cauchy-Schwarz inequality we obtain
M
M
∑
∑
max wi,M f (xi,M ) ∥T (xi,M )∥ .
(5.9)
ai,M e⟨λ̄M ,T (xi,M )⟩sM ≤ λ̄M i=1
i=1
As for the second term of the rightmost side in (5.6), we have
∑
(5.10)
|ai,M | ≤
i∈I −
M
∑
|ai,M | =
i=1
M
∑
⟨
⟩
wi,M f (xi,M ) λ̄M , T (xi,M ) i=1
M
∑
≤ λ̄M wi,M f (xi,M ) ∥T (xi,M )∥ .
i=1
Consequently, it follows from (5.3), (5.6), (5.9), and (5.10) that
(b)
Eg,M
(5.11)
≤ 3 ∥g∥∞
M
∑
λ̄M wi,M f (xi,M ) ∥T (xi,M )∥
≤ 3 ∥g∥∞
(
λ̄M I∥T ∥ + E (a)
i=1
∥T ∥,M
)
,
which is the conclusion.
5.2. Proof of Lemma 4. Noting the definition of p(xi,M )’s in (2.5), wi,M ≥ 0,
and f (x) ≥ 0, letting G = ∥g∥∞ we obtain
(c)
Eg,M
M
∑
1
λ̄M ,T (xi,M )⟩
⟨
≤ 1 − ∑M
wi,M f (xi,M )e
g(xi,M )
λ̄M ,T (xi,M )⟩ ⟨
i=1
i=1 wi,M f (xi,M )e
M
∑
1
wi,M f (xi,M )e⟨λ̄M ,T (xi,M )⟩
≤ G 1 − ∑M
λ̄M ,T (xi,M )⟩ ⟨
w
f
(x
)e
i,M
i=1
i=1 i,M
M
∑
= G
wi,M f (xi,M )e⟨λ̄M ,T (xi,M )⟩ − 1
i=1
M
M
∑
∑
(
)
λ̄M ,T (xi,M )⟩
⟨
≤ G
wi,M f (xi,M ) e
− 1 + G
wi,M f (xi,M ) − 1 ,
i=1
i=1
which can be reduced to Lemma 3 in the case g = 1. Thus we obtain the conclusion.
18
KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA
5.3. Proof of Lemma 5. We estimate λ̄M noting that λ̄M is the unique
minimizer of
JM (λ) =
M
∑
wi,M f (xi,M )e⟨λ,T (xi,M )⟩ .
i=1
First, let us show the inequality
ez ≥ 1 + z +
(5.12)
1
2
(max {0, min {z, a}})
2
for any z, a ∈ R. To see this, note that


0, (z ≤ 0 or a ≤ 0)
max {0, min {z, a}} = z, (0 ≤ z ≤ a)


a, (0 ≤ a ≤ z)
so (5.12) follows by ez ≥ 1 + z if z ≤ 0 and ez ≥ 1 + z + 12 z 2 if z ≥ 0.
Let z = ⟨λ, T (xi,M )⟩ and a = ∥λ∥ α for α > 0 in (5.12) and multiply by
wi,M f (xi,M ) ≥ 0. Letting λ∗ = λ/ ∥λ∥ and summing over i, we obtain
JM (λ) ≥ AM + ⟨λ, BM ⟩ + CM,α (λ∗ ) ∥λ∥ ,
2
(5.13)
where
AM =
M
∑
wi,M f (xi,M ) ∈ R,
i=1
BM =
M
∑
wi,M f (xi,M )T (xi,M ) ∈ RL ,
i=1
1∑
2
wi,M f (xi,M ) (max {0, min {⟨λ∗ , T (xi,M )⟩ , α}}) .
2 i=1
M
CM,α (λ∗ ) =
By Assumptions 1 and 2, for large enough α > 0 we have
∫
1
2
∗
CM,α (λ ) →
f (x) (max {0, min {⟨λ∗ , T (x)⟩ , α}}) dx ≥ Cα > 0
2
as M → ∞, where Cα is defined by (3.14). Let Mε be such that CM,α ≥ Cα − ε > 0
whenever M ≥ Mε . Then it follows from JM (0) = AM , (5.13), and the CauchySchwarz inequality that
λ̄M ∈ {λ | JM (λ) ≤ AM }
{ }
2
⊂ λ AM + ⟨λ, BM ⟩ + CM,α (λ∗ ) ∥λ∥ ≤ AM
{ }
2
⊂ λ AM − ∥BM ∥ ∥λ∥ + (Cα − ε) ∥λ∥ ≤ AM ,
so we obtain
(5.14)
λ̄M ≤ ∥BM ∥ .
Cα − ε
MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS
But since by assumption
∫
RK
19
f (x)T (x) dx = 0, it follows from (5.14) that
∫
M
∑
(a)
∥BM ∥ = f (x)T (x) dx −
wi,M f (xi,M )T (xi,M ) = ET,M → 0,
RK
i=1
so (3.15) holds.
Acknowledgments. We thank Jonathan Borwein, seminar participants at the
33rd International Workshop on Bayesian Inference and Maximum Entropy Methods
in Science and Engineering (MaxEnt 2013), and two anonymous referees for comments
and feedback that greatly improved the paper. KT was partially supported by JSPS
KAKENHI Grant Number 24760064.
REFERENCES
[1] Rafail V. Abramov. An improved algorithm for the multidimensional moment-constrained
maximum entropy problem. Journal of Computational Physics, 226(1):621–644, September
2007.
[2] Rafail V. Abramov. The multidimensional maximum entropy moment problem: A review of
numerical methods. Communications in Mathematical Sciences, 8(2):377–392, 2010.
[3] Jérôme Adda and Russel W. Cooper. Dynamic Economics: Quantitative Methods and Applications. MIT Press, Cambridge, MA, 2003.
[4] S. Rao Aiyagari. Uninsured idiosyncratic risk and aggregate saving. Quarterly Journal of
Economics, 109(3):659–684, 1994.
[5] Graham W. Alldredge, Cory D. Hauck, Dianne P. O’Leary, and André L. Tits. Adaptive
change of basis in entropy-based moment closures for linear kinetic equations. Journal of
Computational Physics, 258(1):489–508, February 2014.
[6] Andrew R. Barron and Chyong-Hwa Sheu. Approximation of density functions by sequences
of exponential families. Annals of Statistics, 19(3):1347–1369, 1991.
[7] Jonathan M. Borwein and Adrian S. Lewis. Duality relationships for entropy-like minimization
problems. SIAM Journal on Control and Optimization, 29(2):325–338, March 1991.
[8] Jonathan M. Borwein and Adrian S. Lewis. Convex Analysis and Nonlinear Optimization:
Theory and Examples. Canadian Mathematical Society Books in Mathematics. Springer,
New York, 2nd edition, 2006.
[9] Peter W. Buchen and Michael Kelly. The maximum entropy distribution of an asset inferred
from option prices. Journal of Financial and Quantitative Analysis, 31(1):143–159, 1996.
[10] Marko Budišić and Mihai Putinar. Conditioning moments of singular measures for entropy
optimization. I. Indagationes Mathematicae, 23(4):848–883, December 2012.
[11] Ariel Caticha and Adom Giffin. Updating probabilities. In Ali Mohammad-Djafari, editor,
Bayesian Inference and Maximum Entropy Methods in Science and Engineering, volume
872 of AIP Conference Proceedings, pages 31–42, 2006.
[12] Imre Csiszár. Sanov property, generalized I-projection and a conditional limit theorem. Annals
of Probability, 12(3):768–793, August 1984.
[13] Philip J. Davis and Philip Rabinowitz. Methods of Numerical Integration. Academic Press,
Orlando, FL, second edition, 1984.
[14] Andrée Decarreau, Danielle Hilhorst, Claude Lemaréchal, and Jorge Navaza. Dual methods in
entropy maximization. Application to some problems in crystallography. SIAM Journal
on Optimization, 2(2):173–197, 1992.
[15] Eric A. DeVuyst and Paul V. Preckel. Gaussian cubature: A practitioner’s guide. Mathematical
and Computer Modelling, 45(7-8):787–794, April 2007.
[16] Duncan K. Foley. A statistical equilibrium theory of markets. Journal of Economic Theory,
62(2):321–345, April 1994.
[17] Cory D. Hauck, C. David Levermore, and André L. Tits. Convex duality and entropy-based
moment closures: Characterizing degenerate densities. SIAM Journal on Control and
Optimization, 47(4):1977–2015, 2008.
[18] Mark Huggett. The risk-free rate in heterogeneous-agent incomplete-insurance economies. Journal of Economic Dynamics and Control, 17(5-6):953–969, September-November 1993.
[19] Edwin T. Jaynes. Information theory and statistical mechanics. Physical Review, 106(4):620–
630, May 1957.
20
KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA
[20] Edwin T. Jaynes. On the rationale of maximum-entropy methods. Proceedings of the IEEE,
70(9):939–952, 1982.
[21] Edwin T. Jaynes. Probability Theory: The Logic of Science. Cambridge University Press,
Cambridge, U.K., 2003. Edited by G. Larry Bretthorst.
[22] Michael Junk. Maximum entropy for reduced moment problems. Mathematical Models and
Methods in Applied Sciences, 10(7):1001–1025, 2000.
[23] Yuichi Kitamura and Michael Stutzer. An information-theoretic alternative to generalized
method of moments estimation. Econometrica, 65(4):861–874, 1997.
[24] Kevin H. Knuth and John Skilling. Foundations of inference. Axioms, 1(1):38–73, 2012.
[25] Per Krusell and Anthony A. Smith, Jr. Income and wealth heterogeneity in the macroeconomy.
Journal of Political Economy, 106(5):867–896, October 1998.
[26] David G. Luenberger and Yinyu Ye. Linear and Nonlinear Programming. International Series
in Operations Research and Management Science. Springer, NY, third edition, 2008.
[27] Lawrence R. Mead and Nikos Papanicolaou. Maximum entropy in the problem of moments.
Journal of Mathematical Physics, 25(8):2404–2417, August 1984.
[28] Robert C. Merton. Optimum consumption and portfolio rules in a continuous-time model.
Journal of Economic Theory, 3(4):373–413, December 1971.
[29] Allen C. Miller, III and Thomas R. Rice. Discrete approximations of probability distributions.
Management Science, 29(3):352–362, March 1983.
[30] Vincent Pavan. General entropic approximations for canonical systems described by kinetic
equations. Journal of Statistical Physics, 142(4):792–827, 2011.
[31] Paul A. Samuelson. Lifetime portfolio selection by dynamic stochastic programming. Review
of Economics and Statistics, 51(3):239–246, August 1969.
[32] Jacques Schneider. Entropic approximation in kinetic theory. ESAIM: Mathematical Modelling
and Numerical Analysis, 38(03):541–561, May 2004.
[33] John E. Shore and Rodney W. Johnson. Axiomatic derivation of the principle of maximum
entropy and the principle of minimum cross-entropy. IEEE Transactions on Information
Theory, IT-26(1):26–37, January 1980.
[34] Ken’ichiro Tanaka and Alexis Akira Toda. Discrete approximations of continuous distributions
by maximum entropy. Economics Letters, 118(3):445–450, March 2013.
[35] George Tauchen. Finite state Markov-chain approximations to univariate and vector autoregressions. Economics Letters, 20(2):177–181, 1986.
[36] Alexis Akira Toda. Existence of a statistical equilibrium for an economy with endogenous offer
sets. Economic Theory, 45(3):379–415, 2010.
[37] Jan M. Van Campenhout and Thomas M. Cover. Maximum entropy and conditional probability.
IEEE Transactions on Information Theory, IT-27(4):483–489, July 1981.
[38] Ximing Wu. Calculation of maximum entropy densities with application to income distribution.
Journal of Econometrics, 115(2):347–354, August 2003.
Download