ERROR ESTIMATE AND CONVERGENCE ANALYSIS OF MOMENT-PRESERVING DISCRETE APPROXIMATIONS OF CONTINUOUS DISTRIBUTIONS∗ KEN’ICHIRO TANAKA† AND ALEXIS AKIRA TODA‡ Abstract. The maximum entropy principle is a powerful tool to solve underdetermined inverse problems. In this paper we consider the problem of finding a discrete approximation of a continuous distribution, which arises in various applied fields. We obtain the approximating distribution by minimizing the Kullback-Leibler information of the unknown discrete distribution relative to the known continuous distribution (evaluated at given discrete points) subject to some moment constraints. We study the theoretical error bound and the convergence property of this approximation method as the number of discrete points increases. The order of the theoretical error bound of the expectation of any bounded continuous function with respect to the approximating discrete distribution is never worse than the integration formula we start with, and we prove the weak convergence of the discrete distribution to the given continuous distribution. Moreover, we present some numerical examples that show the advantage of the method and apply to numerically solving an optimal portfolio problem. Key words. probability distribution, discrete approximation, generalized moment, integration formula, Kullback-Leibler information, Fenchel duality, error estimate, convergence analysis AMS subject classifications. 41A25, 41A29, 62E17, 62P20, 65D30, 65K99 1. Introduction. This paper has two goals. First, we propose a numerical method to approximate continuous probability distributions by discrete ones. Second, we study the convergence property of the method and derive error estimates. Discrete approximations of continuous distributions is important in applied numerical analysis. To motivate the problem, we list a few concrete examples, many of which come from economics. Optimal portfolio problem. Suppose that there are J assets indexed by j = 1, . . . , J. Asset j has gross return Rj (which is a random variable), which means that a dollar invested in asset j will give a total return of Rj dollars over the investment horizon. Let θj be the fraction of an investor’s wealth invested in asset j (so ∑J j=1 θj = 1) and θ = (θ1 , . . . , θJ ) be the portfolio. Then the gross return on the port∑J folio is the weighted average of each asset return, R(θ) := j=1 Rj θj . Assume that [ ] 1 the investor wishes to maximize the risk-adjusted expected returns E 1−γ R(θ)1−γ , where γ > 0 is the degree of relative risk aversion.1 Since in general this expectation has no closed-form expression in the parameter θ, in order to maximize it we need to carry out a numerical integration. This problem is equivalent to finding an approximate discrete distribution of the returns (R1 , . . . , RJ ). Optimal consumption-portfolio problem. In the above example the portfolio choice was a one time decision. But we can consider a dynamic version, where the investor chooses the optimal consumption and portfolio over time (Samuelson [31] and Merton ∗ June 2, 2014 † Corresponding author: School of Systems Information Science, Future University Hakodate (ketanaka@fun.ac.jp). ‡ Department of Economics, University of California San Diego (atoda@ucsd.edu). 1 Readers unfamiliar with economic concepts need not worry here. The point is that we want to maximize an expectation that is a function of some parameter. The case γ = 1 corresponds to the log utility E[log R(θ)]. 1 2 KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA [28] are classic examples that admit closed-form solutions. Almost all practical problems, however, admit no closed-form solutions). Since we need to numerically solve a portfolio problem for each time period, we face one more layer of complexity to the problem. General equilibrium problem. Instead of solving the optimization problem of a single investor, we can consider a model of the whole economy, and might want to determine the asset prices that make demand equal to supply. This is called a general equilibrium problem in economics. Since we need to solve a dynamic optimal portfolio problem given asset prices, and we need to find asset prices that clear markets, we face yet another layer of complexity. Such problems are especially computationally intensive [18, 4, 25], and it is important to find a discrete approximation of a continuous distribution with only a small number of support points in order to mitigate the ‘curse of dimensionality’. Simulating stochastic processes. In many fields, including option pricing, it is necessary to simulate many sample paths of diffusion or autoregressive processes. To carry out the simulation, one needs to approximate them by a finite state Markov chain [35]. The above examples can be generalized as follows. We want compute an expectation E[g(X)] for a function g : RK → R and a random variable X with a given continuous distribution, say with a density f . If the function g is explicitly known, it suffices to use some integration formula ∫ (1.1) E[g(X)] = RK g(x)f (x) dx ≈ M ∑ wi,M g(xi,M )f (xi,M ) i=1 for the computation, where M is the number of integration points and wi,M is the weight on the point xi,M . In many cases, especially in economics (as the above portfolio problem), the function g may contain some unknown parameter θ that we want to determine from some given conditions for E[g(X, θ)], say θ∗ = arg max E[g(X, θ)]. θ In addition, the values g(x, θ) or f (x) may be available only on some prescribed discrete set of x’s, say D ⊂ RK . Such a restriction may arise when g(·, θ) or f is obtained by some statistical estimate from real data, or when the application requires the discrete set D to be a particular set, say lattice points. In such a situation, we need to compute E[g(X, θ)] many times for different parameter values θ using only the discrete points x’s in D. Thus it is desirable to use a highly accurate integration formula with light computing load, say small M in (1.1). Some popular formulas, such as the Newton-Cotes type or the Gauss type formulas (see [13] for a standard textbook treatment), are suitable for such a purpose. These formulas, however, are not necessarily available if the integration points xi,M ’s are restricted as {xi,M } ⊂ D. If that is the case, we need some recipe for approximating an expectation as in (1.1). Note that such an approximation is equivalent to finding a discrete distribution {wi,M f (xi,M )} that approximates the given distribution f in the sense of the weak topology. Several methods for discrete approximations of continuous distributions have been proposed in the literature. Given a continuous probability distribution, Tauchen [35] and Adda and Cooper [3] adopt simple partitions of the domain of the distribution function and assign the true probability to a representative point of each partitioned MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS 3 domain. Although their methods are intuitive, simple, and work in any dimension, their methods are not so accurate, for they generate discrete distributions with only approximate moments. Miller and Rice [29] and Devuyst and Preckel [15] discretize the density function using the weights and the points of the Gaussian integration and its generalization to multi-dimensions, respectively. Although their methods are often more accurate and can match prescribed polynomial moments exactly, they do not allow for the restriction {xi,M } ⊂ D and cannot be applied to non-polynomial moments. Furthermore, the multi-dimensional method by Devuyst and Preckel [15] is computationally intensive and does not have a theoretical guarantee for the existence of the discretization, error bounds, or convergence. As a remedy, in Tanaka and Toda [34] we proposed an approximation method based on the maximum entropy principle (MaxEnt) that matches prescribed moments exactly. Starting from any integration formula, we “fine-tune” the given probabilities (chosen proportionally to {wi,M f (xi,M )}) by minimizing the Kullback-Leibler information (relative entropy) of the unknown probabilities subject to some moment constraints. In that paper we proved the existence and the uniqueness of the solution of the minimization problem, showed that the solution can be easily computed by solving the dual problem, and presented some numerical examples that show that the approximation method is satisfactorily accurate. The method is computationally very simple and works on any discrete set D of any dimension with any prescribed moments (not necessarily polynomials). However, up to now the theoretical approximation error and the convergence property of this method remains unknown. This paper gives a theoretical error bound for this approximation method and shows its convergence property. We first evaluate the theoretical error of our proposed method. It turns out that the order of the theoretical error estimate is at most that of the initial integration formula, and actually improves if the integrand is sufficiently smooth. Thus our proposed method does not compromise the order of the error at the expense of matching moments. Second, as a theoretical consequence of the error estimate, we show the weak convergence of the discrete distribution generated by the method to the given continuous distribution. This means that for any bounded continuous function g, the expectation of g with respect to the approximating discrete distribution converges to the exact one with respect to the given distribution as the number of integration points increases. This convergence property is practically important because it guarantees that the approximation method never generates a pathological discrete distribution with exact moments which has extremely different probability from the given distribution on some domain, at least when the discrete set is large enough. In addition, we present some numerical examples (including a numerical solution to an optimal portfolio problem) that show the advantage of our proposed method. The idea of using the maximum entropy principle to obtain a solution to underdetermined inverse problems (such as the Hausdorff moment problem) is similar to that of Jaynes [20] and Mead and Papanicolaou [27] (Junk [22] studies the existence of maximum entropy solutions with unbounded domains). There is an important distinction, however. In typical inverse problems, one studies the convergence of the approximating solution to the true one when the number of moment constraints tends to infinity. In contrast, in this paper we study the convergence when the number of approximating points tends to infinity, fixing the moments. Thus the two problems are quite different. The literature on the foundations, implementations, and the applications of maximum entropy methods is immense: any literature review is nec- 4 KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA essarily partial. The maximum entropy principle (as an inference method in general, not necessarily restricted to physics) was proposed by Jaynes [19]. For axiomatic approaches, see Shore and Johnson [33], Jaynes [21], Caticha and Giffin [11], and Knuth and Skilling [24]. For the relation to Bayesian inference, see Van Campenhout and Cover [37] and Csiszár [12]. For the duality theory of entropy maximization, see [7, 14, 17]. For numerical algorithms for computing maximum entropy densities, see [1, 2, 5]. Budišić and Putinar [10] study the opposite problem of ours, namely transforming a probability measure to one that is absolutely continuous with respect to the Lebesgue measure. Applications of maximum entropy methods can be found in economics [16, 36], statistics and econometrics [6, 23, 38], finance[9], kinetic theory [32, 30], among many other fields. The rest of the paper is organized as follows. In Section 2 we review the approximation method proposed in [34]. Section 3, which is the main part of the paper, presents the theoretical error estimate and the convergence property of the method. Section 4 shows some numerical examples, one of which is an optimal portfolio problem. Section 5 contains the proofs. 2. The approximation method. In this section, we review the discrete approximation method of continuous distributions proposed in [34]. Let f be a probability density function on RK and assume that some generalized moments ∫ (2.1) T̄ = f (x)T (x) dx RK are given, where T : RK → RL is a continuous function. (Below, we sometimes refer to this function as the “moment defining function”.) For instance, if the first and second polynomial moments are given, T should be defined by T (x) = (x1 , . . . , xK , x21 . . . , xk xl , . . . , x2K ). In this case, we are prescribed K expectations, K variances, and K(K−1) covariances. 2 Therefore, the total number of moment constraints (the dimension of the range space of T ) is L=K +K + K(K − 1) K(K + 3) = . 2 2 In general, the components of T (x) need not be polynomials. Moreover, for each positive integer M , assume that a finite discrete set DM = {xi,M | i = 1, . . . , M } ⊂ RK is given. An example of DM is the lattice DM = {(n1 h, n2 h, . . . , nK h) | n1 , n2 . . . , nK = 0, ±1, . . . , ±N } , where h > 0 is the grid size and N a positive integer, in which case M = (2N + 1)K . Our aim is to find a discrete probability distribution PM = {p(xi,M ) | xi,M ∈ DM } on DM with exact moments T̄ that approximates f (in the sense of the weak topology, that is, convergence in distribution). MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS 5 To match the moments T̄ with PM = {p(xi,M ) | xi,M ∈ DM }, it suffices to assign p(xi,M )’s such that M ∑ (2.2) p(xi,M )T (xi,M ) = T̄ . i=1 Note that the solution to this equation is generally underdetermined because the number of unknowns p(xi,M )’s, namely M , is typically much larger than the number of equations (moments), L + 1.2 To obtain PM with (2.2) approximating f , we first choose a numerical integration formula by setting positive weights wi,M (i = 1, 2, . . . , M ): ∫ (2.3) RK f (x)g(x) dx ≈ M ∑ wi,M f (xi,M )g(xi,M ), i=1 where g is an arbitrary function that we want to compute the expectation with respect to the density f . For instance, if K = 1 and DM = {nh | k = 0, ±1, . . . , ±N } for h > 0, we can choose the (2N + 1)-point trapezoidal formula for a univariate function on R by setting wi,M = h if 1 < i < M and wi,M = h/2 otherwise. In the following, we do not address how to choose the integration formula (2.3) but take it as given. Now the approximation method is defined as follows. We obtain the approximate discrete distribution PM = {p(xi,M ) | xi,M ∈ DM } as a solution to the following optimization problem: (P) min {p(xi,M )} M ∑ p(xi,M ) log i=1 subject to M ∑ p(xi,M ) w(xi,M )f (xi,M ) p(xi,M )T (xi,M ) = T̄ , i=1 ∑ p(xi,M ) = 1, p(xi,M ) ≥ 0. i=1 The problem (P) is equivalent to the minimization problem of the Kullback-Leibler information (also known as the relative entropy) of PM relative to the discrete distribution proportional to {w(xi,M )f (xi,M ) | xi,M ∈ DM }. Note that the problem (P) has a unique solution if T̄ ∈ co T (DM ), where co T (DM ) is the convex hull of T (DM ) defined by M {M } ∑ ∑ αi,M T (xi,M ) (2.4) co T (DM ) = αi,M = 1 and αi,M ≥ 0 , i=1 i=1 because in that case the constraint set is nonempty, compact, convex, and the objective function is continuous (by adopting the convention 0 log 0 = 0) and strictly convex. To characterize the solution of (P), we consider the Fenchel dual3 of (P), which can be written as [ (M )] ∑ ⟨ ⟩ (D) λ̄M = arg min − λ, T̄ + log wi,M f (xi,M )e⟨λ,T (xi,M )⟩ , λ∈RL 2 The 3 See i=1 ∑ “+1” comes from accounting the probabilities M i=1 p(xi,M ) = 1. [7] for an application of the Fenchel duality to entropy-like minimization problems. 6 KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA where ⟨ · , · ⟩ denotes the usual inner product in RL . Note the simplicity of the dual problem (D) compared to the primal problem (P): the dual (D) is an unconstrained optimization problem with typically a small number of unknowns (L) whereas the primal problem (P) is a constrained optimization problem with typically a large number of unknowns (M ). The following theorems in [34] show that we can obtain the solution of (P) in the form of fine-tuned values of w(xi,M )f (xi,M ). These theorems are routine exercises in convex duality theory (see for example [8] for a textbook treatment): we present them nevertheless in order to make the paper self-contained. Theorem 1. Suppose that T̄ ∈ co T (DM ). Then the set of probabilities PM = {p(xi,M ) | xi,M ∈ DM } given by (2.5) wi,M f (xi,M )e⟨λ̄M ,T (xi,M )⟩ p(xi,M ) = ∑M ⟨λ̄M ,T (xi,M )⟩ i=1 wi,M f (xi,M )e is the solution of (P), where λ̄M is the minimizer in (D). Theorem 1 indicates that the solution of (P) can be determined by (2.5) if a solution λ̄M of (D) exists. Theorem 2 below guarantees the existence of a solution λ̄M of (D). Here, in order to guarantee the uniqueness of the solution as well, we adopt a stronger assumption T̄ ∈ int(co T (DM )) than T̄ ∈ co T (DM ) in Theorem 1, where “int” denotes the set of the interior points of a region. Theorem 2. Suppose that T̄ ∈ int(co T (DM )). Then 1. the objective function in (D) is continuous and strictly convex, and 2. the solution λ̄M uniquely exists. Based on Theorems 1 and 2, in [34] we showed by some numerical examples that our method generates accurate discrete approximations of continuous distributions. 3. Error bound and convergence property. In this section we give the theoretical error estimate of the approximation method introduced in Section 2. Let g : RK → R be a bounded continuous function. Under appropriate assumptions, we first estimate the error ∫ M ∑ (3.1) Eg,M = f (x)g(x) dx − p(xi,M )g(xi,M ) , RK i=1 where p(xi,M )’s are determined by (2.5). Next, we show the weak convergence of PM to f , i.e., Eg,M → 0 (M → ∞) for any g. Throughout this paper, ⟨ · , · ⟩ and ∥ · ∥ denote the usual inner product and the Euclidean norm of RL , respectively. Since f (x) is a probability density function, the moment condition (2.1) is equivalent to ∫ f (x)(T (x) − T̄ ) dx = 0. RK Hence by redefining T (x) − T̄ as T (x), without loss of generality we may assume T̄ = 0. We keep this convention throughout the remainder of this section. We consider the error estimate and the convergence analysis under the following two assumptions. The first assumption states that the moment defining function T has no degenerate components and the moment T̄ = 0 can also be expressed as an expectation on the discrete set DM . Assumption 1. The components of the moment defining function T are affine independent as functions both on RL ∩ supp f and DM ∩ supp f for any positive in- MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS 7 teger M . Namely, for any 0 ̸= (λ, µ) ∈ RL × R, there exists xi,M ∈ DM such that ⟨λ, T (xi,M )⟩ + µ ̸= 0. Furthermore, 0 ∈ int(co T (DM )) for any positive integer M . The second assumption concerns the convergence property of the initial integration formula (2.3). Assumption 2. The integration formula converges: for any bounded continuous function g on RK , we have ∫ M ∑ f (x)g(x) dx. (3.2) lim wi,M f (xi,M )g(xi,M ) = M →∞ RK i=1 Furthermore, the integration formula applies to ∥T (x)∥ as well: ∫ M ∑ f (x) ∥T (x)∥ dx < ∞. (3.3) lim wi,M f (xi,M ) ∥T (xi,M )∥ = M →∞ RK i=1 Since (3.2) merely states that the integration formula converges to the true ∫ value, (3.3) is the only essential assumption. Note that since the moments T̄ = f (x)T (x) dx ∫ exists to∫begin with and the Lebesgue integral f (x)T (x) dx exists if and only if the integral f (x) ∥T (x)∥ dx does, the finiteness assumption in (3.3) is not an additional restriction. We start with the following estimate of the error Eg,M obtained by the triangle inequality: (a) (b) (c) Eg,M ≤ Eg,M + Eg,M + Eg,M , (3.4) where (3.5) (a) Eg,M (3.6) Eg,M (3.7) Eg,M (b) (c) ∫ M ∑ = f (x)g(x) dx − wi,M f (xi,M )g(xi,M ) , RK i=1 M M ∑ ∑ = wi,M f (xi,M )g(xi,M ) − wi,M f (xi,M )e⟨λ̄M ,T (xi,M )⟩ g(xi,M ) , i=1 i=1 M M ∑ ∑ λ̄M ,T (xi,M )⟩ ⟨ = wi,M f (xi,M )e g(xi,M ) − p(xi,M )g(xi,M ) . i=1 i=1 (a) Eg,M is the error of the integration formula for the integrand f g, which depends on the choice of the formula and is assumed to converge to 0 as M → ∞ by Assumption (b) (c) 2. Hence we focus on the other errors Eg,M and Eg,M . To bound these two errors, we use the errors ∫ M M ∑ ∑ (a) (3.8) E1,M = f (x) dx − wi,M f (xi,M ) = 1 − wi,M f (xi,M ) , RK i=1 i=1 ∫ M ∑ (a) (3.9) E∥T ∥,M = f (x) ∥T (x)∥ dx − wi,M f (xi,M ) ∥T (xi,M )∥ , RK i=1 (a) which are the values of Eg,M in (3.5) for the special cases corresponding to g = 1 and g = ∥T (·)∥, respectively. The integral ∫ f (x) ∥T (x)∥ dx (3.10) I∥T ∥ = RK 8 KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA (a) (b) is finite and the error E∥T ∥,M tends to 0 as M → ∞ by (3.3). The errors Eg,M and (c) Eg,M are bounded as shown in the following lemmas, which we prove in Section 5. For a bounded function g : RK → R, let ∥g∥∞ = supx∈RK |g(x)| be the sup norm. Lemma 3. Let Assumptions 1 and 2 be satisfied. Then, for any bounded continuous function g, we have ) ( (b) (a) λ̄M . (3.11) E ≤ 3 ∥g∥ I∥T ∥ + E ∞ g,M ∥T ∥,M Lemma 4. Let Assumptions 1 and 2 be satisfied. Then, for any bounded continuous function g, we have ( ( ) ) (c) (a) (a) (3.12) Eg,M ≤ ∥g∥∞ E1,M + 3 I∥T ∥ + E∥T ∥,M λ̄M . According to these lemmas, we need to estimate the solution λ̄M of (D) in order to complete the estimate of Eg,M . To obtain an estimate of λ̄M , we additionally use (a) an error ET,M defined by (3.13) (a) ET,M ∫ M ∑ = f (x)T (x) dx − wi,M f (xi,M )T (xi,M ) RK i=1 and a constant Cα defined by (3.14) Cα = 1 λ∈RL ,∥λ∥=1 2 ∫ 2 inf RK f (x) (max {0, min {⟨λ, T (x)⟩ , α}}) dx. Note that by Assumption 1 we have Cα > 0 for large enough α. Furthermore, since |Tl (x)| ≤ ∥T (x)∥ for any component l = 1, 2, . . . , L, by (3.3) and the Dominated (a) Convergence Theorem the error ET,M tends to 0 as M → ∞. The following lemma gives an estimate of λ̄M , which we also prove in Section 5. Lemma 5. Let Assumptions 1 and 2 be satisfied and α > 0 be large enough such that Cα > 0. Then, for any ε with 0 < ε < Cα , there exists a positive integer Mε such that for any M with M ≥ Mε , we have (3.15) λ̄M ≤ 1 (a) E . Cα − ε T,M Combining Lemmas 3, 4, and 5, we immediately obtain the following theorem for the estimate of Eg,M . Theorem 6. Let Assumptions 1 and 2 be satisfied, g be a bounded continuous function, and α > 0 be large enough such that Cα > 0. Then, for any ε with 0 < ε < Cα , there exists a positive integer Mε such that for any M with M ≥ Mε , we have ) ( ( ) 6 (a) (a) (a) (a) (3.16) Eg,M ≤ Eg,M + ∥g∥∞ E1,M + I∥T ∥ + E∥T ∥,M ET,M . Cα − ε MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS 9 Proof. Combining (3.4), (3.11), (3.12), and (3.15), we obtain the estimate (a) (b) (c) Eg,M ≤ Eg,M + Eg,M + Eg,M ( ( ) ) (a) (a) (a) ≤ Eg,M + ∥g∥∞ E1,M + 6 I∥T ∥ + E∥T ∥,M λ̄M ( ) ( ) 6 (a) (a) (a) (a) ≤ Eg,M + ∥g∥∞ E1,M + I∥T ∥ + E∥T ∥,M ET,M , Cα − ε which is (3.16). (a) (a) (a) Note that Eg,M is bounded by a formula consisting of Eg,M , E1,M , ET,M , and (a) E∥T ∥,M , which are the errors of the integration formula for the given functions g, 1, T , and ∥T ∥. Since all of them tend to zero as M → ∞, it follows from (3.16) that }) ( { (a) (a) (a) Eg,M = O max Eg,M , E1,M , ET,M (M → ∞). (3.17) The equality (3.17) shows that the error Eg,M is at most of the same order as the error of the initial integration formula. Thus our method does not compromise the order of the error at the expense of matching moments. Using Theorem 6, we can prove our main result, the weak convergence4 of the approximating discrete distribution PM = {p(xi,M )} to f , as shown in the following theorem. Theorem 7. Let Assumptions 1 and 2 be satisfied. Then, for any bounded continuous function g, we have (3.18) lim M →∞ M ∑ i=1 ∫ p(xi,M )g(xi,M ) = f (x)g(x) dx, RK i.e., the discrete distribution PM weakly converges to the exact continuous distribution f. Proof. It follows from the definition of the error Eg,M in (3.1), Theorem 6, and Assumptions 1 and 2 that ∫ M ∑ f (x)g(x) dx − p(xi,M )g(xi,M ) RK i=1 ( { }) (a) (a) (a) = Eg,M = O max Eg,M , E1,M , ET,M →0 as M → ∞, which is (3.18). Theorem 6 shows that the order of the theoretical error of our approximation method is at most that of the initial integration formula, but does not say that the error actually improves. Next we show that our method improves the accuracy of the integration in some appropriate situation. For simplicity, we consider the following fundamental case. Let [a, b] ⊂ R be a real interval, f : [a, b] → R be a density function on [a, b], and g : [a, b] → R be a 4 For readers unfamiliar with probability theory, a sequence of probability measures {µ } is said n ∫ ∫ to weakly converge to µ if limn→∞ g dµn = g dµ for every bounded continuous function g (in which case the equation holds for any bounded measurable function g as well). In particular, by choosing g as an indicator function, we have µn (B) → µ(B) for any Borel set B, so the probability distribution µn approximates µ. 10 KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA bounded continuous function. We match the polynomial moments, so the moment defining function T : [a, b] → RL is T (x) = (1, x, . . . , xL−1 )′ . Theorem 8. Let (a) EM,α,ε = E1,M + ( ) 6 (a) (a) I∥T ∥ + E∥T ∥,M ET,M Cα − ε be the error term in Theorem 6. If g ∈ C L [a, b], then (3.19) Eg,M ≤ ) (b − a)L ( (a) Eg̃L ,M + g (L) EM,α,ε , L! ∞ where (3.20) L! g̃L (x) = (b − a)L ( g(x) − L−1 ∑ l=0 g (l) (a) (x − a)l l! ) is a bounded continuous function with ∥g̃L ∥∞ ≤ g (L) ∞ . Proof. Since g ∈ C L [a, b], we can consider the Taylor series of g(x) at x = a: qL−1 (x) = L−1 ∑ k=0 g (l) (a) (x − a)l . l! Estimating the residual term of the Taylor series, we have (b − a)L (L) g . L! ∞ Dividing both sides by (b − a)L /L!, we get ∥g̃L ∥∞ ≤ g (L) ∞ . Noting that ∥g − qL−1 ∥∞ ≤ ∫ b f (x)qL−1 (x) dx = a M ∑ p(xi,M )qL−1 (xi,M ) i=1 because moments of order up to L − 1 are exact, by Theorem 6 we obtain (a) Eg,M = Eg−qL−1 ,M ≤ Eg−qL−1 ,M + ∥g − qL−1 ∥∞ EM,α,ε (b − a)L (a) (b − a)L (L) ≤ Eg̃L ,M + g EM,α,ε , L! L! ∞ which is the conclusion. Theorem 8 shows that our approximation method improves the error by the factor (b − a)L /L! when the integrand is sufficiently smooth. 4. Numerical experiments. In this section, we present some numerical examples that compare the accuracy of the approximate expectations computed by an initial integration formula and its modifications by our proposed method. All computations in this section are done by MATLAB programs with double precision floating point arithmetic on a PC. MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS 11 4.1. Gaussian and beta distributions. We choose the Gaussian distribution N (0, 1) and the beta distribution Be(2, 4) as examples of continuous distributions and adopt the trapezoidal formula as an initial integration formula. In the following, let M be an odd integer with M = 2N + 1 (N = 1, 2, . . . , 12). For the Gaussian distribution, we set ( 2) 1 x f1 (x) = √ exp − (x ∈ (−∞, ∞)), 2 2π { (1) hM , (i ̸= 1, M ) (1) (1) (1) wi,M = xi,M = (i − ⌈M/2⌉)hM (i = 1, . . . , M ), (1) hM /2, (i = 1, M ) √ (1) where hM = 1/ N for N = (M − 1)/2. This means that we let { } (1) D = nhM | n = 0, ±1, . . . , ±N and approximate the integral ∫∞ −∞ by ∫ √N √ . − N { (4.1) g1 (x) = For the test function, we pick ex , (|x| ≤ 10) 0. (|x| > 10) For X ∼ N (0, 1), the exact expectation is { ( ) ( )} 1 1/2 9 11 (4.2) E[g1 (X)] = e erf √ + erf √ , 2 2 2 where (4.3) 2 erf(x) = √ π ∫ x exp(−t2 ) dt 0 is the error function. For the beta distribution, we set f2 (x) = x(1 − x)3 /B(2, 4) (x ∈ [0, 1]), { (2) hM (i ̸= 1, M ), (2) (2) (2) wi,M = xi,M = (i − 1)hM (2) hM /2 (i = 1, M ), (i = 1, . . . , M ), (2) where B( · , · ) is the beta function and hM = 1/(M − 1). For the test function, we pick g2 (x) = ex for x ∈ [0, 1]. For X ∼ Be(2, 4), the exact expectation is E[g2 (X)] = 20(49 − 18e). For numerical experiments, we compute each of • E[g1 (X)] for X ∼ N (0, 1), and • E[g2 (X)] for X ∼ Be(2, 4) using five formulas: the trapezoidal formula (4.4) E[gk (X)] ≈ M ∑ i=1 (k) (k) (k) wi,M fk (xi,M )gk (xi,M ) (k = 1, 2), 12 KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA its modifications by our proposed method with exact polynomial moments E[X l ] up to 2nd order (l = 1, 2), 4th order (l = 1, . . . , 4), and 6th order (l = 1, . . . , 6), and Simpson’s formula with the number of grid points M = 2N + 1 (N = 1, 2, . . . , 12). Here, we intend to observe the relative errors of the computed values for small M ’s. We numerically solve the dual problem (D) as follows. First, in order for (P) to have a solution, it is necessary that there are at least as many unknown variables (p(xi,M )’s, so in total M ) as the number of constraints (L moment constraints and +1 for probabilities to add up to 1, so L + 1). Thus we need M ≥ L + 1.5 A sufficient condition for the existence of a solution is T̄ ∈ co T (D) (Theorem 1), which we can easily verify in the current application. Second, note that (D) is equivalent to the minimization of (4.5) JM (λ) = M ∑ wi,M f (xi,M )e⟨λ,T (xi,M )−T̄ ⟩ , i=1 which is a strictly convex function of λ. In order to minimize JM , we apply a variant of the Newton-Raphson algorithm. Starting from λ0 = 0, we iterate ′′ ′ λn+1 = λn − [κI + JM (λn )]−1 JM (λn ) (4.6) over n = 0, 1, . . . , where κ > 0 is a small number, I is the L-dimensional identity ′ ′′ matrix, and JM , JM denote the gradient and the Hessian matrix of JM . Such an algorithm is advocated in [26]. The Newton-Raphson algorithm corresponds to setting ′′ κ = 0 in (4.6). Since the Hessian JM is often nearly singular, the presence of κ > 0 stabilizes the iteration (4.6). Below we set κ = 10−7 and terminate the iteration (4.6) when ∥λn+1 − λn ∥ < 10−10 . The results for E[g1 (X)] and E[g2 (X)] are shown in Figures 4.1(a) and 4.1(b), respectively. 0 0 10 10 −1 −2 10 Relative Error Relative Error 10 −2 10 −3 10 −4 10 −5 10 0 Trapezoidal Simpson 2nd order 4th order 6th order 5 −4 10 −6 10 −8 10 −10 10 15 M (a) X ∼ N (0, 1). 20 25 10 0 Trapezoidal Simpson 2nd order 4th order 6th order 5 10 15 20 25 M (b) X ∼ Be(2, 4). Fig. 4.1. Relative errors of the computed values of E[g(X)], where g(x) = ex 1[−10,10] (x) for X ∼ N (0, 1) and g(x) = ex for X ∼ Be(2, 4). The legend “Trapezoidal” and “Simpson” represent the relative errors by the trapezoidal and Simpson’s formulas, and “2nd order”, “4th order”, and “6th order” represent those by our method with exact polynomial moments E[X l ] up to 2nd order (l = 1, 2), 4th order (l = 1, . . . , 4), and 6th order (l = 1, . . . , 6), respectively. 5 Since (2) p(xi,M ) (2) the beta density is zero at x = 0, 1, which are included in xi,M ’s, we necessarily have = 0 for i = 1, M . Thus, the number of unknown variables is M − 2, so we need M − 2 ≥ L + 1 ⇐⇒ M ≥ L + 3 in the case of the beta distribution. 13 MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS Both results for E[g1 (X)] and E[g2 (X)] show that our proposed method excels the trapezoidal and Simpson’s formula in the accuracy. The errors basically decrease as the order of the moment increases, consistent with Theorem 8. The reason why our proposed method does not necessarily give accurate results for very small M is because when the number of constraints L + 1 is large relative to the number of unknown variables M , the method generates pathological probability distributions at the expense of matching many moments. For instance, Figures 4.2(a) and 4.2(b) show the discrete distributions that match polynomial moments of order up to 6 to those of the Gaussian distribution N (0, 1) and the beta distribution Be(2, 4) with M = 9 grid points. Clearly the discrete approximations do not resemble the continuous counterparts. This pathological behavior is rarely an issue, however. As long as there are twice as many grid points as constraints (M ≥ 2(L+1)), the discrete approximation is well-behaved. 0.4 0.5 0.35 0.4 0.25 Probabilities Probabilities 0.3 0.2 0.15 0.3 0.2 0.1 0.1 0.05 0 −2 −1 0 x (a) N (0, 1). 1 2 0 0 0.2 0.4 0.6 0.8 1 x (b) Be(2, 4) Fig. 4.2. 6th order discrete approximation with M = 9 grid points. 4.2. Uniform distribution. Theorem 7 can be used to obtain an integration formula by setting the density f (x) to be uniform on the unit interval [0, 1]. As before we start from the trapezoidal formula and match polynomial moments or order up to 3 1 1 6. The test functions are g1 (x) = x 2 , g2 (x) = x 2 , g3 (x) = 1+x , and g4 (x) = sin(πx). Figure 4.3 shows the numerical results. In all cases, our method excels the trapezoidal formula. When the integrand is smooth as in the case with g3 , g4 , the improvement in the accuracy is significant (of the order 10−4 ), consistent with Theorem 8. 3 For g1 (x) = x 2 , which has bounded first derivative but unbounded second derivative, 1 the improvement is more modest, and even more so for g2 (x) = x 2 , which has an unbounded first derivative. For these non-smooth cases, our method is not necessarily more accurate than Simpson’s rule when the number of integration points is large. However, note that we chose the trapezoidal rule as the initial integration formula in Assumption 2; when we start from Simpson’s formula, our method excels Simpson’s rule (not shown in the figure). According to Figures 4.1 and 4.3, the rate of convergence (the slope of the relative error with respect to the number of grid points M ) seem to be the same for the initial integration formula and our method. This observation is consistent with Theorem 8, which shows that the error estimate is of the same order as the integration formula but improves by the factor (b − a)L /L!. Therefore our method seems to be particularly suited for “fine-tuning” an integration formula with a small number of integration 14 KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA points. For instance, we can construct a highly accurate compound rule by subdividing the interval and applying our method to each subinterval. −1 −1 10 10 Trapezoidal Simpson 2nd order 4th order 6th order −2 Relative Error Relative Error 10 Trapezoidal Simpson 2nd order 4th order 6th order −3 10 −2 10 −4 10 −5 10 0 −3 5 10 15 20 10 25 0 5 10 M 0 Trapezoidal Simpson 2nd order 4th order 6th order 0 10 Trapezoidal Simpson 2nd order 4th order 6th order −2 −2 10 Relative Error 10 −4 10 −6 −4 10 −6 10 10 −8 0 25 (b) g2 (x) = x . 10 Relative Error 20 1 2 (a) g1 (x) = x . 10 15 M 3 2 −8 5 10 15 20 25 10 0 5 10 M (c) g3 (x) = 15 20 25 M 1 . 1+x (d) g4 (x) = sin(πx). ∫ Fig. 4.3. Relative errors of the computed values of 01 g(x) dx. The legend “Trapezoidal” and “Simpson” represent the relative errors by the trapezoidal and Simpson’s formulas, and “2nd order”, “4th order”, and “6th order” represent those by our method with exact polynomial moments E[X l ] up to 2nd order (l = 1, 2), 4th order (l = 1, . . . , 4), and 6th order (l = 1, . . . , 6), respectively. 4.3. Optimal portfolio problem. In this section we numerically solve the optimal portfolio problem briefly discussed in the introduction (see [34] for more details). Suppose that there are two assets, stock and bond, with gross returns R1 , R2 . Asset 1 (stock) is stochastic and lognormally distributed: log R1 ∼ N (µ, σ 2 ), where µ is the expected return and σ is the volatility. Asset 2 (bond) is risk-free and log R2 = r, where r is the (continuously compounded) interest rate. The optimal portfolio θ is determined by the optimization (4.7) U = max θ 1 E[(R1 θ + R2 (1 − θ))1−γ ], 1−γ where γ > 0 is the relative risk aversion coefficient. We set the parameters such as γ = 3, µ = 0.07, σ = 0.2, and r = 0.01. We numerically solve the optimal portfolio problem (4.7) in two ways, applying the trapezoidal formula and our proposed method. (We also tried Simpson’s method but it was similar to the trapezoidal method.) To approximate the lognormal distribution, let M = 2N + 1 be the number of grid points MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS 15 (N is the √ number of positive grid points) and D = {nh | n = 0, ±1, . . . , ±N }, where h = 1/ N is the grid size. Let p(x) be the approximating discrete distribution of N (0, 1) as in the previous subsection (trapezoidal or the proposed method with various moments). Then we put probability p(x) on the point eµ+σx for each x ∈ D to obtain the approximate stock return R1 . Table 4.1 shows the optimal portfolio θ and its relative error for various moments L and number of points M = 2N + 1. The result is somewhat surprising. Even with 3 approximating points (N = 1), our proposed method derives an optimal portfolio that is off by only 0.5% to the true value, whereas the trapezoidal method is off by 127%. While the proposed method virtually obtains the true value with 9 points (N = 4, especially when the 4th moment is matched), the trapezoidal method still has 23% of error. Table 4.1 Optimal portfolio and relative error for the trapezoidal method and our method. N : number of positive grid points, M = 2N + 1: total number of grid points, L: maximum order of moments. # of grid points N M 12 = 1 3 22 = 4 9 32 = 9 19 42 = 16 33 52 = 25 51 L = 0 (trapezoidal) θ Error (%) 1.5155 127 0.8246 23.4 0.6830 2.24 0.6687 0.088 0.6681 0 θ 0.6717 0.6694 0.6684 0.6682 0.6681 L=2 Error (%) 0.54 0.20 0.044 0.015 0 θ 0.6680 0.6681 0.6681 0.6681 L=4 Error (%) −0.015 0 0 0 The reason why the trapezoidal method gives poor results when the number of approximating points are small is because the moments are not matched. To see this, taking the first-order condition for the optimal portfolio problem (4.7), we obtain E[(θX + R2 )−γ X] = 0, where X = R1 − R2 is the excess return on the stock. Using the Taylor approximation x−γ ≈ a−γ − γa−γ−1 (x − a) for x = θX + R2 and a = E[θX + R2 ] and solving for θ, after some algebra we get θ= R2 E[X] . γ Var[X] − E[X]2 Therefore the (approximate) optimal portfolio depends on the first and second moments of the excess return X. Our method is accurate precisely because we match the moments. In complex economic problems, oftentimes we cannot afford to use many integration points, in which case our method might be useful to obtain accurate results. 5. Proofs. 5.1. Proof of Lemma 3. Since by assumption T̄ = 0, the objective function of the problem (D) becomes ( (5.1) log M ∑ i=1 ) ⟨λ,T (xi,M )⟩ wi,M f (xi,M )e . 16 KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA Hence the problem (D) is equivalent to the minimization of (5.2) JM (λ) = M ∑ wi,M f (xi,M )e⟨λ,T (xi,M )⟩ . i=1 It follows from Assumption 1 and Theorem 2 that the function JM is strictly convex and has the unique minimizer λ̄M . We use the function JM in (5.2) throughout the rest of this section. (b) It suffices to consider the case λ̄M ̸= 0. Using the definition of the error Eg,M in (3.6), we first note the estimate (b′ ) (b) Eg,M ≤ ∥g∥∞ EM , (5.3) where (b′ ) (5.4) EM = M ∑ wi,M f (xi,M ) e⟨λ̄M ,T (xi,M )⟩ − 1 . i=1 (b′ ) In the following, we give an estimate of EM . By the mean value theorem, for each i, there exists s∗i,M with 0 < s∗i,M < 1 such that ⟨ ⟩ ∗ e⟨λ̄M ,T (xi,M )⟩ − 1 = λ̄M , T (xi,M ) e⟨λ̄M ,T (xi,M )⟩si,M . (5.5) To simplify the notation, let ⟨ ⟩ ai,M = wi,M f (xi,M ) λ̄M , T (xi,M ) , { ⟨ ⟩ } I + = i | λ̄M , T (xi,M ) ≥ 0 , { ⟨ ⟩ } I − = i | λ̄M , T (xi,M ) < 0 . ∗ Using ai,M and smax M = max1≤i≤M si,M , it follows from (5.4) and (5.5) that (5.6) (b′ ) EM = ∑ ai,M e⟨λ̄M ,T (xi,M )⟩si,M − ∗ ≤ = ai,M e⟨λ̄M ,T (xi,M )⟩ smax M ∗ ∑ + |ai,M | i∈I + i∈I − M ∑ ∑( max ai,M e⟨λ̄M ,T (xi,M )⟩sM + M ∑ |ai,M | − ai,M e⟨λ̄M ,T (xi,M )⟩sM max ) i∈I − i=1 ≤ ai,M e⟨λ̄M ,T (xi,M )⟩si,M i∈I − i∈I + ∑ ∑ max ai,M e⟨λ̄M ,T (xi,M )⟩sM + 2 ∑ |ai,M | . i∈I − i=1 Noting the definition of JM in (5.2), we can rewrite the first term of the rightmost side in (5.6) as (5.7) M ∑ i=1 ai,M e⟨λ̄M ,T (xi,M )⟩sM = max M ∑ ⟨ ⟩ max wi,M f (xi,M ) λ̄M , T (xi,M ) e⟨λ̄M ,T (xi,M )⟩sM i=1 d = JM (sλ̄M ) . ds s=smax M MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS 17 By Assumption 1 and λ̄M ̸= 0, the univariate function s 7→ JM (sλ̄M ) is strictly convex and its minimizer is s = 1. Therefore, the derivative of JM ( · λ̄M ) is negative and monotone increasing on the interval [0, 1). Noting that 0 < smax M < 1, we have d d (5.8) 0> JM (sλ̄M ) > JM (sλ̄M ) ds ds s=smax s=0 M = M ∑ ⟨ ⟩ wi,M f (xi,M ) λ̄M , T (xi,M ) . i=1 Combining (5.7) and (5.8), by the Cauchy-Schwarz inequality we obtain M M ∑ ∑ max wi,M f (xi,M ) ∥T (xi,M )∥ . (5.9) ai,M e⟨λ̄M ,T (xi,M )⟩sM ≤ λ̄M i=1 i=1 As for the second term of the rightmost side in (5.6), we have ∑ (5.10) |ai,M | ≤ i∈I − M ∑ |ai,M | = i=1 M ∑ ⟨ ⟩ wi,M f (xi,M ) λ̄M , T (xi,M ) i=1 M ∑ ≤ λ̄M wi,M f (xi,M ) ∥T (xi,M )∥ . i=1 Consequently, it follows from (5.3), (5.6), (5.9), and (5.10) that (b) Eg,M (5.11) ≤ 3 ∥g∥∞ M ∑ λ̄M wi,M f (xi,M ) ∥T (xi,M )∥ ≤ 3 ∥g∥∞ ( λ̄M I∥T ∥ + E (a) i=1 ∥T ∥,M ) , which is the conclusion. 5.2. Proof of Lemma 4. Noting the definition of p(xi,M )’s in (2.5), wi,M ≥ 0, and f (x) ≥ 0, letting G = ∥g∥∞ we obtain (c) Eg,M M ∑ 1 λ̄M ,T (xi,M )⟩ ⟨ ≤ 1 − ∑M wi,M f (xi,M )e g(xi,M ) λ̄M ,T (xi,M )⟩ ⟨ i=1 i=1 wi,M f (xi,M )e M ∑ 1 wi,M f (xi,M )e⟨λ̄M ,T (xi,M )⟩ ≤ G 1 − ∑M λ̄M ,T (xi,M )⟩ ⟨ w f (x )e i,M i=1 i=1 i,M M ∑ = G wi,M f (xi,M )e⟨λ̄M ,T (xi,M )⟩ − 1 i=1 M M ∑ ∑ ( ) λ̄M ,T (xi,M )⟩ ⟨ ≤ G wi,M f (xi,M ) e − 1 + G wi,M f (xi,M ) − 1 , i=1 i=1 which can be reduced to Lemma 3 in the case g = 1. Thus we obtain the conclusion. 18 KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA 5.3. Proof of Lemma 5. We estimate λ̄M noting that λ̄M is the unique minimizer of JM (λ) = M ∑ wi,M f (xi,M )e⟨λ,T (xi,M )⟩ . i=1 First, let us show the inequality ez ≥ 1 + z + (5.12) 1 2 (max {0, min {z, a}}) 2 for any z, a ∈ R. To see this, note that 0, (z ≤ 0 or a ≤ 0) max {0, min {z, a}} = z, (0 ≤ z ≤ a) a, (0 ≤ a ≤ z) so (5.12) follows by ez ≥ 1 + z if z ≤ 0 and ez ≥ 1 + z + 12 z 2 if z ≥ 0. Let z = ⟨λ, T (xi,M )⟩ and a = ∥λ∥ α for α > 0 in (5.12) and multiply by wi,M f (xi,M ) ≥ 0. Letting λ∗ = λ/ ∥λ∥ and summing over i, we obtain JM (λ) ≥ AM + ⟨λ, BM ⟩ + CM,α (λ∗ ) ∥λ∥ , 2 (5.13) where AM = M ∑ wi,M f (xi,M ) ∈ R, i=1 BM = M ∑ wi,M f (xi,M )T (xi,M ) ∈ RL , i=1 1∑ 2 wi,M f (xi,M ) (max {0, min {⟨λ∗ , T (xi,M )⟩ , α}}) . 2 i=1 M CM,α (λ∗ ) = By Assumptions 1 and 2, for large enough α > 0 we have ∫ 1 2 ∗ CM,α (λ ) → f (x) (max {0, min {⟨λ∗ , T (x)⟩ , α}}) dx ≥ Cα > 0 2 as M → ∞, where Cα is defined by (3.14). Let Mε be such that CM,α ≥ Cα − ε > 0 whenever M ≥ Mε . Then it follows from JM (0) = AM , (5.13), and the CauchySchwarz inequality that λ̄M ∈ {λ | JM (λ) ≤ AM } { } 2 ⊂ λ AM + ⟨λ, BM ⟩ + CM,α (λ∗ ) ∥λ∥ ≤ AM { } 2 ⊂ λ AM − ∥BM ∥ ∥λ∥ + (Cα − ε) ∥λ∥ ≤ AM , so we obtain (5.14) λ̄M ≤ ∥BM ∥ . Cα − ε MOMENT-PRESERVING APPROXIMATIONS OF DISTRIBUTIONS But since by assumption ∫ RK 19 f (x)T (x) dx = 0, it follows from (5.14) that ∫ M ∑ (a) ∥BM ∥ = f (x)T (x) dx − wi,M f (xi,M )T (xi,M ) = ET,M → 0, RK i=1 so (3.15) holds. Acknowledgments. We thank Jonathan Borwein, seminar participants at the 33rd International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2013), and two anonymous referees for comments and feedback that greatly improved the paper. KT was partially supported by JSPS KAKENHI Grant Number 24760064. REFERENCES [1] Rafail V. Abramov. An improved algorithm for the multidimensional moment-constrained maximum entropy problem. Journal of Computational Physics, 226(1):621–644, September 2007. [2] Rafail V. Abramov. The multidimensional maximum entropy moment problem: A review of numerical methods. Communications in Mathematical Sciences, 8(2):377–392, 2010. [3] Jérôme Adda and Russel W. Cooper. Dynamic Economics: Quantitative Methods and Applications. MIT Press, Cambridge, MA, 2003. [4] S. Rao Aiyagari. Uninsured idiosyncratic risk and aggregate saving. Quarterly Journal of Economics, 109(3):659–684, 1994. [5] Graham W. Alldredge, Cory D. Hauck, Dianne P. O’Leary, and André L. Tits. Adaptive change of basis in entropy-based moment closures for linear kinetic equations. Journal of Computational Physics, 258(1):489–508, February 2014. [6] Andrew R. Barron and Chyong-Hwa Sheu. Approximation of density functions by sequences of exponential families. Annals of Statistics, 19(3):1347–1369, 1991. [7] Jonathan M. Borwein and Adrian S. Lewis. Duality relationships for entropy-like minimization problems. SIAM Journal on Control and Optimization, 29(2):325–338, March 1991. [8] Jonathan M. Borwein and Adrian S. Lewis. Convex Analysis and Nonlinear Optimization: Theory and Examples. Canadian Mathematical Society Books in Mathematics. Springer, New York, 2nd edition, 2006. [9] Peter W. Buchen and Michael Kelly. The maximum entropy distribution of an asset inferred from option prices. Journal of Financial and Quantitative Analysis, 31(1):143–159, 1996. [10] Marko Budišić and Mihai Putinar. Conditioning moments of singular measures for entropy optimization. I. Indagationes Mathematicae, 23(4):848–883, December 2012. [11] Ariel Caticha and Adom Giffin. Updating probabilities. In Ali Mohammad-Djafari, editor, Bayesian Inference and Maximum Entropy Methods in Science and Engineering, volume 872 of AIP Conference Proceedings, pages 31–42, 2006. [12] Imre Csiszár. Sanov property, generalized I-projection and a conditional limit theorem. Annals of Probability, 12(3):768–793, August 1984. [13] Philip J. Davis and Philip Rabinowitz. Methods of Numerical Integration. Academic Press, Orlando, FL, second edition, 1984. [14] Andrée Decarreau, Danielle Hilhorst, Claude Lemaréchal, and Jorge Navaza. Dual methods in entropy maximization. Application to some problems in crystallography. SIAM Journal on Optimization, 2(2):173–197, 1992. [15] Eric A. DeVuyst and Paul V. Preckel. Gaussian cubature: A practitioner’s guide. Mathematical and Computer Modelling, 45(7-8):787–794, April 2007. [16] Duncan K. Foley. A statistical equilibrium theory of markets. Journal of Economic Theory, 62(2):321–345, April 1994. [17] Cory D. Hauck, C. David Levermore, and André L. Tits. Convex duality and entropy-based moment closures: Characterizing degenerate densities. SIAM Journal on Control and Optimization, 47(4):1977–2015, 2008. [18] Mark Huggett. The risk-free rate in heterogeneous-agent incomplete-insurance economies. Journal of Economic Dynamics and Control, 17(5-6):953–969, September-November 1993. [19] Edwin T. Jaynes. Information theory and statistical mechanics. Physical Review, 106(4):620– 630, May 1957. 20 KEN’ICHIRO TANAKA AND ALEXIS AKIRA TODA [20] Edwin T. Jaynes. On the rationale of maximum-entropy methods. Proceedings of the IEEE, 70(9):939–952, 1982. [21] Edwin T. Jaynes. Probability Theory: The Logic of Science. Cambridge University Press, Cambridge, U.K., 2003. Edited by G. Larry Bretthorst. [22] Michael Junk. Maximum entropy for reduced moment problems. Mathematical Models and Methods in Applied Sciences, 10(7):1001–1025, 2000. [23] Yuichi Kitamura and Michael Stutzer. An information-theoretic alternative to generalized method of moments estimation. Econometrica, 65(4):861–874, 1997. [24] Kevin H. Knuth and John Skilling. Foundations of inference. Axioms, 1(1):38–73, 2012. [25] Per Krusell and Anthony A. Smith, Jr. Income and wealth heterogeneity in the macroeconomy. Journal of Political Economy, 106(5):867–896, October 1998. [26] David G. Luenberger and Yinyu Ye. Linear and Nonlinear Programming. International Series in Operations Research and Management Science. Springer, NY, third edition, 2008. [27] Lawrence R. Mead and Nikos Papanicolaou. Maximum entropy in the problem of moments. Journal of Mathematical Physics, 25(8):2404–2417, August 1984. [28] Robert C. Merton. Optimum consumption and portfolio rules in a continuous-time model. Journal of Economic Theory, 3(4):373–413, December 1971. [29] Allen C. Miller, III and Thomas R. Rice. Discrete approximations of probability distributions. Management Science, 29(3):352–362, March 1983. [30] Vincent Pavan. General entropic approximations for canonical systems described by kinetic equations. Journal of Statistical Physics, 142(4):792–827, 2011. [31] Paul A. Samuelson. Lifetime portfolio selection by dynamic stochastic programming. Review of Economics and Statistics, 51(3):239–246, August 1969. [32] Jacques Schneider. Entropic approximation in kinetic theory. ESAIM: Mathematical Modelling and Numerical Analysis, 38(03):541–561, May 2004. [33] John E. Shore and Rodney W. Johnson. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Transactions on Information Theory, IT-26(1):26–37, January 1980. [34] Ken’ichiro Tanaka and Alexis Akira Toda. Discrete approximations of continuous distributions by maximum entropy. Economics Letters, 118(3):445–450, March 2013. [35] George Tauchen. Finite state Markov-chain approximations to univariate and vector autoregressions. Economics Letters, 20(2):177–181, 1986. [36] Alexis Akira Toda. Existence of a statistical equilibrium for an economy with endogenous offer sets. Economic Theory, 45(3):379–415, 2010. [37] Jan M. Van Campenhout and Thomas M. Cover. Maximum entropy and conditional probability. IEEE Transactions on Information Theory, IT-27(4):483–489, July 1981. [38] Ximing Wu. Calculation of maximum entropy densities with application to income distribution. Journal of Econometrics, 115(2):347–354, August 2003.