Duality for Entropy Optimization and Its Applications Xingsi Li Shaohua Pan Department of Engineering Mechanics, Dalian University of Technology Dalian 116024, P.R.China Abstract In this paper we present the dual formulations of two entropy optimization principles: Jaynes’ maximum entropy and Kullback-Leibler’s minimum cross-entropy principles, together with some applications in developing efficient algorithms for various optimization problems, including minimax, complementarity and nonlinear programming. Our presentation consists of three parts: dual formulations of entropy optimization, smoothing technique for min-max problem with applications to optimization problems and Lagrangian perturbations. 1. Dual Formulations of Entropy Optimization Entropy optimization principles are developed to establish some inference criteria for predicting probabilities based on incomplete information. The maximum entropy principle claims: "in making inference on the basis of partial information we must use that probability distribution which has maximum entropy subject to whatever is known. This is the only unbiased assignment we can make." Mathematically, it is stated as the following optimization problem (E1): n max S ( p ) : p i ln pi i 1 n s.t. p i 1 i f ji ( x) E[ f j ], j 1, 2, . . . , m i 1 n p i 1 (1) p i 0, i 1, 2, . . . , n where the vector p stands for the probability to be assigned, E f j denotes the j th moment known from some probabilistic experiments and S ( p ) is the Shannon entropy measure. It can be easily verified that the problem (E1) is a convex programming and has an unconstrained dual program in the form (DE1): This work is supported by Special Fund for Basic Research (G1999032805) 1 m m n min D( ) : ln exp j f ji j E[ f j ] i 1 j 1 j 1 (2) where is a vector of Lagrange multipliers. If one has a prior probability q (q1 , . . . , qn ) , in addition to the moment constraints in (E1), the probability p should be assigned based on the minimum cross-entropy principle. Mathematically, it leads to the following entropy optimization problem (E2): n min D ( p, q ) : p i ln( pi qi ) i 1 n s.t. p i 1 i f ji ( x) E[ f j ], j 1, 2, . . . , m i 1 n p i 1 (3) p i 0, i 1, 2, . . . , n where D( p, q) stands for the Kullback-Leibler’s cross-entropy or relative entropy. The problem (E2) is also convex in p and has an unconstrained dual program as (DE2): m a x Dq ( ) : n n qi l i 1 m m e x p f j j i Ej f j[ j 1 j 1 ] (4) where the priori probability q is considered as a parameter vector only. Suppose that there is no any information (moment constraints), the problem (E1) will produces p 1 n and (E2) gives p q . This means that the maximum entropy principle is to choose the probability p as close as possible to a uniform distribution while the minimum cross-entropy principle will choose the probability p as close as possible to the priori probability q , subjected to given information. The unconstrained nature of dual programs does not only make it possible to solve the entropy optimization problems by unconstrained optimization algorithms, but also lends themselves to various applications. In developing our optimization algorithms, we utilize this feature and artificially construct some entropy optimization problems. 2. Smoothing Technique for Min-Max Problem The finite min-max problem is usually expressed as (MMP): min ( x) : max g1 ( x), g 2 ( x), . . . , g m ( x) x 1i m 2 (5) This is a typical non-smooth optimization problem due to the non-differentiability of the objective (max) function x . Many algorithms have been devised to solve this problem due to its special role played in various numerical analysis and optimization problems. They either transform the original problem (MMP) into equivalent nonlinear program or seek to find a smooth approximation to the non-differentiable x . Our methodology belongs to the latter and smooth functions are derived based on a continuous estimation of Lagrange multipliers. For problem (MMP), the Lagrangian function has the following form: L( x, ) : i 1 i g i ( x) m where : R m | m i 1 i (6) 1, i 0, i 1, 2, . . . , m . Based on our interpretation that each Lagrange multiplier represents the probability of corresponding component function attaining at x , we introduce the Shannon’s entropy and Kullback-Leibler’s cross-entropy into the Lagrangian function, respectively, into L x, and construct the following entropy optimization problems (PE1) and (PE2): m m i 1 i 1 max Lp ( x, ) : i gi x p 1 i ln i (7) and m m i 1 i 1 max Lp ( x, , ) : i gi x p 1 i ln i i (8) where int denotes the Lagrange multiplier vector obtained from the last iteration. It is easily shown that the above entropy optimization problems can be analytically solved and the original problem (MMP) is transformed into the following smooth unconstrained optimization problem: m min p ( x) : p 1 ln exp pg i ( x) x i 1 m i n p x( , ) :p1 x m l n i i 1 3 (9) epxgpi x ( ) (10) It can be proven that p (x ) and p ( x, ) uniformly approximate the maximum function ( x ) from above and below, respectively; that is, p ( x, ) x p x . Furthermore, for the smooth function p (x ) , there is an error bound: 0 p x x ln m p . 2-1. Nonlinear Programming (NLP): min f ( x) (11) s.t. g i ( x) 0, i 1, 2, . . . , m The inequality constraints present main difficulty in the solution of (NLP). However, the original problem is equivalent to the following singly-constrained one: min f ( x) s.t. x max gi ( x) 0 (12) 1i m The non-smooth constraint could be replaced by the smooth function p (x ) and an optimal solution of (NLP) problem can be found by solving the following problem: min f ( x) (13) s.t. p x 0 Similarly, the smooth function p (x ) can be applied to the non-smooth L1 and L exact penalty functions: m 1 x f x max 0, gi x i 1 x f x max 0, g1 x ,..., g m x to smoothen the max-type functions. 2-2. Complementarity Problem: Consider the following vertical complementarity problem (VNCP): m x 0 , F1 ( x ) 0 , . . m.F , x ( ) i x0 , j 1 i j F x ( ) 0 (14) where Fj ( x) : Rn Rn , 1 j m are vector-valued functions and F ji (x) denotes the i th component of F j (x) . The problem VNCP is equivalent to the following non-smooth equations: 4 min xi , F1i ( x), ... , Fmi ( x) max xi , F1i ( x), ... , Fmi ( x) 0, i 1, ... , n (15) Still, one can replace the above maximum operations by smoothing approximation p (x ) . In the special case of m 1 , the problem (VNCP) reduces to a nonlinear complementarity problem (NCP) x 0, F1 ( x) 0, x F1 ( x) 0 Eq. (15) is then reduces to min xi , F1i ( x) max xi , F1i ( x) 0, i 1, ... , n 2-3. Box Constrained Variational Inequality Problem (BVIP): This problem is to find an x [l , u ] such that ( y x) F ( x) 0, y [l , u] (16) where [l , u ] is a box constraint in R n with l u . It is easy to see that the problem (BVIP) is equivalent to the system of equations: x m i dl, u, x F ( x) m i dx l, x u, F ( x) 0 (17) where the mid operator mid a, b, c can be represented by mid a, b, c a b c min a, b, c max a, b, c (18) Once again, the max and min operators could be replaced by smoothing approximation p (x ) in proper forms. 2-4. Global Optimization The smooth approximations p (x ) can be generalized for infinite case, i.e., sup f ( x) p 1 xX xX exp[ pf ( x)]dx (19) which provides a framework for devising global optimization algorithms. In particular, we could apply (19) to the above variational inequality problem (BVIP) and obtain a regularized gap function as follows. For (BVIP), Auslender defined a gap function as g x sup F T x x y (20) yX Due to the non-smoothness of g x , Fukushima defined a regularized gap function in the 5 form: 1 g x sup F T x x y x y yX (21) By applying Eq.(19) directly to (20), we obtain a new regularized gap function: g p x p 1 ln yX exp pF T x x y (22) For y [l , u ] , the above integration can be easily calculated. 3. Lagrangian Perturbations The Lagrangian function has played important role both in theoretical and algorithmic developments of optimization. For NLP problem (11), the Lagrangian function takes the form: L( x, ) : f ( x) i 1 i g i ( x) m (23) The weak duality theorem can be stated as min max L x, max min L x, x 0 0 (24) x which gives two possibilities for solving the original problem (11). Usually, one starts from the right-hand side of above inequality; that is, the minimization of L( x, ) in x space is performed for given and repeated for updated until convergence. This kind of so-called dual algorithms is effective only for some structured problems. We make our contributions from the left-hand side of (24); that is, min max L( x, ) : f ( x) i 1 i g i ( x) m x (25) 0 It is well known that the maximization of L( x, ) in space for given x is difficult due to the linear property of L( x, ) in . The Lagrangian perturbation is a special regularization technique, through which Lagrange multipliers can be estimated in terms of primal variables. In this paper we employ the Shannon’s entropy and Kullback-Leibler’s cross-entropy as our perturbing functions, respectively; that is we solve m m i 1 i 1 max Lp ( x, ) : f ( x) i gi ( x) p 1 i ln i 0 and 6 (26) m m i 1 i 1 max Lp ( x, , ) : f ( x) i gi ( x) p 1 i ln(i i ) 0 (27) where p 0 is a controlling parameter and 0 denotes the last estimation of . The choice of entropy functions is because they are convex and bounded for 0 and at the same time the regularized maximization problems (26) and (27) can be analytically solved. On substituting the solutions of two problems to eliminate from perturbed Lagrangians, we obtain m Lp x, x f (x ) p 1 e x pg p [i x ( ) 1] (28) i 1 Lp x, x , f( x) m p i e x p [pi g (x ) 1 1] (29) i 1 It should be recognized that (28) and (29) are exponential penalty functions with and without Lagrange multipliers. By using entropy perturbations, we reveal a link between traditional optimization methods and entropy regularization techniques. As a matter of fact, we can replace the entropy functions with general convex function or , as perturbing functions to derive other penalty functions. From the above derivations, one should note that the estimation of Lagrange multipliers has been embedded into the derived penalty functions. All of these discussions reflect the important role of duality of entropy optimization in the field of mathematical programming. Of course, since entropy optimization problem itself originates from many different fields, the potential of the duality should not be limited in this presentation. References 1. E.T.Jaynes (1957): “Information Theory and Statistical Mechanics”, Physics Review, 106, 620-630. 2. S.Kullback and R.A.Leibler (1951): “Information and Sufficiency”, Annals of Mathematical Statistics, 22, 79-86. 3. A.B.Templeman and Li Xingsi (1985): "Entropy Duals", J. Engineering Optimization, 9, 107-119. 4. Li Xingsi (1991): "An Aggregate Function Method for Non-linear Programming", Science in China (Series A), 34, 1467-1473. 7 5. 6. 7. Li Xingsi (1992): "An Entropy-based Aggregate Method for Minimax Optimization", J. Engineering Optimization, 18, 277-285. Li Xingsi (1994): "An Efficient Approach to A Class of Non-smooth Optimization Problems", Science in China (Series A), 37, 323-330. Li Xingsi and Fang Shu-Cherng (1997): “On the Entropic Regularization Method for solving Min-Max Problems with Applications”, Mathematical Methods of Operations Research,46, 119-130。 8