Journal of Inequalities in Pure and Applied Mathematics http://jipam.vu.edu.au/ Volume 2, Issue 1, Article 11, 2001 ON SOME APPLICATIONS OF THE AG INEQUALITY IN INFORMATION THEORY B. MOND AND J. PEČARIĆ S CHOOL OF M ATHEMATICS , L A T ROBE U NIVERSITY, B UNDOORA 3086, V ICTORIA , AUSTRALIA . b.mond@latrobe.edu.au FACULTY OF T EXTILE T ECHNOLOGY, U NIVERSITY OF Z AGREB , Z AGREB , C ROATIA pecaric@hazu.hr Received 3 November, 2000; accepted 11 January, 2001 Communicated by A. Lupas A BSTRACT. Recently, S.S. Dragomir used the concavity property of the log mapping and the weighted arithmetic mean-geometric mean inequality to develop new inequalities that were then applied to Information Theory. Here we extend these inequalities and their applications. Key words and phrases: Arithmetic-Geometric Mean, Kullback-Leibler Distances, Shannon’s Entropy. 2000 Mathematics Subject Classification. 26D15. 1. I NTRODUCTION One of the most important inequalities is the arithmetic-geometric means inequality: P Let ai , pi , i = 1, . . . , n be positive numbers, Pn = ni=1 pi . Then (1.1) n Y i=1 p /P ai i n n 1 X ≤ p i ai , Pn i=1 with equality iff a1 = · · · = an . It is well-known that using (1.1) we can prove the following generalization of another wellknown inequality, that is Hölder’s inequality: P Let pij , qi (i = 1, . . . , m; j = 1, . . . , n) be positive numbers with Qm = m i=1 qi . Then ! Qqi m n n Y m m Y X X qi pij . (1.2) (pij ) Qm ≤ j=1 i=1 i=1 j=1 In this note, we show that using (1.1) we can improve some recent results which have applications in information theory. ISSN (electronic): 1443-5756 c 2001 Victoria University. All rights reserved. 042-00 2 B. M OND AND J. P E ČARI Ć 2. A N I NEQUALITY OF I.A. A BOU -TAIR AND W.T. S ULAIMAN The main result from [1] is: Let pij , qi (i = 1, . . . , m; j = 1, . . . , n) be positive numbers. Then m m n n Y X qi 1 XX (pij ) Qm ≤ pij qi . Q m j=1 i=1 i=1 j=1 (2.1) Moreover, set in (1.1), n = m, pi = qi , ai = m n Y X (2.2) i=1 ! Qqi m pij j=1 Pn j=1 pij . We now have ! m n 1 X X ≤ pij qi . Qm i=1 j=1 Now (1.2) and (2.2) give (2.3) n Y m m n X Y X qi (pij ) Qm ≤ pij j=1 i=1 i=1 j=1 ! Qqi m m n 1 XX ≤ pij qi . Qm i=1 j=1 which is an interpolation of (2.1). Moreover, the generalized Hölder inequality was obtained in [1] as a consequence of (2.1). This is not surprising since (2.1), for n = 1, becomes m m Y qi 1 X Q m (pi1 ) ≤ pi1 qi Qm i=1 i=1 which is, in fact, the A-G inequality (1.1) (set m = n, pi1 = ai and qi = pi ). Theorem 3.1 in [1] is the well-known Shannon Pn Pn inequality: Given a = a, i=1 i i=1 bi = b. Then n a X ai a ln ≤ ai ln ; ai , bi > 0. b bi i=1 It was obtained from (2.1) through the special case n aai Y bi b (2.4) ≤ . ai a i=1 Let us note that (2.4) is again a direct consequence of the A-G inequality. Indeed, in (1.1), setting ai → bi /aiP , p i → ai , i P = 1, . . . , n we have (2.4). Theorem 3.2 from [1] is Rényi’s m m inequality. Given a = a, i=1 i i=1 bi = b, then for α > 0, α 6= 1, m X 1 1 (aα b1−α − a) ≤ aαi b1−α − ai ; ai , bi ≥ 0. i α−1 α−1 i=1 In fact, in the proof given in [1], it was proved that Hölder’s inequality is a consequence of (2.1). As we have noted, Hölder’s inequality is also a consequence of the A-G inequality. 3. O N S OME I NEQUALITIES OF S.S. D RAGOMIR The following theorems were proved in [2]: J. Inequal. Pure and Appl. Math., 2(1) Art. 11, 2001 http://jipam.vu.edu.au/ A PPLICATION OF THE AG I NEQUALITY TO I NFORMATION T HEORY. 3 Theorem 3.1. Let ai ∈ (0, 1) and bi > 0(i = 1, . . . , n). If pi > 0 (i = 1, . . . , n) is such that P n i=1 pi = 1, then # " n # " n n X a2 X X ai a i (3.1) exp pi i − p i ai ≥ exp pi −1 b b i i i=1 i=1 i=1 n ai pi Y ai ≥ bi i=1 " a i # n X pi ≥ exp 1 − pi ai i=1 # " n n X X ≥ exp p i ai − p i bi i=1 i=1 with equality iff ai = bi for all i ∈ {1, . . . , n}. Theorem 3.2. Let ai P ∈ (0, 1) (i = 1, . . . , n) and bj > 0 (j = 1, . . . , m). PmIf pi > 0 (i = n 1, . . . , n) is such that i=1 pi = 1 and qj > 0 (j = 1, . . . , m) is such that j=1 qj = 1, then we have the inequality ! " n m # a i n m n X X X XX a q i j (3.2) − p i ai ≥ exp −1 exp pi a2i p i qj b bj i=1 j=1 j i=1 i=1 j=1 n Q ≥ aai i pi i=1 m Q q bj j Pni=1 pi ai j=1 a i # bj p i qj ≥ exp 1 − ai i=1 j=1 ! n m X X ≥ exp p i ai − qj b j . " m n X X i=1 j=1 The equality holds in (3.2) iff a1 = · · · = an = b1 = · · · = bm . First we give an improvement of the second and third inequality in (3.1). P Theorem 3.3. Let ai , bi and pi (i = 1, . . . , n) be positive real numbers with ni=1 pi = 1. Then a i −1 X a i n ai ai (3.3) exp pi −1 ≥ pi bi bi i=1 n pi ai Y ai ≥ bi i=1 " n #−1 X b i a i ≥ pi ai i=1 " a i # n X bi ≥ exp 1 − pi , ai i=1 with equality iff ai = bi , i = 1, . . . , n. J. Inequal. Pure and Appl. Math., 2(1) Art. 11, 2001 http://jipam.vu.edu.au/ 4 B. M OND AND J. P E ČARI Ć Proof. The first inequality in (3.3) is a simple consequence of the following well-known elementary inequality (3.4) ex−1 ≥ x, for all x ∈ R with equality iff x = 1. The second inequality is a simple consequence of the A-G inequality that is, in (1.1), set ai → (ai /bi )ai , i = 1, . . . , n. The third inequality is again a consequence of (1.1). Namely, for ai → (bi /ai )ai , i = 1, . . . , n, (1.1) becomes a i n n ai pi X Y bi bi ≤ pi ai ai i=1 i=1 which is equivalent to the third inequality. The last inequality is again a consequence of (3.4). Theorem 3.4. Let ai ∈ (0, 1) and bi > 0 (i = 1, . . . , n). If pi > 0, i = 1, . . . , n is such that Pn i=1 pi = 1, then " n # " n # n X a2 X X ai a i i (3.5) exp − pi ai ≥ exp −1 pi pi b b i i i=1 i=1 i=1 a i n X ai ≥ pi bi i=1 n pa Y ai i i ≥ bi i=1 " n #−1 X b i a i ≥ pi ai i=1 " a i # n X bi ≥ exp 1 − pi ai i=1 " n # n X X ≥ exp p i ai − p i bi i=1 i=1 with equality iff ai = bi for all i = 1, . . . , n. Proof. The theorem follows from Theorems 3.1 and 3.3. Theorem 3.5. Pn PmLet ai , pi (i = 1, . . . , n); bj , qj (j = 1, . . . , m) be positive numbers with i=1 pi = j=1 qj = 1. Then " n m # a i a i n X m XX X ai ai (3.6) exp p i qj −1 ≥ p i qj bj bj i=1 j=1 i=1 j=1 n Q ≥ aai i pi i=1 m Q q bj j Pni=1 pi ai j=1 " ≥ i=1 J. Inequal. Pure and Appl. Math., 2(1) Art. 11, 2001 ai #−1 bj p i qj ai j=1 n X m X http://jipam.vu.edu.au/ A PPLICATION OF THE AG I NEQUALITY TO I NFORMATION T HEORY. " ≥ exp 1 − 5 ai #−1 bj . p i qj a i j=1 n X m X i=1 Equality in (3.6) holds iff a1 = · · · = an = b1 = · · · = bm . Proof. The first and the last inequalities are simple consequences of (3.4). The second is also a simple consequence of the A-G inequality. Namely, we have n Q a i aai i pi n X m n Y m ai pi qj Y X ai ai i=1 = ≤ p q , P i j m n Q qj i=1 pi ai b b j j i=1 j=1 i=1 j=1 bj j=1 which is the second inequality in (3.6). By the A-G inequality, we have a i n Y m ai pi qj n X m Y X bj bj ≤ p i qj ai ai i=1 j=1 i=1 j=1 which gives the third inequality in (3.6). Theorem 3.6. Let the assumptions of Theorem 3.2 be satisfied. Then " n # " n m # a i X m n X X XX q a j i (3.7) exp − p i ai ≥ exp −1 pi a2i p i qj bj bj i=1 j=1 i=1 i=1 j=1 a i n X m X ai ≥ p i qj bj i=1 j=1 n Q ≥ aai i pi i=1 m Q q bj j Pni=1 pi ai j=1 ai #−1 bj ≥ p i qj ai i=1 j=1 " a i # n X m X bj ≥ exp 1 − p i qj ai i=1 j=1 " n # m X X ≥ exp p i ai − qj b j . " n X m X i=1 j=1 Equality holds in (3.7) iff a1 = · · · = an = b1 = · · · = bm . Proof. The theorem is a simple consequence of Theorems 3.2 and 3.5. 4. S OME I NEQUALITIES FOR D ISTANCE F UNCTIONS In 1951, Kullback and Leibler introduced the following distance function in Information Theory (see [4] or [5]) (4.1) KL(p, q) := n X i=1 J. Inequal. Pure and Appl. Math., 2(1) Art. 11, 2001 pi log pi , qi http://jipam.vu.edu.au/ 6 B. M OND AND J. P E ČARI Ć provided that p, q ∈ Rn++ := {x = (x1 , . . . , xn )∈ Rn , xi > 0, i = 1, . . . , n}. Another useful distance function is the χ2 -distance given by (see [5]) Dχ2 (p, q) := (4.2) n X p2 − q 2 i i qi i=1 , where p, q ∈ Rn++ . S.S. Dragomir [2] introduced the following two new distance functions (4.3) P2 (p, q) := n pi X pi i=1 qi −1 and (4.4) n pi X qi P1 (p, q) := − +1 , p i i=1 provided p, q ∈ Rn++ . The following inequality connecting all the above four distance functions holds. Theorem 4.1. Let p, q ∈ Rn++ with pi ∈ (0, 1). Then we have the inequality: Dχ2 (p, q) + Qn − Pn ≥ P2 (p, q) 1 ≥ n ln P2 (p, q) + 1 n ≥ KL(p, q) 1 ≥ −n ln − P1 (p, q) + 1 n ≥ P1 (p, q) ≥ Pn − Qn , P P where Pn = ni=1 pi = 1, Qn = ni=1 qi . Equality holds in (4.5) iff pi = qi (i = 1, . . . , n). (4.5) Proof. Set in (3.5), pi = 1/n, ai = pi , bi = qi (i = 1, . . . , n) and take logarithms. After multiplication by n, we get (4.5). Corollary 4.2. Let p, q be probability distributions. Then we have (4.6) Dχ2 (p, q) ≥ P2 (p, q) 1 ≥ n ln P2 (p, q) + 1 n ≥ KL(p, q) 1 ≥ −n ln 1 − P1 (p, q) n ≥ P1 (p, q) ≥ 0. Equality holds in (4.6) iff p = q. Remark 4.3. Inequalities (4.5) and (4.6) are improvements of related results in [2]. J. Inequal. Pure and Appl. Math., 2(1) Art. 11, 2001 http://jipam.vu.edu.au/ A PPLICATION OF THE AG I NEQUALITY TO I NFORMATION T HEORY. 7 5. A PPLICATIONS FOR S HANNON ’ S E NTROPY The entropy of a random variable is a measure of the uncertainty of the random variable, it is a measure of the amount of information required on the average to describe the random variable. Let p(x), x ∈ χ be a probability mass function. Define the Shannon’s entropy f of a random variable X having the probability distribution p by X 1 (5.1) H(X) := p(x) log . p(x) x∈χ In the above definition we use the convention (based on continuity arguments) that 0 log 0q = 1 0 and p log p0 = ∞. Now assume that |χ| (card(χ) = |χ|) is finite and let u(x) = |χ| be the uniform probability mass function in χ. It is well known that [5, p. 27] X p(x) (5.2) KL(p, q) = p(x) log = log |χ| − H(X). q(x) x∈χ The following result is important in Information Theory [5, p. 27]: Theorem 5.1. Let X, p and χ be as above. Then H(X) ≤ log |χ|, (5.3) with equality if and only if X has a uniform distribution over χ. In what follows, by the use of Corollary 4.2, we are able to point out the following estimate for the difference log |χ| − H(X), that is, we shall give the following improvement of Theorem 9 from [2]: Theorem 5.2. Let X, p and χ be as above. Then X (5.4) |χ|E(X) − 1 ≥ |χ|p(x) [p(x)]p(x) − 1 x∈χ ( ≥ |χ| ln 1 X p(x) [|χ| [p(x)]p(x) |χ| x∈χ ) ≥ ln |χ| − H(X) ( ) 1 X −p(x) ≥ −|x| ln |χ| [p(x)]−p(x) |χ| x∈χ X ≥ |χ|−p(x) [p(x)]−p(x) − 1 ≥ 0, x∈χ where E(X) is the informational energy of X, i.e., E(X) := 1 for all x ∈ χ. in (5.4) iff p(x) = |χ| P x∈χ Proof. The proof is obvious by Corollary 4.2 by choosing u(x) = p2 (x). The equality holds 1 . |χ| 6. A PPLICATIONS FOR M UTUAL I NFORMATION We consider mutual information, which is a measure of the amount of information that one random variable contains about another random variable. It is the reduction of uncertainty of one random variable due to the knowledge of the other [6, p. 18]. To be more precise, consider two random variables X and Y with a joint probability mass function r(x, y) and marginal probability mass functions p(x) and q(y), x ∈ X , y ∈ Y. J. Inequal. Pure and Appl. Math., 2(1) Art. 11, 2001 http://jipam.vu.edu.au/ 8 B. M OND AND J. P E ČARI Ć The mutual information is the relative entropy between the joint distribution and the product distribution, that is, X r(x, y) I(X; Y ) = r(x, y) log = D(r, pq). p(x)q(y) x∈χ,y∈Y The following result is well known [6, p. 27]. Theorem 6.1. (Non-negativity of mutual information). For any two random variables X, Y I(X, Y ) ≥ 0, (6.1) with equality iff X and Y are independent. In what follows, by the use of Corollary 4.2, we are able to point out the following estimate for the mutual information, that is, the following improvement of Theorem 11 of [2]: Theorem 6.2. Let X and Y be as above. Then we have the inequality " # X X r2 (x, y) X X r(x, y) r(x,y) −1≥ −1 p(x)q(y) p(x)q(y) x∈χ y∈Y x∈χ y∈Y " r(x,y) # X X r(x, y) 1 ≥ |χ| |Y| ln |χ||Y| x∈χ y∈Y p(x)q(y) ≥ I(X, Y ) ( r(x,y) 1 X X p(x)q(y) ≥ −|χ| |Y| ln |χ||Y| x∈χ y∈Y r(x, y) " # r(x,y) XX r(x, y) ≥ 0. ≥ 1− p(x)q(y) x∈χ y∈Y The equality holds in all inequalities iff X and Y are independent. R EFERENCES [1] I.A. ABOU-TAIR AND W.T. SULAIMAN, Inequalities via convex functions, Internat. J. Math. and Math. Sci., 22 (1999), 543–546. [2] S.S. DRAGOMIR, An inequality for logarithmic mapping and applications for the relative entropy, RGMIA Res. Rep. Coll., 3(2) (2000), Article 1, http://rgmia.vu.edu.au/v3n2.html [3] S. KULLBACK (1951), 79–86. AND R.A. LEIBLER, On information and sufficiency, Annals Maths. Statist., 22 [4] S. KULLBACK, Information and Statistics, J. Wiley, New York, 1959. [5] A. BEN-TAL, A. BEN-ISRAEL AND M. TEBOULLE, Certainty equivalents and information measures: duality and extremal, J. Math. Anal. Appl., 157 (1991), 211–236. [6] T.M. COVER 1991. AND J.A.THOMAS, Elements of Information Theory, John Wiley and Sons, Inc., J. Inequal. Pure and Appl. Math., 2(1) Art. 11, 2001 http://jipam.vu.edu.au/