Quantum Computing (Fall 2013) Instructor: Shengyu Zhang. Lecture 10 Quantum information 2: entropy and mutual information 1. Review of classical information theory Let’s first review some basic concepts in information theory. Suppose that π is a discrete random variable on the sample space X, and the probability distribution function is π(π₯) = Prβ‘[π = π₯]. Following the standard notation in information theory, we also use the capital letter π to denote the distribution, and by writing π₯ ← π, we mean to draw a sample π₯ from the distribution π. Entropy. The entropy π»(π) of a discrete random variable π is defined by π»(π) = − ∑ π(π₯) log 2 π(π₯). π₯∈X Here we use the convention that 0 log 2 0 = 0. One intuitive explanation about this definition is that entropy measures the amount of the uncertainty of a random variable. The more entropy π has, the more uncertain it is. In particular, π»(π) = 0 iff π is fixed/deterministic. At the other extreme, the maximum entropy for any random variable on X is log 2 |X|, achieved by the uniform distribution. The range of the entropy is 0 ≤ π»(π) ≤ log 2 |X|, Another interpretation of entropy is the compression rate, which will be the subject of the next lecture. A special case is binary entropy, when π takes only 2 possible values. Suppose Pr[π = 1 1 π₯] = π, then the binary entropy is π»(π) = π log π + (1 − π) log 1−πβ‘. How does this quantity vary with π? See the following figure. In particular, it reaches its maximum when π = 1/2β‘. Joint distribution. Sometimes we have more than one random variable under the concern. If (π, π) is a pair of random variables on X × Y, distributed according to a joint distribution π(π₯, π¦), then the total entropy is simply the following, treating (π, π) as one random variable in a bigger sample space X × Y: π»(π, π) = − ∑ π(π₯, π¦) log 2 π(π₯, π¦). π₯∈X,β‘π¦∈Y Marginal distribution and conditional distribution. The joint variable (π, π) has a marginal distribution π(π₯)β‘over X defined by π(π₯) = ∑π¦ π(π₯, π¦). For any fixed π¦ ∈ Y with positive probability π(π₯, π¦) for some π₯, there is a conditional distribution π|(π = π¦) defined by ππ¦ (π₯) = π(π₯, π¦)/π(π₯). The conditional entropy is defined as the average entropy of the conditional distribution: π»(π|π) = πΈπ¦←π [π»(π|π = π¦)], From this definition, it is easily seen that π»(π|π) ≥ 0. In general, conditioning reduces entropy: π»(π|π) ≤ π»(π) Knowing something else (π) always helps you to knowing the target (X). Chain rule. Another basic fact is the chain rule: π»(π, π) = π»(π) + π»(π|π).β‘ Think of this as the following: The uncertainty of π»(π, π) is that of one variable π, plus that of the other variable π after knowing π. The chain rule also works with conditional entropy, π»(π, π|π) = π»(π|π) + π»(π|π, π), and with more variables: π»(π1 , … , ππ |π) = π»(π1 |π) + π»(π2 |π1 , π) + β― + π»(ππ |π1 , … , ππ−1 , π)β‘β‘β‘ ≤ π»(π1 |π) + π»(π2|π) + β― + π»(ππ |π), where the inequality is usually referred to as the subadditivity of entropy. Relative entropy. Another important concept is that of the relative entropy of two distributions π and π. π»(π||π) = ∑ π(π₯) log 2 π₯∈X π(π₯) β‘. π(π₯) Relative entropy can sometimes serve as a good measure as distance of two measures. A basic property is that it is nonnegative. Fact. π»(π||π) ≥ 0 with equality iff π = π. ln π₯ Proof. Use the fact that log 2 π₯ = ln 2 ≤ π₯−1 ln 2 in the definition of π»(π||π). Mutual information. For a joint distribution (π, π) = π(π₯, π¦), the mutual information between π and π is πΌ(π; π) = π·(π(π₯, π¦)||π(π₯)π(π¦)) = π»(π) − π»(π|π) = π»(π) + π»(π) − π»(π, π). It is a good exercise to verify the above equalities. But the second one has a clear explanation: We mentioned earlier that conditioning always reduces the entropy. How much does it reduce? Exactly the mutual information. So the mutual information is the amount of uncertainty of π minus that when π is known. In this way, it measures how much π contains the information of π. It is not hard to verify that it’s symmetric: πΌ(π; π) = π»(π) − π»(π|π) = π»(π) − π»(π|π), and we’ve implicitly said that it’s nonnegative: πΌ(π; π) ≥ 0, i.e. πΌ(π; π) ≤ min{π»(π), π»(π)} One can also add conditioning on this. The conditional mutual information is defined by πΌ(π; π|π) = πΈπ§ [πΌ(π; πβ‘|β‘(π = π§))] It is not hard to see that πΌ(π; π|π) ≤ min{π»(π|π), π»(π|π)} ≤ log 2 |X| and πΌ(π; π|π) = π»(π|π) − π»(π|ππ). Data processing inequality. A sequence of random variables π1 , π2 , …β‘form a Markov Chain, denoted by π1 → π2 → β― if Pr[ππ = π₯|ππ−1 = π₯π−1 ] = Pr[ππ = π₯|π1 = π₯1 , … , ππ−1 = π₯π−1 ] Namely, ππ depends only on ππ−1, but not the “earlier history”. Fact. If π → π → π, then πΌ(π; π) ≤ πΌ(π; π). Intuitively, it says that processing data always reduces the contained information. 2. Quantum entropy We can extend the classical Shannon entropy to the quantum case. The von Newmann entropy of a quantum state π is defined by π(π) = −π‘π(π log π) Equivalently, if the eigenvalues of π are λ = (λ1 , … , λπ ), then π(π) = π»(π)β‘. (Note that tr(π) = 1 and ρ β½ 0, so π is a distribution.) Similar to the classical case, the quantum entropy is a measure of uncertainty of the quantum states. Fact. π(π) ≥ 0, with equality iff π is a pure state. Relative entropy. The quantum relative entropy is defined by π(π||π) = tr(π log π − π log π) Let’s first verify that this extends the classical relative entropy. When π and π commute and they have eigenvalues {ππ } and {ππ }, then π(π||π) = ∑(ππ log ππ − ππ log ππ ) = ∑ ππ log π π ππ ππ consistent with the classical case. The relative entropy can be viewed as a distance measure between states. Theorem (Klein’s inequality). π(π||π) ≥ 0 with equality holding iff π = π. Another “distance” property is that taking partial trace reduces relative entropy. Fact. π(ππ΄ ||ππ΄ ) ≤ π(ππ΄π΅ ||ππ΄π΅ ). The theorem can be used to prove the following result. Fact. Projective measurements can only increase entropy. More precisely, suppose {ππ } is a complete set of orthogonal projectors and π is a density operator. Then π(∑π ππ πππ ) ≥ π(π) with equality iff ππ πππ = π. Conditional entropy. Suppose that a joint quantum system (π΄, π΅) is in state ππ΄π΅ . For convenience, we sometimes use (π΄, π΅) to denote ππ΄π΅ . We define the conditional entropy as π(π΄|π΅) = π(π΄, π΅) − π(π΅) Note that classically conditional entropy can also be defined as the average of π(π΄|π), but we cannot do it for quantum entropy. Actually, different than the classical case, the quantum conditional entropy can be negative. For example, consider an EPR pair, S(A|B) = 0 − 1 = 1, but the average definition won't give us this: A and B are in either 00 or 11. On the other hand, the quantum conditional entropy cannot be too negative. |π(π΄) − π(π΅)| ≤ π(π΄, π΅) ≤ π(π΄) + π(π΅) Convexity. Suppose ππ ’s are quantum states and ππ ’s form a distribution. ∑ ππ π(ππ ) ≤ π (∑ ππ ππ ) ≤ ∑ ππ π(ππ ) + π»(π) π π π Mutual information. π(π΄; π΅) = π(π΄) + π(π΅) − π(π΄, π΅) = π(π΄) − π(π΄|π΅) = π(π΅) − π(π΅|π΄) Classically the mutual information is upper bounded by individual entropy, i.e. πΌ(π΄; π΅) ≤ min{π»(π΄), π»(π΅)}, but quantum mutual information isn’t… Well it still holds---up to a factor of 2: πΌ(π΄; π΅) ≤ min{2π(π΄), 2π(π΅)} Strong subadditivity. π(π΄, π΅, πΆ) + π(π΅) ≤ π(π΄, π΅) + π(π΅, πΆ). Notes The definitions and theorems in this lecture about classical information theory can be found in the standard textbook such as [CT06] (Chapter 1). The quantum part can be found in [NC00] (Chapter 11). References [CT06] Thomas Cover, Joy Thomas. Elements of Information Theory, Second Edition, Wiley InterScience, 2006. [NC00] Michael Nielsen and Isaac Chuang, Quantum Computation and Quantum Information, Cambridge University Press, 2000. Exercise 1. What’s the von Newmann entropy of the following quantum states? ο¬ |π〉 = |0〉 ο¬ |π〉 = 1 √2 (|0〉 + |1〉)β‘ 1 2 1 ο¬ π = 3[ ]. 1 1 2. What’s the mutual information between the two parts of an EPR state?