A Refined Analysis of the Poisson Channel in the HighPhoton-Efficiency Regime The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Wang, Ligong, and Gregory W. Wornell. “A Refined Analysis of the Poisson Channel in the High-Photon-Efficiency Regime.” IEEE Trans. Inform. Theory 60, no. 7 (July 2014): 4299–4311. As Published http://dx.doi.org/10.1109/TIT.2014.2320718 Publisher Institute of Electrical and Electronics Engineers (IEEE) Version Author's final manuscript Accessed Thu May 26 00:31:42 EDT 2016 Citable Link http://hdl.handle.net/1721.1/91054 Terms of Use Creative Commons Attribution-Noncommercial-Share Alike Detailed Terms http://creativecommons.org/licenses/by-nc-sa/4.0/ 1 A refined analysis of the Poisson channel in the high-photon-efficiency regime arXiv:1401.5767v2 [cs.IT] 23 Apr 2014 Ligong Wang, Member, IEEE and Gregory W. Wornell, Fellow, IEEE Abstract—We study the discrete-time Poisson channel under the constraint that its average input power (in photons per channel use) must not exceed some constant E . We consider the wideband, high-photon-efficiency extreme where E approaches zero, and where the channel’s “dark current” approaches zero proportionally with E . Improving over a previously obtained firstorder capacity approximation, we derive a refined approximation, which includes the exact characterization of the second-order term, as well as an asymptotic characterization of the thirdorder term with respect to the dark current. We also show that pulse-position modulation is nearly optimal in this regime. Index Terms—Optical communication, Poisson channel, channel capacity, wideband, low SNR. I. I NTRODUCTION W E consider the discrete-time memoryless Poisson channel whose input x is in the set R+ 0 of nonnegative reals and whose output y is in the set Z+ of nonnegative integers. 0 Conditional on the input X = x, the output Y has a Poisson distribution of mean (λ + x), where λ ≥ 0 is called the “dark current” and is a constant that does not depend on the input X. We denote the Poisson distribution of mean ξ by Poisξ (·) so Poisξ (y) = e−ξ ξy , y! y ∈ Z+ 0. (1) output y is the actual number of photons that are detected in the pulse duration. The dark current λ is the average number of extraneous counts that appear in y. We note that, although the name “dark current” is traditionally used in the Poissonchannel literature, these extraneous counts usually include both the detector’s “dark clicks” (which are where the name “dark current” comes from) and photons in background radiation. We impose an average-power constraint2 on the input E[X] ≤ E for some E > 0. In applications like free-space optical communications, the cost of producing and successfully transmitting photons is high, hence high photon-information efficiency—amount of information transmitted per photon, which we henceforth call simply “photon efficiency”—is desirable. As we later demonstrate, this can be achieved in the wideband regime, where the pulse duration of the input approaches zero and, assuming that the continuous-time average input power is fixed, where E approaches zero proportionally with the pulse duration. Note that in this regime the average number of detected background photons or dark clicks also tends to zero proportionally with the pulse duration. Hence we have the linear relation With this notation the channel law W (·|·) is W (y|x) = Poisλ+x (y), x∈ R+ 0 ,y λ = cE, ∈ Z+ 0. (2) This channel models pulse-amplitude modulated optical communication where the transmitter sends light signals in coherent states (which are usually produced using laser devices), and where the receiver employs direct detection (i.e., photon counting) [1]. The channel input x describes the expected number of signal photons (i.e., photons that come from the input light signal rather than noise) to be detected in the pulse duration, and is proportional to the light signal’s intensity,1 the pulse duration, the channel’s transmissivity, and the detector’s efficiency; see [1] for details. The channel This work was supported in part by the DARPA InPho program under Contract No. HR0011-10-C-0159, and by AFOSR under Grant No. FA955011-1-0183. This paper was presented in part at the 2012 IEEE Information Theory Workshop (ITW) in Lausanne, Switzerland, and will be presented in part at the 2014 IEEE International Symposium on Information Theory (ISIT) in Honolulu, HI, USA. L. Wang is with the Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail: wlg@mit.edu). G.W. Wornell is with the Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail: gww@mit.edu). 1 We assume that the pulses are square, which is usually the case in practice. If they are not square, the light signal’s intensity should be averaged over the pulse duration. (3) (4) where c is some nonnegative constant that does not change with E. In practice, asymptotic results in this regime are useful in scenarios where E is small and where λ is comparable to or much smaller than E. Scenarios where E is small but λ is much larger is better captured by the model where λ stays constant while E tends to zero; see [2, Proposition 2]. We denote the capacity (in nats per channel use) of the channel (2) under power constraint (3) with dark current (4) by C(E, c), then C(E, c) = sup I(X; Y ), (5) E[X]≤E where the mutual information is computed from the channel law (2) and is maximized over input distributions satisfying (3), with dark current λ given by (4). As we shall see, our results on the asymptotic behavior of C(E, c) hold irrespectively of whether a peak-power constraint X ≤A with probability 1 (6) 2 Here “power” is in discrete time, means expected number of detected signal photons per channel use, and is proportional to the continuous-time physical power times the pulse duration. 2 is imposed or not, as long as A is positive and does not approach zero together with E. We now formally define the maximum achievable photon efficiency CPE (E, c), where the subscript notation services a reminder that this quantity is photon efficiency and not capacity: C(E, c) CPE (E, c) , . (7) E Various capacity results for the discrete-time Poisson channel have been obtained [2]–[6]. Among them, [2, Proposition 1] considers the same scenario as the present paper and asserts that C(E, c) lim = 1, c ∈ [0, ∞). (8) E↓0 E log 1 E In other words, the maximum photon efficiency satisfies 1 1 , c ∈ [0, ∞). (9) CPE (E, c) = log + o log E E Furthermore, the proof of [2, Proposition 1] shows that the limit (8) is achievable using on-off keying. In the Appendix we provide a proof for the converse part of (8) that is much simpler than the original proof in [2].3 The approximation in (9) can be compared to the maximum photon efficiency achievable on the pure-loss bosonic channel [7]. This is a quantum optical communication channel that attenuates the input optical signal, but that does not add any noise to it. The transmitter and the receiver of this channel may employ any structure permitted by quantum physics. When the transmitter is restricted to sending coherent optical states, and when the receiver is restricted to ideal direct detection (with no dark counts), the pure-loss bosonic channel becomes the Poisson channel (2) with λ = 0. We denote by CPE-bosonic (E) the maximum photon efficiency of the pureloss bosonic channel under an average-power constraint that is equivalent to (3). The value of CPE-bosonic (E) can be easily computed using the explicit capacity formula [7, Eq. (4)], which yields 1 + 1 + o(1). (10) E Comparing (9) and (10) shows the following: • For the pure-loss bosonic channel in the wideband regime, coherent-state inputs and direct detection are optimal up to the first-order term in photon efficiency (or, equivalently, in capacity). For example, they achieve infinite capacity per unit cost [8], [9]. • The dark current does not affect this first-order term. In the present paper, we refine the analysis in [2] in two aspects. First, we provide a more accurate approximation for CPE (E, c) that contains higher-order terms, and that reflects the influence of the dark current, i.e., of the constant c. Second, we identify near-optimal modulation schemes that facilitate code design for this channel. CPE-bosonic (E) = log 3 For the achievability part of (8), we note that its proof can be simplified by letting the decoder ignore multiple photons, rather than considering all possible values of Y as in [2]. This can be seen via the achievability proofs in the current paper, which do ignore multiple photons at the receiver, and which yield stronger results than those in [2]. Some progress has been made in improving the approximation (9). It is noticed in [10] that the photon efficiency achievable on the Poisson channel (2) with λ = 0 may be of the form 1 1 (11) log − log log + O(1), E E which means that restriction to coherent-state inputs and direct detection may induce a loss in the second-order term in the photon efficiency on the pure-loss bosonic channel. For practical values of E, this second-order term can be significant. For example, for E = 10−5 , the difference between the firstorder term in (9) and the first- and second-order terms in (11) is about 20%. The analysis in [10] (whose main focus is not on the Poisson channel itself) is based on certain assumptions on the input distribution.4 It is therefore unclear from [10] if (11) is the maximum photon efficiency achievable on the Poisson channel (2) with λ = 0 subject to constraint (3) alone, i.e., if (11) is the correct expression for CPE (E, 0). In the present paper, we prove that this is indeed the case. Recent works such as [10], [11] often ignore the dark current in the Poisson channel. It has been unknown to us whether the dark current, i.e., whether the constant c influences the second term in (11) and, if not, whether it influences the next term in photon efficiency. We show that the constant c does not influence the second-order term in CPE (E, c), but does influence the third-order, constant term. It has long been known that that infinite photon efficiency on the Poisson channel with zero dark current can be achieved using pulse-position modulation (PPM) combined with an outer code [3], [4]. Further, [11] observes that PPM can achieve (11) on such a channel. PPM greatly simplifies the coding task for this channel, since one can easily apply existing codes, such as a Reed-Solomon code, to the PPM “super symbols”; while the on-off keying scheme that achieves (11) has a highly skewed input distribution and is hence difficult to code. The question then arises: how useful is PPM when there is a positive dark current? This question has two parts. First, is PPM still near optimal in terms of capacity (or photon efficiency) when there is a positive dark current? Second, does PPM still simplify coding when there is a dark current? We answer the first part of the question in the affirmative in our main results. We cannot fully answer the second question in this paper, but we shall discuss it in the concluding remarks in Section VII. The rest of this paper is arranged as follows. We introduce some notation and formally define PPM in Section II. We state and discuss our main results in Section III. We then prove the achievability parts of these results in Sections IV and V, and prove the converse parts in Section VI. We conclude the paper with numerical comparison of the bounds and some remarks in Section VII. II. N OTATION AND PPM We usually use a lower-case letter like x to denote a constant, and an upper-case letter like X to denote a random variable. 4 According to conversation with the authors. 3 We use natural logarithms, and measure information in nats. We use the usual o(·) and O(·) notation to describe behaviors of functions of E in the limit where E approaches zero with other parameters, if any, held fixed. Specifically, given a reference function f (·) (which might be the constant 1), a function described as o(f (E)) satisfies lim E↓0 o(f (E)) = 0, f (E) and a function described as O(f (E)) satisfies O(f (E)) < ∞. lim E↓0 f (E) (12) (13) We emphasize that, in particular, we do not use o(·) and O(·) to describe how functions behave with respect to c. We adopt the convention 0 log 0 = 0. (14) We next formally describe what we mean by PPM. On the transmitter side: • The channel uses are divided into frames of equal lengths; • In each frame, there is only one channel input that is positive, which we call the “pulse”, while all the other inputs are zeros; • The pulses in all frames have the same amplitude. On the receiver side, we distinguish between two cases, which we call simple PPM and soft-decision PPM, respectively. In simple PPM, the receiver records at most one pulse in each frame; if more than one pulse is detected in a frame, then that frame is recorded as an “erasure”, as if no pulse is detected at all. In soft-decision PPM, the receiver records up to two pulses in each frame; frames containing no or more than two detected pulses are treated as erasures. III. M AIN R ESULTS AND D ISCUSSIONS The following theorem provides an approximation for the photon efficiency CPE (E, c) in the high-photon-efficiency regime. Theorem 1. The maximum photon efficiency CPE (E, c) achievable on the Poisson channel (2) subject to constraint (3) and with dark current (4), which we define in (7), satisfies 1 1 − log log + O(1), c ∈ [0, ∞). (15) E E Furthermore, denote 1 1 (16) K(E, c) , CPE (E, c) − log + log log , E E then K(E, c) K(E, c) lim lim = lim lim = −1. (17) c→∞ E↓0 log c c→∞ E↓0 log c CPE (E, c) = log Our second theorem shows that PPM is nearly optimal in the regime of interest. Theorem 2. The asymptotic expression on the right-hand side of (15) is achievable with simple PPM. The limits in (17) are achievable with soft-decision PPM. The achievability part of Theorem 1 follows from Theorem 2. To prove the latter, we need to show two things: first, that the largest photon efficiency that is achievable with simple PPM, which we henceforth denote by CPE-PPM (E, c), satisfies CPE-PPM (E, c) ≥ log 1 1 −log log +O(1), E E c ∈ [0, ∞); (18) and second, that the maximum photon efficiency that is achievable with soft-decision PPM, which we henceforth denote by CPE-PPM(SD) (E, c), satisfies CPE-PPM(SD) (E, c) − log lim lim log c c→∞ E↓0 1 1 + log log E E ≥ −1. (19) To prove the converse part of Theorem 1, we note that the capacity of the channel is monotonically decreasing in c [2], and hence so is the photon efficiency. It thus suffices to show two things: first, that, in the absence of dark current, the largest photon efficiency achievable with any scheme satisfies CPE (E, 0) ≤ log 1 1 − log log + O(1); E E (20) and second, that 1 1 + log log E E ≤ −1. log c CPE (E, c) − log lim lim c→∞ E↓0 (21) We prove (18) in Section IV, (19) in Section V, and (20) and (21) in Section VI. To better understand the capacity results in Theorem 1, we make the following remarks. • Choosing c = 0 in (15) confirms that (11) is the correct asymptotic expression up to the second-order term for CPE (E, 0). Compared to (10) this means that, on the pure-loss bosonic channel, for small E, restricting the receiver to using direct detection induces a loss in photon efficiency of about log log E1 nats per photon. Note that the capacity of the pure-loss bosonic channel can be achieved using coherent input states only [7], so this loss is indeed due to direct detection, and not due to coherent input states. Attempts to overcome this loss by employing other feasible detection techniques have so far been unsuccessful [10], [12]. • The expression on the right-hand side of (15) does not depend on c, so the value of c affects neither the firstorder term nor the second-order term in CPE (E, c). In particular, these two terms do not depend on whether c is zero or positive. • The first term in CPE (E, c) that c does affect is the third, constant term. Indeed, though we have not given an exact expression for the constant term, the asymptotic property (17) shows that c affects the constant term in such a way that, for large c, the constant term is approximately − log c. • Theorem 1 suggests the approximation CPE (E, c) ≈ log 1 1 − log log − log c. E E (22) 4 The terms constituting the approximation error, roughly speaking, are either vanishing for small E, or small compared to log c for large c. Note that if we fix the dark current, i.e., if we fix the product cE, then the first and third terms on the right-hand side of (22) cancel. This is intuitively in agreement with [2, Proposition 2], which asserts that, for fixed dark current, photon efficiency scales like some constant times log log(1/E) and hence not like log(1/E). We note, however, that (17) cannot be derived directly using (15) together with [2, Proposition 2], because we cannot change the order of the limits. In (17) we do not let E tend to zero and c tend to infinity simultaneously. Instead, we first let E tend to zero to close down onto the constant term in CPE (E, c) with respect to E, and then let c tend to infinity to study the asymptotic behavior of this constant term with respect to c. • The approximation (22) is good for large c, but diverges as c tends to zero. We hence need a better approximation for the constant term, which behaves like − log c for large c, but which does not diverge for small c. As both the nonasymptotic bounds and the numerical simulations we show later will suggest, − log(1 + c) is a good approximation: 1 1 (23) CPE (E, c) ≈ log − log log − log(1 + c). E E For modulation and coding considerations, we make some remarks following Theorem 2. • Theorem 2 shows that PPM is optimal up to the secondorder term in photon efficiency when c = 0. Hence, compared to the restriction to the receiver using direct detection, the further restriction to PPM induces only a small additional loss in photon efficiency. • Furthermore, for the third-order, constant term, (softdecision) PPM is also not far from optimal, in the sense that it achieves the optimal asymptotic behavior of this term for large c. • In Section IV we show that • CPE-PPM (E, c) 1 3 1 ≥ log − log log − c − log(1 + c) − + o(1). (24) E E 2 A careful analysis will confirm that the bound (24) is tight in the regime of interest, in the sense that simple PPM cannot achieve a constant term that is better than linear in c (while being second-order optimal). This is in contrast to soft-decision PPM, which can achieve a constant term that is logarithmic in c. In particular, simple PPM is clearly not third-order optimal.5 In the PPM schemes that achieve (18) and (19), which we describe in Sections IV and V, the pulse has amplitude 1/ log(1/E) , which depends on E, and which tends to zero as E tends to zero. An on-off keying scheme having the same amplitude for its “on” signals can also achieve 5 A scheme between simple and soft-decision PPM is the following. When detecting multiple pulses in a frame, the receiver randomly selects and records one of the positions (possibly together with a “quality” flag). This scheme outperforms simple PPM in photon efficiency, but its third-order term is still linear in c: it scales like −c/2 instead of −c. • these lower bounds, because the dependence between the channel inputs introduced by PPM can only reduce the total mutual information between channel inputs and outputs. These (PPM and on-off keying schemes) are different from the on-off keying scheme used in [2] where the “on” signal has a fixed amplitude that does not depend on E. The latter on-off keying scheme, as well as any PPM scheme with a fixed pulse amplitude (with respect to E), is not second-order optimal on the Poisson channel. Because in the PPM schemes to achieve (18) and (19) the pulse tends to zero as E tends to zero, we know that, as claimed in Section I, both our theorems hold if a constant (i.e., not approaching zero together with E) peak-power constraint as in (6) is imposed on X in addition to (3). We next proceed to prove our main results, and leave further discussions to Section VII. IV. P ROOF OF THE S IMPLE -PPM L OWER B OUND (18) Assume that E is small enough so that E log 1 <1 E E < e−c . (25a) (25b) Consider the following simple PPM scheme: Scheme 1. • The channel uses are divided into frames of length b, so each frame contains b input symbols x1 , . . . , xb and b corresponding output symbols y1 , . . . , yb . We set 1 . (26) b= E log E1 • Within each length-b frame, there is always one input that equals η, and all the other (b − 1) inputs are zeros. Each frame is then fully specified by the position of its unique nonzero symbol, i.e., its pulse position. We consider each frame as a “super input symbol” x̃ that takes value in {1, . . . , b}. Here x̃ = i, i ∈ {1, . . . , b}, means xi = η xj = 0, (27a) j ∈ {1, . . . , b}, j 6= i. (27b) To meet the average-power constraint (3) with equality, we require η = bE. • (28) The b output symbols y1 , . . . , yb are mapped to one “super output symbol” ỹ that takes value in {1, . . . , b, ?} in the following way: ỹ = i, i ∈ {1, . . . , b}, if yi is the unique nonzero term in {y1 , . . . , yb }; and ỹ = ? if there is more than one or no nonzero term in {y1 , . . . , yb }. We have the following lower bound on the photon efficiency achieved by the above scheme. 5 Proposition 1. Under the conditions (25), Scheme 1 achieves photon efficiency CPE-PPM (E, c) η cE ≥ 1− log b − cη log b − 1 + log(1 + c) 2 η η o cE n log(1 − cη) + log 1 − + 1+ η 2 cE (29) −1− η W̃ (?|i) = 1 − p0 − (b − 1)p1 , Denote the capacity of this super channel by C̃(E, c, b, η), then C̃(E, c, b, η) = max I(X̃; Ỹ ). and simplify (29) to obtain CPE-PPM (E, c) 1 1 3 ≥ log − log log − c − log(1 + c) − + o(1) (31) E E 2 1 1 (32) = log − log log + O(1). E E When proving Proposition 1, as well as Propositions 2 and 3, we frequently use the inequalities summarized in the following lemma. Lemma 1. For all a ≥ 0, 1−e −a (33) ≤ a, (34) 1 1 − e−a ≥ a − a2 . (35) 2 Proof of Proposition 1: We compute the transition matrix of the PPM “super channel” as follows: W̃ (i|i) = Pr[Ỹ = i|X̃ = i] (36) = Pr [ Yi ≥ 1| Xi = η] b Y Pr [ Yk = 0 |Xk = 0] (37) k=1 k6=i b−1 = 1 − W (0|η) W (0|0) (39) =e , p0 , −e i ∈ {1, . . . , b}; (40) (41) W̃ (j|i) = Pr[Ỹ = j|X̃ = i] = Pr [ Yi = 0| Xi = η] Pr [ Yj ≥ 1| Xj = 0] b Y · Pr [ Yk = 0| Xk = 0] (42) = (1 − e −η−cE )e −(b−1)cE −(b−1)cE −η−bcE (38) (43) =e −η−cE (1 − e −cE )e −(b−2)cE = e−η−(b−1)cE − e−η−bcE , p1 , i, j ∈ {1, . . . , b}, i 6= j; Note that the total input power (i.e., expected number of detected signal photons) in each frame equals η. Therefore we have the following lower bound on CPE-PPM (E, c): C̃(E, c, b, η) . (51) η It can be easily verified that the optimal input distribution for (50) is the uniform distribution 1 PX̃ (i) = , i ∈ {1, . . . , b}, (52) b which induces the following marginal distribution on Ỹ : CPE-PPM (E, c) ≥ p0 + (b − 1)p1 , i ∈ {1, . . . , b}, b PỸ (?) = 1 − p0 − (b − 1)p1 . PỸ (i) = (44) (45) (46) (47) (53a) (53b) We can now use the above joint distribution on (X̃, Ỹ ) to lower-bound C̃(E, c, b, η) as follows: C̃(E, c, b, η) = I(X̃; Ỹ ) = H(Ỹ ) − H(Ỹ |X̃) (54) (55) 1 1 − p0 − (b − 1)p1 b + (p0 + (b − 1)p1 ) log p0 + (b − 1)p1 1 − (1 − p0 − (b − 1)p1 ) log 1 − p0 − (b − 1)p1 1 1 − (b − 1)p1 log (56) − p0 log p0 p1 bp0 bp1 = p0 log + (b − 1)p1 log p0 + (b − 1)p1 p0 + (b − 1)p1 (57) p0 + (b − 1)p1 = p0 log b − p0 log p | {z 0 } = (1 − p0 − (b − 1)p1 ) log ≤log 1+ bp1 p0 p0 + (b − 1)p1 − (b − 1)p1 log bp {z 1 } | =log 1+ k=1 k∈{i,j} / b−2 = W (0|η) 1 − W (0|0) W (0|0) (50) PX̃ Proof of (18) using Proposition 1: We plug (26) and (28) into (29), note that 1 1 = + O(1) (30) 1 E log E E log E1 (48) We now have, irrespective of the distribution of X̃, 1 1 + (b − 1)p1 log H(Ỹ |X̃) = p0 log p0 p1 1 . + 1 − p0 − (b − 1)p1 log 1 − p0 − (b − 1)p1 (49) where b and η are given in (26) and (28), respectively. log(1 + a) ≤ a, i ∈ {1, . . . , b}. p0 −p1 bp1 ≥ p0 log b − p0 log 1 + ≥ p0 log b − p0 log 1 + ≤ (58) p0 −p1 bp1 bp1 p0 − bp1 p0 − p0 . b−1 (p0 − p1 ) b } | {z } | {z ≤1 (59) ≤p0 (60) 6 Next note that p0 can be upper-bounded as p0 = e−(b−1)cE − e−η−bcE =e −(b−1)cE 1−e −η−cE ≤ 1 − e−η−cE ≤ η + cE, (61) (62) (63) (64) where the last step follows from (34). It can also be lowerbounded as p0 = e−(b−1)cE − e−η−bcE ≥ e−bcE − e−η−bcE (65) (66) − e−η ) = e|−bcE {z } (1 | {z } (67) ≥1−bcE ≥η− 12 η 2 1 ≥ (1 − bcE) η − η 2 , 2 (68) where the last inequality follows from (34) and (35), and from the fact that the first multiplicand on its right-hand side is (in fact, both multiplicands are) nonnegative for E satisfying (25) and other parameters chosen accordingly. Also note that p1 can be upper-bounded as p1 = e−η−(b−1)cE − e−η−bcE e−cE = |e−η−(b−1)cE {z } |1 −{z } ≤1 (69) (70) ≤cE ≤ cE. (71) Using (64), (68) and (71) we can continue the chain of inequalities (60) to further lower-bound C̃(E, c, b, η) as C̃(E, c, b, η) 1 ≥ (1 − bcE) η − η 2 log b 2 − (η + cE) log 1 + − η − cE bcE (1 − bcE) η − 21 η 2 ! Here, (73) follows from the fact that, for all α ≥ 0 and β ≥ 1, log(1 + αβ) ≤ log(β + αβ) = log β + log(1 + α), (75) with the choices β= 1 , (1 − bcE) 1 − 21 η V. P ROOF OF THE S OFT-D ECISION -PPM L OWER B OUND (19) Consider the following soft-decision PPM scheme: Scheme 2. The transmitter performs the same PPM as in Scheme 1. The receiver maps the b output symbols to a “super symbol” ŷ that takes value in {1, . . . , b, ?} ∪ {i, j} ⊂ {1, . . . , b} . The mapping rule is as follows. Take ŷ = i if yi is the unique nonzero term in {y1 , . . . , yb }; take ŷ = {i, j} if yi and yj are the only two nonzero terms in {y1 , . . . , yb }; if there are more than two or no nonzero term in {y1 , . . . , yb }, take ŷ = ?. Proposition 2. Under the conditions (25), Scheme 2 achieves photon efficiency CPE-PPM(SD) (E, c) cE η log b − cη log b − 1 + log(1 + c) ≥ 1− 2 η η o cE cE n log(1 − cη) + log 1 − −1− + 1+ η 2 η c2 E 2 b η (b − 1) cE − (1 − cη) log + 1− 2 2 2 c2 − η − c(η + cE) (78) 2 where b and η are given in (26) and (28), respectively. (72) bcE 1 2 ≥ (1 − bcE) η − η log b − (η + cE) log 1 + 2 η n η o + (η + cE) log(1 − bcE) + log 1 − 2 − η − cE (73) 1 2 bcE ≥ η − η log b − bcEη log b − (η + cE) log 1 + 2 η n η o + (η + cE) log(1 − bcE) + log 1 − 2 − η − cE. (74) bcE , α= η which indeed satisfy α ≥ 0 and β ≥ 1; and (74) follows by dropping the term 21 bcEη 2 log b, which is positive. Combining (28), (51), and (74) yields (29). Proof of (19) using Proposition 2: Simplifying (78) we obtain 1 1 lim CPE-PPM(SD) (E, c) − log + log log E E E↓0 3 ≥ − log(1 + c) − . (79) 2 This immediately yields (19). Proof of Proposition 2: We compute the transition matrix of the super channel that results from Scheme 2. We first note Ŵ (i|i) = p0 , Ŵ (j|i) = p1 , where p0 and p1 are given in (41) and (47), respectively. For the remaining elements of the transition matrix we have Ŵ {i, j}i = Pr[Yi ≥ 1|Xi = η] Pr[Yj ≥ 1|Xj = 0] · (76) (77) i, j ∈ {1, . . . , b}, i 6= j, (80) (81) b Y Pr[Yℓ = 0|Xℓ = 0] (82) ℓ=1 ℓ∈{i,j} / = (1 − e−η−cE )(1 − e−cE ) e−(b−2)cE , p2 , {i, j} ⊂ {1, . . . , b}; (83) (84) 7 Ŵ {j, k}i = Pr[Yi = 0|Xi = η] Pr[Yj ≥ 1|Xj = 0] · Pr[Yk ≥ 1|Xk = 0] b Y · Pr[Yℓ = 0|Xℓ = 0] (85) = e−η−cE (1 − e−cE )2 e−(b−3)cE , p3 , {i, j, k} ⊂ {1, . . . , b}; (86) (87) ℓ=1 ℓ∈{i,j,k} / Ŵ (?|i) = 1 − p0 − (b − 1)p1 − (b − 1)p2 (b − 1)(b − 2) p3 − 2 , p4 , i ∈ {1, . . . , b}. (88) (89) bp2 2p2 + (b − 2)p3 b = (b − 1)p2 log − (b − 1)p2 2 (b − 1)p2 log log | p2 + b−2 2 p3 p2 {z } (97) (b−2)p (b−2)p =log 1+ 2p 3 ≤ 2p 3 b (b − 1)(b − 2) p3 ≥ (b − 1)p2 log − 2 2 2 b b ≥ (b − 1)p2 log − p3 . 2 2 2 2 (98) (99) We lower-bound the fourth term on the right-hand side of (96) as We now have, irrespective of the distribution of X̃, 1 1 + (b − 1)p1 log p0 p1 1 (b − 1)(b − 2) 1 + (b − 1)p2 log + p3 log p2 2 p3 1 + p4 log . (90) p4 H(Ŷ |X̃) = p0 log Choosing a uniform X̃ (which is optimal as in Section IV), we have the following distribution on Ŷ p0 + (b − 1)p1 , i ∈ {1, . . . , b}; (91) b 2p2 + (b − 2)p3 {i, j} = , {i, j} ⊂ {1, . . . , b}; (92) b PŶ (?) = p4 . (93) PŶ (i) = PŶ side of (96) as Therefore (b − 1)(b − 2) bp3 p3 log 2 2p2 + (b − 2)p3 2p2 + (b − 2)p3 (b − 1)(b − 2) p3 log =− 2 bp | {z 3 } (100) 2(p2 −p3 ) 2p 2(p2 −p3 ) =log 1+ ≤ bp2 ≤ bp bp 3 3 3 (b − 1)(b − 2) 2p2 ≥− · 2 b ≥ −bp2 . (101) (102) Using (34) and (35) we have the upper bound on p2 p2 = (1 − e−η−cE ) (1 − e−cE ) e|−(b−2)cE {z } {z } | {z } | ≤η+cE (103) ≤1 ≤cE ≤ (η + cE)cE, (104) the lower bound on p2 H(Ŷ ) b = p0 + (b − 1)p1 log p0 + (b − 1)p1 b (b − 1)(b − 2) p3 log + (b − 1)p2 + 2 2p2 + (b − 2)p3 1 (94) + p4 log . p4 p2 = (1 − e−η−cE) (1 − e−cE) |e−(b−2)cE {z } | {z } | {z } ≥1−e−η ≥η− η 2 ≥cE− c 2 E2 2 ≥1−bcE 2 η2 c2 E 2 ≥ η− cE − (1 − bcE), 2 2 p3 = |e−η−cE e−cE ) e|−(b−3)cE {z } |(1 −{z {z } } ≤1 I(X̃; Ŷ ) = H(Ŷ ) − H(Ŷ |X̃) bp0 = p0 log p0 + (b − 1)p1 (95) bp1 + (b − 1)p1 log p0 + (b − 1)p1 bp2 + (b − 1)p2 log 2p2 + (b − 2)p3 (b − 1)(b − 2) bp3 + . (96) p3 log 2 2p2 + (b − 2)p3 At this point, note that the first two summands on the righthand side of (96) constitute I(X̃; Ỹ ), which we analyzed in Section IV. It remains to find lower bounds on the last two summands. We lower-bound the third term on the right-hand (106) and the upper bound on p3 2 Using (90) and (94) we have (105) 2 2 ≤c E . ≤cE (107) ≤1 (108) From (96), (99), (102), (104), (106), and (108) we obtain a lower bound on the additional mutual information that is gained by considering output frames with two detection positions: I(X̃; Ŷ ) − I(X̃; Ỹ ) b b2 (109) ≥ (b − 1)p2 log − p3 − bp2 2 2 c2 E 2 b η2 cE − (1 − bcE) log ≥ (b − 1) η − 2 2 2 b2 − c2 E 2 − b(η + cE)cE. (110) 2 8 Dividing the above by η and plugging in (28) we obtain a lower bound on the additional photon efficiency: I(X̃; Ŷ ) − I(X̃; Ỹ ) η c2 E 2 b η (b − 1) cE − (1 − cη) log ≥ 1− 2 2 2 c2 − η − c(η + cE). (111) 2 Adding the right-hand side of (111) to the right-hand side of (29) yields (78). VI. P ROOF OF THE U PPER B OUNDS (20) AND (21) Proposition 3. Assume that E is small enough so that E < e−1 where the supremum is taken over all allowed input distributions. We choose R(·) to be the following distribution: R(y) = D W (·|x)R(·) ∞ X = Poisx+cE (y) · log (112a) (112b) 1 log E1 , y ≥ 1. (120) = e−(x+cE) log − ∞ X (121) 1 1 − (1 + c)E Poisx+cE (y) y=1 (112c) 1 1 − log log − log(1 + c) + 2 + log 13 E E 1 1 log − (1 + c)E + E 1 − (1 + c)E 1 1 c2 + E log log + (1 + c) log . (113) 1 2 E 1− log E1 CPE (E, c) ≤ log Proof of (20) using Proposition 3: The last three summands on the right-hand side of (113) are all of the form o(1). Setting c = 0 in (113) we have (114) (115) Proof of (21) using Proposition 3: From (113) we have 1 1 lim CPE (E, c) − log + log log E↓0 E E ≤ − log(1 + c) + 2 + log 13. (116) Hence 1 1 + log log E E lim lim c→∞ E↓0 log c − log(1 + c) + 2 + log 13 ≤ lim c→∞ log c = −1. 1 log E1 y−1 1 − H(Y |X = x) R(y) y=0 then 1 1 CPE (E, 0) ≤ log − log log + 2 + log 13 + o(1) E E 1 1 = log − log log + O(1). E E (1 + c)E 1 − y=0 For any x ≥ 0 we have 1 − 1−2 e−1 1 e E log < E 1+c 4 1 144 E log < , E (1 + c)2 c)E, 1 − (1 + CPE (E, c) − log ( · log (1 + c)E 1 − 1 log E1 1 log E1 y−1 ) − H(Y |X = x) = e−(x+cE) log (122) ∞ X 1 1 Poisx+cE (y)y log log + 1 − (1 + c)E y=1 E | {z } =E[Y |X=x]=x+cE + ∞ X ! Poisx+cE (y) y=1 | {z =1−e−(x+cE) } 1 1 1 + log 1 (1 + c)E log E 1− 1 log E − H(Y |X = x) 1 1 = e−(x+cE) log + (x + cE) log log 1 − (1 + c)E E + (1 − e−(x+cE)) · log 1 1 1 + log 1 (1 + c)E log E 1− log E1 − H(Y |X = x). · log (123) (124) We can lower-bound H(Y |X = x) as (117) (118) Proof of Proposition 3: Like in [2], we use the duality bound [13] which states that, for any distribution R(·) on the output, the channel capacity satisfies C ≤ sup E D W (·|X)kR(·) , (119) H(Y |X = x) ≥ Hb (e−(x+cE) ) = e−(x+cE)(x + cE) + (1 − e−(x+cE)) log (125) 1 . (126) 1 − e−(x+cE) Using this and (124), we upper-bound D W (·|x)R(·) as 9 ≤ E + log 1 − (1 + c)E 1 − (1 + c)E 1 + (x + cE) log log + (x + cE) log E D W (·|x)R(·) 1 1 + (x + cE) log log 1 − (1 + c)E E + (1 − e−(x+cE) ) ≤ e−(x+cE) log −e · log −(x+cE) 1 1 1 + log 1 (1 + c)E log E 1− 1 log E (x + cE) − (1 − e −(x+cE) ) log 1 1 − e−(x+cE) (127) 1 1 + (x + cE) log log = e−(x+cE) log 1 − (1 + c)E E 1 − e−(x+cE) (x + cE) + (1 − e−(x+cE) ) log 1 | {z } | {z } 1− ≤x+cE ≥cE log E1 {z } | ≥0 1 − e−(x+cE) + (1 − e−(x+cE) ) log (1 + c)E log E1 1 − cE log ≤ |e−(x+cE) {z } 1 − (1 + c)E ≤1 | {z } (128) ≥(1+c)E−cE≥0 + (x + cE) log log 1 + (x + cE) log E (1 − e−(x+cE) ) {z } | + log =(1−e−cE )+(e−cE −e−(x+cE) ) + (e −cE −e −(x+cE) 1− 1 log E1 1 − e−(x+cE) (1 + c)E log E1 1 − cE 1 − (1 + c)E 1 + (x + cE) log log + (x + cE) log E ≤ log 1 (129) 1 1− 1 log E1 1 − e−(x+cE) (1 + c)E log E1 c + (1 − e−cE ) log | {z } (1 + c) log E1 2 | {z } ≥cE− c E 2 (130) 1 1− 1 log E1 + e−cE (1 − e−x ) log 2 ≤− log log + (1 − e−cE ) log | {z } | ≤cE 1 E 1 − e−(x+cE) {zcE } ≤log x+cE cE x ≤ cE 1− 1 log E1 1 − e−(x+cE) + e−cE (1 − e−x ) log (1 + c)E log E1 c2 1 − cE − E 2 log log + x. 2 E (132) Using (119) and (132) together with constraint (3) we have C(E, c) ≤ sup E D W (·|X)kR(·) 1 ≤ E + log − (1 + c)E 1 − (1 + c)E 1 + (1 + c)E log log + (1 + c)E log E (133) 1 1− 1 log E1 1 c2 2 − cE − E log log + E 2 E 1 − e−(X+cE) −cE −X +e sup E (1 − e ) log (134) (1 + c)E log E1 1 = E log log + 2E E 1 − (1 + c)E + log 1 − (1 + c)E 1 1 c2 + E 2 log log + (1 + c)E log 1 2 E 1− log E1 1 − e−(X+cE) −cE −X +e sup E (1 − e ) log . (135) (1 + c)E log E1 Hence the maximum achievable photon efficiency satisfies 1 − e−(x+cE) ) log (1 + c)E log E1 1 − e−(x+cE) + (1 − e−cE ) log (1 + c)E log E1 1 = E + log − (1 + c)E 1 − (1 + c)E 1 + (x + cE) log log + (x + cE) log E 1 (131) CPE (E, c) C(E, c) (136) = E 1 1 1 ≤ log log + 2 + log − (1 + c)E E E 1 − (1 + c)E 2 1 1 c + E log log + (1 + c) log 1 2 E 1− log E1 1 − e−(X+cE) e−cE −X sup E (1 − e ) log . (137) + E (1 + c)E log E1 We next upper-bound the expectation on the right-hand side of (137) as follows 1 − e−(X+cE) E (1 − e−X ) log (1 + c)E log E1 1 − e−X 1 − e−(X+cE) =E X· log (138) X (1 + c)E log E1 X + cE 1 − e−X log ≤E X· (139) X (1 + c)E log E1 ≤ E · sup φ(x) (140) 10 where Plugging (149), (153) and (154) into (143) we have φ(x) , 1−e x −x log x + cE , (1 + c)E log E1 x ≥ 0. (141) We claim that, if E is small enough to satisfy all of the conditions in (112), then 12 sup φ(x) = φ(x ) for some x ≤ . log E1 ∗ ∗ 1 x + cE 1 dφ(x) ≤ − log (155) 1 + x dx 6 (1 + c)E log E 1 1 1 1 ≤− + log(1 + c) − log log x | {z } 6 6 6 E ≥log 12−log log (142) + To show this, we compute the derivative of φ(x) (with respect to x) as −x −1 dφ(x) < 0, dx 12 x> . log E1 (145) (146) Note that this case need not be considered if E ≥ e−12 . In this case the logarithmic term in the derivative is positive: log x x + cE 1 ≥ log (1 + c)E log E (1 + c)E log E1 12 > log 2 (1 + c)E log E1 > 0, (149) where the last step follows because (112a) and (112c) together imply 2 1 1 12 e− 2 E log < . (150) E 1+c Next, using Taylor’s theorem with Cauchy remainder, we have ′ 1 1 (3 − x′ ) e−x 2 (1 + x) e−x − 1 = − + x − x x2 2 3 24{z | } (151) ≥0 1 1 x ≤ − + |{z} 2 3 (152) ≤1 1 ≤− , (153) 6 where x′ ∈ [0, x]. The underbrace in (151) follows because x′ ≤ x < 1. We also have x 1 1 − e−x ≤ ≤ . x(x + cE) x(x + cE) x (154) (157) (158) (159) where the last step follows from (112c). We have now shown dφ(x) < 0, dx 12 < x < 1. log E1 (160) Case 2. Consider x ≥ 1. (161) In this case the logarithmic term in the derivative is still positive: 1 x + cE > 0, 1 ≥ log (1 + c)E log E (1 + c)E log E1 (162) where the last step follows from (112b). We bound the other terms as (1 + x) e−x − 1 1 − 2 e−1 ≤− (163) 2 x x2 which is negative, and (147) (148) 1 E < 0, log 12 < x < 1. log E1 (156) 1 1+c 1 1 1 1 log + log log − log 6 12 3 E 12 E 4 ! 1 1 (1 + c)2 = log E log 12 144 E We do this separately for two cases. Case 1. Consider 1 x |{z} ≤ log dφ(x) = 0. (144) dx To prove (142), it thus suffices to show that, under the assumptions (112), 1 1 log log + 6 E 1 ≤ 12 log −x 1−e x + cE + x(x + cE) (1 + c)E log E1 (143) which exists and is continuous for all x > 0. Further note that it is negative for large enough x, and is positive for small enough x. Hence x∗ must be achieved at a point where (1 + x) e dφ(x) = dx x2 1 E 1 − e−x 1 ≤ 2. x(x + cE) x (164) Using (162), (163) and (164) together with (143) we have dφ(x) 1 1 − 2 e−1 ≤− log(x + cE) + log 1 {z } | dx x2 (1 + c)E log E ≥0 1 + 2 x 1 − 2 e−1 1 ≤− log x2 (1 + c)E log E1 1 − 2 e−1 1 + · x2 1 − 2 e−1 =− < 0, (165) (166) − 1−21e−1 e 1 − 2 e−1 log 2 x (1 + c)E log E1 (167) (168) where the last step follows from (112b). Hence we have shown dφ(x) < 0, x ≥ 1. (169) dx Combining (160) and (169) proves (142) for all E satisfying (112). 11 We now proceed to upper-bound sup φ(x): sup φ(x) = φ(x∗ ) ∗ 1 − e−x x∗ + cE = log ∗ (1 + c)E log E1 | x{z } (140), and is given by (170) (171) ≤1 x∗ + cE (1 + c)E log E1 12 + cE log E1 ≤ max 0, log . (1 + c)E log E1 ≤ max 0, log (172) 1 . log E1 where φ(·) is given in (141) and sup φ(x) is computed numerically. The on-off-keying lower bound is obtained by computing the mutual information of the channel with “on” signal equaling η, and with the receiver ignoring multiple detected photons. This is given by (174) CPE-OOK (E, c) E 1 E Hb (q) − · Hb (e−η−cE ) − 1 − Hb (e−cE ) , ≥ E η η (181) Here (176) follows because conditions (112a) and (112c) imply (150). Combining (137), (140) and (177), and noting e−cE ≤ 1 (together with the fact that the right-hand side of (177) is positive), we obtain CPE (E, c) 1 1 1 log − (1 + c)E ≤ log log + 2 + E E 1 − (1 + c)E c2 1 1 + E log log + (1 + c) log 1 2 E 1− log E1 1 1 + log − 2 log log − log(1 + c) + log 13 (178) E E 1 1 = log − log log − log(1 + c) + 2 + log 13 E E 1 1 log − (1 + c)E + E 1 − (1 + c)E c2 1 1 + E log log + (1 + c) log . (179) 1 2 E 1− log E1 AND (180) (173) We can now continue the chain of inequalities (173) as ) ( 13 (175) sup φ(x) ≤ max 0, log 2 (1 + c)E log E1 13 = log (176) 2 (1 + c)E log E1 1 1 = log − 2 log log − log(1 + c) + log 13. (177) E E VII. N UMERICAL C OMPARISON R EMARKS 1 1 1 log − (1 + c)E ≤ log log + 2 + E E 1 − (1 + c)E 1 1 c2 + E log log + (1 + c) log 1 2 E 1− log E1 + e−cE sup φ(x), Now note that, due to (112b), we have cE ≤ CPE (E, c) C ONCLUDING We numerically compare the approximation (23) with nonasymptotic upper and lower bounds on photon efficiency. Specifically, the plotted upper bound is based on (137) and where Hb (·) denotes the binary entropy function Hb (a) = a log 1 1 + (1 − a) log , a 1−a a ∈ [0, 1], (182) and where E E −η−cE e−cE . + 1− q, e η η (183) The simple-PPM lower bound is computed using (51) and (57): 1 bp0 CPE-PPM (E, c) ≥ p0 log η p0 + (b − 1)p1 bp1 . (184) + (b − 1)p1 log p0 + (b − 1)p1 The soft-decision-PPM lower bound is computed using (96) and is given by CPE-PPM(SD) (E, c) 1 bp0 ≥ p0 log η p0 + (b − 1)p1 bp1 p0 + (b − 1)p1 bp2 + (b − 1)p2 log 2p2 + (b − 2)p3 bp3 (b − 1)(b − 2) . (185) p3 log + 2 2p2 + (b − 2)p3 + (b − 1)p1 log In all these lower bounds, the parameters b and η are chosen as in (26) and (28), and p0 , p1 , p2 , p3 , and p4 are given in (41), (47), (84), (87), and (89), respectively. We plot these bounds for c = 0.1, c = 1, and c = 10 in Figure 1. The figure shows the following. • The approximation (23) is reasonably accurate for small enough E. • The on-off-keying and the soft-decision-PPM bounds are consistently close to the approximation (23). 12 As c increases, the simple-PPM bound deviates significantly from the other lower bounds, and hence also from the actual value of CPE (E, c). The capacity bounds as well as the asymptotic results in this paper provide insights to how technical restrictions and device or channel imperfections influence the communication rates in optical channels in the wideband regime. Our results and techniques can also be applied to secret communication and key distribution over optical channels [14]. It would be interesting to see if these are further extendable, e.g., to multiple-mode optical channels and optical networks. As we have demonstrated, PPM is nearly optimal on the Poisson channel in the high-photon-efficiency regime. When c is small, the simple-PPM super channel has high erasure probability but low “error” (by which we mean the receiver detects a single pulse at a position that is different from the transmitted signal) probability. In this case, Reed-Solomon codes can perform rather close to the theoretical limit. However, when c > 1, Reed-Solomon codes can no longer achieve any positive rates on this channel. Nevertheless, we believe that, for c > 1, PPM still has its advantages over on-off keying in terms of code design. This is because the optimal input distribution for (both simple and soft-decision) PPM is uniform, whereas the optimal input distribution for on-off keying for this channel is highly skewed. The uniformity of PPM inputs allows the usage of more structured codes, in particular linear codes. One possible direction, for instance, is to employ the idea of multilevel codes [15], [16] on this channel. • 20 Photon Efficiency (nats/photon) Upper bound 18 Approximation (23) On-off keying 16 Soft-decision PPM 14 Simple PPM 12 10 8 6 4 2 0 −8 10 −7 10 −6 10 −5 E −4 10 10 −3 10 (a) Case c = 0.1. Note that the lowest three curves almost overlap. Photon Efficiency (nats/photon) 20 Upper bound 18 Approximation (23) On-off keying 16 Soft-decision PPM 14 Simple PPM 12 10 8 6 4 2 0 −8 10 −7 10 −6 10 −5 E −4 10 10 −3 10 (b) Case c = 1. The on-off keying and soft-decision PPM curves overlap. A PPENDIX In this appendix we provide a proof for the converse part of (8) that is much simpler than the one given in [2]. Using the same arguments as in [2], we know that C(E, c) is monotonically increasing in c. Hence it suffices to show C(E, 0) ≤ 1. (186) lim E↓0 E log 1 E This can be verified as follows: C(E, 0) = max I(X; Y ) E[X]≤E Photon Efficiency (nats/photon) 20 Upper bound 18 On-off keying 16 (188) = max H(Y ) (189) = (1 + E) log(1 + E) − E log E, (190) E[Y ]≤E Soft-decision PPM 14 ≤ max H(Y ) E[X]≤E Approximation (23) Simple PPM (187) 12 which yields the desired asymptotic bound. Here, (190) follows from the well-known fact that, for a nonnegative discrete random variable under a first-moment constraint, maximum entropy is achieved by the geometric distribution, and is given by the right-hand side of (190). Note that the right-hand side of (190) is exactly the capacity of the pure-loss bosonic channel [7]. 10 8 6 4 2 0 −8 10 −7 10 −6 10 −5 E 10 −4 10 −3 10 (c) Case c = 10. Fig. 1. Comparing the approximation (23) to nonasymptotic upper bound, and to lower bounds for simple PPM, soft-decision PPM, and on-off keying for three cases: c = 0.1, c = 1, and c = 10. ACKNOWLEDGMENTS The authors thank Hye Won Chung, Yuval Kochman, Amos Lapidoth, Arya Mazumdar, and Lizhong Zheng for helpful discussions, and the anonymous reviewers for their careful reading of the manuscript. 13 R EFERENCES [1] J. H. Shapiro, “The quantum theory of optical communications,” IEEE J. Select. Top. Quantum Electron., vol. 15, no. 6, pp. 1547–1569, 2009. [2] A. Lapidoth, J. H. Shapiro, V. Venkatesan, and L. Wang, “The discretetime Poisson channel at low input powers,” IEEE Trans. Inform. Theory, vol. 57, no. 6, pp. 3260–3272, June 2011. [3] J. Pierce, “Optical channels: practical limits with photon counting,” IEEE Trans. Commun., vol. 26, no. 12, pp. 1819–1821, Dec. 1978. [4] J. L. Massey, “Capacity, cutoff rate, and coding for direct-detection optical channel,” IEEE Trans. Commun., vol. 29, no. 11, pp. 1615–1621, Nov. 1981. [5] S. Shamai (Shitz), “Capacity of a pulse amplitude modulated direct detection photon channel,” in Proc. IEE, vol. 137, pt. I (Communications, Speech and Vision), no. 6, Dec. 1990, pp. 424–430. [6] A. Lapidoth and S. M. Moser, “On the capacity of the discrete-time Poisson channel,” IEEE Trans. Inform. Theory, vol. 55, no. 1, pp. 303– 322, Jan. 2009. [7] V. Giovannetti, S. Guha, S. Lloyd, L. Maccone, J. H. Shapiro, and H. P. Yuen, “Classical capacity of the lossy bosonic channel: the exact solution,” Phys. Rev. Lett., vol. 92, no. 2, p. 027902(4), 2004. [8] R. G. Gallager, “Energy limited channels: Coding, multiaccess, and spread spectrum,” Laboratory for Information and Decision Systems, MIT, Tech. Rep. 1714, Nov. 1987. [9] S. Verdú, “On channel capacity per unit cost,” IEEE Trans. Inform. Theory, vol. 36, pp. 1019–1030, Sept. 1990. [10] H. W. Chung, S. Guha, and L. Zheng, “On capacity of optical channels with coherent detection,” in Proc. IEEE Int. Symp. Inform. Theory, Saint Petersburg, Russia, July 31–August 5 2011. [11] Y. Kochman and G. W. Wornell, “On high-efficiency optical communication and key distribution,” in Proc. Inf. Theory and Appl. Workshop, San Diego, CA, USA, Feb. 5–10 2012. [12] S. Dolinar, K. M. Birnbaum, B. M. Erkmen, and B. Moision, “On approaching the ultimate limit of photon-efficient and bandwidth efficient optical communication,” in Int. Conf. Space Optical Sys.and App., Santa Monica, CA, USA, May 11—13 2011. [13] A. Lapidoth and S. M. Moser, “Capacity bounds via duality with applications to multiple-antenna systems on flat fading channels,” IEEE Trans. Inform. Theory, vol. 49, no. 10, pp. 2426–2467, Oct. 2003. [14] Y. Kochman, L. Wang, and G. W. Wornell, “Toward photon-efficient key distribution over optical channels,” 2013, subm. to IEEE Trans. Inform. Theory. [15] H. Imai and S. Hirakawa, “A new multilevel coding method using error correcting codes,” IEEE Trans. Inform. Theory, vol. 23, pp. 371–377, 1977. [16] U. Wachsmann, R. F. H. Fischer, and J. B. Huber, “Multilevel codes: Theoretical concepts and practical design rules,” IEEE Trans. Inform. Theory, vol. 45, no. 5, pp. 1361–1391, July 1999.