1 Asynchronous Convolutional-Coded Physical-Layer Network Coding Qing Yang, Student Member, IEEE, and Soung Chang Liew, Fellow, IEEE Abstract—This paper investigates the decoding process of asynchronous convolutional-coded physical-layer network coding (PNC) systems. Specifically, we put forth a layered decoding framework for convolutional-coded PNC consisting of three layers: symbol realignment layer, codeword realignment layer, and joint channel-decoding network coding (Jt-CNC) decoding layer. Our framework can deal with phase asynchrony (phase offset) and symbol arrival-time asynchrony (symbol misalignment) between the signals simultaneously transmitted by multiple sources. A salient feature of this framework is that it can handle both fractional and integral symbol misalignments. For the decoding layer, instead of Jt-CNC, previously proposed PNC decoding algorithms (e.g., XOR-CD and reduced-state Viterbi algorithms) can also be used with our framework to deal with general symbol misalignments. Our Jt-CNC algorithm, based on belief propagation (BP), is BER-optimal for synchronous PNC and near optimal for asynchronous PNC. Extending beyond convolutional codes, we further generalize the Jt-CNC decoding algorithm for all cyclic codes. Our simulation shows that Jt-CNC outperforms the previously proposed XOR-CD algorithm and reduced-state Viterbi algorithm by 2 dB for synchronous PNC. For both phaseasynchronous and symbol-asynchronous PNC, Jt-CNC performs better than the other two algorithms. Importantly, for real wireless network experimentation, we implemented our decoding algorithm in a PNC prototype built on the USRP software radio platform. Our experiment shows that the proposed Jt-CNC decoder works well in practice. Keywords—Physical-layer network coding; convolutional codes; symbol misalignment; phase offset; joint channel-decoding and network coding; cyclic codes. I. I NTRODUCTION HIS paper investigates the use of convolutional codes in asynchronous physical-layer network coding (PNC) systems to ensure reliable communication. In particular, we focus on the decoding problem when simultaneous signals from multiple transmitters arrive at a PNC receiver with asynchronies between them. PNC was first proposed in [1] as a way to exploit network coding [2], [3] at the physical layer. In the simplest PNC setup, two users exchange information via a relay in a two-way relay network (TWRN). The two users transmit their messages simultaneously to the relay; the relay then maps the overlapped signals to a network-coded message and broadcasts it to the two users; and each of the two users recovers the message from the other user based on the network-coded message and the knowledge of its own message. PNC can potentially boost the throughput of TWRN by 100% compared with a traditional relay system [1]. T The authors are with the Department of Information Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong (email: {yq010, soung}@ie.cuhk.edu.hk). This work is partially supported by AoE grant E-02/08 and the General Research Funds Project No. 414911, established under the University Grant Committee of the Hong Kong Special Administrative Region, China. This work is also partially supported by the China 973 Program, Project No. 2012CB315904 and the China NSFC grant, Project No. 61271277. Our paper focuses on PNC decoding as applied to TWRN. To ensure reliable transmission, communication systems make use of channel coding to protect the information from noise and fading. In channel-coded PNC, the goal of the relay is to decode the simultaneously received signals not into the individual messages of the two users, but into a network-coded message. This process is referred to as the channel-decoding network coding (CNC) process in [4]. In addition to the issue of channel coding, in practice, the signals from the two users may be asynchronous in that there may be relative symbol arrival-time asynchrony (symbol misalignment), phase asynchrony (phase offset), and other asynchronies between the two signals received at the relay. These PNC systems are referred to as asynchronous PNC (APNC) systems [5]. Both [4] and [5] assume the use of repeat accumulate (RA) codes. Our current paper, on the other hand, focuses on the use of convolutional codes. A main motivation is that convolutional codes are commonly adopted in many communications systems (e.g., the channel code in IEEE 802.11 is a convolutional code [6]). Convolutional codes have been well studied and there are many good designs for the encoding/decoding of convolutional codes in the conventional communication setting. Given this backdrop, whether these designs are still applicable to PNC, and what additional considerations and modifications are needed for PNC, are issues of utmost interest. This paper is an attempt to address these issues. Our main contributions are as follows: • We put forth a layered decoding framework for asynchronous PNC system. The proposed decoding framework can deal with synchronous PNC as well as asynchronous PNC with relative phase offset and general symbol misalignment—by general symbol misalignment, we mean that the arrival times of the two users’ signals at the relay are offset by (τI + τF ) symbol durations, where τI is an integral offset and τF is a fractional offset smaller than one. With our framework, the previous decoding algorithms can also be used to deal with asynchronous PNC. • We design a joint channel-decoding network coding (JtCNC) decoder for convolutional-coded PNC. The Jt-CNC decoder, based on belief propagation (BP), is optimal in terms of bit error rate (BER) performance. We analyze the BER of our Jt-CNC decoder mathematically and derive an approximate expression for the BER. • We implement the Jt-CNC decoder in a real PNC system built on USRP software radio platform. Our experiment shows that the Jt-CNC decoder works well under real wireless channel. • We propose an algorithm that can handle general symbol misalignment in cyclic-coded PNC, building on the insight obtained from our study of convolutional-coded PNC; that is, the algorithm is applicable to all cyclic codes, not just convolutional codes. 2 The remainder of this paper is organized as follows. Section II overviews related work. Section III describes the PNC system model. Section IV puts forth our Jt-CNC framework, focusing on synchronous PNC. Section V extends the Jt-CNC framework to asynchronous PNC. We further show how the algorithmic framework is applicable to the general cyclic-coded PNC in the Appendix. Section VI presents simulations and experimental results together with the BER analysis for the Jt-CNC decoder. Section VII concludes this work. II. R ELATED W ORK A. Synchronous PNC with Convolutional Codes The first implementation of TWRN based on the principle of PNC was recently reported in [7], [8]. This system employs the convolutional code defined in the 802.11 standard and adopts the OFDM modulation to eliminate symbol misalignment [9]. In [7], [8], first the log-likelihood ratio (LLR) of the XORed channel-coded bits is computed; then this soft information is fed to a conventional Viterbi decoder. We refer to this decoding strategy as the soft XOR and channel decoding (XOR-CD) scheme [10]. Detailed explanation and interpretation of the XOR-CD algorithm are given in Appendix III-A. The experiment shows that the use of XOR-CD on the convolutional-coded PNC system, thanks to its simplicity, is feasible and practical. The acronym XOR-CD refers to a two-step process: first, prior to channel decoding and without considering the correlations among the received symbols due to the channel code, we apply symbol-by-symbol PNC mapping on the received symbols to obtain estimates on the successive XORed bits; after that, we perform channel decoding on the XORed bits to obtain the XORed source bits. The performance of XORCD is suboptimal because the PNC mapping in the first step loses information [4]. Furthermore, only linear channel codes can be correctly decoded in the second step. Jt-CNC, on the other hand, performs channel decoding and network coding as an integrated process rather than two disjoint steps. Jt-CNC can be ML (maximum likelihood) optimal, depending on which variations of Jt-CNC we use and whether the underlying PNC system is synchronous or asynchronous. Within the class of Jt-CNC algorithms, for optimality, there are two possible decoding targets: (i) ML XORed packet; (ii) ML XORed bits. To draw an analogy, for the conventional single-user point-to-point communication, if convolutional codes are used, then the Viterbi algorithm [11] aims to obtain the ML packet, while the BCJR [12] aims to obtain ML bits. For PNC systems, the aim is to obtain the network-coded packet or the network-coded bits instead. A Jt-CNC algorithm for finding the ML XORed packet was proposed in [13]. However, as will be discussed later, finding the ML XORed packet requires exhaustive search that could have prohibitively high complexity. Therefore, the logmax approximation is adopted in [13] and the ML algorithm is simplified to (approximated with) a full-state Viterbi (FSV) algorithm. The detailed explanation and interpretation of the full-state Viterbi algorithm are given in Appendix III-B. The term “full-state” comes from the fact that this algorithm combines the trellises of both end nodes to make a virtual decoder. By searching the best path on the combined trellis with the Viterbi algorithm, [13] tries to decode the ML pair of packets of the two end nodes. To further reduce the complexity, [13] simplifies the full-state Viterbi algorithm to a “reduced-state” Viterbi algorithm. Reference [13], however, did not benchmark their approximate algorithm with the optimal one. As we will show later, the algorithm proposed by us in this paper can yield better performance than that in [13]. In this paper, we aim to find the ML XORed bits within the source packet rather than the overall ML XORed packet. In Section IV we show that our algorithm is XOR bit-optimal for synchronous PNC. Finding ML XORed bits turns out to have much lower complexity than finding the ML XORed packet. This is quite different from the conventional pointto-point communication system, in which the simple Viterbi algorithm can be used to decode the ML packet, and in which BCJR (slightly more complex than the Viterbi algorithm) can be used to decode the ML bits. The XOR bit-optimal decoding for synchronous PNC is investigated in our prior published work [14]. B. Asynchronous PNC with Convolutional Codes In asynchronous PNC systems, the signals from the two end nodes may arrive at the relay with symbol misalignment and relative phase offset [5]. To our best knowledge, there was no Jt-CNC decoder for convolutional codes that can deal with integral-plus-fractional symbol misalignment. In [15], a convolutional decoding scheme with an XOR-CD algorithm was proposed to deal with integral symbol misalignment. As pointed out in [15], symbol misalignment entangles the channelcoded bits of the trellises of the two encoders in a way that ordinary Viterbi decoding, based on just one of the trellises, is not applicable. Therefore the XOR-CD algorithm for synchronous PNC cannot be applied anymore in the presence of integral symbol misalignment. Their solution is to rearrange the transmit order of the channel-coded bits into blocks, and pad Dmax zeros between adjacent blocks. The zero padding acts as a guard interval between blocks that avoids the entanglement of channel-coded bits and facilitates Viterbi decoding. However, this scheme can only deal with integral symbol misalignment of at most Dmax symbols. In addition, it incurs a code-rate loss due to the zero padding between blocks. C. Asynchronous PNC with Other Channel Codes The use of LDPC codes in asynchronous PNC systems has previously been considered. In [5], the authors designed a JtCNC decoder for the RA code that can deal with fractional symbol misalignment (i.e., symbol misalignment that is less than one symbol duration) and phase offset. Our decoding framework adopts the over-sampling technique proposed in [5] to address fractional symbol misalignment. To deal with asynchrony in PNC, our decoding framework consists of three layers: symbol-realignment layer, codewordrealignment layer, and joint channel-decoding network coding (Jt-CNC) layer. The first two layers, symbol realignment and codeword realignment, counter fractional and integral symbol misalignments, respectively; the third layer, Jt-CNC, decodes the ML XORed bits. Other decoding schemes (e.g., XOR-CD, reduced-state Viterbi) can also be used in the third layer of the framework. We further show that our decoding framework is not only applicable when convolutional codes are adopted, it is also applicable when general cyclic codes are used. Besides convolutional codes, an important class of cyclic codes is the cyclic LDPC. The Jt-CNC LDPC decoder proposed in [5] was extended by [16] to deal with general asynchrony using cyclic LDPC. However, the proposed decoder in [16] discards the non-overlapped part of the received signal, losing useful information that can potentially enhance performance. 3 Uplink Therefore, for the decoder in [16], the larger the symbol misalignment, the worse the performance. By contrast, our framework makes full use of the non-overlapped portion of the signal so that the larger symbol misalignment can enhance performance. D. Two-Way Relay Network with Other Techniques The goal of our Jt-CNC decoder proposed in this paper is to decode the XOR of the end nodes’ messages in the uplink phase. The relay then broadcasts the XORed message to both end nodes in the downlink phase. Each end node XORs the XORed message with its own message to obtain the other end node’s message. For two-way relay network system, we have other rate-improving techniques other than PNC such as superposition coding and hierarchical modulation [17], [18]. Both superposition coding and hierarchical modulation are for downlink. If they are to be used, the relay has to decode the individual messages U A and U B instead of their XOR in the uplink. For FSV and Jt-CNC, we can also get the individual messages, and then use superposition coding or hierarchical modulation rather than network coding (NC) for the downlink. For both superposition coding and hierarchical modulation, the self information at the two end nodes is not used to do the decoding in the downlink. In other words, some available information is not exploited. As a result, the overall achievable rates (from an information-theoretic viewpoint) will not be as good as those achieved by the NC schemes. III. S YSTEM M ODEL We consider the application of PNC in a two-way relay network (TWRN) as shown in Fig. 1. In this model, nodes A and B exchange information with the help of relay node R. We assume that all nodes are half-duplex and there is no direct link between A and B. With PNC, nodes A and B exchange one packet with each other in two time slots. The first time slot corresponds to an uplink phase, in which node A and node B transmit their channelcoded packets simultaneously to relay R. The relay R then constructs a network-coded packet based on the simultaneously received signals from A and B. This operation is referred to as the channel decoding network coding (CNC) process [10], because the received signals are decoded into a network-coded message rather than the individual messages from A and B. The second time slot corresponds to a downlink phase, in which relay R channel-encodes the network-coded message and broadcasts it to both A and B. Upon receiving the networkcoded packet, A (B) then attempts to recover the original packet transmitted by B (A) in the uplink phase using self-information [1]. This paper focuses on the design of the CNC algorithm in the uplink phase; the issue in the downlink phase is similar to that in conventional point-to-point transmission and does not require special treatment [10]. As shown in Fig. 1, in the uplink phase, the source packets of nodes A and B each goes through a convolutional encoder, an interleaver, and a modulator. Our framework can accommodate all different types of convolutional codes. We adopt zero-tail convolutional codes1 in the main text of this paper. We denote the source packets of node A and node B by two K-bit binary sequences: (1) U i = ui1 , ui2 , · · · , uiK , i ∈ {A, B} , 1 The use of other convolutional codes (e.g., tail-biting [19] and recursive) is discussed in the Appendix II. A Interleaver Π (•) CA U A R Downlink Encoder (•) CA Modulator M(•) B WR XA + XB Y R Demodulator M-1(•) Modulator M(•) CB Interleaver Π (•) CB Encoder (•) UB Deinterleaver Π-1 (•) Decoder C-1(•) Û R Fig. 1. System model of two-way relay network operated with physical-layer network coding. where uik is the input bit of end nodes i’s source packet at time k. The source packets are encoded into two M -bit channelcoded binary sequences. We assume nodes A and B use the same convolutional code with code rate r. In the following presentation we choose r = 1/R where R is an integer as an example. For concreteness, let us consider R = 3. Thus, M = 3K. The two channel-coded packets are C i = ci1 , ci2 , · · · , ciM = c̄i1 , c̄i2 , · · · , c̄iK = ci1,1 , ci1,2 , ci1,3 , ci2,1 , ci2,2 , · · · , ciK,1 , ciK,2 , ciK,3 , i ∈ {A, B} , (2) where cik,j is the jth channel-coded bit of end nodes i’s channelcoded packet at time k; the 3-bit tuple c̄ik = (cik,1 , cik,2 , cik,3 ) is the output of the convolutional encoder of node i at time k. Then, C A and C B are fed into their respective block interleavers that realize the same permutation to produce C̃ i = ci1,1 , ci2,1 , · · · , ciK,1 ; ci1,2 , · · · , ciK,2 ; ci1,3 , · · · , ciK,3 , i ∈ {A, B} . (3) Note that after the permutation, the jth coded bits of all the source bits are grouped into a block. There are altogether three blocks. Finally, C̃ i are modulated to produce the two sequences of N complex symbols: (4) X i = xi1 , xi2 , · · · , xiN , i ∈ {A, B} . Throughout this paper, we focus on BPSK and QPSK modulations; our framework can be easily extended to higher order i constellations [20]–[22]. For BPSK √ N =3K and xn ∈ {1, −1}. i For QPSK N =3K/2 and xn ∈1/ 2 {1+j, −1+j, 1−j, −1−j}. The complex symbol sequences X A and X B are shaped using a pulse shaping function p(t) with symbol duration T and transmitted. Without loss of generality, we assume p(t) is the rectangular pulse throughout this paper. Let us denote the channel coefficients of the channels from node A and node B to relay R by hA and hB , respectively. Both hA and hB are complex numbers, whose phase difference φ = (hB /hA ) is the relative phase offset between node A and node B. We assume that the channel state information (CSI) hA and hB can be estimated at the relay R using preambles. Node A and node B use different pseudo-noise (PN) sequences that have 4 good cross-correlation property (e.g., Gold sequence) as their preambles. Upon receiving the superposed packet, the relay cross-correlates the received signal with node A’s preamble to estimate the channel coefficient hA . Since node A and node B use different PN sequences as preambles, the influence of node B’s signal is removed by the cross-correlation. The relay also estimates hB using the same method. The received complex baseband signal at the relay is y R (t)= N B B R hA x A n p (t−nT ) + h xn p (t−nT −τ T ) +w (t), n=1 (5) where τ T is the symbol misalignment (i.e., the arrival time of the signal of B lags the arrival time of the signal of A by τ T ). The relay can estimate the symbol misalignment using the two PN preambles. First, the relay cross-correlates the received signal with node A’s preamble to locate the first sample of A’s packet; it then cross-correlates the received signal with node B’s preamble to locate the first sample of B’s packet. Finally, the relay calculates their difference to estimate τ . This method works even if the end nodes’ preambles are partially overlapped due to the good cross-correlation property of the PN preambles. The noise term wR (t) is assumed to be circularly complex with variance σ 2 . We assume the symbol misalignment to consist of two parts: an integral part τI ∈ N, and a fractional part τF ∈ [0, 1) so that τ = τI + τF . IV. S YNCHRONOUS C ONVOLUTIONAL -C ODED PNC Let us first focus on synchronous convolutional-coded PNC, where the signals of node A and node B are symbol-aligned (τ = 0). Section V will discuss the asynchronous case. We first derive the XOR packet-optimal Jt-CNC algorithm that aims at finding the ML XORed source packet. We show that such an XOR packet-optimal algorithm has prohibitively high complexity. Then we introduce our XOR bit-optimal Jt-CNC algorithm that finds the ML XORed bits, which has much lower complexity. A. XOR Packet-Optimal Decoding of Synchronous Convolutional-Coded PNC In the case of synchronous convolutional-coded PNC, the received baseband signal at relay R is obtained by setting symbol misalignment τ to zero in (5): y R (t) = N B B R hA x A n p(t − nT ) + h xn p(t − nT ) + w (t). n=1 (6) After matched filtering [5], the received baseband samples at relay R are R , (7) Y R = y1R , y2R , · · · , yN where B B R ynR = hA xA n + h xn + w n . (8) R R The ML XORed source packet Û R = (ûR 1 , û2 , · · · , ûK ) (i.e., ML XOR of the source packets of node A and node B) is given by Û R = arg max exp −M X A , X B , (9) UR U A ,U B :U A ⊕U B =U R where ⊕ denotes the binary bit-wise XOR operator; X A and X B are the convolutional-encoded and modulated baseband signal of U A and U B , respectively; and M(X A , X B ) is the distance metric defined as follows: N R B B 2 A B y n − hA x A n − h xn = M X ,X 2σ 2 n=1 R Y − hA X A − hB X B 2 2 = . (10) 2σ 2 For source packets U A and U B of length K, the functional mapping from U A and U B to the XORed source packet U R can be expressed as fpacket : {0, 1}K × {0, 1}K → {0, 1}K . (11) The mapping in (11) is a 2K -to-1 mapping; that is, there are 2 possible (U A , U B ) that can produce a particular U R . This is where the complexity lies in (9). For each possible source packet U R , we need to examine 2K possible combinations of (U A , U B ), with each (U A , U B ) associated with one pair of channel-coded signal (X A , X B ). The Viterbi algorithm is a shortest-path algorithm that computes a path in the trellis of (U A , U B ). Meanwhile, each U R is associated with 2K paths in the trellis. There is no known exact computation method for (9) except to exhaustively sum over the possible combinations of (U A , U B ) for each U R . We now consider the computing complexity of the XOR packet-optimal decoding algorithm. For each possible (U A , U B ), we need to sum over N terms in (10) to compute M(X A , X B ). For a code-rate r code and M-QAM modulation, N = K/[rlog2 (M )]. Computing each term in (10) takes six complex operations, and the summation takes (N − 1) operations. Hence the complexity of one combination of (U A , U B ) is (7K/[rlog2 (M )]−1). Moreover, to find the maximum in (9), (2K −1) comparisons are needed. Given that there are 2K possible U R , from which we want to find the optimal Û R , the overall complexity is therefore 22K (7K/[rlog2 (M )] − 1) + 2K − 1. In Big-O notation, the complexity is O(K22K ). This is a big contrast with the situation in the regular point-topoint communication system, in which the Viterbi algorithm for finding the ML codeword has polynomial complexity only. For PNC systems, the complexity of XOR packet-optimal decoding algorithm is exponential with packet length K. K B. XOR Bit-Optimal Decoding of Synchronous ConvolutionalCoded PNC To reduce complexity, we consider an XOR bit-optimal JtCNC decoder based on the framework of Belief Propagation (BP) algorithms. The proposed decoder aims to find the ML XORed source bit rather than the ML XORed source packet. We give two important results: (i) the proposed Jt-CNC decoder is optimal in terms of BER performance; and (ii) the complexity is linear in packet length K. Unlike finding ML XORed packets, for which the Viterbi algorithm is of little use, the BP (similar to BCJR) algorithm can find the ML XORed source bit without incurring exponential growth in complexity. We first explain the reason before describing the BP algorithm in detail. The kth ML XORed source bits ûR k , k = 1, 2, . . . K is given by B R ûR Pr uA (12) k = arg max k , uk |Y , C , uR k uA ,uB :uA ⊕uB =uR k k k k k 5 u2 u1 s0 f1 s1 f2 uK fK uk sK c1 c2 cK J c1 J c2 J cK G uk sk 1 D sk 1 Fig. 2. Tanner graph of the Jt-CNC decoder on which the BP algorithm operates: sk is the joint state variable that captures the states of the convolutional codes at the two users at time k; ūk is the pair of source bit of the two users at time k; c̄k is the group of channel-coded bits of the two users at time k; fi is the factor node that represents the state transition function of the Jt-CNC decoder; γ(c̄k ) is the likelihood function of c̄k . B R where Pr(uA k , uk |Y , C) denotes the a posteriori probability A B of (uk , uk ) given the received signal Y R and the codebook C. We can use the BP algorithm to calculate this probability. Fortunately, finding the ML XORed bits in PNC systems has much lower complexity, because the functional mapping from B R (uA k , uk ) to uk can be expressed as fbit : {0, 1} × {0, 1} → {0, 1}. (13) The mapping in (13) is a 2-to-1 mapping; hence for each A B possible realization of the XOR bit uR k = uk ⊕ uk , we need to examine only two possible realizations of the pair of source B bits (uA k , uk ). Importantly, the BP algorithm can compute A B B R Pr(uk , uk |Y R , C) easily, from which Pr(uA k ⊕ uk |Y , C) can readily be obtained through the 2-to-1 mapping in (13). We now explain the details of our BP algorithm that implements the XOR bit-optimal Jt-CNC decoder. BP is a general framework for generating inference-making algorithms for graphical models, in which there are two kinds of nodes: variable nodes and factor nodes. Each variable node represents a variable, such as the state variable of the convolutional encoder; each factor node indicates the relationship among all variable nodes connected to it. For example the state transition function of a convolutional encoder is represented by a factor node. The goal of BP is to compute the marginal probability distributions B R Pr(uA k , uk |Y , C) for all k. This goal is achieved by means of a sum-product message-passing algorithm [23]. Fig. 2 shows the Tanner graph of our bit-optimal Jt-CNC decoder. Unlike the conventional point-to-point convolutional decoder for single-user systems with only one transmitter, the Jt-CNC decoder combines the states and the trellis of both transmitters A and B. Note here that node A and node B can use different convolutional codes with the same code rate. In Fig. 2, vectors S = (s0 , s1 , · · · , sK ) represents the state variables, where state sk combines the state of both end nodes’ B states; vector U = (ū1 , ū2 , · · · , ūK ), where ūk = (uA k , uk ), represents the “virtual” source packet consisting of the duple of the two source packets from nodes A and B; similarly, B vector C = (c̄1 , c̄2 , · · · , c̄K ), where c̄k = (c̄A k , c̄k ) (as defined i in (2) c̄k denotes the group of channel-coded bits of node i at time k), represents the “virtual” channel-coded packet. The behavior of the decoder is defined by the functions of the factors node fk (sk−1 , ūk , c̄k , sk ) that represents the state transition rule of the trellis. For a trellis transition e = (sk−1 , ūk , c̄k , sk ), fk (e) = 1 if e is a valid transition, and fk (e) = 0 otherwise. For example, if input ūk causes a state transition from sk−1 to sk and the output is c̄k , then fk (e) = 1; on the other hand, if E sk E sk 1 fk D sk sk J ck ck Fig. 3. The messages being passed around a factor node during the operation of the sum-product algorithm. input ūk causes a state transition from sk−1 to a state not equal to sk or the output is not c̄k , then fk (e) = 0. The goal of the Jt-CNC decoder is to find the maximum likelihood XOR bit uR k through the a posteriori probability (APP) Pr(ūk |Y R , C) by R Pr uR k Y ,C = Pr ūk Y R , C , (14) ūk :uA ⊕uB =uR k k k where Pr(ūk |Y R , C) can be computed exactly by the sumproduct message-passing algorithm thanks to the tree structure of the Tanner graph associated with convolutional nodes [24]. The sum-product algorithm, when applied to decode convolutional codes, is the well-known BCJR algorithm [12]. The difference in our situation here is that instead of the source bit from one source, we are decoding for the bit duple B ūk = (uA k , uk ) from the two sources. We now explain our sum-product algorithm in detail. Fig. 3 depicts the messages being passed around a factor node within the overall Tanner graph of Fig. 2. We follow the notation of the original paper on the BCJR algorithm [12]. In the forward direction, the message from sk−1 to fk is denoted by α(sk−1 ), and the message from fk to sk is denoted by α(sk ). In the backward direction, the message from sk to fk is denoted by β(sk ), and the message from fk to sk−1 is denoted by β(sk−1 ). Additionally, γ(c̄k ) denotes the message from c̄k to fk , and δ(ūk ) denotes the message from fk to ūk . Note that δ(ūk ) is the APP Pr(ūk |Y R , C) and the goal here is to compute it. Since the Tanner graph of the Jt-CNC decoder is cyclefree, the operation of the sum-product algorithm consists of two natural recursions according to the direction of message flow in the graph: a forward recursion to compute α(sk ) as a function of α(sk−1 ) and γ(c̄k ); a backward recursion to compute β(sk−1 ) as a function of β(sk ) and γ(c̄k ). The calculation of Pr(ūk |Y R , C) can be divided into three steps: initialization, forward/backward recursion, and termination. We present these three steps in detail below. Initialization As usual in a cycle-free Tanner graph, the sumproduct algorithm begins at the leaf nodes. Since zero-tail convolutional code is used, the initial and terminal states of end node’s convolutional encoders are both zero state. Therefore the 6 x1A x2A x3A x4A x5A Jt-CNC decoder x1B x2B x3B WI 1 0 if sK equals to 0 otherwise. (15a) (15b) 1 u2 fK uK sK cK J c2 J cK Deinterleaver \o Pr x1A , x1B Y R Symbol-realignment layer Demodulated samples The message γ(c̄k ) is the likelihood function of c̄k based on the evidence Y R . For example, if the code rate is 1/3 and the BPSK modulation is used, then c̄k = A A B B B (cA k,1 , ck,2 , ck,3 , ck,1 , ck,2 , ck,3 ). These channel-coded bits c̄k are A A mapped to BPSK modulated symbols (xA 3k−2 , x3k−1 , x3k ) and B B B (x3k−2 , x3k−1 , x3k ) at node A and node B, respectively. Given the overlapped signal Y R at the relay node, the likelihood of c̄k is calculated by 3 f2 c2 Codeword-realignemnt layer message α(s0 ) and β(sK ) are initialized as 1 if s0 equals to 0 α(s0 ) = 0 otherwise, β(sK ) = s1 J c1 WF s0 u1 c1 Fig. 4. Symbol misalignment in PNC: a general symbol misalignment consists of an integral part τI = 2 and a fractional part τF = 0.7. and f1 Pr xNA , xNB Y R \e x1,1 x1,0 Pr y1R x1,0 Pr y2R x1,1 Pr y3R x 2,1 \e x N ,N x 2,1 Pr y2RN x N , N x N 1, N Pr y2RN 1 x N 1, N Fig. 5. Decoding framework for asynchronous convolutional-coded PNC systems. This framework can deal with an integral-plus-fractional symbol misalignment. finding a single ML XORed source bit in (12) takes two summations and one comparison, so finding all the ML XORed source bits takes 3K operations beyond the operations by the B R BP algorithm that computes Pr(uA k , uk |Y , C), k = 1, 2, . . . K. Therefore finding the ML XORed source bits of length-K packets has an overall complexity of 9K/[rlog2 (M )]+6·22/r KS 2 + 4·22/r KS 2 + 3K. In Big-O notation, the complexity of source bit-optimal decoding algorithm is O(K). Compared with the XOR packet-optimal decoding algorithm, the XOR bit-optimal algorithm has a much lower complexity and is therefore more feasible in practice. V. A SYNCHRONOUS C ONVOLUTIONAL -C ODED PNC 2 2πσ j=1 In this section, we present our three-layer decoding frame⎧ 2 ⎫ work for asynchronous convolutional-coded PNC. The asyn ⎪ ⎪ R B B ⎨ y3(k−1)+j ⎬ chrony causes unique challenges that the synchronous decoder − hA x A 3(k−1)+j − h x3(k−1)+j . in Section IV cannot handle. As shown in Fig. 4, when the exp − ⎪ ⎪ 2σ 2 ⎩ ⎭ signals of nodes A and B arrive at the relay at different times, (16) their symbols can be misaligned. The symbol misalignment consists of two parts: an integral part τI and a fractional part Forward/backward recursion After initializing the messages τF . These two components impose different challenges: the from leaf nodes, we can compute the message α(sk ) and β(sk ) fractional symbol misalignment causes overlaps of adjacent recursively by following the message update rule below [24]: symbols, and as a result the symbol-boundary preserving sam pling as expressed in (8) is no longer valid; the integral symbol α (sk ) = fk (sk−1 , ūk , c̄k , sk )α (sk−1 ) γ (c̄k ) , misalignment entangles the channel-coded bits of nodes A and sk−1 ,ūk ,c̄k B in such a way that the decoding scheme as proposed in (17a) Section IV cannot be applied anymore. fk (sk−1 , ūk , c̄k , sk )β (sk ) γ (c̄k ) . (17b) β (sk−1 ) = To address these challenges, we add two layers to the Jt-CNC sk ,ūk ,c̄k decoder to construct an integrated framework, as illustrated in Fig. 5. First, to address the fractional symbol misalignment, Termination In the final step, the algorithm terminates with the symbol-realignment layer uses a BP algorithm at the relay the computation of δ(ūk ), which gives the APP of the source to “realign” the soft information of the symbols. Second, the bit ūk codeword-realignment layer uses an interleaver/deinterleaver δ (ūk ) = fk (sk−1 , ūk , c̄k , sk )α (sk−1 ) γ (c̄k ) β (sk ) . set-up to accommodate the integral symbol misalignment. As a result, the three-layer decoding framework can deal with the sk−1 ,sk ,c̄k (18) integral-plus-fractional symbol misalignment. The summation in (18) is over different trellis transitions e = (sk−1 , ūk , c̄k , sk ) with fixed ūk , such that fk (e) = 1 if e A. Symbol-Realignment Layer: Addressing Fractional Symbol is a valid transition, and fk (e) = 0 otherwise. Misalignment Let us consider the computing complexity of the XOR bit-optimal decoding algorithm. The initialization takes For simplicity, as in [5], we assume the use of rectangular 9K/[rlog2 (M )] complex operations in (16). The forpulse to carry the modulated signal in the analog domain. As illustrated in Fig. 4, the fractional symbol misalignment τF ward/backword recursions step take 6·22/r KS 2 operations, where S is the number of decoder’s states. The termination causes an inter-symbol interference between A’s and B’s signals A A step for computing (18) takes 4·22/r KS 2 operations. Because (e.g., xB 1 overlaps with part of x3 and part of x4 ). In [25], γ(c̄k ) ∝ √ 7 Pr x1A , x1B Y R \o Pr xNA , xNB Y R \e x1,0 Pr y1R x1,0 \e x1,1 Pr y2R x1,1 x N ,N x 2,1 Pr y3R x 2,1 Pr y2RN x N , N x N 1, N Pr y R 2 N 1 x N 1, N Fig. 6. Tanner graph of the symbol-realignment layer. The variable node xi,j B corresponds to the joint symbol (xA i , xj ); nodes ψo and ψe are the factor nodes that constrain the relationships among the variable nodes. The likelihood R R |xn,n ) are the evidences from |xn,n−1 ) and Pr(y2n probabilities Pr(y2n−1 observation Y R . suboptimal sampling was assumed: with respect to Fig. 4, only A the overlapped part of xB 1 and x3 is sampled, and the useful A signal in the overlapped part of xB 1 and x4 is discarded. By contrast, our method here is an optimal maximum-likelihood (ML) oversampling method based on the BP algorithm. Specifically, the relay R performs integration (matched filtering) on the overlapped symbols for a duration τF and a duration of (1 − τF ) alternately to generate (2N + 1) samples. Our oversampling method makes full use of the overlapped signal and is ML optimal. Let us first ignore the integral part of symbol misalignment and only consider the fractional part (i.e., τ < 1) in this subsection. When the integral part of symbol misalignment is not zero, the factor graphs follow different equations for non-overlapping parts of the received signal. Because the nonoverlapping parts are “clear” signal, we use the traditional sampling technique for them. The soft-information of the nonoverlapping parts of signal can be directly computed, so we only focus the Tanner graph of the √ overlapped signal. Furthermore, let us assume |hA |=|hB |= P where P is the transmission power of end nodes. The design of this layer is similar to the decoding method of asynchronous non-channel-coded PNC expounded in [5], [26]. It is provided here for completeness and for illustration on how this layer is tied to the upper layer. With over-sampling on the received signal y R (t), the total number of samples obtained per packet is (2N + 1), where N is the number of symbols per packet (for both users A and B). The relay uses the (2N + 1) samples to compute B R the soft information Pr(xA n , xn |Y ), where instead of the R R R R expression in (7), Y = (y1 , y2 , · · · , y2N +1 ) consists of the (2N +1) samples. Thus, as far as the soft information fed to the upper layer is concerned, the fractional symbol misalignment is removed and the symbols are realigned. We emphasize that this realignment of soft information is a key step. Once that is done, the channel decoding algorithm for synchronous PNC as proposed in Section IV can be applied. We can write the samples obtained at the relay R as follows (after normalization): R B jφ R y2n−1 = xA + w2n−1 n + xn−1 e R y2n = xA n + jφ xB ne + R w2n (19a) (19b) R B jφ R where n=1, 2, . . . , N , xB 0 =0 and y2N +1 =xN e +w2N +1 . The R R terms w2n−1 and w2n are zero-mean complex Gaussian noise with variances σ 2 /τF P and σ 2 /(1 − τF )P per dimension, respectively [5]. We use a BP algorithm to compute soft information of B R Pr(xA n , xn |Y ) from the (2N + 1) samples. The associated Δ Tanner graph is shown in Fig. 6. In the Tanner graph xi,j = A B (xi , xj ) are the variable node; ψo and ψe are the compatibility functions associated with the factor nodes. The compatibility functions model the correlation between two adjacent symbols and are defined as ⎧ n,n−1 of xA n in x n,n−1 n,n ⎨1 if the values n,n ψo x = ,x and x are equal ⎩ 0 otherwise, (20a) ⎧ B n,n of xn in x ⎨1 if the values (20b) ψe xn,n , xn+1,n = and xn+1,n are equal ⎩ 0 otherwise. Note that the Tanner graph in Fig. 6 has a tree structure, hence the BP algorithm can find the “exact” a posteriori probability Pr(xn,n |Y R ) for n = 1, . . . , N . Furthermore, the solution can be found after only one iteration of the messagepassing algorithm [24]. We now describe the message-passing algorithm in detail: αL (xi,j ) denotes the forward message from the factor node (ψo or ψe ) to variable node xi,j , αR (xi,j ) denotes the forward message from variable node xi,j to the factor node; βR (xi,j ) denotes the backward message from the factor node to variable node xi,j , βL (xi,j ) denotes the backward message from variable node xi,j to the factor node; γ(xi,j ) denotes the evidence of variable node xi,j from the observation. Let us first consider variable node xn,n . The forward messages from left to right can be computed as αL (xn,n ) = αR (xn,n−1 )ψo xn,n−1 , xn,n , (21a) xn,n−1 αR (xn,n ) = αL (xn,n )γ(xn,n ). (21b) The backward messages from right to left can be computed as (22a) βR (xn,n ) = βL (xn+1,n )ψe xn,n , xn+1,n , βL (x n,n )= xn+1,n βR (xn,n )γ(xn,n ). (22b) We compute the forward/backward messages of variable node xn+1,n in the same way, except that the factor nodes need to be modified accordingly. The likelihood probabilities R R Pr(y2n+1 |xn+1,n ) and Pr(y2n |xn,n ) are the evidences γ(xn,n ) R from observation Y . The computation of these evidences is given by R A R n+1,n x xn+1 , xB γ(xn+1,n ) = Pr y2n+1 = Pr y2n+1 n R B 2 y2n+1 − xA 1 n+1 − xn exp − = 2πσ 2 /τF 2σ 2 /τF (23a) and R A B R n,n x x ,x = Pr y2n γ(xn,n ) = Pr y2n n n R B 2 y2n − xA 1 n − xn exp − . = 2πσ 2 /(1 − τF ) 2σ 2 /(1 − τF ) (23b) B R Finally, the soft information of Pr(xA n , xn |Y ) is computed by B R n,n )βR (xn,n )γ(xn,n ). Pr(xA n , xn |Y ) = αL (x (24) 8 The computation of forward messages in (21a) and (21b) takes 2N M 2 (2M 2 − 1) complex operations and 2N M 2 complex operations, where N is the number of symbols per packet and M is the modulation order, respectively. The same computation is needed for the backward messages in (22a) and (22b) as well. Computing the evidences in (23a) and (23b) takes (2N + 1)7M 2 complex operations. Equation (24) takes another 2N M 2 complex operations. Therefore, the overall computational complexity of the symbol-realignment layer is 4N M 2 (2M 2 − 1) + 4N M 2 + (2N + 1)7M 2 + 2N M 2 complex operations. In Big-O notation, the complexity is O(N ). Since the complexity is linear with the packet size N , the symbolrealignment layer does not incur heavy overhead. B. Codeword-Realignment Layer: Countering Integral Symbol Misalignment Since the fractional part of symbol misalignment has been removed in the symbol-realignment layer, here we only consider the integral part of symbol misalignment in this subsection. Recall that in Section IV we used (16) to compute the message γ(c̄k ). Equation (16) requires that the modulated symbols of end nodes A and B to be symbol-by-symbol aligned (i.e., xA n must align with xB n ). However, with integral symbol misalignment B τI , x A n will be aligned with xn−τI ; consequently, the algorithm proposed in Section IV becomes invalid. The codeword-realignment layer solves this problem using a specially designed interleaver/deinterleaver at the end/relay nodes. At the end nodes, we use the same block interleaver with R rows and M/R columns, where r = 1/R is the code rate and M is the number of bits in the codeword. For interleaving, the channel-coded bits are filled into the interleaver column-wise , and read out row-wise. Let Π denotes the interleave operation. The interleaving process is Π C A = C̃ A , Π C B = C̃ B . (25) The interleaved packets C̃ A and C̃ B are modulated and transmitted simultaneously to the relay. Upon receiving the overlapped signal (with symbol misalignment), the relay first deals with the fractional symbol misalignment with the algorithm proposed in Section V-A, leaving only the integral symbol misalignment τI after that. Then the relay wraps back the nonoverlapped signal at the tail to the head. The overlapped B B , where C̃(τ is the τI bit circularsignal becomes C̃ A + C̃(τ I) I) B shifted version of C̃ . Finally, the relay uses the same block deinterleaver to deinterleave the overlapped signal as B B B Π−1 C̃ A +C̃(τ =Π−1 C̃ A +Π−1 C̃(τ =C A +C(τ . I) I) I R) (26) Let us consider an example with a convolutional code of code rate 1/3, BPSK modulation, and integral symbol misalignment of τI = 2 (in this subsection, we only consider the integral part of τ ). As specified in Section III, the channel-coded packets of node A and node B are C A and C B , respectively. Then the channel-coded packet is bit-interleaved with a block interleaver with 3 rows and M/3 columns. The interleaved packets C̃ i , i ∈ {A, B} in (3) are BPSK modulated to produce the transmitted signal X i = xi1,1 , xi2,1 , · · · , xiK,1 , xi1,2 , · · · , xiK,2 , xi1,3 , · · · , xiK,3 , i ∈ {A, B} , (27) where xik,j = 1 − 2cik,j , and cik,j is the jth channel-coded bit of node i’s channel-coded packet at time k as defined in (3). The TABLE I. D IFFERENT TECHNIQUES TO MAKE THE CONVOLUTIONAL CODE ’ S INITIAL STATE AND TERMINAL STATE THE SAME . Code Technique Non-recursive Recursive zero tailing or tail biting zero tailing received signal samples are the superposition of the following two sequences (xA 1,1 ) (xA 2,1 ) xA 3,1 xB 1,1 xA 4,1 xB 2,1 ··· ··· (xA 2,1 ) (xB K,3 ) xA 3,1 xB 1,1 xA 4,1 xB 2,1 xA K,3 xB K−2,3 (xB K,3 ). (28) The relay first wraps back the nonoverlapped signals: A B B xA 1,1 x2,1 at the head and xK−1,3 xK,3 at the tail (enclosed in brackets in (28) above). After the wrap-back, the realigned sequences look like (xA 1,1 ) (xB K−1,3 ) xA 5,1 xB 3,1 (xB K−1,3 ) ··· ··· xA K,3 B xK−2,3 . (29) Then the relay deinterleaves the signal in (29) to restore node A’s transmission order. After deinterleaving, the received packet becomes the superposition of the following sequences: A A A A (xA xA xA (xA xA 2,1 ) x2,2 x2,3 1,1 ) 1,2 1,3 3,1 x3,2 x3,3 · · · B B B B B B B B (xK−1,3 ) xK−1,1 xK−1,2 (xK,3 ) xK,1 xK,2 x1,1 x1,2 xB 1,3 · · · block−1 block−2 block−3 (30) We group the received packet into K blocks, with each blockk containing the R = 3 coded symbols of the kth input. The signal in (30) is equivalent to the superposition of the B modulated signals of C A and C(6) , where, except for the first B two blocks, C(6) is the 6-bit right circular-shifted version of C B . The symbols of the first two blocks are out of order due to the larger-than-one symbol misalignment. However, the disorder does not hinder our decoding because we can still compute the likelihoods of the first and second blocks. For example, the likelihood of the first block can be computed as follows: A A A B B B B γ c̄A 1 c̄K−1 = γ x1,1 x1,2 x1,3 xK−1,1 xK−1,2 xK−1,3 B A B A B (31) ∝ γ xA 1,1 γ xK−1,3 γ x1,2 xK−1,1 γ x1,3 xK−1,2 . To compute the likelihood in (31), we first compute the likelihoods in the second line. We can obtain γ(xA 1,1 ), A B A B ), γ(x x ), and γ(x x ) from the γ(xB 1,2 K−1,1 1,3 K−1,2 K−1,3 symbol-realignment layer as shown in Fig. 6. After normalB ization, we get the likelihood of block-1 γ(c̄A 1 c̄K−1 ). We can compute the likelihood of block-2 in the same manner. The likelihood of block-k when k > τI can be computed using (16). We then pass the likelihoods of the K blocks to the upper layer, Jt-CNC decoder, as the message γ(c̄k ). In Appendix I we prove that if a convolutional code C has the same initial state and terminal state, then C(τI R) , the τI R bit circular-shifted version of C, can be decoded to U(τI ) . To ensure that the initial state equals to the terminal state, we use different techniques for different kinds of convolutional codes as shown in Table I. As a result, in the presence of integral symbol misalignment, our XOR bit-optimal decoding algorithm will output B U R = U A ⊕ U(τ . However, node A (B) can still decode the I) information of node B (A). In the downlink phase, the relay broadcasts the XOR message U R together with the value of τI to both the end nodes. Node A can first XOR U R with its own 9 0 10 0 10 (5,7) FSV (5,7) Jt−CNC (5,7) XOR−CDV −1 10 −1 10 −2 FER BER 10 −3 10 −2 10 −4 10 −5 10 (5,7) Jt−CNC (5,7) FSV (5,7) XOR−CDV (13,15,17) Jt−CNC (13,15,17) FSV (13,15,17) XOR−CDV −3 10 −4 10 −6 10 −1 0 1 2 3 4 5 SNR (dB) Eb/N0 6 7 8 0 1 2 3 4 SNR (dB) Eb/N0 5 6 7 8 9 Fig. 7. BER performances of Jt-CNC, XOR-CD Viterbi (XOR-CDV), and full-state Viterbi (FSV) algorithms for PNC. The channel codes are (5, 7) and (13, 15, 17) convolutional codes. We use BPSK modulation and assume AWGN channel. B packet to obtain U(τ , then left shift it τI bits to restore U B . I) B Node B can first right shift its own packet to obtain U(τ and I) R A then XOR it with U to obtain U . VI. −1 N UMERICAL R ESULTS We evaluate the performance of the proposed PNC decoding framework under the AWGN channel by extensive simulations and analysis. First, we compare the BER performances of JtCNC, XOR-CD Viterbi, and full-state Viterbi algorithms in synchronous PNC under both AWGN channel and Rayleigh fading channel. Second, we investigate the effect of phase offset on our Jt-CNC decoder. Third, we present the performance of the three algorithms in the presence of symbol asynchrony. Furthermore, we implement the three algorithms in a real PNC prototype built on the USRP software radio platform, and test them in an indoor environment. A. BER Performance Comparison and Analysis 1) AWGN Channel: We compare the BER performances of Jt-CNC, XOR-CD Viterbi (XOR-CDV), and full-state Viterbi (FSV) in synchronous PNC under AWGN channel. The XORCD Viterbi algorithm and full-state Viterbi algorithm were introduced in Section II and elaborated in Appendix III. In the simulations, we adopt convolutional codes of two different code rates: code rate 1/2 (5, 7) code and code rate 1/3 (13, 15, 17) code. We consider BPSK modulation, assuming AWGN channel. We plot the BER curve of the full-state Viterbi algorithm as a benchmark for the reduced-state Viterbi algorithm in [13]. In our attempt to replicate the reduced-state Viterbi algorithm, we cannot get the same simulation results as in [13] even though we follow the exact specification as described in the paper2 . Our simulation results are somewhat better than those presented 2 We believe that there are errors in equation (12) and Fig. 3 in [13]. We suspect that in [13], the SNR was not normalized correctly. Our attempt to contact the authors of [13] by email received no reply. Fig. 8. Frame error rate (FER) performances of Jt-CNC, XOR-CD Viterbi (XOR-CDV), and full-state Viterbi (FSV) algorithms for synchronous PNC. The channel code used is (5, 7) convolutional code. We use BPSK modulation and assume AWGN channel. in [13]. To avoid misrepresenting their results, here we just compare the results of full-state Viterbi with Jt-CNC. As shown in Fig. 7, Jt-CNC has slightly better BER performance than FSV. In Section II-A and also in Appendix III-B, we explained that FSV is an approximation to the XOR packet-optimal decoding algorithm based on the log-max approximation. As such, FSV is not exactly XOR-packet optimal. The approximation is shown in equation (III.51). During the simplification in (III.51), some possible combinations of {U A , U B } yielding the same XOR packet U R are omitted, so FSV is not strictly an XOR packet-optimal decoding scheme, but an approximation to it. As such, there is no guarantee that it will outperform Jt-CNC even if XOR-packet error rate is the performance metric. Meanwhile, when XOR-bit error rate is the performance metric (as shown in Fig. 7), Jt-CNC will be better than FSV, since Jt-CNC targets for bit optimality and is an exact bit-optimal algorithm for synchronous PNC. In [13], a performance gap of 2 dB was observed between the reducedstate Viterbi and the full-state Viterbi. If the gap between fullstate Viterbi and reduced-state Viterbi is 2 dB, then the gap between Jt-CNC and reduced-state Viterbi is at least 2 dB. Fig. 7 also shows that Jt-CNC outperforms XOR-CD Viterbi by 2 dB for both rate 1/2 and 1/3 convolutional codes. As described previously, XOR-CD loses information in the XORmapping, hence this 2 dB gap is as expected. 2) Frame Error Rate: We compare the frame error rate (FER) performances of Jt-CNC, XOR-CD Viterbi and full-state Viterbi in synchronous PNC. As discussed in Section IV, Jt-CNC is optimal in terms of BER, but may not be optimal in terms of FER. As shown in Fig. 8, the FER performances of Jt-CNC and FSV are quite close, and are 1 dB better than XOR-CDV. We have also investigated the FER performances of the three decoders in asynchronous PNC. The relative performance gaps among the decoders as in synchronous PNC are also observed. Furthermore, in terms of the dB gaps between the decoders, there is no substantial difference between BER and FER results. Henceforth, we will only present the BER results. 3) Fading Channel: We compare the BER performances of Jt-CNC, XOR-CDV, and FSV in synchronous PNC under fading channel. In the simulation, we assume block Rayleigh fading 9 10 0 10 (5,7) FSV (5,7) Jt−CNC (5,7) XOR−CDV −1 10 −2 −3 10 −4 10 −5 10 i=1 (ẋA i −6 10 (33) Since the path {Ẋ A , Ẋ B } and path {X A , X B } differ in exactly k symbols, the pairwise error probability is Pk = Pr M X A , X B − M Ẋ A , Ẋ B > 0 k k R B 2 B 2 yiR − xA yi − ẋA i − xi i − ẋi − >0 = Pr 2σ 2 2σ 2 i=1 i=1 k B A B A B 2 2yiR (ẋA = Pr i + ẋi − xi − xi ) + (xi + xi ) − 10 BER (as defined in (10)) than the correct path, i.e., M Ẋ A , Ẋ B < M X A , X B . 0 2 4 6 SNR (dB) Eb/N0 8 10 12 Fig. 9. BER performances of Jt-CNC, XOR-CD Viterbi (XOR-CDV), and full-state Viterbi (FSV) algorithms for PNC under fading channel. The channel code used is (5, 7) convolutional code. We use BPSK modulation and assume block Rayleigh fading channel. and white noise. As shown in Fig. 9, compared with the BER performances under AWGN channel, all the three decoding algorithms experience 3 dB degradation under fading channel. The degradation is caused by the phase offset and unbalanced channel coefficients of fading channel. 4) BER Analysis: We now analyze the bit error rate (BER) of our Jt-CNC decoder under AWGN channel. It is difficult to obtain the closed-form expression of BER for Jt-CNC decoder, which uses the BCJR decoding algorithm. However, the BER of FSV (full-state Viterbi) algorithm can be a good approximation for Jt-CNC algorithm for the following two reasons: first, the simulation in Section VI-A1 shows that BER performances of Jt-CNC and FSV are quite close; second, Jt-CNC and FSV should perform nearly the same in the high SNR regime. Next we derive the approximative BER expression for Jt-CNC and FSV with BPSK modulation. Let BERXOR denote the BER of the XORed source packet U R , and BERAB denote the BER of the joint source packet {U A , U B } of the two end nodes. It can be easily proved that BERXOR < BERAB , since an error of U R implies an error of {U A , U B } but not vice versa (e.g., if U A and U B have one bit error in the same position, their XOR is still correct). There is no simple way to derive the closed-form expression of BERAB , but we may approximate (upper bound) it using the union bound [27] as BERAB ≈ ∞ c k Pk . (32) k=d In (32), d is the free distance of the FSV decoder. ck is the sum of the numbers of bit errors over all paths of distance k from the correct path. We compute ck by taking the derivative of the FSV decoder’s transfer function [27]. Pk is the pairwise error probability that an incorrect path with distance k from the correct path is decoded. To compute Pk , let us consider an incorrect path {Ẋ A , Ẋ B } merging with the correct path {X A , X B } at a particular step, which has k incorrect symbols and the remaining symbols correct. Such a path may be incorrectly chosen only if it has a smaller distance metric + 2 ẋB i ) >0 . (34) Without loss of generality, we assume that the correct path {X A , X B } corresponds to the all-zero source packet [28] B (hence xA i =xi = + 1, ∀i), and only one symbol error happens B for each overlapped symbol yiR (hence ẋA i = +1, ẋi = −1 or A B ẋi = −1, ẋi = +1). Therefore k R 2yi (0 − 2) + 4 > 0 Pk = Pr = Pr i=1 k yiR <k . (35) i=1 Since yiR are independent Gaussian random variables of B A B variance σ 2 and mean (xA i + xi ), where xi and xi are the actually transmitted symbols by the end nodes. Therefore the k sum Z = i=1 yiR is also Gaussian with mean 2k and variance kσ 2 hence !√ " k − 2k k . (36) Pk = Pr {Z < k} = 1 − Q √ =Q 2 σ kσ Consequently, BERAB ≈ ∞ k=d !√ " k . ck Q σ (37) To give a concrete example, let us assume both end nodes use (5, 7) convolutional codes as in Section VI-A1. As illustrated in Section III-B, the trellis of the FSV decoder is the combination of node A’s and node B’s trellises. Therefore, the FSV decoder 5, 7, 0, 0 has a generator vector of . We compute the free 0, 0, 5, 7 distance d and the coefficients ck of this FSV decoder using the method proposed in [29]. We get that d = 5 and substitute the values of ck into (37) to obtain BERAB ≈ 2P5 + 8P6 + 24P7 + 64P8 + 160P9 + . . . !√ " !√ " !√ " 5 6 7 + 8Q + 24Q + = 2Q σ σ σ !√ " !√ " 8 9 64Q + 160Q + . . . . (38) σ σ We omit the higher order terms (when k > 12 the value of Pk is negligible) in (38) and plot the approximative BER curve together with the simulation results from Section VI-A1. As 11 0 0 10 10 −1 10 −2 10 −1 10 −2 BER 10 BER Jt−CNC φ=0 Jt−CNC φ=π/4 XOR−CDV φ=0 XOR−CDV φ=π/4 FSV φ=0 FSV φ=π/4 −3 10 −4 −3 10 −4 10 10 Jt−CNC φ=0 Jt−CNC φ=π/4 XOR−CDV φ=0 XOR−CDV φ=π/4 FSV φ=0 FSV φ=π/4 −5 10 −6 10 0 −5 10 −6 2 4 6 8 SNR (dB) Eb/N0 10 12 (a) without random-phase precoding 10 0 2 4 6 SNR (dB) Eb/N0 8 10 12 (b) with random-phase precoding Fig. 11. Effects of phase offset on Jt-CNC, XOR-CD Viterbi (XOR-CDV), and full-state Viterbi (FSV). QPSK modulation and the (13, 15, 17) convolutional code are used in the simulation. We assume the symbols are aligned and the relative phase offset is π/4. In (a), both nodes transmit their signals directly; in (b), node B precodes its transmit signal with a pseudo-random phase sequence. 2 10 (5,7) Approximative BER (5,7) Jt−CNC (5,7) FSV 1 10 0 10 −1 BER 10 −2 10 −3 10 −4 10 −5 10 −6 10 −7 10 −1 0 1 2 3 4 SNR (dB) Eb/N0 5 6 7 8 Fig. 10. The approximative BER curve derived in (38) and the simulated BER curves of Jt-CNC, full-state Viterbi (FSV) algorithms. The channel code used is (5, 7) convolutional code. We use BPSK modulation and assume AWGN channel. shown in Fig. 10, the BER curves of Jt-CNC and FSV are close to the derived approximative BER curve in (38), and approach it in the high-SNR regime. B. Effects of Phase Offset We next evaluate the effect of phase offset on Jt-CNC assuming QPSK modulation (higher order QAM can also be used)—note that phase offset does not present a challenge to BPSK systems (see [4], [5]). First, we compare the BER performances of the aforementioned three decoding algorithms with phase offset φ=0 (phase synchronous) and φ=π/4 (worst case for QPSK [5]). As shown in Fig. 11a, when the phase offset is π/4, the BER performances of Jt-CNC, FSV, and XOR-CDV degrade by 3 dB, 3 dB, and 5 dB, respectively. The severe phase penalty is due to the poor confidence of the messages as calculated in (16) when the phase offset is π/4. One method to improve the confidence is to make the phase offset random so that the symbols with small phase offset can help the symbols with large phase offset during the BP process. To improve our system’s resilience against phase offset, we adopt the random-phase precoding at the transmitter of one end node. Specifically, node B rotates the phase of its transmitted signal with a pseudo-random phase sequence B B ΦB =(φB 1 , · · · , φN ) where φn is randomly chosen from zero to π/4. We assume that this pseudo-random phase sequence is known at the relay so that it can incorporate this knowledge into the decoding process. As shown in Fig. 11b, with the randomphase precoding algorithm, the phase penalty is reduced to 1 dB, 1 dB, and 3 dB compared with the synchronous case for Jt-CNC, FSV, and XOR-CDV, respectively. C. Effects of Symbol Misalignment A major advantage of the proposed decoding framework is that it can handle general symbol misalignment with different decoding algorithms. We evaluate the performance of Jt-CNC under varying degrees of symbol misalignment and phase offset (without random-phase precoding). In the simulation, both end nodes transmit 1000-bit source packets (corresponding to 1500 QPSK symbols for channel code rate of 1/3). From Fig. 12 we see that although the fractional symbol misalignment (the curve with τ = 0.5, φ = 0) degrades the BER performance by 0.5dB, the integral symbol misalignment (the curve with τ =100.5, φ=0) improves the BER performance 12 0 10 0 10 XOR−CDV FSV Jt−CNC −1 10 −1 10 −2 10 −2 −3 BER BER 10 τ=0, φ=0 τ=0.5, φ=0 τ=100.5, φ=0 τ=0, φ=π/8 τ=0.5, φ=π/8 τ=100.5, φ=π/8 10 −4 10 −3 10 −4 10 −5 10 −5 10 −1 0 1 2 SNR (dB) Eb/N0 3 4 5 −6 10 Fig. 12. BER performance of Jt-CNC decoder under general symbol misalignment, with (13, 15, 17) convolutional code and QPSK modulation. 4 5 6 7 8 9 10 SNR (dB) E /N b 11 12 13 0 Fig. 14. BER performances of Jt-CNC, FSV, and XOR-CDV in an indoor environment. We tested the three algorithms on a practical OFDM PNC prototype implemented on USRP N210. The PNC prototype adopts BPSK modulation and (5, 7) convolutional code. 0 10 −1 10 −2 BER 10 −3 10 −4 10 −5 10 FSV δ=0 Jt−CNC δ=0 XOR−CDV δ=0 XOR−CDV δ=9.5 Jt−CNC δ=9.5 FSV δ=9.5 −6 10 0 1 2 3 4 5 6 SNR (dB) Eb/N0 7 8 9 10 Fig. 13. BER performances of Jt-CNC, XOR-CD Viterbi (XOR-CDV), and full-state Viterbi (FSV) decoding algorithms under general symbol misalignment (δ = 9.5, φ = 0), with (13, 15, 17) convolutional code and QPSK modulation. slightly. That is because when there are integral symbol misalignments, the head and tail of the signals are non-overlapping and thus yield cleaner information without mutual interference. We next compare the BER performances of the Jt-CNC, XOR-CD Viterbi, and full-state Viterbi algorithms under largerthan-one symbol misalignment. As shown in Fig. 13, when symbol misalignment δ = 9.5 and phase offset φ = 0, the BER performances degrades 0.5 dB, 0.5 dB, and 3 dB for JtCNC, FSV, and XOR-CDV, respectively. Within the framework, both Jt-CNC and full-state Viterbi are robust to symbol misalignment; while XOR-CD Viterbi is quite sensitive to symbol misalignment. D. Software Radio Experiment To evaluate the proposed algorithm in a real communication system, we implemented an OFDM PNC prototype built on USRP N210. The three decoding algorithms are implemented in the prototype. The PNC prototype adopts BPSK modulation with 2 MHz bandwidth and 2.58 GHz carrier frequency. We used the (5, 7) convolutional code and followed the frame format design in [8]. We conducted our experiments in an indoor office environment and evaluated the BER performances of Jt-CNC, XOR-CDV, and full-state Viterbi algorithms under different SNRs. In the experiment, we balanced the powers of the end nodes and let both nodes transmit 100 packets to the relay. Each packet consisted of 204 OFDM symbols (4 symbols of preambles and 200 symbols of data). The PC used for this experiment has 32 GB RAM and an Intel Core i7 processor. The typical processing times to decode one packet for the three decoding algorithms are: 0.5255 s for XOR-CDV; 2.0596 s for FSV; 1.4946 s for Jt-CNC. As shown in Fig. 14, the BER performances of Jt-CNC and full-state Viterbi are nearly the same in real indoor environment. Compared with the simulation results in Fig. 7, the BER performance of all the three algorithms in the real system are degraded by 5 dB due to imperfections of the real system, such as imperfect channel estimation, carrier-frequency offsets, and frequency-selective channels. VII. C ONCLUSION We have proposed a three-layer decoding framework for asynchronous convolutional-coded PNC systems. This framework can deal with general (integral plus fractional) symbol misalignment in convolutional-coded PNC systems. Furthermore, we design a Jt-CNC algorithm to achieve the BERoptimal decoding of convolutional code in synchronous PNC. Building on the study of convolutional codes, we further generalize the Jt-CNC decoding algorithm to all cyclic codes (in Appendix II), providing a new angle to counter symbol asynchrony. Simulation shows that our Jt-CNC algorithm outperforms the previous decoding algorithms (XOR-CD, reduced-state Viterbi) by 2 dB. For both phase-asynchronous and symbol-asynchronous PNC, our Jt-CNC algorithm outperforms the two previously proposed algorithms. Importantly, we have implemented the proposed Jt-CNC decoder in a real PNC 14 13 u2 u1 s1 f1 ( s0 ) sK f2 f K k c2 c1 cK k sK k uK fK cK uK k cK k 1 f K k 1 uK k 1 sK 1 sK k 1 Fig. 15. Tanner graph of a convolutional code that has the same initial state S0 and terminal state SK . We merge the initial state and terminal state, hence the Tanner graph in Fig. 2 becomes a ring. prototype built on software radio platform. Our experiment shows that the Jt-CNC decoder works well in practice. A PPENDIX I C ORRECT D ECODING OF C IRCULAR -S HIFTED S OURCE B ITS Theorem 1: Let C denote the codeword of a convolutional code whose code rate is r = L/R, where L and R are positive integers. If the initial state and terminal state of this convolutional code encoder are the same, then the decoding based on C(kR) , the kR-bit right circular-shifted version of C, yield the kL-bit right circular-shifted version of U . Proof: The encoding and decoding process of the convolutional code can be represented by the Tanner graph in Fig. 2. Since the code has the same initial and terminal state, we can merge the initial state and terminal state of the Tanner graph as shown in Fig. 15. For a general convolutional code with code rate L/R, the source message ūk is an L-bit tuple and the coded message c̄k is an R-bit tuple. Let C(kR) = (c̄K−k+1 , c̄K−k+2 , · · · , c̄K , c̄1 , · · · , c̄K−k ) be the kR-bit right circular-shifted version of codeword C. To decode C(kR) , the decoding algorithm starts with the first tuple c̄K−k+1 and ends with the last tuple c̄K−k . Because the Tanner graph has a ring structure, the decode output is U(kL) = (ūK−k+1 , ūK−k+2 , · · · , ūK , ū1 , · · · , ūK−k ), which is the kL-bit right circular-shifted version of U . Remark 1: Both zero-tail convolutional codes and tail-biting convolutional codes have the property in Theorem 1, because their initial state and terminal state are the same. For a recursive convolutional code, we can append tail bits to the input packet to force the terminal state of the encoder to zero state. Then recursive convolutional codes can also be used with the proposed Jt-CNC decoder. A PPENDIX II H ANDLE S YMBOL M ISALIGNMENT IN TAIL -B ITING C ONVOLUTIONAL C ODES AND C YCLIC C ODES Theorem 2: For a tail-biting convolutional code with code rate 1/R, R ∈ N+ , let U denote the source packet of the channel-coded packet C, and C(kR) denote the kR-bit right circular-shifted version of C. The source packet corresponding to C(kR) is U(k) , the k-bit right circular-shifted version of U . Proof: Let m denote the memory length of this convolutional encoder. The generator matrix of the convolutional code is ⎡ ⎤ g0 g1 g2 · · · gm ⎢ ⎥ g0 g1 · · · gm−1 gm ⎢ ⎥ .. .. ⎢ ⎥ . . ⎢ ⎥ ⎢ ⎥ g0 g1 g2 · · · gm ⎥ ⎢ , G=⎢ g0 g1 · · · gm−1 ⎥ ⎢ gm ⎥ ⎢g ⎥ g0 · · · gm−2 ⎥ ⎢ m−1 gm ⎢ . ⎥ .. .. ⎣ .. ⎦ . . g1 g2 · · · gm g0 (II.39) where gb = [g0 g1 · · · gm ] is the basis generator matrix of the convolutional code; each entry gi is an R-bit vector ) * gi = gi(1) gi(2) · · · gi(R) , (II.40) (r) where gi is equal to 1 or 0, corresponding to whether the ith stage of the shift register contributes (connects) to the rth output. Therefore, the basis generator matrix gb can be regarded as the “impulse response” of the convolutional encoder. For example, the basis generator matrix of (5, 7) convolutional code shown in Fig. 16 is [11 01 11]. The encoding process is simply C = U G. (II.41) The right circular-shifted codeword can be represented by C(kR) = U G(k) (II.42) where G(k) is obtained by right circular-shift matrix G by k×R columns. Since G is a circulant matrix, we have C(kR) = U G(k) = U(k) G. (II.43) Therefore the source packet of C(kR) is U(k) , the k-bit circularshifted version of U . Remark 2: Theorem 2 is also valid for the tail-biting convolutional code with a general code rate L/R, but the resulting source packet will be U(kL) , the kL-bit right circular-shifted version of U . The proof is the same except that the entry of the basis generator matrix gb is an L × R matrix: ⎤ ⎡ (1) (2) (R) g1,i g1,i · · · g1,i ⎢ (1) (2) (R) ⎥ ⎢g2,i g2,i · · · g2,i ⎥ ⎥ ⎢ (II.44) gi = ⎢ . .. .. ⎥ ⎣ .. . . ⎦ (1) (2) (R) gL,i gL,i · · · gL,i (r) where gl,i is equal to 1 or 0, depending on whether the ith stage of the shift register for the lth input contributes (connects) to the rth output. Theorem 2 also indicates that tail-biting convolutional codes are quasi-cyclic with period R. Inspired by the quasi-cyclic property of tail-biting convolutional codes [19], [30], [31], we attempted to generalize the results to general quasi-cyclic 14 uk D B B shifts its codeword C B to produce C(τ and decodes C(τ to I) I) B R B A obtain Ũ ; then node B XORs U with Ũ to obtain U . ck ,1 D D ck ,2 Fig. 16. Convolutional encoder of (5, 7) convolutional code. uk is the input source bit at time k; ck,1 and ck,2 are the first and second output bit of the encoder at time k, respectively. codes (as opposed to just convolutional codes). Unfortunately, a general quasi-cyclic code3 may not have the property in Theorem 2. However, building on the insight obtained from our study of convolutional-coded PNC, we propose an algorithm that can deal with general symbol misalignment when cyclic codes are used (as opposed to quasi-cyclic codes). That is, our asynchronous PNC decoding framework can incorporate not just convolutional codes, but all cyclic codes. Since cyclic codes have a period of one, we do not need the interleaver/deinterleaver here. Let C(·) and C −1 (·) denote the encoding function and decoding function of a particular linear cyclic code (e.g., BCH code), respectively. Then the encoding process in the end nodes is C i = C(U i ), i ∈ A, B. To ease presentation, we assume BPSK modulation and a symbol misalignment of τI . The received signal at the relay is the overlap of the following two signals: xA 1 ··· xA τI xA τI +1 xB 1 xA 1 B xN −τI +1 xA 2 ··· ··· ··· ······ xB 2 ··· xA N B xN −τI xB N. (II.45) Upon receiving the overlapped signal, the relay first aligns the last τI symbols with the first τI symbols to obtain a new overlapped signal xA τI xB N xA τI +1 xB 1 ··· xB 2 xB N −τI +1 ··· ··· ··· xA N B xN −τI . (II.46) B The result in (II.46) is actually the signal X A + X(τ , I) B where X(τI ) is the τI -symbol right circular-shifted version of node B’s signal. Then the relay can map the signal of (II.46) B B to C A ⊕C(τ , where C(τ is the τI -bit right circular-shifted I) I) B B version of C . Note that C(τ is also a valid codeword due I) to the property of cyclic code. We assume the source packet B B is Ũ B such that Ũ B = C −1 (C(τ ). corresponding to C(τ I) I) Because the XOR operator preserves the linearity of codes, the relay first decodes the XORed packet by B B =C −1 C A ⊕C −1 C(τ =U A ⊕Ũ B , U R =C −1 C A ⊕C(τ I) I) (II.47) and then broadcasts this packet to both the end nodes. After decoding U R , node A first XORs U R with its own information B = U A to obtain Ũ B ; then node A re-encodes Ũ B to obtain C(τ I) B B to obtain C ; finally from C(Ũ B ), and left circular-shifts C(τ I) C B node A can decode U B . For node B, it first right circular3A recent paper [32] investigates the use of quasi-cyclic LDPC codes to deal with the symbol misalignment in PNC without requiring the validity of Theorem 2. A PPENDIX III E XPLANATIONS OF XOR-CD AND F ULL -S TATE V ITERBI A LGORITHMS In this appendix, we explain and provide interpretations for XOR-CD algorithm and full-state Viterbi (FSV) algorithm. To ease the presentation, we consider BPSK modulation and synchronous PNC. A. XOR-CD As pointed out in Section II, XOR-CD refers to a twostep process: (i) symbol-by-symbol PNC mapping; (ii) channel decoding. In the first step, the received symbol ynR is mapped A B to XORed coded bit cR n = cn ⊕ cn for n = 1, . . . , N . Upon receiving the overlapped symbols, the demodulator at the relay computes the likelihood B B 2 ynR − hA xA R A B − h x 1 n n Pr yn xn , xn = √ . exp − 2σ 2 2πσ 2 (III.48) Then the likelihood in (III.48) is mapped to XORed coded bits by B (III.49) Pr ynR cR Pr ynR xA n = n , xn . B A B R xA n ,xn :cn ⊕cn =cn After the mapping, we obtain C R = C A ⊕ C B , which is fed into the channel decoder in the second step. Based on the likelihood in (III.49), we can make hard decision on cR n , which is called “hard” XOR-CD; or we directly pass the probability to the decoder, which is called “soft” XOR-CD (which is used in our main text). In the second step, an ordinary point-to-point channel decoder can be used to decode the XORed source bits. Since convolutional code is a linear code and XOR is a linear operator, the decoding process of C R is C −1 (C R ) = C −1 (C A ⊕C B ) = C −1 (C A )⊕C −1 (C B ) = U A ⊕U B . (III.50) Two points are noteworthy: (i) not only convolutional codes, any linear code can be used with XOR-CD; (ii) the symbols of X A and X B must be symbol-by-symbol aligned, otherwise (III.50) is invalid. B. Full-State Viterbi In Section IV-A, we show that the complexity of finding the ML XORed source packet is prohibitively high. Full-state Viterbi algorithm is proposed to reduce the complexity in [13]. Equation (9) is simplified using log-max approximation to Û R = arg max log exp −M X A , X B UR ≈ arg min UR U A ,U B :U A ⊕U B =U R min U A ,U B :U A ⊕U B =U R M X A, X B . (III.51) The computing of (III.51) consists of two steps. First, we find the best pair of codewords Û A and Û B such that Û A , Û B = arg min M X A , X B . (III.52) U A ,U B Computing (III.52) is equivalent to finding the shortest path on the joint trellis of node A’s and node B’s encoders. Viterbi 15 algorithm is a well-known algorithm to solve this problem. Since the state space of the joint trellis is the combination of node A’s and node B’s state space, we call this decoding algorithm the “full-state” Viterbi algorithm. Second, we obtain Û R by XOR Û A with Û B (i.e., Û R = Û A ⊕ Û B ). [24] [25] [26] R EFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] S. Zhang, S. C. Liew, and P. P. Lam, “Hot topic: physical-layer network coding,” in Proceedings of Mobicom 2006. ACM, 2006, pp. 358–365. R. Ahlswede, N. Cai, S.-Y. Li, and R. W. Yeung, “Network information flow,” IEEE Trans. Inf. Theory, vol. 46, no. 4, pp. 1204–1216, 2000. S.-Y. Li, R. W. Yeung, and N. Cai, “Linear network coding,” IEEE Trans. Inf. Theory, vol. 49, no. 2, pp. 371–381, 2003. S. Zhang and S. C. Liew, “Channel coding and decoding in a relay system operated with physical-layer network coding,” IEEE J. Sel. Areas Commun., vol. 27, no. 5, pp. 788–796, 2009. L. Lu and S. C. Liew, “Asynchronous physical-layer network coding,” IEEE Trans. Wireless Commun., vol. 11, no. 2, pp. 819–831, 2012. IEEE-SA Standards Board, “Wireless LAN medium access control (MAC) and physical layer (PHY) specifications,” IEEE Std 802.11 part 11, 2003. L. Lu, T. Wang, S. C. Liew, and S. Zhang, “Implementation of physicallayer network coding,” Physical Communication, 2012. L. Lu, L. You, Q. Yang, T. Wang, M. Zhang, S. Zhang, and S. C. Liew, “Real-time implementation of physical-layer network coding,” in Proceedings of the 2nd Workshop on Software Radio Implementation Forum. ACM, 2013, pp. 71–76. F. Rossetto and M. Zorzi, “On the design of practical asynchronous physical layer network coding,” in IEEE 10th Workshop on SPAWC ’09. IEEE, 2009, pp. 469–473. S. C. Liew, S. Zhang, and L. Lu, “Physical-layer network coding: Tutorial, survey, and beyond,” Physical Communication, vol. 6, pp. 4–42, 2013. A. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,” IEEE Trans. Inf. Theory, vol. 13, no. 2, pp. 260–269, 1967. L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Trans. Inf. Theory, vol. 20, no. 2, pp. 284–287, 1974. D. To and J. Choi, “Convolutional codes in two-way relay networks with physical-layer network coding,” IEEE Trans. Wireless Commun., vol. 9, no. 9, pp. 2724–2729, 2010. Q. Yang and S. C. Liew, “Optimal decoding of convolutional-coded physical-layer network coding,” accepted by WCNC 2014, 2014. D. Wang, S. Fu, and K. Lu, “Channel coding design to support asynchronous physical layer network coding,” in Proceedings of Globecom 2009. IEEE, 2009. X. Wu, C. Zhao, and X. You, “Joint ldpc and physical-layer network coding for asynchronous bi-directional relaying,” IEEE J. Sel. Areas Commun., vol. 31, no. 8, pp. 1446–1454, 2013. S. Vanka, S. Srinivasa, Z. Gong, P. Vizi, K. Stamatiou, and M. Haenggi, “Superposition coding strategies: Design and experimental evaluation,” IEEE Trans. Wireless Commun., vol. 11, no. 7, pp. 2628–2639, 2012. H. Jiang and P. A. Wilford, “A hierarchical modulation for upgrading digital broadcast systems,” IEEE Trans. Broadcast, vol. 51, no. 2, pp. 223–229, 2005. H. Ma and J. Wolf, “On tail biting convolutional codes,” IEEE Trans. Commun., vol. 34, no. 2, pp. 104–111, 1986. V. Namboodiri, K. Venugopal, and B. Rajan, “Physical layer network coding for two-way relaying with qam,” IEEE Trans. Wireless Commun., vol. 12, no. 10, pp. 5074–5086, October 2013. H. J. Yang, Y. Choi, and J. Chun, “Modified high-order PAMs for binary coded physical-layer network coding,” IEEE Commun. Lett., vol. 14, no. 8, pp. 689–691, 2010. T. Koike-Akino, P. Popovski, and V. Tarokh, “Optimized constellations for two-way wireless relaying with physical network coding,” IEEE J. Sel. Areas Commun., vol. 27, no. 5, pp. 773–787, 2009. J. Pearl, Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, 1988. [27] [28] [29] [30] [31] [32] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 498–519, 2001. S. Zhang, S.-C. Liew, and P. P. Lam, “On the synchronization of physicallayer network coding,” in Information Theory Workshop 2006. IEEE, 2006, pp. 404–408. L. Lu, S. C. Liew, and S. Zhang, “Optimal decoding algorithm for asynchronous physical-layer network coding,” in Proceedings of ICC 2011. IEEE, 2011. A. J. Viterbi, “Convolutional codes and their performance in communication systems,” IEEE Trans. Commun. Tech., vol. 19, no. 5, pp. 751–772, 1971. B. Sklar, Digital communications. Prentice Hall NJ, 2001, vol. 2. M. L. Cedervall and R. Johannesson, “A fast algorithm for computing distance spectrum of convolutional codes,” IEEE Trans. Inf. Theory, vol. 35, no. 6, pp. 1146–1159, 1989. M. Esmaeili, T. A. Gulliver, N. P. Secord, and S. A. Mahmoud, “A link between quasi-cyclic codes and convolutional codes,” IEEE Trans. Inf. Theory, vol. 44, no. 1, pp. 431–435, 1998. G. Solomon and H. Tilborg, “A connection between block and convolutional codes,” Journal on Applied Mathematics, vol. 37, no. 2, pp. 358–369, 1979. P.-C. Wang, Y.-C. Huang, and K. R. Narayanan, “Asynchronous computeand-forward/integer-forcing with qusai-cyclic codes,” arXiv preprint arXiv:1312.4003, 2013. Qing Yang received his B.Eng degree in electronics and information engineering from the Huazhong University of Science and Technology, Wuhan, China, in 2010. Since then he has been Ph.D. student at the Department of Information Engineering, The Chinese University of Hong Kong. His research interests include physical-layer network coding, multiuser MIMO and software-defined radio. Soung Chang Liew received his S.B., S.M., E.E., and Ph.D. degrees from the Massachusetts Institute of Technology. From 1984 to 1988, he was at the MIT Laboratory for Information and Decision Systems, where he investigated Fiber-Optic Communications Networks. From March 1988 to July 1993, he was at Bellcore (now Telcordia), New Jersey, where he engaged in Broadband Network Research. He has been a Professor at the Department of Information Engineering, The Chinese University of Hong Kong (CUHK), since 1993. Prof. Liew is currently the Division Head of the Department of Information Engineering and a Co-Director of the Institute of Network Coding at CUHK. He is also an Adjunct Professor of Peking University and Southeast University, China. Prof. Liews research interests include wireless networks, Internet protocols, multimedia communications, and packet switch design. Prof. Liews research group won the best paper awards in IEEE MASS 2004 and IEEE WLN 2004. Separately, TCP Veno, a version of TCP to improve its performance over wireless networks proposed by Prof. Liews research group, has been incorporated into a recent release of Linux OS. In addition, Prof. Liew initiated and built the first inter-university ATM network testbed in Hong Kong in 1993. More recently, Prof. Liews research group pioneers the concept of Physicallayer Network Coding (PNC). Publications of Prof. Liew can be found in www.ie.cuhk.edu.hk/soung.