Coding for the Correction of Synchronization Errors ASJ Helberg CUHK Oct 2010 Content • Background – Synchronization errors and their effects • Previous approaches – Resynchronization – Concatenation – Error correction • Algebraic insertion/deletion correction – Single error correcting – Multiple error correcting • Problems and applications Synchronization errors • Due to timing or other noise and inaccuracies, • Manifests as the insertion or deletion of symbols • Examples: – PPM (Pulse position modulation) in optic fibres, – Terabit per square inch magnetic recording – Optic disc recording due to ISI and ITI – Multipath effects in radio Types of Synchronization errors • Insertion or deletion, excluding additive errors – Additive errors are special case of deletion and insertion in same position of bits of opposite value – Repetition/duplication error: copies bit – Bit/peak shift: 01 becomes 10 – Bit/peak shift of size a: 0a1 becomes 10a Effects • A single synchronization error causes a catastrophic burst of additive errors Tx: Rx: 0001010011011110 0001100001110 • Boundaries of data blocks are unknown to receiver e.g. 001100 becomes 0000 Synchronizable codes • Comma-free codes • Prefix synchronised codes • Bounded synchronisation delay • Synchronisation with timing • Marker codes • Corrupted blocks are discarded!! Sync error correcting codes • Binary Algebraic block codes • Nonbinary and perfect codes • Bursts of sync errors • Weak synchronization errors • Convolutional codes • Expurgated codes (Reed Muller/ LDPC) Binary Algebraic block codes • Varshamov Tenengoltz construction: • One asymmetric error Levenshtein codes With 2n > m >= n + 1, s=1 correcting code With m >= 2n, s=1 and t=1 correcting code Partition a=0 was proven to have the maximum cardinality Largest common subword obtained from two valid codewords is Example i =4 3 2 1 ix ixi a mod(n+1); 0 1 2 3 4 4 5 6 4 5 6 7 7 8 9 10 0 1 2 3 4 4 0 1 4 0 1 2 2 3 4 0 i x =0 0 0 0 x =0 0 0 1 x =0 0 1 0 x =0 0 1 1 x =0 1 0 0 x =0 1 0 1 x =0 1 1 0 x =0 1 1 1 x =1 0 0 0 x =1 0 0 1 x =1 0 1 0 x =1 0 1 1 x =1 1 0 0 x =1 1 0 1 x =1 1 1 0 x =1 1 1 1 Hamming distance properties of Levenshtein codes • Proposition 1 : A Levenshtein code C has only one code word of either weight w = 0 or weight w = 1. • Proposition 2 : In a Levenshtein code there is a minimum Hamming distance, dmin 2 between any two code words. • Proposition 3 : Code words in a Levenshtein code have a dmin 4 if they have the same weight. • Proposition 4 : Levenshtein code words that differ in one unit of weight have dmin 3. Weight distance diagram 0 dmin = 2 dmin = 4 2 dmin = 3 dmin = 4 3 dmin = 4 n-3 dmin = 3 dmin = 4 n-2 dmin = 2 n Generalised structure Proposition 5 • Code words of weights w = 0, 1, 2, ..., s do not occur together in an s - correcting code. Proposition 6 • The minimum Hamming distance of an s - insertion/deletion correcting code is dmin s + 1. • Again, the proof of propositions 5 and 6, is straight forward when considering the resulting subwords after s deletions. Proposition 7 • Any two number theoretic s - insertion/deletion correcting code words which differ in weight by i, 0 i s, have a Hamming distance of d 2(s + 1) - i. dmin = (w2 - x) + (w1 - x) = w2 + w1 - 2x = w2 + w2 - w - 2x = 2(w2 - x) - w From Proposition 6, dmin s + 1 corresponding to number of “1’s” by which w2 differ from w1 i.e. (w2 - x) thus d 2(s + 1) - i Weight-distance diagram 0 dmin = s+1 dmin = 2(s+1) s+1 dmin = 2s+1 dmin = 2(s+1) s+2 dmin = 2(s+1) - i dmin = 2(s+1) ns+2 dmin = 2s+1 dmin = 2(s+1) ns+1 dmin = s+1 n Bounds for the general algebraic construction • Lower bound on s correction capability • Upper Bound on correction capability Upper bound on Cardinality • Hamming type upper bound General algebraic construction Modified Fibonacci • • • • • • • • S=1: 1, 2, 3, 4, 5, 6, 7, … S=2 1, 2, 4, 7, 12, 20, 33, … S=3: 1, 2, 4, 8, 14, 23, 38, … S=4: 1, 2, 4, 8, 16, 31, 60, … • Partitioning 2n into , thus in limit, cardinality bounded by • 2n / m with (non-empty partitions) Example v =7 4 2 1 vx vx a mod(m); m=12 x =0 0 0 0 x =0 0 0 1 x =0 0 1 0 x =0 0 1 1 x =0 1 0 0 x =0 1 0 1 x =0 1 1 0 x =0 1 1 1 x =1 0 0 0 x =1 0 0 1 x =1 0 1 0 x =1 0 1 1 x =1 1 0 0 x =1 1 0 1 x =1 1 1 0 x =1 1 1 1 0 7 4 11 2 9 6 13 1 8 5 12 3 10 7 14 0 7 4 11 2 9 6 1 1 8 5 0 3 10 7 2 Cardinalities Word length n 4 5 6 7 8 9 10 11 12 13 14 s=2 s=3 s=4 s=5 2 2 3 4 5 6 8 9 11 15 18 2 2 2 2 3 4 4 5 6 8 8 2 2 2 2 2 3 4 4 4 5 2 2 2 2 2 2 3 4 4 Problems • Very low cardinalities • Does not scale well • No decoding algorithm • Codeword boundaries assumed • Validity not proven in general Cardinality bounds • Levenshtein Lower bounds on the capacity of the binary deletion channel A Kirsch and E Drinea, “Directly lower bounding the information capacity for channels with i.i.d. deletions and duplications, IEEE Transactions on Information Theory, vol. 56, no. 1, January 2010, pp 86-102 Lower bounds on the capacity of the binary deletion channel Connection with network coding? • Synchronization in NC environments is assumed • Especially on physical layer NC • “Pruned/punctured” codes may be useful ? • Superimposed codes that are also sync error correcting?