UNIQUE DECIPHERABILITY AND SYNCHRONIZABILITY OF CODES

advertisement
UNIQUE DECIPHERABILITY AND SYNCHRONIZABILITY OF CODES
SOME DEFINITIONS
SOURCE ALPHABET: A finite set of symbols. The elements are called source symbols. Ex: S = {A, B, C, D}.
CODE ALPHABET: A finite set of symbols The elements are called code symbols. Ex: L = {0, 1, 2}.
CODE WORD: Concatenation of a finite number of code symbols. Ex: 0110, 102, 3.
CODE: Finite number of code words, each representing a source symbol.
Examples:
Code A
.
S
00
C
01
R
10
F
11
Code B
S
C
R
F
.
0
110
10
1110
Code C
A
B
C
D
.
0
11
00
01
Code A is a block code. Code B and code C are variable-length codes.
Code A: 2 binary digits per source symbol. If the probabilities of the occurrences of source symbols are equal, then this
block code is OK. But, suppose we have the probabilities p(S) = 1/2, p(C) = p(F) = 1/8, and p(R) = 1/4. In this case
code B is better than code A, since the average length per source symbol for code B is:
iL(si) p(si) = 1 p(S) + 3 p(C) + 2 p(R) + 4 p(F) = 1/2 + 3/8 + 2/4 + 4/8 = 15/8. Compare this with 2 (code A).
CODED MESSAGE: String obtained by concatenating code words, without spacing or any other punctuation.
Examples:
a) 1101011011001110 is a coded message in code B, corresponding to the source string CRCCSF.
b) 0000 is a coded message in code C, corresponding to AAAA, AAC, ACA, CAA, or CC.
c) 0000 is a coded message in code B, corresponding to SSSS.
d) 0000 is a coded message in code A, corresponding to SS.
NONSINGULAR CODE: If all code words are distinct the code is called nonsingular. [Some authors use the term
“code” only for nonsingular codes]. If there are at least one duplication of code words, the code is called singular.
THE nth EXTENSION OF A CODE: If the source alphabet of a code has q symbols, S = {s1, s2, ..., sq}, the n th
extension of the code has qn source symbols, 1, 2, ..., qn. Each of the i corresponds to some sequence of n of
the sj.
Example. Second extension of code C:
AA: 00, AB: 011, AC: 000, AD: 001, BA: 110, BB: 1111, BC: 1100, BD: 1101, CA: 000, CB: 0011, CD: 0001,
DA: 010, DB: 0111, DC: 0100, DD: 0101.
UNIQUE DECODABILITY (UNIQUE DECIPHERABILITY) (UD): A code is said to be uniquely decodable (UD) if
and only if the nth extension of the code is nonsingular for every finite n.
Examples: a) Code C is clearly not UD, since its second extension is singular (AC: 000 and CA: 000).
b) 2nd and 3rd extensions of code B are nonsingular, but this is not sufficient for code B to be UD.
This shows that we need a method for testing a code for unique decodability.
The above definition of UD assures that any two sequences of source symbols lead to distinct sequences of code
symbols. For this we have the following theorem [notation: the sequence of source symbols Si correspond to the
sequence of code symbols Xi; L denotes the length; “Si  Xi” is used for “Si corresponds to Xi”]:
THEOREM: If the code is UD, then any two distinct sequences of source symbols lead to (i.e. correspond to) distinct
sequences of code words.
Proof
a) L(S1) = L(S2) = k. Consider the kth extension, which should be nonsingular by hypothesis.
Then clearly [S1 S2]  [X1  X2].
b) L(S1)  L(S2). Assume the contrary, i.e. assume that S1  X and also S2  X.
Then S1S2 XX, and S2S1  XX. Hence, if L(S1S2) = L(S2S1) = m,
[S1S2  XX] AND [S2S1  XX] AND [L(S1S2) = L(S2S1) = m]  mth extension should be nonsingular.
Contradiction.
1
Example. Consider code L = [A: 0, B: 01, C: 1010]. We note that both AC and BBA correspond to 01010, hence the
code is clearly non-UD. Second, third, fourth extensions are nonsingular. For example the 2nd extension is:
[AA: 00, AB: 001, AC: 01010, BA: 010, BB: 0101, BC: 011010, CA: 10100, CB: 101001, CC: 10101010].
But, L(AC) + L(BBA) = 5 and the 5th extension is bound to be singular: ACBBA 0101001010 and BBAAC 
0101001010.
Hence AC  01010 and BBA  01010. The code is not UD.
With the above theorem, we see that our definition of UD (“nth extension is nonsingular for any finite n”) and the
definition in Kohavi (“every coded message can be decomposed into a sequence of code words in only one way”) are
essentially same. So, if a code is UD, then any (whole) coded message can be deciphered in a unique way.
UNIQUE DECODABILITY WITH FINITE DELAY (UDF-): A code is said to be uniquely decodable with delay ,
iff  is the least integer, so that the knowledge of the first  symbols of the coded message suffices to determine its first
code word.
If a code is UDF, then we can start the deciphering without the need for waiting the end of the message. Hence we can
decode “as we receive the message”, of course with some delay,  being a measure for decoding delay.
Example. Code K = [A: 1, B: 10, C: 001]. This code is known to be UDF-4. Hence there exists at least one message of
length 3 which does not allow us to uniquely identify the first code word. Consider the coded message 100:
B_ C-__
1 0 0_
First code word: 10 (B) or 1 (A)?
A
CBut four symbols suffices: If the fourth symbol is 0 then first code word is 10 (B), and if the fourth symbol is 1 then
first code word is 1 (A):
B
1
0
C-__
0
0
1
A
0
0
C
1
UDF-4 For all possible coded messages of length four, the first code word can be determined uniquely.
Testing a code for UD and for UDF is analogous to IL and ILF tests.
EVEN’S TEST FOR UD AND UDF (Explained on code L = [A: 0, B: 01, C:1010] and K = [A: 1, B: 10, C: 001])
1. Given code , insert a separation symbol S at the beginning and at the end of each code word in . In addition, in
every code word representing the source symbol N, insert the symbol Ni between its ith symbol and its (i+1)st
symbol.
2. Each code symbol xk is now situated between two separation symbols. We say that the separation symbol to the right
of the code symbol xk is the xk-successor, denoted Ri, of the left separation symbol.
A:
B:
C:
S
S
S
0
0
1
S
B1
C1
1
0
S
C2
1
C3
0
S
For example, S is 0 successor of S, S is zero successor of C3, S is 1 successor of B1, C3 is 1-successor of C2.
3. Two successors Ri and Rj are compatible if
a) S xk Ri and S xk Rj occur, or
b) Rp xk Ri and Rq xk Rj occur, and Rp and Rq are compatible. In such a case, (Ri Rj) is said to be a
compatible pair implied by (Rp Rq) under xk [compatible pairs are unordered pairs].
4. Testing table (for UD)
a) Column headings are the symbols of the code alphabet.
b) The firs row heading is S. Other row headings are compatible pairs.
c) The entries in row (Rp Rq), column xk, are the compatibles implied by (Rp Rq) under xk.
Testing table for code L:
0
1
------------------------------S
(S B1)
S B1
(C1 S)
C1 S
(C2 S)(C2 B1)
C2 S
(C3 C1)
C2 B1 (C3 S)
C3 C1 S C2
C3 S
(S B1)(S S)
-------------------------------
2
5. A necessary and sufficient condition for a code to be UD is that the pair (S S) is not generated in the testing table.
Since (S S) is generated in the testing table for code L, it is not UD. We don’t proceed further.
6. If the code is not UD, trace back from (S S) to S in order to find an ambiguous message:
S (S B1)  (C1 S) (C2 B1) (C3 S)  (S S)
B
0
A
B
1
0
A
0
1
01010 
BBA
or
AC
C
7. If no (S S) appears in the testing table, the code is UD. Then, proceed to construct the testing graph. In the graph
there is one vertex for S, and for each compatible pair. If the graph is loop-free, then the code is UDF-, and  = L + 1,
where L is the length of the longest path.
Example. Code K = [A: 1, B: 10, C: 001]
A
B
C
1
10
001
S
S
S
1
1
0
S
B1
C1
0
0
S
C2
1
S
(S B1)
(C1 S)
(C2 C1)
S
0
(C1 S)
(C2 C1)
-
1_____
(S B1)
-_____
Testing graph for code K:
S SB1) (C1 S) (C2 C1)
No loops. UDF. L = 3,
=L+1=4
DECIPHERING A CODED MESSAGE
Example. Code Q = [A: 11, B: 011, C: 001, D: 01, E: 00].
0
(B1 C1)(B1 D1)(B1 E1)
(C1 D1)(C1 E1)(D1 E1)
B1 C1
B1 D1
B1 E1
C1 D1
C1 E1
(S C2)
D1 E1
S B2
S C2
S A1
-
1
.
S
(S B2)
(S A1)
(S A1)
(S A1) .
No (S S) in the table: code Q is UD. Construct the graph. The graph contains a loop: (S A1) 1 (S A1)
Hence Code Q is not UDF.
In order to find an arbitrarily long coded message which does not allow to identify the first code word, consider the
paths starting from S, and reaching to this loop. There are two such paths:
S 0(C1 E1) 0(S C2) 1(S A1)
and
S 1(B1 D1) 1(S B2) 1(S A1).
Hence, the two cases for which arbitrarily long coded messages does not allow to identify the first code word are:
A- _
A1
0
0
1
1
1
1
1 . . .
S
E1 S
A1 S
A1 S _
E
A
A
A
S
C
C1
S
_
A
A1
B2
S
A
A1
S
No matter how long you keep receiving 1111 ...,
the ambiguity about the first code word being C or E
will never be resolved.
A- _
A1
0
1
1
1
1
1
1 . . .
S
D1 S
A1 S
A1 S _
D
A
A
AS
C
B1
_
C2
S
But, since the code is UD, we must be able to decipher uniquely any message, if the end of the message is known.
For example, if we know that the whole message is 0011111, it corresponds to CAA (i.e. the first code word is 001=C).
On the other hand if the whole message is 001111, this corresponds to EAA (i.e. the first code word is 00=E).
3
In general, given any coded (whole) message in this code, we can decipher it working in forward and in backward
directions, analogous to finding the input sequence of an IL machine. This is shown in the following example.
Suppose we want to decode the message 0011101100011010011 which is encoded using code Q. We scan the message
in forward direction (left to right) and insert a lower comma whenever a sequence which corresponds to a legitimate
code word is detected. For example the first lower comma from the left follows the initial 00, since 00 is a word in the
code. Next a comma follows the 1 since the sequence 001 is also a word in the code, and so on. Although the tenth and
eleventh symbols are 0’s, no lower comma is inserted between the eleventh and the twelfth symbols, because there is no
comma between the ninth and the tenth symbol, and a new word cannot start unless a comma indicates the end of the
preceding word. Next, we scan the coded message in backward direction (right to left) and insert an upper comma
whenever a sequence which corresponds to the inverse of a legitimate code word is scanned. The inverses of the code
words in our example are [11, 110, 100, 10, 00]. If the code is UD, the message can be decoded by retaining only those
commas that occur in the upper and the lower spaces simultaneously (shown .
,
,
,
,
,
,
,
,
,
,
,
,

0
0 , 1 , 1 , 1 , 0
1 , 1 , 0
0 , 0
1 , 1 , 0
1 , 0
0 , 1 , 1


C
A
B
E
B
D
E
A__
Although the above procedure will in general require keeping track of a number of sequences and the locations of the
various commas, it is in principle a simple procedure that can be carried out by a finite-state machine (simple
algorithm).
SYNCHRONIZABILITY OF CODES
SYNCHRONIZABILITY: A code is said to be synchronizable of order if  is the least integer, so that the
knowledge of any  consecutive code symbols is sufficient to determine a separation of code words within these
symbols. We shall restrict our attention to synchronizable codes which are UDF.
TEST FOR SYNCHRONIZABILITY
The problem of testing a code for synchronizability is analogous to the problem of testing a machine for finite output
memory (FOM). In fact, since in both cases the objective is to specify the sequence at some point, we can use the same
testing proceedure.
Testing table for synchronizability:
Upper table: The row headings consist of all the separation symbols. The entries in row Ri, column xk are the
xk-successors of Ri.
Lower table: The row headings are all pairs of separation symbols. The entries in row (Ri Rj), column xk, are the pairs
implied by (Ri Rj) under xk.
Testing graph for synchronizability: There is a vertex for each row in the lower table. A directed arc labeled xk leads
from vertex (Ri Rj) to vertex (Rp Rq), where p  q, if and only if (Rp Rq) is the xk-successor of (Ri Rj).
THEOREM. A code is synchronizable of order  if and only if it is UD and its testing graph for synchronizability is
loop-free, the order  being  = L+1.
Example. Code K = [A: 1, B: 10, C: 001] This code is UD and UDF-4 as shown above.
A: 1
B: 10
C: 001
Graph:
(B1 C2)
S
S
S
1
1
0
S
B1
C1
0
0
S
C2
1
S
0
1
0
0
(B1 C1)(S C2)(S B1)(S C1)C1 C2)
The graph is loop-free. L = 4, hence  = L + 1 = 5.
S
B1
C1
C2
S B1
S C1
S C2
B1 C1
B1 C2
C1 C2
0
C1
S
C2
(S C1)
(C1 C2)
(S C2)
-
1_____
(S B1)
S_____
(S B1)
-_____
Four symbols is not enough (for at least one case):
 If the fifth symbol is 0, these are the separation points.
B
C-_ _
B1
S
B1
S
C1
0
1
0
0
C1
C2
S
C1
C2__
- C
C 
If the fifth symbol is 1, this is the separation point.
-B
4
Example. Code P = [A: 0, B: 11, C: 101]. This code is UDF-2.
Testing table for UD
A
B
C
S 0 S
S 1 B1 1 S
S 1 C1 0 C2 1 S
Testing table for synchronizability
S
B1
C1
C2
SB1
SC1
SC2
B1C1
B1C2
C1C2
S
B1C1
0
-
Testing graph for UDF
1_____
(B1C1)
-_____
1

S
L=1  =2 UDF-2
Testing graph for synchronizability
 1
0
1
1
S
C2
(SC2)
-
B1C1
S
S
_
(SB1)(SC1)
(SB1)(SC1)
SS
-___________
(SB1)

(SC1)
110
SC2
There are three loops. Code P is not synchronizable.
Some of the arbitrarily long messages which does not allow to
identify a separation point are given below.
-B
1
B
...
Corresponding to the loop SB1SB1:
B
1
1
B
C
S
Loop SB1–SC1–SC2–SB1:
B1C1
...
C1
1
B1
-B
C2
0
S
1
S
A
S
C1
1
B1
B
S
B___
1
1
B-
...
C_____
C2
S
0
1
...
S
B1
A
B-
------------------------------------------------------------------------------
5
Download