Introduction_of_Codings - Team

advertisement
Binary coding
Binary coding is when each string of digits has one of two possible digits that is either “1” or “0”.
Eight digits of “1” and “0” can be arranged differently to correspond to a different symbol. With
eight digits there would be 256 symbols depending on how the digits are arranged. Eight digits for
example 01110110 can be used to represent the lowercase letter b.
In computing and telecommunication, binary code is used for any of a variety of methods of
programming data, such as a series of bytes. Binary coding can be fixed width or variable width.
In a fixed-width binary code, each letter, digit, or other character has a string of binary numbers of
the same length. Variable width binary coding has different characters which have different widths.
One way in which binary coding is implemented in practice is the way in which lasers are used to
send binary strings.
Using a laser to send decimal number 5 in binary string is 101. ‘1’ represents the laser being on and
‘0’ represents the laser being off. So the laser is switched on then off then on. Each number is
transformed into binary and this is sent down a optic fibre with a laser. It is then converted back into
the number. The speed at which modern day computers can switch a laser beam on and off is
extremely fast. This increases efficiency as lots of numbers sent very quickly.
Suppose 𝑙𝑖 is the binary length of character 𝑖. Then the average length < 𝑙 > = ∑π‘˜π‘–=𝑙 𝑃𝑖 𝑙𝑖 .
For example,
1
A
0
B
10
C
11
D
110
1
1
2
1
𝑃𝐡 =
4
1
𝑃𝐢 =
8
1
𝑃𝐷 =
8
𝑃𝐴 =
1
1
The average length < 𝑙 > = 1 ∗ 2 + 2 ∗ 4 + 2 ∗ 8 + 3 ∗ 8 = 1.625
Before coding: the length of string of characters = 𝐿
After coding: the length after binary coding = 𝐿𝑏
𝑛
𝐿𝑏 = ∑π‘˜π‘–=1 𝑛𝑖 𝑙𝑖 = 𝐿 ∑π‘˜π‘–=1 𝐿𝑖 𝑙𝑖
= 𝐿 ∑π‘˜π‘–=1 𝑃𝑖 𝑙𝑖 = 𝐿 < 𝑙 >
∴ 𝐿𝑏 = 𝐿 < 𝑙 >
Before coding: the entropy before coding = S
After coding: the entropy after coding is S𝑏
𝐼 = S. 𝐿 = S𝑏 . 𝐿𝑏 = S𝑏 . 𝐿. < 𝑙 >
∴ S = S𝑏 <l>
S ≤ π‘™π‘œπ‘”2 π‘˜ οƒ  S𝑏 ≤ 1 ∴ S ≤ < 𝑙 >
The length of the coding cannot be less than the entropy according to Shannon’s theorem i.e. < 𝑙 > ≥
S. The above formulas are all derived from previously mentioned formulae in the introduction.
It is vital that the coding is uniquely decode-able i.e. invertible. One example is Instantaneous
coding.
For example, we have:
𝑆1 → 0
𝑆2 → 01
𝑆3 → 11
The recipient receives the message 0111111..... When decoding this message, it could be either
(i) 0.11.11.11..... or (ii) 01.11.11.....
If one was to take even # 1 as in case (i) above, then we would get 𝑆1 . 𝑆3 . 𝑆3 . 𝑆3 ..... and if one was
to take odd # 1 as in case (ii) above, then we would get 𝑆2 . 𝑆3 . 𝑆3 . 𝑆3 ..... Depending on the method
of decoding chosen, 2 different messages are received. Therefore, one cannot begin to decode
until one has the complete sequence.
Prefix coding
Prefix code is the idea that no code word is the beginning of another code word.
Examples are fixed width codes. Binary code is an example of a fixed width prefix code. These fixed
codes are the easiest and most obvious but not an important consideration. If you are looking for
speed and efficiency then you would look into variable width prefix codes.
π‘Šπ‘– is not the beginning of π‘Šπ‘— for any 𝑖, 𝑗 i.e. π‘Šπ‘– ≠ π‘Šπ‘— π‘Š where π‘Š is some other string. However, π‘Šπ‘–
is a prefix of π‘Šπ‘— if π‘Šπ‘— = π‘Šπ‘– π‘Š.
For example, π‘Š1 = 0, π‘Š2 = 01, π‘Žπ‘›π‘‘ π‘Š3 = 11, then π‘Š1 is a prefix of π‘Š2 .
If π‘Š1 = 0, π‘Š2 = 10, π‘Žπ‘›π‘‘ π‘Š3 = 11, we can see that this is not a prefix coding.
Obvious
Prefix coding
Instantaneous
Not so obvious
Prefix (binary) codes can be constructed using a tree diagram as below.
000
001 010
00
011 100
01
101 110
10
0
111
“length 3”
“length 2”
11
“length 1”
1
“length 0”
∅
The table below summarizes the coding picked without any evaluation and the optimal coding which
has been carefully picked. The normal coding was picked randomly. This was done by starting at the
bottom of the tree diagram. 0 has been chosen to represent 𝑆1, this means that the whole branch of 0
cannot be used any further. Going along the branch 1, 10 has been chosen to represent 𝑆2 , thus this
branch can also no longer be used. This method has been followed and thus 110 chosen to represent
𝑆3 and 111 chosen to reporesent 𝑆4 . This means that there is nothing available to choose from for 𝑆5
and therefore this coding is inefficient. The branches used can be seen by the bold lines in the tree
diagram above.
000
001 010
00
011 100
01
101 110
10
0
11
1
∅
111
“length 3”
“length 2”
“length 1”
“length 0”
The optimal code is obtained by selecting carefully from the tree diagram. Continue the same
procedure from the other coding to obtain the optimal code. This time starting from the row of
“length 2”, choose 00 to represent 𝑆1, 01 to represent 𝑆2 , 10 to represent 𝑆3 , 110 to represent 𝑆4 , and
111 to represent 𝑆5 . Since a code has been derived for all 5 letters, this is assumed to be the optimal
code that works. The branches used can be seen by the dashed lines in the tree diagram above.
Letter
Coding
𝑆1
𝑆2
𝑆3
𝑆4
𝑆5
0
10
110
111
Length
1
2
3
3
Optimal coding
00
01
10
110
111
Length
2
2
2
3
3
Country calling codes
When making a telephone call to another country there is an international dialling prefix. For
example “00” or “011” is dialled before the country calling code. So if somebody wanted to make a
call to a mobile phone in the UK from abroad, they would dial 00447XX XXX XXXX (followed by
9 digits) i.e. 00 being the international code, 44 being the country code, and 7 being the second digit
of any mobile number in the UK.
Huffman coding
The Huffman coding method is named after D.A Huffman. He developed the coding in the 1950s.
Huffman coding is a compression that transforms characters into variable length bit strings. The
characters that occur most frequently have the shortest bit string. The characters that do not occur so
frequently have longer bit strings. This code will be explained in detail later on in the project.
Huffman coding is used to obtain prefix codes. Prefix codes are known as Huffman codes even if it
was not formed from a Huffman code. http://en.wikipedia.org/wiki/Prefix_code
Each symbol has a variable length depending on the number of time the symbol occurs.
• Each codeword has a certain number of bits.
• A prefix code
• A variable-length code
• Codewords are created by expanding a Huffman Tree
Huffman coding is the most illustrative for our project, even though there are many other coding
methods.
Arithmetic coding
Arithmetic coding is an entropy coding. This sort of coding was invented by Elias, Rissanen and
Pasco and was subsequently made practical by Witten et al. The coding does not use a discrete
number of bits to compress. However it is slow speed. Each symbol is assigned an interval. The
coding can provide compression that is close to optimal. The coding is optimal for long coding.
So far we have only been concerned with coding, not worried about if words or symbols can come
often. In the main body of the project will pay more attention to the fact that some symbols come up
more often and this will motivate the concept of entropy which will be explained in detail.
Download