Chapter 17: Binary Codes

advertisement
MAT 105 Spring 2008

A binary code is a system for encoding data
made up of 0’s and 1’s

Examples
 Postnet (tall = 1, short = 0)
 UPC (dark = 1, light = 0)
 Morse code (dash = 1, dot = 0)
 Braille (raised bump = 1, flat surface = 0)
 Movie ratings (thumbs up = 1, thumbs down = 0)

CD, MP3, and DVD players, digital TV, cell
phones, the Internet, space probes, etc. all
represent data as strings of 0’s and 1’s rather
than digits 0-9 and letters A-Z

Whenever information needs to be digitally
transmitted from one location to another, a
binary code is used

What are some problems that can occur when
data is transmitted from one place to
another?

The two main problems are
 transmission errors: the message sent is not the
same as the message received
 security: someone other than the intended
recipient receives the message

Suppose you were looking at a newspaper ad for a job,
and you see the sentence “must have bive years
experience”

We detect the error since we know that “bive” is not a
word

Can we correct the error?

Why is “five” a more likely correction than “three”?

Why is “five” a more likely correction than “nine”?

Suppose NASA is directing one of the Mars rovers by
telling it which crater to investigate

There are 16 possible signals that NASA could send, and
each signal represents a different command

NASA uses a 4-digit binary code to represent this
information
0000
0100
1000
1100
0001
0010
0101
0110
1001
1010
1101
1110
0011
0111
1011
1111

The problem with this method is that if there
is a single digit error, there is no way that the
rover could detect or correct the error

If the message sent was “0100” but the rover
receives “1100”, the rover will never know a
mistake has occurred

This kind of error – called “noise” – occurs all
the time

One way to try to avoid these errors is to send
the same message twice

This would allow the rover to detect the error,
but not correct it (since it has no way of knowing
if the error occurs in the first copy of the
message or the second)

There is a better way to allow the rover to detect
and correct these errors, and only requires 3
additional digits

The original message is four digits long

We will call these digits I, II, III, and IV

We will add three new digits, V, VI, and VII

Draw three intersecting circles
as shown here
V

Digits V, VI, and VII should be
chosen so that each circle
contains an even number of
ones
I
II
III
VII
VI
IV

The message we want to send is “0100”

Digit V should be 1 so that the first circle has two
ones

Digit VI should be 0 so that the second circle has
zero ones (zero is even!)
1


Digit VII should be 1 so that
the last circle has two ones
Our message is now 0100101
0
1
0
1
0
0

Now watch what happens when there is a single digit error

We transmit the message 0100101 and the rover receives
0101101

The rover can tell that the second and third circles have
odd numbers of ones, but the first circle is correct

So the error must be in the digit that is
in the second and third circles, but not
the first: that’s digit IV

Since we know digit IV is wrong, there is
only one way to fix it: change it from 1 to 0
1
0
1
0
1
0
1

Encode the message 1110 using this method

You have received the message 0011101.
Find and correct the error in this message.

This method only allows us to encode 16
possible messages, which isn’t even enough
to represent the alphabet!

However, if we use more digits, we won’t be
able to use the circle method to detect and
correct errors

We’ll have to come up with a different
method that allows for more digits

The circle method is a specific example of a
“parity check sum”

The “parity” of a number is 1 is the number is
odd and 0 if the number is even

For example, digit V is 0 if I + II + III is even,
and 1 if I + II + III is odd

Instead of using Roman numerals, we’ll use a1
to represent the first digit of the message, a2
to represent the second digit, and so on

We’ll use c1 to represent the first check digit,
c2 to represent the second, etc.

Using this notation, our rules for our check
digits become
 c1 = 0 if a1 + a2 + a3 is even
 c1 = 1 if a1 + a2 + a3 is odd
 c2 = 0 if a1 + a3 + a4 is even
 c2 = 1 if a1 + a3 + a4 is odd
 c3 = 0 if a2 + a3 + a4 is even
 c3 = 1 if a2 + a3 + a4 is odd
c1
a1
a2
a3
c3
c2
a4

If we want to have a system that has enough
code words for the entire alphabet, we need
to have 5 message digits: a1, a2, a3, a4, a5

We will also need more check digits to help us
decode our message: c1, c2, c3, c4

We can’t use the circles to determine the
check digits for our new system, so we use the
parity notation from before
 c1 is the parity of a1 + a2 + a3 + a4
 c2 is the parity of a2 + a3 + a4 + a5
 c3 is the parity of a1 + a2 + a4 + a5
 c4 is the parity of a1 + a2 + a3 + a5

Using 5 digits in our message gives us 32
possible messages, we’ll use the first 26 to
represent letters of the alphabet

On the next slide you’ll see the code itself,
each letter together with the 9 digit code
representing it
Letter
Code
Letter
Code
A
000000000
N
011010101
B
000010111
O
011101100
C
000101110
P
011111011
D
000111001
Q
100001011
E
001001101
R
100011100
F
001011010
S
100100101
G
001100011
T
100110010
H
001110100
U
101000110
I
010001111
V
101010001
J
010011000
W
101101000
K
010100001
X
101111111
L
010110110
Y
110000100
M
011000010
Z
110010011

Now that we have our code, using it is simple

When we receive a message, we simply look it up
on the table

But what happens when the message we receive
isn’t on the list?

Then we know an error has occurred, but how do
we fix it? We can’t use the circle method
anymore

Using this new system, how do we decode
messages?

Simply compare the (incorrect) message with the
list of possible correct messages and pick the
“closest” one

What should “closest” mean?

The distance between the two messages is the
number of digits in which they differ

What is the distance between 1100101 and
1010101?
 The messages differ in the 2nd and 3rd digits, so the
distance is 2

What is the distance between 1110010 and
0001100?
 The messages differ in all but the 7th digit, so the
distance is 6

The nearest neighbor decoding method
decodes a received message as the code word
that agrees with the message in the most
positions

Suppose that, using our alphabet code, we
receive the message 010100011

We can check and see that this message is not
on our list

How far away is it from the messages on our
list?
Code
Distance
Code
Distance
000000000
4
011010101
5
000010111
4
011101100
5
000101110
4
011111011
3
000111001
4
100001011
4
001001101
6
100011100
8
001011010
6
100100101
4
001100011
2
100110010
4
001110100
6
101000110
6
010001111
3
101010001
6
010011000
5
101101000
6
010100001
1
101111111
6
010110110
3
110000100
5
011000010
3
110010011
3

Since 010100001 was closest to the message that
we received, we know that this is the most likely
actual transmission

We can look this corrected message up in our
table and see that the transmitted message was
(probably) “K”

This might still be incorrect, but other errors can
be corrected using context clues or check digits
Download