19910036 - Telecommunications Industry Association

advertisement
Telecommunications Industry Association
TR-30.1/99-10-036
(TIA)
Columbia, MD Oct 13, 1999
COMMITTEE CONTRIBUTION
Technical Committee TR-30 Meetings
SOURCE:
Hughes Network Systems
CONTACT:
Jeff Heath
Hughes Network Systems
10450 Pacific Center Court
San Diego, CA 92121
Phone:
(619) 452-4826
Fax:
(619) 597-8979
E-mail:
jheath@hns.com
TITLE:
Upgrading V.42bis to LZJH
PROJECT:
PN-xxxx
DISTRIBUTION:
Members of TR-30 and TR-30.1 and meeting attendees
ABSTRACT
This paper describes the implementation changes necessary to upgrade a soft modem from V.42bis to a
similiar ITU Recommendation based upon the LZJH data compression algorithm. The paper describes
the changes at a fairly low level and is intended to provide the reader with an appreciation of the relative
ease of such an upgrade.
Copyright Statement
The contributor grants a free, irrevocable license to the Telecommunications Industry Association (TIA) to
incorporate text contained in this contribution and any modifications thereof in the creation of a TIA
standards publication; to copyright in TIA's name any TIA standards publication even though it may
include portions of this contribution; and at TIA's sole discretion to permit others to reproduce in whole or
in part the resulting TIA standards publication.
Intellectual Property Statement
The individual preparing this contribution knows of patents, the use of which may be essential to a
standard resulting in whole or in part from this contribution.
1.
1.
2.
3.
4.
5.
6.
2.
Algorithm Differences
Dictionary structure
Code words vs. single characters
String Extension
Dictionay Entry Reuse
Dictionary Re-initialization
Transparent Mode
Dictionary Structure
A 4096 entry dictionary has 4096 unique code words (0 - 4095) available to represent strings of
characters. Both algorithms reserve code words 0, 1, and 2 for control purposes.
2.1.
V.42bis
The encoder and decoder dictionaries are identical. Code words 3 through 258 are reserved for the 256
combinations of an 8 bit byte and each has a DOWN index from which to create strings.
Each of the remaining 3837 dictionary entries (code words 259 through 4095) has the following:
 DOWN index - index to the dictionary entry having the next character in a string.
 RIGHT index - index to the next dictionary entry having same previous string as this entry.
 UP index - index to the dictionary entry having the previous character in a string.
 CHAR - the character represented by this dictionary entry.
2.2.
LZJH
Unlike V.42bis, the encoder dictionary is different from the decoder dictionary.
2.2.1. Encoder Dictionary
The encoder dictionary is similiar to the V.42bis encoder dictionary but has 3 parts as follows:
Root Dictionary - 256 DOWN indexes for the 256 possible combinations of an 8 bit byte.
Node Dictionary - Each of the 4093 entries (code words 3 through 4095) has the following:
 DOWN index - index to the dictionary entry having the next character(s) in a string.
 RIGHT index - index to the next dictionary entry having same previous string as this entry.
 LOCATION index - index into the Character Dictionary where the first character of the one or more
characters (string extension characters) defined by this entry is located.
 COUNT - number of characters defined by this entry (the first of which is at LOCATION).
Character Dictionary - an array of variable length that contains all input characters received since the
dictionaries were initialized. For a 4096 dictionary this should be about 25,000 bytes. The LOCATION
index in the Node Dictionary points to character(s) within this dictionary.
2.2.2. Decoder Dictionary
Node Dictionary - Each of the 4093 entries (code words 3 through 4095) has the following:
 LOCATION index - index into the Character Dictionary where the last character of the string defined
by this entry is located.
 DEPTH - the total number of characters in the string defined by this entry.
2/16/2016
2
Character Dictionary - an array, the same size of the encoder Character Dictionary, that contains all
input characters received since the dictionaries were initialized. The LOCATION index in the Node
Dictionary points to cahracters within this dictionary.
3.
Code Words vs. Single Characters
V.42bis always sends code words. With a 4096 dictionary the code word (after the reaching the
maximum) size remains at 12 bits until the dictionary is re-initialized. Thus when a single character is
output it requires 12 bits even though there are only 8 significant bits.
LZJH sends code words, single characters, and string extension tokens. A bit is output prior to a code
word or single character indicating which type follows. Thus code words require 13 bits (1 bit plus 12 bit
code word) and single characters 9 bits (1 bit plus 8 bit character). The disadvantage of sending an extra
bit for code words is offset by the advantage of sending 9 bits for single characters. An additional
advantage is, since the decoder can distinguish between code words and single characters, 256 entries of
the Node Dictionary are not reserved for the 256 combinations of an 8 bit character (hence the Root
Dictionary).
4.
String Extension
4.1.
V.42bis
Each time a repeating string of characters is encountered the encoder (and decoder) adds the next input
character to the end of the string (creating a dictionary entry). Thus a 10 character string has to repeat 9
times before the encoder has created an entry (code word) representing the entire string. The first time
the encoder chained the 2nd character to the 1st to create a 2 character string. Each dictionary entry
defines a one character extension to the previous string.
4.2.
LZJH
LZJH does the same as V.42bis the first time a repeating string of characters is encountered (i.e. chain
the 2nd character to the 1st to create a 2 character string). In fact both algorithms always add the next
character to process to a single character which is not part of a string. However, the second time a
repeating string of more than 3 characters is encountered by the LZJH encoder it immediately extends the
string to its maximum length (up to maximum string parameter). Using the LOCATION index in the
dictionary entry for the 2nd character of the string it compares the bytes following the first instance of the
string to the next bytes to process following the second instance of the string (the first 2 characters of
which have already been encoded). If the encoder finds that 1 or more characters in both instances of
the string match, i.e. it is a 3 or more character string, it sends a string extension signal to the encoder to
extend the string by the number of characters that match (subject to the maximum string parameter).
Node Dictionary entries define one or many characters. The LOCATION index into the Character
Dictionary and COUNT define the sequence of string extension characters defined by each entry in the
Node Dictionary.
5.
Dictionary Entry Reuse
V.42bis reuses dictionary entries when the dictionary becomes full. For a 4096 dictionary, 3837 entries of
which are available for string extensions, the dictionary becomes full after about 3900 bytes have been
processed. From that moment until the dictionary is RESET (which happens rarely in most V.42bis
implementations) dictionary entries are reused. Starting at the first available entry (number 259) the
encoder and decoder take the first entry they can find that does not have a DOWN index (i.e. it is the last
character in a chained string) and deletes it from the chain. The entry is then reused on the next input
character to create a string extension.
LZJH does not reuse dictionary entries. When either the Node Dictionary or Character Dictionary
becomes full, the dictionaries are immediately re-initialized. Since LZJH can reference many characters
2/16/2016
3
in a dictionary entry, it can process many more characters than V.42bis before the dictionary fills. Hence
the 25K Character Dictionary for a 4096 dictionary. With 4 to 1 compression, LZJH will process over 20K
characters before the 4092 available entries are all used. The Character Dictionary should be large
enough such that the Node Dictionary fills first so that the compression ratio is not restricted.
6.
Dictionary Initialization (Re-initialization)
6.1.
V.42bis
The V.42bis dictionaries are initialized (with a 4096 dictionary) by:
 put a null index into the DOWN index of the 256 reserved entries and 3837 node entries of the
dictionary. Alternately just set all 4096 DOWN indicies to illegal values.
 put a null index into the UP index of the 3837 node entries of the dictionary.
 set the next dictionary entry to be used variable to its initial value (i.e. 259).
 set the code word size to its initial value (i.e. 9).
6.2.
LZJH
The LZJH dictionaries are initialized (with a 4096 dictionary) by:
 put a null index into the DOWN index of the 256 entries in the encoders Root Dictionary.
 set the next dictionary entry to be used variable to its initial value (i.e. 3).
 set the code word size to its initial value (i.e. 6).
 set the next location index into the Character Dictionary to its initial value (i.e. 0).
 NOTE that neither the encoders or decoders Node Dictionary indicies need to be initialized.
7.
Transparent Mode
7.1.
V.42bis
V.42bis transparent mode has 3 command characters one of which is an ESC character which is modified
each time it appears in the input data. The ESC character precedes the other two command characters
(RESET, ECM) and itself which makes moderate expansion of the data possible while in transparent
mode. The RESET is used to re-iinitialize the dictionaries and the ECM to go back to compression mode.
While in transparent mode the encoder dictionary is filled with data that does not compress and the
decoder must actually encode the transparent data to keep its dictionary current with that of the encoder.
7.2.
LZJH
Although LZJH could have a transparent mode with command characters, including an ESC character, the
proposed implementation is much simpler than that of V.42bis. Every N input characters (N is currently
200) it processes and compresses, LZJH checks to see if compression was successful. If so, it stays in
compression mode and sends the compressed output. If not, it places an ETM into the output (same as
V.42bis) followed by the N transparent characters and re-initializes its dictiionaries. The ETM indicates to
the decoder to go into transparent mode, process exactly N characters transparently, then re-initialize the
dictionary and return to compression mode. The encoder, with dictionary re-initialized, processes N more
input characters sending the compressed output if they compress or an ETM followed by the N
transparent characters if they don’t compress. The decoder, in compression mode, either sees
compressed data or an ETM indicating a switch to transparent mode for N characters.
8.
Examples
The following pages describe the difference between the V.42bis and LZJH encoders and how the
dictionary is used. Assumed is a 4096 dictionary which already stepped up to the maximum 12 bit code
word. The examples show how each processes an 8 character string ‘ABCDEFGH” which is repeated 8
2/16/2016
4
times in the input data. For simplicity, intervening characters between each instance of the example
string are ignored. This exemplifies how the V.42bis encoder builds its dictionary one character at a time
and how it must see 7 instances of an 8 character string before it has built a code word that represents the
entire string such that it can encode the string with one code word on the 8th instance. In contrast, LZJH
builds a code word to represent the entire string the 2nd time it is seen and encodes the string with a single
code word on the 3rd instance. Not only does LZJH only build 9 dictionary entries (compared to the 18 for
V.42bis) but it uses only 169 bits to encode the 8 instances of the string (compared to 312 bits) which is
about an 85% savings.
Note that the savings are greater as the length of the string increases. LZJH can encode any sized string
(up to the string length maximum of 255) the 2nd time it sees it and send a single code word on the 3rd
instance.
The V.42bis decoder builds the exact same dictionary as the encoder. Thus, on the 8th instance of the
string, when the decoder receives code word 276, it traverses the tree backwards through the UP indexes
until it gets to an entry that is a single character (i.e. less than 259). It then must copy the characters into
the output in reverse order to re-create the string. In contrast, the LZJH decoder dictionary, just
maintains an index into the Character Dictionary to the last character of a string and its total length.
When it receives code word a it just uses the index and the total length to get to the beginning of a
previous instance of the string within the Character Dictionary and does a copy to the output.
Legend:
 Shown is the input string, each of the 8 instances in the case of V.42bis.
 Also shown is the output of the encoder and the number of bits.
 New nodes built during each instance of the string are gray shaded.
 V.42bis nodes have the character itself within the dictionary entry.
 LZJH encoder nodes have an index to the character(s) and the number referenced. For the 2 nd
instance, when the LZJH encoder found the string ‘AB’ and output code word 3, it compared the
characters in the Character Dictionary following the 1st instance of the ‘AB’ with the next characters to
process which followed the 2nd instance of ‘AB’. Since 6 characters in both instances matched, it
sent a signal to the decoder to extend the string referenced by code word 3 by 6 additional characters
and built a dictionary entry (a) for 6 characters.
 LZJH decoder nodes have an index to the last character and the total in the string.
2/16/2016
5
V.42bis Encoder and Decoder
input
output
ABCDEFGH
reserved
A
node
B
B
259
C
C
260
D
ABCDEFGH
D
261
E
E
262
F
F
263
ABCDEFGH
A
B
B
259
C
266
C
C
260
261
E
267
E
E
B
B
259
C
266
D
269
C
C
260
262
F
263
G
268
B
D
B
259
C
266
D
269
E
272
2/16/2016
6
C
260
265
G
G
264
E
F
H
4 X 12 = 48
H
265
G
D
261
E
262
F
263
G
264
E
267
F
270
G
268
H
271
C
H
H
266 262 264 H
ABCDEFGH
A
264
F
ABCDEFGH
A
G
G
259 261 263 265
D
D
bits
8 X 12 = 96
H
4 X 12 = 48
H
265
269 268 H
D
E
F
G
D
261
E
262
F
263
G
264
E
267
F
270
G
268
H
271
H
273
H
3 X 12 = 36
H
265
ABCDEFGH
A
B
B
259
C
266
D
269
E
272
F
274
C
C
260
272 271
D
E
F
A
B
B
259
C
266
D
269
E
272
F
274
G
275
2/16/2016
7
C
261
E
262
F
263
G
264
E
267
F
270
G
268
H
271
H
273
C
260
G
D
ABCDEFGH
2 X 12 = 24
H
H
265
274 265
D
E
F
2 X 12 = 24
G
D
261
E
262
F
263
G
264
E
267
F
270
G
268
H
271
H
273
H
H
265
ABCDEFGH
A
B
B
259
C
266
D
269
E
272
F
274
G
275
H
276
C
C
260
275 H
D
E
F
A
B
B
259
C
266
D
269
E
272
F
274
G
275
H
276
2/16/2016
8
C
261
E
262
F
263
G
264
E
267
F
270
G
268
H
271
H
273
C
260
G
D
ABCDEFGH
2 X 12 = 24
H
H
265
276
D
E
F
1 X 12 = 12
G
D
261
E
262
F
263
G
264
E
267
F
270
G
268
H
271
H
273
H
H
265
total = 312
LZJH Encoder
input
output
ABCDEFGH
root
A
node
1 1
B
3
1 2
ABCDEFGH
C
4
D
5
1 3
1 4
E
6
F
7
1 5
1 6
G
8
0
1
2
3
4
5
6
B
3
6 b
a
1 2
36
C
4
D
5
1 3
1 4
E
6
0
1
2
3
4
5
6
F
7
1 5
ABCDEFGH
1 6
B
1 1
3
6 b
a
1 2
8
1 3
1
9
1 7
ABCDEFGH
7 8 9
D
a
b
c
5
1 4
E
6
d
e
f
10
2
3
4
5
6
F
7
1 5
ABCDEFGH
0
H
11
a
C
4
13 + 6 = 19
G
ABCDEFGH
A
9
1 7
7
ABCDEFGH
1 1
H
ABCDEFGH
character dictionary
A
bits
8 X 9 = 72
1 6
G
8
1 7
ABCDEFGH
7 8 9
a
b
c
d
1 X 13 = 13
e
H
9
ABCDEFGH
f 10 11 12 13 14 15 16 17 18 19 20
5 X 13 = 65
2/16/2016
9
total = 169
LZJH Decoder
root
A
node
2 1
input
output
ABCDEFGH
ABCDEFGH
B
3
2 2
C
4
2 3
D
5
2 4
E
6
F
7
2 5
2 6
0
1
2
3
4
5
6
B
3
8 10
a
9
2 2
ABCDEFGH
C
4
2 3
D
5
2 4
E
6
0
1
2
3
4
5
6
F
7
2 5
ABCDEFGH
2/16/2016
10
2 7
7
36
2 1
8
H
ABCDEFGH
character dictionary
A
G
2 6
G
8
2 7
ABCDEFGH
7 8 9
a
b
c
d
e
f
10
11
H
9
Download