V.42bis Patent infringement analysis

advertisement
Patent Infringement Analysis for V.42bis
This detailed analysis shows how the data compression methods used in the V.42bis standard
infringes on the Holtz Patent 4,366,551. It includes a short history of data compression methods,
the methods used in V.42bis, and a step by step analysis of claims 1 and 2 in the Holtz patent 4
366 551. The analysis will show that all the patented steps are indeed used in the V.42bis
standard. A large list of references is provided to aid in further research.
Introduction
The internationally recognized V.42bis data compression standard was developed by
British Telecom and published in 1990 in a document:
ITU INTERNATIONAL TELECOMMUNICATION UNION
CCITT
V.42bis
THE INTERNATIONAL TELEGRAPH AND TELEPHONE
CONSULTATIVE COMMITTEE.
DATA COMMUNICATION OVER THE TELEPHONE NETWORK.
DATA COMPRESSION PROCEDURES FOR DATA CIRCUIT TERMINATING
EQUIPMENT (DCE) USING ERROR CORRECTION PROCEDURES.
Recommendation V.42bis
Geneva 1990
The document is available for sale from the ITU in the “Blue Book”.
The ITU can be contacted through:
Mr. Th. Irmer
Director ITU.TSB
2 rue de Varembe CH 1211
Geneve 20, Switzerland
Email: marie.bercher@itu.ch
The ITU requires that all parties, which claim patent rights for a standard, should file a
formal claim letter stating that they will license their patents in a reasonable and non
discriminatory manner.
The following four parties have filed such claims for the V.42bis standard:
1
Omni Dimensional Networks, Klaus Holtz, Tel, Fax 415 474-4860
631 O’Farrell, Suite 710 (now suite 1208), San Francisco, CA 94109-7427, USA
2
UNISYS Ltd, Stonebridge Park, London NW10 8LS, United Kingdom
Tel. +44 714 535247, Fax +44 719 612252
3
BT plc (British Telecom), Mr. Richard Williams, Tel. +44 473 646020
4
IBM Europe, Mr. J. Peckels Tel. +33 931 84518
All four parties issue licenses for the use of their patents in V.42bis.
A short history of Data Compression methods.
Three basic data compression methods have evolved from three separate lines of
research. The early Shannon-Fano-Huffman codes are based on character frequency sorting
algorithms. Research into mathematical “Learning” in 1974 resulted in the discovery of selflearning Autosophy networks by Klaus Holtz. This research has now evolved into a new
Autosophy information theory, which claims to be the mathematical foundation of all known
lossless data compression techniques. Later research into dictionary based data compression by
Jacob Ziv and Abraham Lempel, in 1977 and 1978, eventually lead to the Ziv-Lempel-Welch
(LZW) codes now used in the V.42bis standard.
Huffman Codes
Claude Shannon published “A mathematical Theory of Communications” in 1948,
suggesting a method for lossless data compression. In this method the (ASCII) character set is
first sorted according to relative frequency in the data record (English text). The more frequently
used characters are assigned shorter codes than less frequently used characters. This idea was
refined by Huffman in 1952 (Patent 3 694 813 has expired). Though a favorite of academics and
mathematicians, the Huffman codes achieved few commercially viable data compression ratios
except when used for very predictable data records. The Huffman codes are not implemented in
the V.42bis standard.
Self-learning Autosophy Tree Networks
The data compression scheme used in the V.42bis standard is based on self-learning tree
networks invented by Klaus Holtz in 1974. Self-learning tree networks are a fall out from more
basic theoretical research into self-learning, brain-like machines. In addition to the self-learning
network later re-invented by Terry Welch, there are now seven other known types of selflearning networks, which are used to build brain-like self-learning Autosopher. Only the “Serial
Omni Dimensional Networks” are presently used in computer file compression and in the
V.42bis communications standard. A “Parallel Omni Dimensional Network” version is now
used in image compression. All modern data compression methods are based on a new
information theory and a mathematical theory of learning developed by Klaus Holtz.
Figure 1: A serial self-learning Autosophy data tree, Patent: Holtz 4 366 551.
A tree-like data structure (Fig. 1) consists of individual nodes, each of which contains a
GATE, (a text character) and a POINTER pointing backwards toward the previous node in the
tree network. The combination of a GATE (character) and a POINTER is called a MATRIX.
Starting from a pre-assigned SEED node, the network will grow from the input data, like a “Data
Tree” or a “Data Crystal”, in a computer memory. The first input character is combined with the
SEED pointer to obtain a first MATRIX. The memory is then searched to find a matching
MATRIX already stored in the library. If a matching MATRIX is found, then the memory
ADDRESS in which it was found becomes the new POINTER in the MATRIX register. This
new POINTER is then combined with the next input character to form a new MATRIX. If a
matching MATRIX is not found in the memory, then the MATRIX is stored into a next empty
memory location and the memory ADDRESS where it was stored becomes the next POINTER
in the MATRIX. This procedure is repeated for entire input strings, such as text words.
In the above example the text word READY is encoded into ADDRESS code 14 for
transmission. The first character R is combined with the SEED pointer 0 to obtain the first
matrix R-0. The memory is searched to find such matrix already stored in ADDRESS 1.
ADDRESS 1 is combined with the next character E to obtain the second matrix E-1 and the
search is continued. If the new matrix is not found in the memory, then it is stored into the next
empty memory location and the memory ADDRESS, into which it was stored, becomes the
POINTER for the next matrix. The procedure is repeated for each character in the string until
the string is terminated, either by an “End of Sequence” code (such as an ASCII “Space”), by a
fixed string length limit, or by a “first not found node” (see LZW code) used in the V.42bis
standard.
The ADDRESS code 14 is then transmitted or stored in a disc file, representing the input
string READY in a compressed form. Retrieval is done in reverse order starting from
ADDRESS 14. Address 14 is used as a memory address in an identical library memory to fetch
a first GATE=Y and a POINTER=13. The GATE character Y is stored in a First-In-Last-Out
(FILO) stack buffer. The POINTER is used as the next memory address to fetch the next
MATRIX from the library. The procedure is repeated until the POINTER is equal to the SEED
address. The output data string is then turned around and retrieved from the FILO stack buffer.
Each input string of arbitrary length is either recognized and found already stored or
automatically learned by the addition of new nodes. The text string characters are used as
GATES to select a trail through the tree maze. Each unique text string is finally represented by a
unique final ADDRESS code which is the “tip” of each branch. These final ADDRESS codes
can be transmitted as compressed data codes. The ADDRESS codes can later be used to retrieve
the original input strings.
According to the new Autosophy information theory true information is depending on the
data “content” rather than on the data “volume” as in the old Shannon information theory. The
transmitted codes are address tokens, called “tips” where each tip may represent any amount of
data depending on the amount of “knowledge” stored in the libraries. The more knowledge
stored in the libraries the more efficient is the communication and the higher the compression
ratios. Knowledge is measured in “engrams” where each tree node contains an engram of true
knowledge. In the V.42bis standard a tree library is grown from the transmitted data where each
transmission (with a few exceptions) creates a new node in both the transmitter and the receiver
libraries. Data compression is achieved because already learned data strings in the library are
represented by tip codes, which may represent many data characters. The V.42bis standard uses
the tree networks as the basic data compression method, but many additions are required to
implement the basic method in a practical system.
Figure 2: String encoding and retrieval algorithm according to Patent 4 366 551.
The string encoding and retrieval algorithm (Fig. 2) generates the generic “Serial Omni
Dimensional Network” according to the Autosophy information theories. This network is only
one of eight known self-learning networks which have been described in the literature. The
string length is finite and limited by an “End of Sequence” code, such as an ASCII “Space”
character or a maximum defined string length count. If the network is educated with written text,
then each string will represent whole text words of arbitrary length. The library is very compact
because each text word is only learned once and similar words may share their initial networks
nodes. String retrieval is terminated either by a pre-defined “Stop” code or by recognizing the
SEED ADDRESS in the POINTER. The output data is retrieved in reverse order and must be
turned around by a First-In-Last-Out (FILO) stack buffer. Storing or encoding the input data
words in reverse order will eliminate the need for a stack buffer.
The generic data trees were first discovered by Klaus Holtz in 1974. A first patent
application was filed on June 16, 1975. This patent describes the tree algorithms in Claim 1 and
2 and includes possible applications for data compression and encryption. With slight
modification and additions these networks are now implemented in the V.42bis standard.
Instead of showing formal flow charts, as in Fig. 2, modern methods instead use
algorithms to explain the same procedure. Algorithms are plain language steps which explain a
process. The algorithms for the generic serial networks are shown below.
SERIAL NETWORK GENERATION ROUTINE
MATRIX: [ GATE ] POINTER ]
Start: Set POINTER = SEED = 0.
Loop: Move the next input character into the GATE.
If the GATE is an “end of sequence” (an ASCII space) then
use the POINTER as output code; Goto Start.
Else search the library memory for a matching MATRIX.
If a matching MATRIX is found then
move the ADDRESS where it was found to the POINTER; Goto Loop.
Else, if a matching MATRIX is not found, then
store the present MATRIX into a next empty memory ADDRESS;
Move the memory ADDRESS where it was stored to the POINTER; Goto Loop.
SERIAL NETWORK RETRIEVAL ROUTINE
MATRIX: [ GATE ] POINTER ]
Start: Move the input code to the POINTER.
Loop: Use the POINTER as a memory ADDRESS to fetch a new MATRIX from the library.
Store, push the new GATE into a First-In-Last-Out (FILO) stack.
If the new POINTER = SEED = 0 then pull the output data from the FILO stack
Goto Start.
Else Goto Loop.
From the Ziv-Lempel codes to the Welch (LZW) algorithm
Though the basic data trees were discovered back in 1974, the V.42bis compression
method evolved from a later line of research. The tree networks were in effect re-discovered ten
years later by Terry Welch and described as the Ziv Lempel Welch (LZW) code in a paper “A
Technique for High-Performance Data Compression”. The basic Ziv-Lempel code (Fig. 3) was
first published in May of 1977 as: “A Universal Algorithm for Sequential Data Compression”.
This was one month after the first publication of the self-learning networks by K. Holtz. This
method is now known by such names as the Ziv-Lempel 1 (LZ-1) code or the LZ-77 code
(published in 1977). Ziv and Lempel did not apply for patents for this invention. Much later
Douglas Whiting received important improvement patents (5,003,307 and 5,016,009) which are
now licensed by STAC Electronics.
Figure 3: The original Ziv-Lempel method: LZ-1 or LZ-77
The input data is accumulated in a library memory, such as a shift register. For
subsequent transmissions the input data strings are compared with the data stored in the library
memory. The “Longest matching String” section found in the memory is then identified with a
“String Start Address” and a “String Length Code” and transmitted. The receiver would retrieve
the string from its own library memory and add the retrieved data to the memory so that both
libraries would remain identical at all times. Any length string if repeated in the transmission
could be compressed into a short code, but the longer the string the less likely is its repetition.
The basic Ziv-Lempel code is not very practical and was not chosen in the V.42bis standard.
The main problems are low compression ratios due to the inefficient library, and the need for a
string length code which is avoided in V.42bis. This algorithm is however still the most
common data compression method.
Figure 4: Improved Ziv Lempel code: LZ-2 or LZ-78
In September of 1978 Ziv and Lempel published an improved data compression method:
“Compression of Individual Sequences via Variable-Rate Coding”. This method, shown in Fig.
4, is now known as the LZ-2 (the second Ziv Lempel code) or the LZ-78 code (published in
1978). Again, no patents were filed on this method. This method will assemble a more compact
library from previously transmitted data. Each single data character not found in the library is
entered into the library and then transmitted as a single character code. In subsequent
transmissions the library is searched to find a “Longest matching string” in the library. The
library ADDRESS of this longest matching string is appended with the next “Non matching
EXTENSION character” and stored as a new library entry. The ADDRESS of the “Longest
matching string” and the next “Non matching EXTENSION character” is then transmitted. The
receiver will output the “Longest matching string”, recovered by a library look-up from the
received ADDRESS code, and then add the “non matching EXTENSION character” from the
same transmission. The receiver will store the received ADDRESS and the “EXTENSION
character” in a new library entry so that both libraries will remain identical at all times. The
receiver uses the transmitted codes to construct its own identical library. The libraries will grow
indefinitely during a transmission to absorb more and more string fragments. As the library size
increases, longer and longer string fragments will be found, leading to increased data
compression. The string length in each memory location is undefined and may require
increasingly longer and longer data words for storage. This makes this theoretical data
compression scheme very difficult to implement in real hardware systems. Many improvements
were required to implement this theoretical method in a real application.
In Aug. 1981 Eastman and Ziv Lempel applied for patent 4,464,650 granted in 1984
(now owned by AT&T). Eastman solved the main problems associated with the Ziv-Lempel
(LZ-78) code by constructing a network tree in which each node consists of a POINTER and a
single CHARACTER. Instead of storing a new string in a next “Empty” memory location,
Eastman employed a tree algorithm, involving multiplication, to assign network branches. The
forward-pointing branching scheme would assign the memory location for the next node
according to the input character and a reservation for all following nodes. The Eastman
invention involving the creation of many complex look-up tables, was not very practical. This
code is not used in the V.42bis standard.
In June 1983 Terry Welch applied for patent 4,558,302, which is mostly an extension of
the Eastman patent. Welch used a self-learning networks for data transmission by including
Hash code memory searching and a special code transmission scheme. These codes, now known
as the Ziv-Lempel-Welch (LZW) codes, have found applications for file data compression and in
the V.42bis standard. The only relevant difference between the self-learning Holtz networks and
the Ziv-Lempel-Welch codes is that in the latter the input string is terminated by the first “Not
Found” matrix or node. Each “Not Found” node is first stored in a next empty memory location
and then transmitted by a PREFIX (Pointer) and an EXTENSION (character) code. The receiver
will use the transmitted PREFIX-EXTENSION code to create a new node in its own library,
before retrieving the data string analogous to the HOLTZ algorithm. Because of “Delayed
Innovation” used in the V.42bis standard there is no perfect match between the LZW code and
the methods used in V.42bis. Data strings which are already found in the library or which are
terminated by a string length limit are not covered in the Welch patent.
Figure 5: The Ziv-Lempel-Welch code.
The Ziv-Lempel-Welch code shown in Fig. 5 is a self-learning network previously
discovered by Holtz but modified for data communications. The first 256 network memory
locations are pre-loaded with an 8 bit binary address pattern to represent the first ASCII
character in the sequence. This in effect creates a separate tree for each of the 256 ASCII
characters. The first input character of the sequence leads to a network node corresponding to its
8 bit ASCII code. The second input character is combined with the POINTER to the previous
node and the memory is searched to locate such MATRIX in the memory. If such MATRIX is
found, then the memory address where it was found is used as a new POINTER in the MATRIX.
Each character in the input string will cause a new memory search until the MATRIX (or leaf) is
not found in the memory. This “Not found” branch will terminate the input string.
The last found POINTER (or PREFIX) and the last “non matching” input CHARACTER
are then stored in a next “Empty” memory location. The POINTER (PREFIX) and the last input
character (EXTENSION CODE) are transmitted to a receiver. The receiver stores this PREFIXEXTENSION CODE in its own memory section so that both dictionary memories remain
identical at all times. The receiver retrieves the data string, in reverse order, by storing the
EXTENSION CODE (ASCII character) in the First-In-Last-Out Stack and then using the
PREFIX code to follow the POINTER trail back to the first 256 memory locations.
LZW ENCODING ROUTINE
MATRIX: [ GATE ] POINTER ]
Start: Reserve the first 256 memory locations ( 0 to 255) for the first character SEED.
Loop: Clear the POINTER in the MATRIX.
Loop1: Move the next input character to the GATE.
If the POINTER = clear then move the GATE to the POINTER; Goto Loop1
Else search the library memory to locate a matching MATRIX.
If a matching MATRIX is found then
move the memory ADDRESS to the POINTER; Goto Loop1.
Else store the MATRIX into a next empty memory ADDRESS;
Use the present MATRIX as the output transmission code; Goto Loop.
LZW RETRIEVAL ROUTINE
MATRIX: [ GATE ] POINTER ]
Start: Reserve the first 256 memory locations ( 0 to 255) as final SEED nodes.
Loop: Move the received input code into the MATRIX.
Store the present MATRIX into a next empty memory ADDRESS.
Push the GATE into a First-In-Last-Out (FILO) stack.
Loop1: Use the POINTER as a memory ADDRESS to fetch a new MATRIX from the library.
Push the new GATE into the FILO stack.
If the new POINTER is less than 256 then push the new POINTER into the FILO stack;
Retrieve, pull the output data from the FILO stack; Goto Loop.
Else Goto Loop1.
Overall the Ziv-Lempel-Welch code suffers from: Data expansion for short files, limited
data compression ratios and error propagation. Because the Welch method starts from a mostly
empty dictionary memory, short files will not find any matching strings. This will cause “Data
Expansion” in which the compressed files turn out to be larger than the original data files. ZivLempel-Welch codes achieve only modest data compression because of the long PREFIXEXTENSION CODE transmission. Because text words are chopped into random fragments the
library is very disorganized. This is know as a “Greedy” algorithm. Error propagation is a
problem because any transmission error will cause the two dictionaries to become different. This
may cause catastrophic error propagation which destroys the entire remaining data file. ZivLempel-Welch codes will, though, automatically adapt to the users data type and language,
requiring no prior exchange of network libraries. Theoretically, the Ziv-Lempel-Welch library
could grow string of any length, but the V.42bis standard imposes a maximum string length
(Paragraph 6.4 Procedure for adding strings to the dictionary). This limited string length in the
V.42bis standard requires special steps which are not covered in the Welch Patent 4 558 302.
These special steps are however covered in the Holtz Patent 4 366 551 so that a license from
both patent owners is required to implement the V.42bis standard.
The V.42bis Data Compression Method
British Telecom developed the V.42bis standard by combining several patents and adding
innovative features. After much testing and evaluation the final version shown in Fig. 6 was
chosen. Other data compression methods, such as STAC Electronics original Ziv-Lempel code
(LZ-1 or LZ-77), HAYES compressor or MICROCOM’s class 5 or 7 codes, proved inferior and
were rejected.
Figure 6.
The V.42bis compression method using delayed innovation
The international V.42bis compression standard (Fig. 6) improves the LZW code through
“delayed innovation” and a limited recycling library memory. Only the POINTER code is
transmitted, while the next or unmatched character is used to start the next string. The transmitter
stores the not found node in a next empty memory ADDRESS and then transmit the POINTER
only. The receiver retrieves the string from the POINTER code and remembers the POINTER in
a buffer. The next POINTER transmission contains as its first character the missing GATE,
which is combined with the POINTER in the buffer to create a new node in the receiver library.
A rare problem exists because the transmitter adds a library node one transmission before the
receiver can create the same node. If the transmitter uses the new node in encoding the next
string, then the algorithm can crash because the receiver does not yet has that node. The V.42bis
standard has precautionary steps to avoid that problem. Instead of an endlessly growing library,
the V.42bis standard has a fixed length library in which the “least recently used” node address is
cleared and recycled to store the next node. A few addresses above the first 256 nodes are used
as special command messages rather than data codes.
The ETM (256) code switches the compression engine off. The compression efficiency is
continuously monitored by an embedded microprocessor. If the compression ratio becomes too
low, then the compression engine is switched off and replaced by clear 8 bit character
transmissions. The compression engine may be switched back on later when the data becomes
compressible again. This avoids most problems with data expansion.
The FLUSH (257) code is used to clear the entire library to allow a new library to grow
from the following transmissions.
The STEPUP (258) command increases the POINTER bit length by one bit. Starting with
a 9 bit POINTER code (node addresses 259 to 511), the bit length is increased whenever the
library doubles at 512, 1k, 2k and 4k locations until the fixed library size limit. The POINTER
code length then remains constant while the library addresses are recycled in a “least recently
used” order. The V.42bis standard stuffs either POINTER codes or 8 bit character codes into
packets for transmissions. The packets contain error checking codes and defective packets are retransmitted until they are received without errors. This avoids the problem of error propagation.
Figure 7.
Sussenguth tree searching in V.42bis
The V.42bis standard uses a forward pointing tree searching algorithm (shown in Fig. 7)
to increase encoding speed. This algorithm was first described by Edward H. Sussenguth Jr. in a
1963 publication “Use of Tree Structures for Processing Files”. There is no known patent for this
searching method. Instead of having to search the entire library memory, this algorithm limits the
search to a maximum of 256 steps. Both the Sussenguth and the LZW trees are combined in the
same node. While the Sussenguth tree accelerates encoding speed in the transmitter, the LZW
tree accelerates retrieval speed in the receiver. The GATE field contains the character which is
shared by both trees. A DEPENDENT forward pointer points to the next node ADDRESS if the
input character matches the GATE code. If the input character does not match the GATE code,
then the search would follow the SIBLING pointer to the next node ADDRESS. The algorithm
would follow the SIBLING pointer trail until a GATE matching the input character is found. If
no matching GATE is found at the end of the trail, indicated by an empty SIBLING pointer, then
a new node is created at a next empty or recycled memory ADDRESS. The previous node
SIBLING pointer is loaded with the new node ADDRESS to extend the SIBLING trail. Starting
from a “Root” node, a forward search is made through the memory to find a stored data string.
Each node contains the following four fields:
A CHARACTER field contains an 8 bit ASCII character or GATE, which is shared by
both the Autosophy (LZW) and the Sussenguth trees.
A DEPENDENT forward pointer points at the NODE ADDRESS where the next
character in the string may be found. This pointer branch is taken if the CHARACTER field or
GATE matches the input text character. A dependent pointer refers to all the nodes depending on
the present node address, or the children of the node.
A SIBLING forward pointer means a pointer to the same level nodes, or the brothers and
sisters of the node. The pointer branch is taken if the CHARACTER field does not match the
input text character. The SIBLING search trail is followed until a node with a matching
CHARACTER field is found. If no matching CHARACTER is found at the end of the trail, then
a new node is created for the input text character in a next empty or recycled memory
ADDRESS. The algorithm then loads the new node ADDRESS into the empty SIBLING field
of the previous nodes to extend the SIBLING node search trail. The PARENT backward pointer
is not part of the Sussenguth tree but is required in the receiver for data retrieval.
Figure 8.
Holtz, LZW tree transformation for use in V.42bis
Figure 8 shows how the original Holtz tree network was transformed into the V.42bis
standard using many additions and modifications. Adding modifications or additions to a system
does not protect from patent infringement as long as all the original patented steps are
implemented in an application. The original Holtz tree network only contains the essential steps
of storing a MATRIX consisting of a GATE (a character) and a backward POINTER in a
memory ADDRESS. The Holtz tree contains facilities for both encoding data strings into
compression codes and for retrieving the original data strings (in reverse order) from the
compression codes.
A first modification was made in the V.42bis standard by splitting the nodes into a
transmitter portion and a receiver portion. Each portion contains the identical GATE and is
stored in an identical node ADDRESS. Duplicating the GATE portion for the transmitter and
receiver is an addition which does not protect from patent infringement.
In theory, a node is defined by its node ADDRESS. Even is the two portions of the node
content are split into two portions, separated by a modem and telephone line, the node remains
the same as long as it is identified by the same ADDRESS. A complete node is the combination
of both portions located in the transmitter library and in the receiver library.
The second addition was to add the Sussenguth search tree in the transmitter. In the Holtz
data tree all the memory locations are searched to locate a matching MATRIX. The Sussenguth
tree accelerates the search by jumping from node to node so that a maximum of 256 steps are
required to find a MATRIX. Implementing or adding a specific search method does not protect
from patent infringement. The Holtz compression method would work with or without the
additional Sussenguth search algorithm.
The backward POINTER is generated by the transmitter as the last found matching node
ADDRESS. This backward (LZW) POINTER need not be stored in the transmitter library. The
POINTER is then transmitted to the receiver. The backward POINTER is the compressed
“transmission code”. The receiver would use the POINTER code to retrieve a data string, in
reverse order, as specified in the Holtz patent claim 2. The POINTER code is then stored in a
buffer. The next POINTER code transmission would contain the next data string. The first
character (GATE) of this next data string is combined with the POINTER in the buffer to create
a new node in the receiver (see delayed innovation). The final result is that a node containing a
GATE (character) and a back POINTER is stored in an ADDRESS as claimed in the Holtz
patent claim 1. Neither “delayed innovation”, splitting of the node into two portions, or the
addition of the Sussenguth search algorithm will protect from patent infringement.
The following features are implemented in V.42bis.
1.
The compression library implements the Self-learning tree networks in Patent Holtz
4,366,551. All data sequences which exceed the string length (V.42bis standard paragraph 6.4)
or which are already found stored in the library are processed according to the Holtz algorithms.
Data retrieval from a node address uses the Holtz retrieval algorithm in claim 2.
2.
Strings are terminated by the first “Not found node” according to the Patent Welch
4,558,302 (now owned by UNISYS). The only relevant variation from the Holtz algorithms is
that each string is terminated by the first “Not Found” node. This allows for automatic library
growth during a transmission. Terminating a string in this manner is still within the parameter
claimed in the Holtz patent.
3.
The V.42bis compression standard uses Delayed Dictionary Learning. As shown in Fig.
6, only the POINTER is transmitted. Delayed innovation was anticipated in the Holtz patent as
“Word Fragmentation”.
4.
The V.42bis standard employs an algorithm to recycle memory nodes once all memory
locations are used up. This algorithm will separate “Leaf” nodes from “Branch” nodes. Branch
nodes must be preserved because they are used as reference POINTER by other nodes. Leaf
nodes are cleared and reused in all further encoding. The method and algorithm are claimed in
the Miller Patent 4,814,746 (owned by IBM).
5.
The efficiency of the compression is monitored by an internal processor. If the
compression ratio becomes too low or results in data expansion, then a switch is made to normal
clear transmission. The compression algorithm is switched on or off during a transmission
depending on the data type.
6.
The bit length of the transmitted ADDRESS is adjusted by STEPUP commands. Library
ADDRESSES in the first 512 library locations require only a 9 bit code, while ADDRESSES in
the last 4k library locations require a 12 bit code. The data bits are stuffed into 8 bit octets for
transmission and combined into data packets. Each packet contains an error checking code to
initiate packet re-transmission in case of transmission errors.
Summary
The V.42bis standard is a state of the art algorithm, implemented in virtually all new
modems. It can compress most types of data to some extend. It avoids most data expansion and
eliminates the danger of error propagation by re-transmitting defective packets. The algorithm is
usually implemented by embedded microprocessors. The library size, packet size and baud rate
are negotiated by the modems before each transmission. The Sussenguth search tree allows data
transmission speeds sufficient for telephone modems and ISDN though not much beyond that.
Truly high speed transmissions via local or remote area networks (including fiber optic
transmissions and real time file compression in a computer) are not possible with today’s
microprocessors. Encrypted data cannot be compressed and no provisions for data encryption are
provided in the standard. Because of the inefficient “string chop suey” library in the LZW code
the compression performance remains mediocre.
Future enhancements may include:
1.
Using pre-loaded generic dictionaries which are downloaded from the software or which
are stored as generic Read Only Memories (ROM) in the hardware. This may increase initial
data compression and provide near unbreakable data encryption. Using “String Fragmentation”
in a fixed library system may remove the danger of “Error Propagation”.
2.
Using the “End of Sequence code (an ASCII “Space”), similar to the method in the
HOLTZ algorithm, may result in more compact libraries which are less “Greedy” and provide
higher compression.
3.
Using Content Addressable Memories (CAM) such as the AMD 99C10 chips may
increase encoding speed to more than 30 million characters per second. This may open the way
into other applications, such as local and wide area digital networks.
4.
Using the new Content Addressable Read Only Memory (CAROM Patent 5,576,985)
may result in single chip compression systems operating at bus speeds.
5.
Other self-learning networks, like the “Parallel” networks (Patents Holtz 4,992,868 and
5,113,505), are now being developed from image compression in applications for Computer
Graphics compression, Teleconferencing and High Definition Television transmissions.
U.S. Patent 4 366 551
The following analysis will show that the data compression method in the V.42bis
standard uses all the steps in the Holtz patent. Unlike in the Welch Patent 4,558,302 this claim
applies even for strings that exceed the string limit or that are already found in the library. The
Holtz patent provides the generic data trees used in the V.42bis standard. Manufacturing or
selling V.42bis equipment infringes on U.S. Patent Holtz 4 366 551 claims 1 and 2.
Claim 1:
A machine implemented method of conversion of data sequence into a single address
number of an addressable storage location in a storage region, comprising the steps of:
In the V.42bis standard compression method, data sequences (strings) are converted into
pointer codes for compressed communications or storage. The pointer codes (single address
number) represent address (storage) locations in a tree library (storage region). The method is
“machine implemented” applying to both hardware and software only implementations.
(a)
Providing storage regions having a plurality of machine addressable storage
locations for storing machine readable data characters;
All lossless data compression method require a library memory or “storage region” which
can be implemented either as: A Random Addressable Memory (RAM), a serial shift register
memory or as a Content Addressable Memory (CAM) as outlined in the specifications. The
V.42bis standard implements and requires such a library memory.
(b)
directing the first data character of a data sequence and a starting number into
respective first and second position of a two-position machine operable buffer region, the
data character and the starting number, when received in respective positions of the buffer
region, defining a machine readable input character matrix;
The two position MATRIX or storage word is the very foundation of this invention. In
the Holtz patent the two positions are named: Data Character -- Starting number. Later the same
two position memory pattern is called GATE - POINTER and the combination of the two values
is called a MATRIX. In the Welch patent the same two values are referred to as: Extension
character -- prefix. Both patents require starting the network generation at a pre-defined starting
address. In V.42bis, this starting number is an address in the pre-loaded first 256 memory
locations corresponding to the first input character of the string.
(c)
machine addressing the storage locations of the storage region and machine reading
the content of each storage location to determine if the input character matrix in the buffer
region is stored in the storage region;
Searching the memory for a matching matrix is required in the V.42bis method. This
memory searching may be implemented in different ways even though step (c) does not demand
any specific searching method. In the Holtz patent, the memory searching method is not further
defined except that in Fig. 4 and Fig. 5 the memory is defined as “Either Random, Serial of
Content Addressable Mass Storage”. The Sussenguth search method used in V.42bis is an
additional feature that speeds memory searching. It is obvious that once a matching matrix is
found no further search is required. Any type of search is included in this step.
(d)
if the input character matrix in the buffer region is not stored in said storage region,
storing the input character matrix in a free storage location of said storage region and
directing the address number of said free storage location into the second position of the
buffer region;
This storage of “not found” tree nodes is implemented in the V.42bis standard. Once all
storage locations are used up then old nodes may be eliminated and recycled. This modification
will not protect V.42bis users from patent infringement. Note that the Holtz patent specifies
storing the matrix in “a” free storage location which should include any type of address
scrambling. In the V.42bis standard, the resulting pointer or address code is also transmitted to a
receiver station. This is an additional step in the algorithm.
(e)
if the character matrix in the buffer region is stored in a storage location of said
storage region directing the address number of the storage location at which the input
character matrix is stored into the second position of the buffer region;
Using the memory address where the GATE-POINTER matrix is found as a backward
POINTER is used in the V.42bis standard. The backward pointer is generated in the transmitter
as the “last matching” node address. The backward POINTER is stored in the receiver library.
(f)
directing the next data character of said data sequence into said first position of
buffer region to cause the next data character and the address number in respective
positions of the buffer region to form another machine readable input character matrix;
Combining the next character input with the POINTER to form a new matrix is
implemented in the V.42bis standard.
(g)
repeating the steps c through f for each of the next and remaining data characters of
the data sequence; and
This looping for each data character in the string is implemented in the V.42bis method.
(h)
terminating the repeating step after the input character matrix corresponding to the
last data character of the data sequence has been stored or found to be stored in said
storage region, the address number of the storage location which identifies the last input
character matrix being operable to represent the data sequence.
The V.42bis method generates a single address code for transmission which identifies an
input sequence of any length. In the Holtz patent the end of the data sequence is not further
specified and left to the specific application. In the Welch patent and in the V.42bis standard, the
end of a data sequence is determined when an input matrix is not found in the memory. The
Holtz patent claim includes any method of terminating the data sequence, including using a “not
found” node or alternately recognizing special “End of String” codes such as “ASCII Space”.
The Welch algorithm uses the first “not found” node to terminate the data string encoding. This
special case still falls within the parameter of this claim. Strings that are terminated by a string
length limit or that are already stored (V.42bis standard paragraph 6.4) are not covered by the
Welch patent. These conditions are however still covered in this claim.
Claim 2
Claim 2 shows how the data is retrieved in reverse order from the pointer codes. Even if
the encoding method in the V.42bis standard is thought to be different, retrieving the data output
in reverse order will still make V.42bis users liable for patent infringement. All steps are
precisely implemented in V.42bis.
Claim 2: A machine implemented method as set forth in claim 1, including retrieving, in
reverse order, a data sequence stored in said storage region and represented by an address
number, said number identifying the storage location in the storage region at which the last
character input matrix of the data sequence is stored, comprising the steps of:
(i) machine reading said last input character matrix from the storage region and
directing it into the buffer region to present an output character matrix formed of a
machine readable output data character and a machine readable next address number,
(j) machine reading the output data character in said buffer region and directing it
to an output station,
(k) machine reading the input character matrix from the storage location of the
storage region identified by the next address number and directing it into the buffer region
to present another output character matrix, and
(l) repeating steps j and k until the next address number is the starting number.
References
1
Klaus Holtz, Eric Holtz, Diana Holtz, “Autosophy Information Theory provides lossless
data and video compression based on the data content”. EUROPTO’96, Digital
Compression Technologies and Systems for Video Communication,
SPIE Volume 2952, October 1996, Berlin Germany
2
Klaus Holtz, “Data Compression in Network Design”.
Tutorial at Design SuperCon’96, Santa Clara CA, Jan 1996
3
Klaus Holtz, “Digital Image and Video Compression for Packet Networks”.
Tutorial at SuperCon’96, Santa Clara CA.
4
Klaus Holtz, “Autosophy Data Compression Accelerates and Encrypts Network
Communications”. WESCON/95, Session C3, Paper 4. Nov. 7, 1995
5
Klaus Holtz, “Autosophy Image Compression for Packet Network Television”
WESCON/95, Session C2, 1995
6
Klaus Holtz, “”Packet Video Transmission on the Information Superhighway Using
Image Content Dependent Autosophy Video Compression”.
IS&T’s 48th Annual Conference, Washington DC, May 1995
7
Klaus Holtz, “Fast Data Compression Speeds Data Communications”.
Design SuperCon ‘95, Santa Clara CA, 1995
8
Klaus Holtz, “Lossless data-compression techniques advance beyond Shannon’s limits”.
Personal Engineering Magazine, Dec. 1994
9
WESCON/94, Session W23, Advanced Information Management,
Sept. 29, 1994, 5 papers, Anaheim CA.
10
WESCON/94, Session W9, Digital Video Compression. Sept. 28, 1994, Anaheim CA.
11
Colin Johnson, “Data know thyself” OEM Magazine, May 1994, Page 94
12
Klaus Holtz, “Hyperspace storage compression for Multimedia systems”,
IS&T / SPIE Electronic Imaging Science and Technology, Paper 2188-40,
Feb. 8, 1994, San Jose CA
13
WESCON/93, Session 7, Applications for lossless data compression.
Sept. 28, 1993, San Francisco, CA. 4 papers.
14
Klaus Holtz, “Autosophy Networks yield Self-learning Robot Vision”.
WESCON/93, Session S2, Paper 5, San Francisco, CA, Sept. 28, 1993
15
Klaus Holtz, “Self-aligning and compressed autosophy video databases”.
SPIE Vol. 1908, 1993 San Jose, CA, Storage and Retrieval for Image and Video
Databases
16
Klaus Holtz, “Lossless Image Compression with Autosophy Networks”.
SPIE Vol. 1903, 1993 San Jose, CA, Image and Video Processing
17
Klaus Holtz, “HDTV and Multimedia Image Compression with Autosophy Networks”
WESCON/92, Nov. 1992, Anaheim, CA, page 414
18
Klaus Holtz, “Autosophy image compression and vision for aerospace sensing”.
SPIE-92, Vol. 1700-39, Orlando, FL, April. 24, 1992
19
Colin Johnson, “Storage technique can learn”. Electronic Engineering Times, Jan.6,1992
20
NORTHCON-91, Session D4, Learning Networks: An Alternative to Data Processing.
Portland OR, Oct.1991
21
Klaus Holtz, E, Holtz, “True Information Television (TITV) breaks Shannon Bandwidth
Barrier”. IEEE Transaction on Consumer Electronics, May 1990, Volume 36, Number 2
22
Klaus Holtz, “Text Compression and Encryption with self-learning Networks”.
IEEE GLOBECOM-85 New Orleans
23
Klaus Holtz. “Build a self-learning no programming Computer with your
Microprocessor”. Dr. Dobbs Journal, Number 33, March 1979, Vol.4
24
Klaus Holtz, E. Langheld, “Der selbstlernende und programmier-freie
Assoziationscomputer”. ELEKTRONIK magazin, Dec. 1978, Volume 14 and 15,
Franzis Verlag Abt. Zeitschriften Vertrieb, Postfach 37 01 20, 8000 Munich 37, Germany
25
Klaus Holtz, “Here comes the brain-like self-learning no-programming computer of the
future”. The First West Coast Computer Faire 1977
26
A. Lettieri “Text compression and encryptography with self-learning infinite
dimensional networks”. Master Degree Thesis, University of Dayton Ohio 1983
27
C.E. Shannon, “A mathematical Theory of Communications”,
Bell Telephone B-1598, Vol.27, July and Oct. 1948
28
J. Ziv and A. Lempel, “A Universal Algorithm for Sequential Data Compression”,
IEEE Information Theory, IT-23, No. 3, May 1977
29
J, Ziv and A. Lempel, “Compression of Individual Sequences via Variable-Rate Coding”
IEEE Information Theory, IT-24, Sept. 1978
30
J. Ziv and A. Lempel. “Coding Theorems for Individual Sequences”
IEEE Information Theory. Vol. IT-24, No.4, July 1978
31
T. A. Welch, “A Technique for High Performance Data Compression”
IEEE Computer, June 1984 (Patent 4,558,303)
32
Y. S. Miller, M. N. Wegman. “Variation on a Theme by Ziv and Lempel”
IBM Papers, Combinatorial Algorithms on Words, 1985
33
Edward H. Sussenguth, “Use of Tree Structures for Processing Files”
Communications of the ACM, Volume 6, Number 5, May 1963
34
E.U. Cohler and J.E. Storer “Functional Parallel Architecture for Array Processor”
IEEE Computer, Vol 14, No. 9, Sept 1981
35
L. Rissanen “A Universal Data Compression System”
IEEE Information Theory, Vol. IT-29, No.5, Sept. 1983
36
D.I. Dance “An Adaptive on Line Data Compression System”
IEEE The Computer Journal, Vol 19, Aug. 1976
37
38
39
40
41
42
43
44
45
46
47
48
49
50
52
53
54
55
56
57
58
59
60
61
62
J.B. Seery, J. Ziv “A Universal Data Compression Algorithm”
Description and Preliminary results. Technical Memo; mARCH 23, 1977
J.B. Seery, J. Ziv “Further Results on Universal Data Compression”
Technical Memorandum, June 20, 1978
A. Lempel, G. Seroussi, J. Ziv “On the Power of Straight Line Computation in
Finite Fields”. IEEE Information Theory, Vol. IT-28, No. 6, Nov. 1982
A. Lempel, J. Ziv “On the Complexity of Finite Sequences”
IEEE Information Theory, Vol. IT-22, No. 1, Jan. 1976
A. Lempel, J. Ziv “Compression of Two-Dimensional Data”
IEEE Information Theory, Vol. IT-32, No. 1, Jan. 1986
A. Lempel, J. Ziv “Compression of Two-Dimensional Images”
Combinatorial Algorithms on Words 1985
J. Ziv “On Universal Quantization”
IEEE Information Theory, Vol. IT-31, No. 3, May 1985
J.A. Storer “Data Compression: Methods and Complexity Issues”
Ph.D. Dissertation, Dept. Electrical Eng. and Computer Science, Princeton Uni. 1979
J.A. Storer “Data Compression Methods and Theory” 1988
J.A. Storer, T.G. Szymanski “Data Compression via Textual Substitution”
Journal of the Association for Computing Machinery. Vol. 29, No. 4, Oct. 82
J.A. Storer, T.G. Szymanski “The Macro Model for Data Compression”
Tenth Annual ACM Symposium on Theory of Computing 1978
G. Held, T. Marshall “Data Compression, Techniques and Application Hardware
and Software Considerations, (2d ed) pp 1-206, 1983, 1987.
S. Even, M. Rodeh “Economical Encoding of Commas Between Strings”
Communications of the ACM. Vol. 21, No. 4, Apr. 1978
M. Rodeh, V.R. Pratt, S. Even “Linear Algorithms for Data Compression via String
Matching”. Journal of the ACM, Vol. 28, No. 1, Jan. 1981
J.K. Gallant “String Compression Algorithms”
Ph. D. Diss. Electrical Eng. and Computer Science, Princeton University 1982
D.A. Huffman “A Method for the Construction of Minimum Redundancy Codes”
Proceedings of the I.R.I., Sept.1952
Knuth “Sorting and Searching: The Art of Computer Programming”. Vol. 3, 1973
G.G. Langdom “A Note on the Ziv-Lempel Model for Compressing Individual
Sequences”. IEEE Information Theory, Vol. IT-29, No. 2, March 1983
M. Wells “File Compression Using Variable Length Encoding”
The Computer Journal, Vol. 15, No. 4, Nov. 1972
M.J. Bianchi, J.J. Kato, D.J. Van Maren “Data Compression in a Half-Inch
Reel-to-Reel Tape Drive”. Hewlett-Packard Journal, June 1989
“How compressed data travel faster and cheaper” Business Week, Nov. 1982
G. Pastrick “Double-No-Triple Your Hard Disc Space with on-the-fly Data
Compression”. PC Magazine, Jan. 1992
M. Cohn “Performance of Lempel-Ziv Compression with Deferred Innovation”
Technical Report TR-88-132, April 1988.
Mark Nelson “The Data Compression Book”
M&T BOOKS, 411 Borel Avenue, San Mateo, CA 94402
European Patent No. 0 127 815 and Patent Miller 8,814,746 (related patents)
63
64
65
German patent No. A 65 665 and patent Heinz 4,491,934 (related patents)
Japanese Patent No. 19857 / 82, 101937 / 82
U.S. Patents on data compression
Huffman 3,694,813 The Huffman codes
Barnes 3,914,747
Betz 3,976,844
Hoerning 4,021,782
Schippers 4,038,652
Jackson 4,054,951
Schippers 4,152,582
Holtz 4,366,551
V.42bis basic tree networks (Omni Dimensional Networks)
Moll 4, 412,306
Storer 4,436,422
Eastman 4,464,650 Primitive tree network (AT&T)
Welch 4,558,302
V.42bis LZW code (UNISYS)
Miller 4,814.746
V.42bis delayed innovation and recycling library (IBM)
Van Maren
4,870,415
(HP)
Holtz 4,992,868
Image compression using parallel networks
Whiting 5,003,307 LZ-1 code (STAC Electronics)
Whiting 5,016,009 LZ-1 code
Katz 5,051,745
PKZIP relative LZ-1 (PKWARE)
Holtz 5,113,505
Image compression with pyramidal addressing
Holtz 5,576,985
Content addressable memory for high speed compression
Download