Novel Algorithms for Index & Vertex Data Compression and

advertisement
Novel Algorithms for Index &
Vertex Data Compression and
Decompression
main memory. The way that the transfer rate is increased is
not by increasing the size of the transfer buffer, but by
decreasing the size of the data being sent through the buffer.
If the algorithms that the group create generate a
compression ratio of C, the overall throughput ratio will be
change from “one data per transfer” to “C data per transfer”.
Alex Berliner, Brian Estes, Samuel Lerner
School of Electrical Engineering and Computer
Science, University of Central Florida, Orlando,
Florida, 32816-2450
Abstract — Modern graphics cards are performing
tremendous amounts of work to maintain the expected quality
of current-generation graphical applications. A bottleneck
exists where the speed of the bus connecting the graphics card
and a computer’s processor is not sufficient to quickly
transmit draw data, known as index and vertex data. The goal
of this project was to research, implement, and identify lossless
algorithms that would be able to compress this data efficiently.
Over the course of the project, the group evaluated many
algorithms including run-length encoding, LZO compression,
floating point masks, and Golomb encoding.
Index Terms — Index, vertex, graphics, compression,
decompression, lossless.
I. INTRODUCTION
Modern graphics cards are perform tremendous
amounts of work to maintain the visual fidelity expected of
current-generation graphical applications. Although it is
important to improve the hardware that supports these
cards, it is equally important to optimize software to make
as much use of the existing hardware as possible. The
compression of data before transfer through a hardware bus
is a shining example of this method of optimization.
Efficient compression algorithms can be used to deliver an
overall faster system than what can be accomplished with
hardware optimizations alone.
The goal of this project was to implement efficient
lossless compression and decompression algorithms for use
on the modern graphics pipeline. The algorithms will
compress the data that go into both the vertex and index
buffers. This reduction in size of information being
transferred through the buffer will allow a higher effective
throughput of previously uncompressed objects. After data
is fetched from a buffer it will be quickly decompressed on
the graphics processing unit (GPU) and then used normally.
The implementation of these algorithms will increase the
speed and efficiency that a graphics card can operate by
allowing the card to not have to wait as long for new
information to transfer into the buffer from the computer’s
II. DATA TYPES
The first of the data types to be compressed is vertex data.
This data can potentially contain both decimal and integer
values and depending on the situation can contain
numerous different structures of data. This data contains
information that describes a single point of the object. By
connecting these points and using the contained data the 3d
object is drawn.
The second type of data to be compressed is index data
which consists of only integer values. Each integer is a
single data point that is used similarly to how indices are
used in an array of variables in programming. The number
points to a section of vertex data that all describe the single
vertex of the object. By using index information the object
only has to define the vertex once and can then reuse that
point when drawing the object.
Fig 1 displays how two triangles are drawn using an
index and vertex buffer. The arrows between the buffers
shows how the index points to a section in the vertex buffer.
The vertex buffer is formatted as Position(x,y,z) and
Color(r,g,b). It can be seen that within the index buffer the
vertices 1 and 3 are reused to draw both triangles.
Fig. 1.
Interaction between index and vertex information.
III. REQUIREMENTS
The group recognized that it was going to the most
influential requirement for the project was that the
algorithm must be lossless. Many algorithms modify or
truncate the data that is being compressed to attempt to save
space with little loss in meaningful information. This
method works in some situations, such as audio and video
compression, but when the data is vertex information, small
changes may lead to large discrepancies in rendered
appearance. This means that the information that is
compressed cannot be changed in any form for
transmission.
The second requirement for this project was the
algorithm should be able to perform decompression quickly
while still maintaining an acceptable level of compression.
Since the decompression sequence is supposed to be
performed in real time, while the compression sequence can
be performed in advance, low decompression time was
favored over low compression time and high compression
rate. Although an ideal compression algorithm would be
one highly efficient in terms of compression ratio,
compression time, and decompression time, a real solution
can only be so good in one area before acting to the
detriment to one or more of the others.
IV. SPECIFICATIONS
A. Compression Requirements
Having the compression algorithms that can execute
quickly however is not altogether useless. If the
compression algorithm that is used happens to be fast, it can
be put to use by having the CPU compress the assets before
it is sent through the graphics pipeline. A situation like this
might occur in a game that was not built with these
optimizations in mind. If the assets in the project were not
compressed when they were built, they would be able to
still gain benefit from the compression / decompression
system with on-the-fly compression.
B. Decompression
The compression algorithms that are created must be
made in such a way that they support some amount of
random access capability. The contents of a buffer being
sent to a GPU contains many objects, and the GPU may not
want to access these objects in the order that they are
presented. If the compression algorithm is written in such a
way that the block it creates must all be decompressed at
the same time or in sequence, then significant overhead will
be incurred when trying to access a chunk that is in the
middle.
worked to create a testing environment for their algorithms
to run in that provided standardized tools and output
procedures, such as standardized file reading, buffer
management, data formatting, checksum validation, and
time & space metric keeping. The testing environment was
written in C and was used when comparing different
implementations and optimizations of the group’s
algorithms with previous attempts. By developing the
environment in C, the group hoped to avoid the
complications of using a shader program which would have
introduced more complexity than needed to test the
algorithms.
VI. INDEX COMPRESSION
A. Delta
Delta Encoding is an encoding and decoding method that
when run on a list of integers generates a list of deltas, or
differences of a value in the list and the previous value. The
first value in the list is kept in its original form, which is
named the anchor point. This list is used as a way of
encoding the original list of integers into potentially smaller
numbers by storing the differences, that when saved will
result in less space used.
When decoding the list these deltas are then used to
calculate the original number by adding them to the
previous item in the list. The decoder will run through the
list one by one the resulting list will be identical to the
original.
This process can be seen in Fig 2 which shows an
example of Delta Encoding. The encoded data contains the
anchor point 5 and the delta from that anchor point and the
next value 4 is then stored in the encoded list as -1. The
decoded list is generated similarly by taking the anchor
point and adding it to the next value, in this case 5 + (-1)
which then will result in the original value of 4.
Due to the nature of how Delta Encoding works, integer
data that does not vary much from one unit to the next offers
the highest potential compression. Due to this index data is
a prime candidate for encoding as when an object is created
the indices will tend to vary little from its neighbors in the
buffer.
V. TESTING ENVIRONMENT
The group recognized that it was going to be difficult to
maintain a standardized method of code generation, testing,
and data output without taking coordination precautions. To
prevent these code fragmentation issues, the group first
Fig. 2.
Example of Delta Encoding.
B. Run Length
Run length encoding is a simple compression algorithm
that turns consecutive appearances of a single character, a
“run”, into a pairing of the number of times that the
character appears followed by the character being
compressed.
As can be seen in Fig 3, a run of 5 a’s in a row would take
up 5 individual characters when uncompressed. The
compression algorithm will turn this into “5a”, which takes
up a mere 2 characters. The algorithm must also recognize
when not to use this technique in situations where doing so
will increase the file size. As with the last character being
encoded, “z”, compressing it into “1z” would double its
size, and so it is left alone. Decompressing a run length
encoded file is simply the process of re-inflating the
elements from their counted forms by inserting the
specified element the denoted number of times.
Fig. 3.
Example of Delta Encoding.
C. Golomb-Rice
Golomb-Rice coding is an algorithm designed by
Solomon Golomb and iterated upon by Robert Rice. It takes
in an integer and translates it into a binary sequence. It is
based on integer division, with a divisor that is decided
upon before runtime. It works by dividing the integer being
compressed by the chosen divisor and writing the quotient
and remainder as a single sequence.
The quotient from the result of this division is written in
unary notation. Unary is essentially a base 1 number
system. Each integer in unary is written as a series of one
number repeated to match the quantity the integer
represents. For example the integer three is written as 111
followed by a space. We cannot accurately express the
space in a binary sequence so it is instead represented by a
0 in our program.
The remainder from the result of the division operation is
simply written in binary. A unary sequence requires a lot
more digits to represent an integer than a binary sequence.
Because of this, choosing a large divisor when using
Golomb-Rice Compression is encouraged.
VII. VERTEX COMPRESSION
There are numerous research papers that describe
attempts to create effective vertex compression algorithms.
Some of these algorithms work at the time of vertex data
creation when creating the actual 3D object instead of at the
time of data transfer. There are also some algorithms
proposed for vertex compression that are lossy; used with
the assumption that the programs drawing the 3D objects
do not need the precision that the 32-bit vertex float data
would offer. These however were unacceptable for this
project as any changes in the data would result in graphical
errors or loss of important data in scientific or engineering
simulations. [1]
A. Burtscher-Ratanaworabhan
Burtscher-Ratanaworabhan, also labeled BR encoding, is
a predictive, hash-based vertex compression algorithm. The
hope with this algorithm is to save space by using
previously recorded data entries over the course of
compressing a file to predict what the next value is going to
be, and then only storing the difference in information of
that value and the actual value. It works by sequentially
predicting each value in a data file, performing a XOR
operation on the actual value and the predicted value to
increase data uniformity, and then finally performing
leading zero compression on the result of the XOR
operation.
The hash tables that are used for value prediction are
called the DFCM and FCM hash tables, which stands for
(Differential) Finite Context Method. An FCM uses a twolevel prediction table to predict the next value that will
appear in a sequence. The first level stores the history of
recently viewed values, known as a context, and has an
individual history for each location of the program counter
of the program it is running in. The second level stores the
value that is most likely to proceed the current one, using
each context as a hash index. After a value is predicted from
the table, the table is updated to reflect the real result of the
context. DFCM prediction works in a similar fashion;
instead of storing each actual value encountered as in a
normal FCM, only the difference between each value is
stored. [2]
B. Lempel-Ziv-Oberhumer
Lempel-Ziv-Oberhumer, or LZO, describes a family of
compression algorithms based on the LZ77 compressor,
which is also behind other popular algorithms such as those
that compress GIF and PNG files. LZO algorithms focus
on decompression while still achieving acceptable levels of
compression.
LZO compresses a block of data into “matches” using a
sliding window. This is done using a small memory
allocation to store a “window” ranging in size 4 to 64
kilobytes. This window holds a section of data which it then
slides across the data to see if it matches the current block.
When a match is found it is replaced by a reference to the
original block’s location. Blocks that do not match the
current “window” of data are stored as is which creates runs
of non-matching literals in between those that matched. For
this project LZO1-1 was implemented as this version
focused more on decompression speed instead of
compression rate.
Fig 5 displays a comparison between BR and LZO
compression rates from out tests. Unlike the results of the
index compression algorithms this has a very clear better
algorithm. LZO has a consistently higher compression ratio
and almost a double the compression rate when compared
to BR.
VIII. RESULTS
All tests were done using our testing environment. Each
test run was done on a computer that contained a Intel Core
i7-4785T @ 2.20GHz, 8 GB RAM and was run on
Windows 8.1 Pro. All test data was run 10 times for each
file then averaged.
The Delta RLE integer compression algorithm is able to
compress 46.25% of the index buffer on average, when run
on sequential data. It has an average compression time of
0.83 milliseconds, and an average decompression time of
0.76 milliseconds. It is able to compress data averaging at
400MB/second and decompress data at 250 MB/second.
The Golomb Rice integer compression algorithm is able
to compress 42.01% of the index buffer on average. It has
an average compression time of 14 milliseconds, and an
average decompression time of 14 milliseconds. It is able
to compress data at 28 MB/second, and decompress data at
16MB/second on average.
The LZO1-1 float compression algorithm is able to
compress 32.58% of the vertex buffer on average. It has an
average compression time of 5.1 milliseconds, and an
average decompression time of 2.9 milliseconds. It is able
to compress data at 500MB/second and decompress data at
600MB/second.
The BRK float compression algorithm is able to
compress 14.03% of the vertex buffer on average. It has an
average compression time of 9 milliseconds, and an average
decompression time of 7.6 milliseconds. It is able to
compress data at a rate of 200MB/second and decompress
data at 200MB/second.
Fig 4 displays a comparison between Delta-RLE and
Golomb-Rice compression rates from out tests. It is
important to note that the Compression rates of both
algorithms remain relatively comparable throughout the
tests. However due to Golomb’s slow speeds with
decompression it was deemed the less fit algorithm for our
project.
Fig. 4. Comparison between
Golomb-Rice compression rates.
Fig. 5.
Delta-RLE
encoding
and
Comparison between LZO and BR compression rates.
IX. CONCLUSION
The four algorithms described in this paper were
implemented and tested and the resulting data shows that
for index buffer compression, Golomb-Rice is significantly
less efficient than Delta-RLE. Delta-RLE can compress and
decompress 16 times faster than Golomb-Rice. Using
Golomb-Rice is only advantageous when run on random
data, as it maintains a consistent compression ratio.
Conversely, the Delta-RLE algorithm has far lower
compression results when run on random data.
For vertex buffer compression algorithms, LZO was the
better fitting algorithm for this project as it provides
18.55% more compression than BR encoding, able to
compress around twice the normalized rate (MB/s) than BR
and is also able to decompress at three times the normalized
rate that BR decompresses at. This shows that LZO and
other LZ77 based algorithms are what is recommended
moving forward when attempting to compress index and
vertex data for graphics cards.
ACKNOWLEDGEMENT
The authors wish to acknowledge the assistance and
support of Todd Martin and Mangesh Nijasure from
Advanced Micro Systems as well as Dr. Sumanta Pattanaik
and Dr. Mark Heinrich from the University of Central
Florida.
REFERENCES
[1] P. H. Chou and T. H. Meng. Vertex data compression
through vector quantization. IEEE Transactions on
Visualization and Computer Graphics, 8(4):373–382, 2002.
[2] M. Burtscher and P. Ratanawarobhan, “FPC: A high-speed
compressor for double-precision floating-point data,” IEEE
Trans. Comput., vol. 58, no. 1, pp. 18–31, Jan. 2009.
Download