Summary of OnTheFly - unb-dw

advertisement
Summary of “NET-FLi: On-the-fly Compression, Archiving and Indexing of Streaming Network Traffic”
Fusco et al. introduce a high-performance solution for high-speed archival, indexing and retrieval of
network traffic flow information. This information can be used in a number of different network
applications, such as network forensics, network troubleshooting and behaviour analysis. It may also be
used in solutions that do not work with network data, such as archiving and searching over numerical
data.
The NET-Fli storage solutions are comprised of two logical components: the archiving backend, and the
compressed bitmap index. In Fusco et al.’s solution, they capture a flow of records with 12 different
attribute values. The flow records are then ordered using an online Locality Sensitive Hashing (oLSH)
algorithm. Once they reorder using oLSH, they process the flow columns separately. Each column’s
information is compressed using the Lempel-Ziv-Oberhumer (LZO) algorithm. This algorithm allows fast
compression and decompression of the data. They also take the most important columns and index
them using a COMpressed Adaptive indeX (COMPAX). The COMPAX compression scheme consists in
initially compressing with a small variation of Word Aligned Hybrid (WAH) that encodes only 0-fill words,
and dirty words. Then, these WAH compressed bitmaps are further compressed by encoding sequences
of words that form the patterns LFL or FLF. In these patterns, L words are dirty words that contain a
single dirty byte and F words are dirty words that always have a last dirty byte. Thus, they effectively
reduce these sequences of three words into a single word. The resulting COMPAX compressed bitmap
index is “prepended” with a header that contains pointers to the beginning of each compressed bit
array.
Each stream of network flows may be reordered using oLSH. This type of sorting reduces disk
consumption for index and archives alike. The oLSH method uses a hash-based buffer that groups flow
records of similar content by using Locality Sensitive Hashing (LSH) based functions. The basic premise of
LSH is to cause vectors that are close, according to a distance function, to collide with a high probability
to the same bucket. The chains of the hash-based buffer are kept ordered using an InsertionSort
algorithm. Whenever a buffer reaches the maximum chain length, the chain is removed from the hash
and its content is flushed to a block.
Fusco et al. evaluate the performance of their solution with two datasets. One dataset consists of traces
of access traffic from a large hosting environment (HE) for six days. The other consists of internal and
external traces for a two-month period of an average-sized enterprise production network (PN). They
measure the disk consumption of their methodology for both the archive and the COMPAX encoded
bitmap index. The space required to store flow files compressed with lzo is greater than that of the
same flow files compressed with gzip. However, when they add oLSH to their solution, they reduce disk
consumption by up to 40% for the PN dataset, and up to 30% for the HE dataset.
For the index size, they compare the disk space savings on the bitmap index created when using
COMPAX as opposed to WAH encoding. COMPAX-compressed bitmap indices are 30% smaller for the PN
dataset and 40% smaller for the HE dataset than their WAH-compressed equivalents. Furthermore,
when oLSH is enabled for COMPAX, the disk consumption of the index is further reduced up to 60%
making it even smaller than its WAH counterpart.
Finally, because NET-FLi was designed to archive and build the index for high-speed streams of flow
records, Fusco et al. evaluate the average sustainable insertion rate in flows per second (f/s). They find
that the processing rates of their system when building the archive and index were fastest when using
COMPAX, followed by WAH and lastly COMPAX with oLSH. However, when evaluating the time it takes
to query the indices, they found that COMPAX with oLSH was faster than COMPAX, and COMPAX was
still faster than WAH. More specifically, they found that the total query time reduction when using
COMPAX with oLSH did not come from the Index search time, but from the archive search time of the
query.
Download