Summary of “NET-FLi: On-the-fly Compression, Archiving and Indexing of Streaming Network Traffic” Fusco et al. introduce a high-performance solution for high-speed archival, indexing and retrieval of network traffic flow information. This information can be used in a number of different network applications, such as network forensics, network troubleshooting and behaviour analysis. It may also be used in solutions that do not work with network data, such as archiving and searching over numerical data. The NET-Fli storage solutions are comprised of two logical components: the archiving backend, and the compressed bitmap index. In Fusco et al.’s solution, they capture a flow of records with 12 different attribute values. The flow records are then ordered using an online Locality Sensitive Hashing (oLSH) algorithm. Once they reorder using oLSH, they process the flow columns separately. Each column’s information is compressed using the Lempel-Ziv-Oberhumer (LZO) algorithm. This algorithm allows fast compression and decompression of the data. They also take the most important columns and index them using a COMpressed Adaptive indeX (COMPAX). The COMPAX compression scheme consists in initially compressing with a small variation of Word Aligned Hybrid (WAH) that encodes only 0-fill words, and dirty words. Then, these WAH compressed bitmaps are further compressed by encoding sequences of words that form the patterns LFL or FLF. In these patterns, L words are dirty words that contain a single dirty byte and F words are dirty words that always have a last dirty byte. Thus, they effectively reduce these sequences of three words into a single word. The resulting COMPAX compressed bitmap index is “prepended” with a header that contains pointers to the beginning of each compressed bit array. Each stream of network flows may be reordered using oLSH. This type of sorting reduces disk consumption for index and archives alike. The oLSH method uses a hash-based buffer that groups flow records of similar content by using Locality Sensitive Hashing (LSH) based functions. The basic premise of LSH is to cause vectors that are close, according to a distance function, to collide with a high probability to the same bucket. The chains of the hash-based buffer are kept ordered using an InsertionSort algorithm. Whenever a buffer reaches the maximum chain length, the chain is removed from the hash and its content is flushed to a block. Fusco et al. evaluate the performance of their solution with two datasets. One dataset consists of traces of access traffic from a large hosting environment (HE) for six days. The other consists of internal and external traces for a two-month period of an average-sized enterprise production network (PN). They measure the disk consumption of their methodology for both the archive and the COMPAX encoded bitmap index. The space required to store flow files compressed with lzo is greater than that of the same flow files compressed with gzip. However, when they add oLSH to their solution, they reduce disk consumption by up to 40% for the PN dataset, and up to 30% for the HE dataset. For the index size, they compare the disk space savings on the bitmap index created when using COMPAX as opposed to WAH encoding. COMPAX-compressed bitmap indices are 30% smaller for the PN dataset and 40% smaller for the HE dataset than their WAH-compressed equivalents. Furthermore, when oLSH is enabled for COMPAX, the disk consumption of the index is further reduced up to 60% making it even smaller than its WAH counterpart. Finally, because NET-FLi was designed to archive and build the index for high-speed streams of flow records, Fusco et al. evaluate the average sustainable insertion rate in flows per second (f/s). They find that the processing rates of their system when building the archive and index were fastest when using COMPAX, followed by WAH and lastly COMPAX with oLSH. However, when evaluating the time it takes to query the indices, they found that COMPAX with oLSH was faster than COMPAX, and COMPAX was still faster than WAH. More specifically, they found that the total query time reduction when using COMPAX with oLSH did not come from the Index search time, but from the archive search time of the query.