1/5 DS Project2 A High Level Description of the Design of JWCompress 高嘉蔚 Gao Jiawei 09302010076 1 Introduction JWCompress1 is a Java application for compressing and decompressing files. Features: Compression and decompression of plain text files. Compression and decompression of any binary files. Compression and decompression of folders (recursive folders included). 2 Compression and Decompression The core of compression function in JWCompress is: firstly, build a Huffman tree for the file, then use the Huffman Code to encode the file. 2.1 Building Huffman Tree Firstly, read the whole input file, and count how many times every byte appears Then, following these steps, build the Huffman tree according to the occurrence of every byte. 1) Construct nodes storing the occurrence of every byte. 2) Store every node in a Minimum Priority Queue. 3) While there are more than one element in the queue, choose the two nodes with the least occurrence x1, x2. 4) Delete x1,x2 from the queue. 5) Add a new node x3 to the queue. The occurrence of x3 is the sum of the occurrences of x1 and x2. 6) If there are still more than one element in the queue, go to step 3. 7) After building the tree, denote the Huffman Code of a leaf node as the path from the tree 1 You can pronounce it ”Jiawei Compress”. 2/5 root to the node. Note that if initially there is only one node in the tree, its Huffman Code is “0” (or “1”), instead of “”(empty string). 2.2 Storage of Huffman Code We need to store the Huffman code in order to reconstruct it when decompressing the file. There are four methods to store Huffman Code: Store the Huffman Code as characters; Store the binary Huffman Code; Store the Huffman tree by storing the left and right children of every node; Store the statistics of the occurrence of every byte. Methods 1 and 2 are space consuming and difficult to program. Between methods 3 and 4, I chose method 4, the simplest of all. 2.3 Storage of data How to write binary into files? I use the WriteByte() method. I used a “buffer” variable (of type byte) to store bits. When there are 8 bits in the “buffer”, write the buffer to the output file. Note: I store bytes in the buffer from lower bits to higher bits. That is, if an encoded binary sequence is “10001101”, it is stored in the buffer as the binary number (10110001)2. Because of this “little-endian” implementation, what appears in the binary file is not what the binary sequence really is. To know when the sequence has come to an end, there are several possible implementations, two of which is as following: Method 1: Add a “dummy” node representing for “end of file”. Assign the occurrence of the dummy node to 0, so that the dummy node has the least impact on the size of the compressed file. In the process of decoding, when we meet the Huffman Code of the “dummy” node, we stop the decoding process. Method 2: Store the length of the binary code before the binary code. When decoding, read the length, and count from 1 to the length. I use method 2. 2.4 Reading data Reading tree and building tree are similar to the tree-building method in Compress Class. Reading data is similar as writing data, and is easy to obtain the binary code of the file. So the details are omitted here. 3/5 2.5 Decoding data Given a binary sequence, firstly start at the root of the Huffman tree. 1) For every bit si, if si=0, then go to its left subtree; otherwise go to its right subtree. 2) Continue step 1 until reaching a leaf node. 3) Output the byte this node represents, and return to the root, begin step 1. 3 Recursive Folder The compression and decompression of recursive folders are additional features of the program. 3.1 Compression Recursively compress a file: If the file is a folder: Write its name, and how many children it has. For each of its children, recursively compress them. Otherwise, the file is a file: Write its name, and -1. Use the single file compression method to compress the file. 3.2 Decompression Recursively decompress a file: Read the filename and its children number. If children number is -1, then it is a file, not a folder: Call the single file decompression to decompress this file. Otherwise, the file is a folder: For each of its child, recursively decompress them. 4 UI Design The window has a menu bar, a text area and a single-line text field. The menu bar is for opening files, compression/decompression commands and help. The text area displays everything occurs in the compression/decompression process. The single-line text field at the bottom of the window shows how many bytes the program has already compressed/decompressed. 4/5 5 Additions Some tricky problems: I read every byte of the input file and counted them by storing information of every byte to an array with index 0 to 255. But Java doesn’t have unsigned byte type. So, except for IO statements, I cast everything to integer inside my functions. In decompress process, if the Huffman tree is empty, the following “datasize” still needs to be read. 5/5 Appendix Structure of JWCompress 高嘉蔚 Gao Jiawei 09302010076 Core classes: Compress.java: The class for compression. Decompress.java: The class for decompression RecCompress.java extends Compress.java: The class for recursive compression. RecDecompress.java extends RecDecompress.java: The class for recursive decompression. Utilization classes: Queue.java: The priority queue. Node.java: Node of Huffman tree. UI classes: MainUI.java