Design

advertisement
1/5
DS Project2
A High Level Description
of the Design of JWCompress
高嘉蔚 Gao Jiawei 09302010076
1 Introduction
JWCompress1 is a Java application for compressing and decompressing files.
Features:
 Compression and decompression of plain text files.
 Compression and decompression of any binary files.
 Compression and decompression of folders (recursive folders included).
2 Compression and Decompression
The core of compression function in JWCompress is: firstly, build a Huffman tree for the file, then
use the Huffman Code to encode the file.
2.1 Building Huffman Tree
Firstly, read the whole input file, and count how many times every byte appears
Then, following these steps, build the Huffman tree according to the occurrence of every
byte.
1) Construct nodes storing the occurrence of every byte.
2) Store every node in a Minimum Priority Queue.
3) While there are more than one element in the queue, choose the two nodes with the least
occurrence x1, x2.
4) Delete x1,x2 from the queue.
5) Add a new node x3 to the queue. The occurrence of x3 is the sum of the occurrences of x1
and x2.
6) If there are still more than one element in the queue, go to step 3.
7) After building the tree, denote the Huffman Code of a leaf node as the path from the tree
1
You can pronounce it ”Jiawei Compress”.
2/5
root to the node.
Note that if initially there is only one node in the tree, its Huffman Code is “0” (or “1”), instead of
“”(empty string).
2.2 Storage of Huffman Code
We need to store the Huffman code in order to reconstruct it when decompressing the file.
There are four methods to store Huffman Code:
 Store the Huffman Code as characters;
 Store the binary Huffman Code;
 Store the Huffman tree by storing the left and right children of every node;
 Store the statistics of the occurrence of every byte.
Methods 1 and 2 are space consuming and difficult to program. Between methods 3 and 4, I
chose method 4, the simplest of all.
2.3 Storage of data
How to write binary into files? I use the WriteByte() method. I used a “buffer” variable (of type
byte) to store bits. When there are 8 bits in the “buffer”, write the buffer to the output file.
Note: I store bytes in the buffer from lower bits to higher bits. That is, if an encoded binary
sequence is “10001101”, it is stored in the buffer as the binary number (10110001)2. Because of
this “little-endian” implementation, what appears in the binary file is not what the binary
sequence really is.
To know when the sequence has come to an end, there are several possible
implementations, two of which is as following:
Method 1: Add a “dummy” node representing for “end of file”. Assign the occurrence of the
dummy node to 0, so that the dummy node has the least impact on the size of the compressed
file. In the process of decoding, when we meet the Huffman Code of the “dummy” node, we stop
the decoding process.
Method 2: Store the length of the binary code before the binary code. When decoding, read
the length, and count from 1 to the length.
I use method 2.
2.4 Reading data
Reading tree and building tree are similar to the tree-building method in Compress Class. Reading
data is similar as writing data, and is easy to obtain the binary code of the file. So the details are
omitted here.
3/5
2.5 Decoding data
Given a binary sequence, firstly start at the root of the Huffman tree.
1) For every bit si, if si=0, then go to its left subtree; otherwise go to its right subtree.
2) Continue step 1 until reaching a leaf node.
3) Output the byte this node represents, and return to the root, begin step 1.
3 Recursive Folder
The compression and decompression of recursive folders are additional features of the program.
3.1 Compression
Recursively compress a file:
If the file is a folder:
Write its name, and how many children it has.
For each of its children, recursively compress them.
Otherwise, the file is a file:
Write its name, and -1.
Use the single file compression method to compress the file.
3.2 Decompression
Recursively decompress a file:
Read the filename and its children number.
If children number is -1, then it is a file, not a folder:
Call the single file decompression to decompress this file.
Otherwise, the file is a folder:
For each of its child, recursively decompress them.
4 UI Design
The window has a menu bar, a text area and a single-line text field.
The menu bar is for opening files, compression/decompression commands and help.
The text area displays everything occurs in the compression/decompression process.
The single-line text field at the bottom of the window shows how many bytes the program
has already compressed/decompressed.
4/5
5 Additions
Some tricky problems:
 I read every byte of the input file and counted them by storing information of every byte to
an array with index 0 to 255. But Java doesn’t have unsigned byte type. So, except for IO
statements, I cast everything to integer inside my functions.
 In decompress process, if the Huffman tree is empty, the following “datasize” still needs to
be read.
5/5
Appendix
Structure of JWCompress
高嘉蔚 Gao Jiawei 09302010076
Core classes:
Compress.java: The class for compression.
Decompress.java: The class for decompression
RecCompress.java extends Compress.java: The class for recursive compression.
RecDecompress.java extends RecDecompress.java: The class for recursive
decompression.
Utilization classes:
Queue.java: The priority queue.
Node.java: Node of Huffman tree.
UI classes:
MainUI.java
Download