4. 1100010110111010001110110100111110001001
Worth: 5%
DUE: Sunday August 17, 11:55pm; submitted on MOODLE
4.1. Task
The goal is to compress data such that they take up less space, but can be uncompressed without losing any information. Huffman Coding is to be used as the compression mechanism. The data will be compressed with an encoding process, and uncompressed with a decoding process.
Objectives:
Create
HuffmanLeafandHuffmanInternalclasses that implementHuffmanNodeThese will help with the creation of the Huffman Codes
Create a
HuffmanNodeComparatorclass that implements theComparatorinterfaceThis is what defines the order of the
HuffmanNodeobjects
Create a
HuffmanCodeclass that will encode and decode stringsThe
HuffmanCodeobject will be used in themainto:Encode strings
Decode strings
Measure the compression percentage
Test the implementation with test classes
4.2. Provided Files
A complete
HuffmanNodeinterfaceA nearly empty
HuffmanLeafclass with aHuffmanLeafTestclass containing empty testsA nearly empty
HuffmanInternalclass with aHuffmanInternalTestclass containing empty testsA nearly empty
HuffmanNodeComparatorclass with a completeHuffmanNodeComparatorTestclassA nearly empty
HuffmanCodeclass with aHuffmanCodeTestclass containing empty testsAn
Asn4class with an emptymainA text file that may be used to seed the Huffman Code
All of this can be downloaded from hereThis is a compressed IntelliJ project
Just unzip, put it in the desired location on the computer, and open the project through IntelliJ
Warning
When opening the project, IntelliJ may mention a missing JDK. If this is the case, simply select the download link in the notification to download and install the missing JDK.
4.3. Part 0 — Read the Assignment
Read the assignment description in its entirety before starting.
4.4. Part 1 — Read Up on Huffman Codes
Read up on Huffman Codes. Be sure to dig deeper than just how it is used to compress data. Having a strong understanding of how it works will greatly help with completing the assignment. The Wikipedia page gives details but it is suggested to find additional resources to learn from. Perhaps YouTube or other online resources will prove helpful.
Things of note related to Huffman Codes to look out for:
How data is stored within nodes in a binary tree structure
How a priority queue can be used
How the nodes are ordered
How to encode
How to decode
4.5. Part 2 — Complete Huffman Nodes
The Huffman nodes are an important piece of the Huffman Codes. There are two types of nodes:
Nodes that store a character and it’s weight (leaf nodes)
Nodes that have a left and right subtree and a weight (internal nodes)
Both types of nodes have a weight; leaf nodes have a weight equal to the number of times a given character is contained within some string, and the internal nodes’ weight is the sum of their left and right subtrees’ weights.
4.5.1. Implementing the Classes
A complete HuffmanNode interface is provided. This interface will be implemented by the HuffmanLeaf and
HuffmanInternal classes. Notice that the interface is very short — getWeight().
Both the HuffmanLeaf and HuffmanInternal will have a weight field, but the HuffmanLeaf will also need to
keep track of which character it has. Have the constructor for the leaf nodes always convert the character to lower
case. Additionally, the HuffmanInternal needs fields for referencing its left and right subtrees (HuffmanNode
objects).
These classes will be immutable; only write getters as the data within the fields will not need to be updated after the objects are initialized.
To help with testing, also write a toString method.
4.5.2. Implementing the Unit Tests
There are skeletal test classes with incomplete test methods for each of the node types. The method names should provide an idea of what each method should be testing. Complete the unit tests and ensure the implemented node classes are working correctly.
4.6. Part 3 — Complete HuffmanNodeComparator
Although comparable was used when implementing the ArraySortedBag and LinkedBinarySearchTree, here a
Comparator is to be used.
A Comparator is a very similar idea to comparable, but it allows for more flexibility when needing to define
more complex orderings.
4.6.1. Implementing the Class
Write the int compare(HuffmanNode huff1, HuffmanNode huff2) method in the provided HuffmanNodeComparator class.
The compare method must return a negative integer if huff1 < huff2, zero if huff1 == huff2, and a positive
integer if huff1 > huff2.
Make sure to have a good sense of the order the HuffmanNode objects must have based on the Huffman Code compression
strategy:
Nodes with smaller weights come first
If a leaf and an internal node have the same weight, the leaf comes first
If two leaf nodes have the same weight, the one with the smaller ASCII value comes first
If two internal nodes have the same weight, the order does not matter
The instanceof operator can be used to check if a given reference variable is referencing an object of a specific
type. For example, huff1 instanceof HuffmanLeaf results in true if huff1 is referencing something of type
(or subtype) of HuffmanLeaf.
4.6.2. Run the Unit Tests
Since testing Comparator objects can be quite tedious, a complete set of unit tests for the class is provided. If
everything is working properly in the HuffmanNodeComparator class, the tests should pass.
Make sure to take time investigating these unit tests. One thing of note is that the primary way the
HuffmanNodeComparator is tested is by sorting a list based on the order defined by the HuffmanNodeComparator.
In the method compare_unsortedList_sortsList:
A list of
HuffmanNodeobjects in their proper order, referenced bysorted, is createdA copy of that list is made and referenced by
unsortedThe
unsortedlist is shuffled (to make it actually unsorted)The
unsortedlist is then sorted based on the ordering of theHuffmanNodeComparator(classUnderTest)
The idea here is, if the HuffmanNodeComparator sorts the whole list such that the elements return to their correct
sorted order as laid out in sorted, then the HuffmanNodeComparator should be correct.
4.7. Part 4 — Complete HuffmanCode
The HuffmanCode class will make use of the HuffmanNode and HuffmanNodeComparator class to implement the
compression. Ultimately, this class will be responsible for building the Huffman Tree based on some seed string and
encoding and decoding strings.
4.7.1. Notes About the Implementation
A
PriorityQueueshould be used, but instead of implementing one, use the one from java.utilTake care to notice the methods as this implementation does not call the add/remove
enqueue/dequeue
Make use of factory methods for instantiating the objects
Refer to the
Mazeclass from assignments 2 and 3Make the constructor private
An already “complete”
fromFilefactory method is providedAlthough, it delegates all the work to the
fromStringmethod that needs to be implemented
Under the hood, this class is creating and storing a binary tree of
HuffmanNodeobjectsIt is recommended to have a field within the class that stores each character’s encoding in a
Map<Character, String>Do not be afraid to make use of private helper methods as needed
4.7.2. Implementing the Class
The class should have at least one field — a reference to the root of the Huffman Tree. As discussed above, it is recommended to also include another field — a map storing each character’s bitstring encoding.
Complete the public static HuffmanCode fromString(String seed) method. This factory method will do all the setup for
the class and return a reference to a HuffmanCode object. Expect this method to:
Count the number of times each character exists in the seed string
Initialize the priority queue
Build the tree
Populate the character encoding map
Return a new
HuffmanCode
It is recommended to make use of a private helper method to populate the character encoding map. A recursive traversal of the tree would work well in this scenario.
Complete the public String encode(String string) method which takes a string and returns a bitstring encoding. The
encoding is based on the encoding of each character.
Complete the public String decode(String bits) method for decoding the bitstring. This is the inverse operation of
encode. With the use of a private helper method, decode the string recursively.
4.7.3. Implementing the Unit Tests
There is a skeletal test class with incomplete test methods for the HuffmanCode class. The method names should
provide an idea of what each method should be testing. Complete the unit tests and ensure the class is working
correctly.
4.8. Part 5 — Putting it Together
Once everything is done, write the main method to compress some strings. Like the previous assignments, this method
will likely be short.
This method must:
Create a
HuffmanCodeinstance from a string or a fileEncode some string
Decode the string
Print out the original, encoded, and decoded strings
Print out the percentage the message got compressed
To calculate the compression percentage, make a few assumptions:
Assume that a given character takes up 1 byte (8 bits)
If the string has 10 characters, then we will assume it takes up 80 bits
In reality, characters may take up more space than 8 bits, but ignore this here
Assume that the 0s and 1s in the encoded message are each 1 bit
If the encoded bitstring has 20 characters, then assume it takes up 20 bits
In reality, the 0s and 1s are being stored in a string, meaning each is actually a character that takes up more space, but ignore this here
4.9. Part 6 — Testing
The correctness of the classes may have already been verified by completing and running their test classes. If not, do it!
If they have already been verified, for good measure, re-run all the tests provided and the ones completed for the assignment. If they all pass, one should be pretty confident that everything is working correctly.
There is no test provided for the Asn4 class, but that’s nothing to worry about. Get a sense that it is working
correctly by
Running the program on different seeds
Encoding and decoding various strings
4.10. Some Hints
Work on one method at a time
Get each method working perfectly before you go on to the next one
Test each method as you write it
This is a really nice thing about programming; you can call your methods and see what result gets returned
Mentally test before you even write — what does this method do? What problem is it solving?
If you need help, ask
Drop by office hours
4.11. Some Marking Details
Warning
Just because your program produces the correct output, that does not necessarily mean that you will get perfect, or even that your program is correct.
Below is a list of both quantitative and qualitative things we will look for:
Correctness?
Did you follow instructions?
Comments?
Variable Names?
Style?
Did you do just weird things that make no sense?
4.12. What to Submit to Moodle
Make sure your NAME, SCHOOL EMAIL, and STUDENT NUMBER appear in a comment at the top of the classes
Submit the completed .java files to Moodle
HuffmanLeaf.java and HuffmanLeafTest.java
HuffmanInternal.java and HuffmanInternalTest.java
HuffmanNodeComparator.java
HuffmanCode.java and HuffmanCodeTest.java
Asn4.java
Do not submit the provided test classes
Do not submit the seed file
Do not submit the .class files
Do not compress the files
Warning
Verify that your submission to Moodle worked. If you submit incorrectly, you will get a 0.