4. 1100010110111010001110110100111110001001

  • Worth: 5%

  • DUE: December 6, 11:55pm; submitted on MOODLE

4.1. Task

The goal is to compress data such that they take up less space, but can be uncompressed without losing any information. Huffman Coding is to be used as the compression mechanism. The data will be compressed with an encoding process, and uncompressed with a decoding process.

Objectives:

  • Create HuffmanLeaf and HuffmanInternal classes that implement HuffmanNode

    • These will help with the creation of the Huffman Codes

  • Create a HuffmanNodeComparator class that implements the Comparator interface

    • This is what defines the order of the HuffmanNode objects

  • Create a HuffmanCode class that will encode and decode strings

  • The HuffmanCode object will be used in the main to:

    • Encode strings

    • Decode strings

    • Measure the compression percentage

  • Test the implementation with test classes

4.2. Provided Files

  • A complete HuffmanNode interface

  • A nearly empty HuffmanLeaf class with a HuffmanLeafTest class containing empty tests

  • A nearly empty HuffmanInternal class with a HuffmanInternalTest class containing empty tests

  • A nearly empty HuffmanNodeComparator class with a complete HuffmanNodeComparatorTest class

  • A nearly empty HuffmanCode class with a HuffmanCodeTest class containing empty tests

  • An Asn4 class with an empty main

  • A text file that may be used to seed the Huffman Code

  • All of this can be downloaded from here

    • This is a compressed IntelliJ project

    • Just unzip, put it in the desired location on the computer, and open the project through IntelliJ

Warning

When opening the project, IntelliJ may mention a missing JDK. If this is the case, simply select the download link in the notification to download and install the missing JDK.

4.3. Part 0 — Read the Assignment

Read the assignment description in its entirety before starting.

4.4. Part 1 — Read Up on Huffman Codes

Read up on Huffman Codes. Be sure to dig deeper than just how it is used to compress data. Having a strong understanding of how it works will greatly help with completing the assignment. The Wikipedia page gives details but it is suggested to find additional resources to learn from. Perhaps YouTube or other online resources will prove helpful.

Things of note related to Huffman Codes to look out for:

  • How data is stored within nodes in a binary tree structure

  • How a priority queue can be used

  • How the nodes are ordered

  • How to encode

  • How to decode

4.5. Part 2 — Complete Huffman Nodes

The Huffman nodes are an important piece of the Huffman Codes. There are two types of nodes:

  • Nodes that store a character and it’s weight (leaf nodes)

  • Nodes that have a left and right subtree and a weight (internal nodes)

Both types of nodes have a weight; leaf nodes have a weight equal to the number of times a given character is contained within some string, and the internal nodes’ weight is the sum of their left and right subtrees’ weights.

4.5.1. Implementing the Classes

A complete HuffmanNode interface is provided. This interface will be implemented by the HuffmanLeaf and HuffmanInternal classes. Notice that the interface is very short — getWeight().

Both the HuffmanLeaf and HuffmanInternal will have a weight field, but the HuffmanLeaf will also need to keep track of which character it has. Have the constructor for the leaf nodes always convert the character to lower case. Additionally, the HuffmanInternal needs fields for referencing its left and right subtrees (HuffmanNode objects).

These classes will be immutable; only write getters as the data within the fields will not need to be updated after the objects are initialized.

To help with testing, also write a toString method.

4.5.2. Implementing the Unit Tests

There are skeletal test classes with incomplete test methods for each of the node types. The method names should provide an idea of what each method should be testing. Complete the unit tests and ensure the implemented node classes are working correctly.

4.6. Part 3 — Complete HuffmanNodeComparator

Although comparable was used when implementing the ArraySortedBag and LinkedBinarySearchTree, here a Comparator is to be used. A Comparator is a very similar idea to comparable, but it allows for more flexibility when needing to define more complex orderings.

4.6.1. Implementing the Class

Write the int compare(HuffmanNode huff1, HuffmanNode huff2) method in the provided HuffmanNodeComparator class.

The compare method must return a negative integer if huff1 < huff2, zero if huff1 == huff2, and a positive integer if huff1 > huff2.

Make sure to have a good sense of the order the HuffmanNode objects must have based on the Huffman Code compression strategy:

  • Nodes with smaller weights come first

  • If a leaf and an internal node have the same weight, the leaf comes first

  • If two leaf nodes have the same weight, the one with the smaller ASCII value comes first

  • If two internal nodes have the same weight, the order does not matter

The instanceof operator can be used to check if a given reference variable is referencing an object of a specific type. For example, huff1 instanceof HuffmanLeaf results in true if huff1 is referencing something of type (or subtype) of HuffmanLeaf.

4.6.2. Run the Unit Tests

Since testing Comparator objects can be quite tedious, a complete set of unit tests for the class is provided. If everything is working properly in the HuffmanNodeComparator class, the tests should pass.

Make sure to take time investigating these unit tests. One thing of note is that the primary way the HuffmanNodeComparator is tested is by sorting a list based on the order defined by the HuffmanNodeComparator.

In the method compare_unsortedList_sortsList:

  • A list of HuffmanNode objects in their proper order, referenced by sorted, is created

  • A copy of that list is made and referenced by unsorted

  • The unsorted list is shuffled (to make it actually unsorted)

  • The unsorted list is then sorted based on the ordering of the HuffmanNodeComparator (classUnderTest)

The idea here is, if the HuffmanNodeComparator sorts the whole list such that the elements return to their correct sorted order as laid out in sorted, then the HuffmanNodeComparator should be correct.

4.7. Part 4 — Complete HuffmanCode

The HuffmanCode class will make use of the HuffmanNode and HuffmanNodeComparator class to implement the compression. Ultimately, this class will be responsible for building the Huffman Tree based on some seed string and encoding and decoding strings.

4.7.1. Notes About the Implementation

  • A PriorityQueue should be used, but instead of implementing one, use the one from java.util

  • Make use of factory methods for instantiating the objects

    • Refer to the Maze class from assignments 2 and 3

    • Make the constructor private

    • An already “complete” fromFile factory method is provided

      • Although, it delegates all the work to the fromString method that needs to be implemented

  • Under the hood, this class is creating and storing a binary tree of HuffmanNode objects

  • It is recommended to have a field within the class that stores each character’s encoding in a Map<Character, String>

  • Do not be afraid to make use of private helper methods as needed

4.7.2. Implementing the Class

The class should have at least one field — a reference to the root of the Huffman Tree. As discussed above, it is recommended to also include another field — a map storing each character’s bitstring encoding.

Complete the public static HuffmanCode fromString(String seed) method. This factory method will do all the setup for the class and return a reference to a HuffmanCode object. Expect this method to:

  • Count the number of times each character exists in the seed string

  • Initialize the priority queue

  • Build the tree

  • Populate the character encoding map

  • Return a new HuffmanCode

It is recommended to make use of a private helper method to populate the character encoding map. A recursive traversal of the tree would work well in this scenario.

Complete the public String encode(String string) method which takes a string and returns a bitstring encoding. The encoding is based on the encoding of each character.

Complete the public String decode(String bits) method for decoding the bitstring. This is the inverse operation of encode. With the use of a private helper method, decode the string recursively.

4.7.3. Implementing the Unit Tests

There is a skeletal test class with incomplete test methods for the HuffmanCode class. The method names should provide an idea of what each method should be testing. Complete the unit tests and ensure the class is working correctly.

4.8. Part 5 — Putting it Together

Once everything is done, write the main method to compress some strings. Like the previous assignments, this method will likely be short.

This method must:

  • Create a HuffmanCode instance from a string or a file

  • Encode some string

  • Decode the string

  • Print out the original, encoded, and decoded strings

  • Print out the percentage the message got compressed

To calculate the compression percentage, make a few assumptions:

  • Assume that a given character takes up 1 byte (8 bits)

    • If the string has 10 characters, then we will assume it takes up 80 bits

    • In reality, characters may take up more space than 8 bits, but ignore this here

  • Assume that the 0s and 1s in the encoded message are each 1 bit

    • If the encoded bitstring has 20 characters, then assume it takes up 20 bits

    • In reality, the 0s and 1s are being stored in a string, meaning each is actually a character that takes up more space, but ignore this here

4.9. Part 6 — Testing

The correctness of the classes may have already been verified by completing and running their test classes. If not, do it!

If they have already been verified, for good measure, re-run all the tests provided and the ones completed for the assignment. If they all pass, one should be pretty confident that everything is working correctly.

There is no test provided for the Asn4 class, but that’s nothing to worry about. Get a sense that it is working correctly by

  • Running the program on different seeds

  • Encoding and decoding various strings

4.10. Some Hints

  • Work on one method at a time

  • Get each method working perfectly before you go on to the next one

  • Test each method as you write it

    • This is a really nice thing about programming; you can call your methods and see what result gets returned

    • Mentally test before you even write — what does this method do? What problem is it solving?

  • If you need help, ask

    • Drop by office hours

4.11. Some Marking Details

Warning

Just because your program produces the correct output, that does not necessarily mean that you will get perfect, or even that your program is correct.

Below is a list of both quantitative and qualitative things we will look for:

  • Correctness?

  • Did you follow instructions?

  • Comments?

  • Variable Names?

  • Style?

  • Did you do just weird things that make no sense?

4.12. What to Submit to Moodle

  • Make sure your NAME, SCHOOL EMAIL, and STUDENT NUMBER appear in a comment at the top of the classes

  • Submit the completed .java files to Moodle

    • HuffmanLeaf.java and HuffmanLeafTest.java

    • HuffmanInternal.java and HuffmanInternalTest.java

    • HuffmanNodeComparator.java

    • HuffmanCode.java and HuffmanCodeTest.java

    • Asn4.java

    • Do not submit the provided test classes

    • Do not submit the seed file

    • Do not submit the .class files

    • Do not compress the files

Warning

Verify that your submission to Moodle worked. If you submit incorrectly, you will get a 0.

4.13. Assignment FAQ