Package-merge algorithm

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Oravec (talk | contribs) at 15:04, 7 November 2006 (Added Reference Section). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

 ??? 


The Package-Merge Algorithm is an O(nL)-time algorithm for finding an optimal length-limited Huffman code for a given distribution on a given alphabet of size n, where no code string is permitted to have length more than L. It is greedy algorithm which is a generalization of Huffman's original algorithm. Package-merge works by reducing the problem to the binary Coin Collector's problem.

Coin Collector's Problem

Suppose a coin collector has a number of coins of various denominations, each of which has a numismatic value. The coin collector has run out of money and needs to use some of his coin collection to buy something of cost N. He wishes to select a subset of coins from his collection of minimum numismatic value whose denominations total N.

The binary version of this problem is that all denominations are powers of 2, that is, 1, 1/2, 1/4, etc. dollars.

Description of the Algorithm

Assume that the largest denomination is 1 dollar, and that N is an integer. (The algorithm works even if these assumptions do not hold, by trivial modifications.) The coin collector first separates his coins into lists, one for each denomination, sorted by numismatic value. He then packages the smallest denomination coins in pairs, starting from the pair of smallest total numismatic value. If there is one coin left over, it will be the coin of highest numismatic value of that denomination, and it is set aside and ignored henceforth. These packages are then merged into the list of coins of the next smallest denomination, again in order of numismatic value. The items in that list are then packaged in pairs, and merged into the next smallest list, and so forth.

Finally, there is a list of items, each of which is a 1 dollar coin or a package consisting of two or more smaller coins whose denominations total 1 dollar. They are also sorted in order of numismatic value. The coin collector then selects the least value N of them.

Note that the time of the algorithm is linear in the number of coins.

The Reduction

Let L be the maximum length of any code string is permitted to have. Let p1, ... pn be the frequencies of the symbols of the alphabet that needs to be encoded. We first sort the symbols so that pi <= pi+1. Create L coins for each symbol, of denominations 2-1 ... 2-L, each of numismatic value pi. Use the package-merge algorithm to select the set of coins of minimum numismatic value whose denominations total n-1. Let hi be the number of coins of numismatic value pi selected.

The optimal length-limited Huffman code will encode symbol i with a bit string of length hi. The actual Huffman tree can easily be constructed by a simple bottom-up greedy method, given that the hi are known.

External Links

References

  • L. L. Larmore and D. S. Hirschberg. A fast algorithm for optimal length-limited Huffman codes. Journal of the Association for Computing Machinery, V 37 No. 3:464--473, 1990.