Algorithms for Infinite Huffman-Codes (Extended Abstract)
Mordecai J. Golin Dept. of Computer Science Hong Kong U.S.T. golin@cs.ust.hk
Abstract Optimal (minimum cost) binary prefix-free codes for infinite sources with geometrically distributed frequencies, e.g., P = {pi (1 - p)} 0 , 0 < p < 1, were first (imi= plicitly) suggested by Golomb over thirty years ago in the context of run-length encodings. Ten years later Gallager and Van Voorhis exhibited such optimal codes for al l values of p. Just recently Merhav, Seroussi and Weinberger extended this further to find optimal binary prefix-free codes for two-sided geometric distributions. These codes were derived by cleverly "guessing" optimal codes for finite sources, validating these guesses by using the sibling property of Huffman encoding, and then showing that the finite codes converge in a very specific sense to an optimal infinite one. In this paper we describe the first algorithmic approach to constructing optimal prefix-free infinite codes. Our approach is to define an infinite weighted graph with the property that the least cost infinite path in the graph corresponds to the optimal code. We then show that even though the graph is infinite, the leastcost infinite path has a repetitive structure and that it is therefore possible to not only find this path but to find it relatively efficiently. This approach will work for even more complicated generalizations of geometric sources where solutions can't be guessed as well as in extensions of Huffman-coding for which the Huffman algorithm no longer works, e.g., non-uniform cost encoding alphabet characters and/or other restrictions on the codewords. We illustrate our approach by deriving an algorithm for constructing optimal prefix free codes with a geometric source for the telegraph channel. We also implement our algorithm and show what the constructed codes look like in this case.

Kin Keung Ma Dept. of Computer Science Hong Kong U.S.T. kkma@cs.ust.hk
1 Intro duction In this paper we present the first algorithmic approach to constructing minimal-cost prefix codes for (generalized) geometric distributions.
Due to space considerations we do not develop the approach in its ful l generality. Instead, we il lustrate the technique by developing an algorithm that works for one variant of Huffman coding and then sketch how this can be modified to work for many other variants. Some proofs are also omitted.

1.1

Background Consider a distribution1 P = {pi }n-1 , i=0 p0  p1  p2  · · ·  pn-1

on a set of n letters. The Huffman Encoding problem is to associate a prefix-free set of n binary words {wi }n 0  i= {0, 1} with P such that the expected word length n i=0 pi · length(wi ) is minimized, where length(wi ) is the number of bits in wi , e.g., length(0110) = 4. A prefix-free set is one in which i = j, wi is not a prefix of wj . It is well known that finding such a code is equivalent to finding a tree with n leaves such that, when li is the length of the i-th highest leaf nn the tree, then i the expected external path length i=0 pi li (achieved by placing the i-th letter (pi ) at the i-th highest leaf ) is minimized. Such a tree may easily be found using the well-known Huffman Encoding Algorithm [13]. Suppose now that the situation is modified slightly to permit infinite sources, i.e., P = {pi } 0 , i= p0  p 1  p 2  · · · .

In this case the problem of finding a prefix-free code, or equivalently, an infinite tree labelled with the pi , with minimum weighted external path length, is not
1 For

work was supported by Hong Kong CERG grants HKUST6162/00E, HKUST6082/01E and HKUST6206/02E

 This

and

Copyright © 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688

i

this paper a distribution is a sequence such that, i, pi > 0 i pi < . We do not require p i = 1.

758


times string can be written uniquely as the concatenation of different Xi s with the probability of Xi occurring being (1 - p)pi . We thus have a situation in which strings are composed of words from an infinite source with given distribution Pp . Other problems that can be recast as finding a min-cost infinite tree with distribution Pp arise in operations research [12] and group testing [14] [19]. This special case of P = Pp was studied by Gallager and Van Voorhis [6] who exhibited an optimal tree for every p. Their technique was to first define a countable 3 1 2 0 sequence of finite sources Pp , Pp , Pp , Pp , . . . that were better and better approximations to Pp . They then "guessed" the structure of the optimum Huffman code for these sources, verified the correctness of their guess by using the "sibling" property3 of Huffman trees, and then showed that these codes "converge" to an infinite tree that is optimal for the infinite source. Their result can be stated as:

nearly as well understood. It was proven2 in [17] that optimal i trees (codes) exist if and only if the pi log pi , of P is bounded but there is entropy - no algorithm for constructing optimal codes that works for all such P with bounded entropy. The reason that the basic Huffman algorithm can not be modified to work in such a case is that the Huffman algorithm starts by identifying the two smallest probabilities in the distribution and merging them; in the infinite source case, there is no smallest probability. Certain restricted cases of the infinite source problem are better understood, though. The best known and earliest such case studied is that of the infinite binary codes (e.g., using only 0-s and 1-s) for the infinite geometric source. This is the source that fixes some p, 0 < p < 1, and then defines Pp = {(1 - p)pi } 0 . As was i= noted by Golomb [11], such a source arises, for example, in the description of run-length encoding. Suppose we have a string of As and Bs in which each character occurs independently of every other one; Bs occurring with probability p and As with probability 1 - p. Now, for i = 0, 1, 2, . . . set Xi = B B . . . B B A. Every infinite i

Theorem 1. (Gal lager and Van Voorhis [6]) Given p, let m be the unique integer that satisfies pm + pm+1  1 < pm + pm-1 . Let a tree T be described as a sequence Ii , i = 0, 1, 2, 3, . . . where Ii is the number of internal nodes on level i. Then the tree described by (1.1) I0 , I 1 , I 2 , I 3 , . . . = 1, 2, 4, . . . , 2 log2 m , m, m, m, . . .

is optimal for Pp . If we use Mm to denote the mth such tree then Figure 1 contains the tops of M1 , M2 , M3 , M4 and M5 . For later reference we point out that if qm is the unique real root of 1 - pm - pm-1 = 0 in (0, 1) then {qm }=1 monotonically increases and converges to 1. m Another way of viewing the theorem is that the intervals (qm , qm+1 ]=1 partition (0, 1) into a countable set of m intervals and that p  (qm-1 , qm ), the optimal tree for Pp is Mm . These optimal codes (trees) were later shown to be the unique optimal codes for the given sources in [8]. Gallager and Van Voorhis' basic technique was later expanded upon and extended by [1] and [16] who showed that it could be applied to other families of infinite sources to show the existence of the optimal trees. [1] also showed how to extend the theorem to cover the ternary case in which every parent can have three children rather than two. This work was recently pushed even further by [18] who constructed optimal codes for two-sided geometric distributions (which have become useful in lossless image compression schemes). These are distributions Pp,d consisting of the infinite sequence px,d = p|x+d| , x = 0, ±1, ±2, ±3, . . .

2 since the set of co des is infinite, existence of an optimal co de is not a-priori obvious. 3 Let T b e a binary tree with n leaves lab elled with the n probabilities, p1 , . . . , pn . The weight of a leaf will be its assigned probability; the weight of an internal node is recursively defined to be the sum of the weights of its children. T has the sibling property if the weights of its nodes, read left to right, top to bottom, are nonincreasing. The essential observation is that a tree with the sibling property can be produced by the Huffman algorithm and therefore represents an optimal code. The subtle point is that the sibling property does not construct optimal codes; it verifies that guessed codes are optimal.

where 0 < p < 1 and d  is fixed4 . They showed that the real (p, d) plane can be subdivided into regions such that, in each region, all of the Pp,d have the same optimal tree. A comprehensive survey of the latest results may be found in [2]. All of these results (except for [8] which was proving uniqueness) share the same basic approach in that they construct an optimal code/tree for the infinite source by (i) constructing a sequence of finite sources that are better and better approximations to the infinite source P, (ii) "guessing" the structure of the optimum Huffman code for these finite sources (iii) verifying
4 Note

that the px are not necessarily sorted in decreasing

order.

759


M1 r @ r@ @ r@ @ r@ @ r@ @ r@ @ r@ M4

M2 r @ r @r @ HH @ r @r @HH HH @@ r @r @HH @ HH @ r @r @HH HH @@ r @r @HH @ HH @ r @r @HH M5

M3 r @ r @r @ HH @ H r @r @r H PP HHHHP @@ P H r @r @r HHHPP PP HHHHP @@ P H r @r @r HHHPP PP HHHHP @@ P H r @r @r HHHPP PP HHHHP @@ P H r @r @r HHHPP r @ r @r H @ @HH r @r @r Hr X HP HPX HHPPP @ @HHPPPXXXX P r @r @r Hr Hr `PPXX P XPX P`` P` H PPXX ` @ HHHPPXXXXXX`` @ H H PP P X ` r @r @r Hr Hr `PXXXXX`` `P HHPPXXXXX X` HHPPXX` ` @@ PPPPX X`` HHP ` r @r @r Hr Hr `PXXXXX`` P` P XPX P`P X @ HHHPPXXXXXX`` @ H PPXX `` H H PP P X ` r @r @r Hr Hr PPXXXXX``

r @ r @r H @ @HH r @r @r Hr X HP HPX HHPPP @ @HHPPPXXXX r @r @r Hr HPPPPXX P H PHXPP @ @HHPPXXXX H HHr PPPPXXX r @r @r PH P P H PHXPP @ @HHPPXXXX H HP P P r @r @r Hr HPPPXXX P HHPPXXXX @ @ HHPPPPX HHr PPPPXXX HH r @r @r P

Figure 1: The tops of the infinite "optimal" trees M1 , M2 , M3 , M4 and M5 . In this figure a filled-in circle represents an internal node while a square represents a leaf. the correctness of the guess by using the "sibling" property of Huffman trees and (iv) showing that the finite codes/trees "converge" to an infinite tree that must be optimal for the infinite source. 1.2 Our new results The advantage of the approach used in the previous papers is that, by guessing the code structures, it at one stroke reports the optimal prefixfree codes for al l of the distributions in the class. A ma jor disadvantage is that the approach only works if it is possible to validate the guessed codestructure. There are many variants of prefix-free coding for which the Huffman algorithm does not work, e.g., when the encoding alphabet has unequal letter costs[15, 10] or has restrictions on the codewords, e.g., the one ended codes of [3] or the restricted-zero codes of [5] ([2] has a nice survey of other variations); the Huffman algorithm does not work for any of these variations so it is impossible to verify guessed trees and the technique above can not be employed. Another disadvantage is that, even for normal prefix-codes, it can be quite difficult to guess and verify the optimal codes for complicated sources. The twosided geometric distributions of [18] seem to be as far as the technique can be pushed. In this paper we develop the first algorithmic technique for constructing optimal prefix codes. It will permit unequal cost encoding alphabets that `charge' more to transmit a `0' than to transmit a `1' and will also permit placing restrictions on the structures of the codewords, e.g., all codewords must end with a `1'; the technique will actually permit any restriction of the type, "all codewords must belong to given regular language L". The technique will also allow many generalizations of geometric distributions that use finite memory. For example, it will allow distributions of the type 1, p1 , p2 , . . . , pi , p, p1+1 , p1+2 , . . . , p1+i , p2 , p2+1 , p2+2 , . . . , p2+i , p3 , p3+1 , p3+2 , . . . , p3+i , . . .

(1.2)

where 0  1  2  · · ·  i < 1, a natural generalization of the two sided distribution of [18]. The input to the algorithm will be the costs of the encoding letters, the restriction L on the codewords and

760


the distribution. The output will be a description of the optimal infinite coding tree. There are two ma jor ideas behind our algorithm. The first is the fact, developed for finite-source unequalcost coding in [10] and later modified for restricted coding in [4], that prefix-code trees can be represented by paths in a weighted graph; a minimal-cost tree corresponds to a min-cost path. In our problem this means that we are looking for a minimum cost infinitelink path in some infinite weighted graph. The second idea generalizes the observation [8] that optimum Huffman trees for the geometric source must be cyclic. That is, the structure of the optimum tree must be some finite head, followed by a finite part that infinitely cycles; Figure 1 illustrates the tops of the first 5 optimal infinite trees. We can translate this into our graph description to show that a minimum cost infinite-link path must have a particular finite cyclic structure, i.e., a "head" followed by an infinitely repeated cycle, and then modify standard shortest path algorithms to look for a finite path that generates a minimum cost infinite cycle. (1, 2)-lopsided trees for geometric distributions. As mentioned at the beginning of this abstract, due to space considerations we do not develop the approach in its full generality. Instead, we restrict ourselves to developing an algorithm that works for the special case of (1, 2) lopsided trees for geometric distributions. This example illustrates all of the basic ideas of the general technique. In the general unequal-cost or lopsided tree problem, we are given weights ,  such that length(0) =  and length(1) =  . The length of a word w is the sum of the lengths of its characters. For example, if length(0) = 1 and length(1) = 2 (this is sometimes known as the telegraph channel [7]) then length(1000) = 5. Given ,  the (,  )-lopsided tree problem is ito find a prefixpi length(wi ) is free set of codewords wi such that minimized. Equivalently, by setting the length of a left edge to be  and that of a right edge to be  (Figure 5) the problem is equivalent to finding a tree such that when i is the length of the i-th highest leafi in the tree pi i of the then the expected external path length, tree is minimized. For finite source distributions, even though the Huffman algorithm does not work, there are algorithms for solving this problem exactly e.g., [15, 10] and approximately, e.g., [9]; it is still unknown though, whether the problem is NP-Hard, polynomial time solvable, or lies somewhere else. For the geometric source, there is no known algorithm for finding the tree. The previously developed 2

techniques can not be applied because they require using the Huffman algorithm to verify guessed codes and the Huffman algorithm does not work for lopsided trees. In what follows we describe an algorithmic technique for finding the optimal (1, 2)-lopsided tree for a given geometric distribution Pp . The technique can easily be generalized to work for any (,  )-lopsided tree when / is a rational number. We first derive theorems describing the structure of the minimum cost (1, 2)-lopsided trees for the geometric distribution Pp . We then use these theorems to develop an algorithm which, given fixed p, will output the corresponding optimal tree for Pp . Finally, the section concludes with the output of our program for various values of p. 2.1 Notation for describing optimal trees In this section we generalize the notation introduced in [8] for infinite Huffman trees so that it can represent infinite (1, 2)-lopsided trees. See Figure 2 for an example. Definition 2.1. A tree T = {(Il , Cl , El )} 0 is an inl= finite sequence of tuples of non-negative integers satisfying (I0 , C0 , E0 ) = (1, 0, 0) and (2.3) l  0, El+2 + Il+2 = Il+1 + Il Cl+1 = Il .

The Il , Cl and El are called internal nodes, cross nodes and external (or leaf) nodes respectively. Intuitively Il and El are the number of internal nodes and leaves at level l; Cl counts the number of tree edges (of length 2) that pass through level l without stopping at level l. Given a tree T we define Cl (T ) = Cl , El (T ) = El . j We also define Al (T ) = l Ej (T ) where Al is the number of leaves on or above level l. When T is understood we will simply write Al . We can now define Cost(., .) in a way that corresponds to the natural cost of a tree given a source. Definition 2.2. Let T be a tree and di (T ), i = 0, 1, 2, ... be the depth of the ith leaf in T , breaking ties arbitrarily. Thus, di (T ) = min{l : i < Al (T )}. Let P = {pi } 0 be a nonincreasing sequence of nonnegi= ative reals. The cost of T labelled by P is the external path length of T when its leaves, sorted by increasing depth, are label led with the pi , i.e., 0 0 Cost(T , P ) = pi di (T ) = pi min{l : i < Al (T )}.
i i

l  0,

Il (T ) = Il ,

761


I C L Al
0 1 2 3 4 5 6 7 . . . .

I C L Al
0 1 2 3 4 5

1 1 1 1 1 1 1 1

0 1 1 1 1 1 1 1

0 0 1 1 1 1 1 1

0 0 1 2 3 4 5 6

6 7 8 9 10 . . . .

1 1 2 1 2 1 2 1 2 1 2

0 1 1 2 1 2 1 2 1 2 1

0 0 1 2 1 2 1 2 1 2 1

0 0 0 2 3 5 6 8 9 11 12

 Figure 2: Two examplesl of infinite trees and their corresponding {(Il , Cl , El )}l=0 description. For easy reference we also provide Aj = j El

As examples note that the first tree in Figure 2 has cost 2p0 + 3p1 + 4p2 + . . . = while the second tree has cost 3p0 + 3p1 + 4p2 + 5p3 + 5p4 + 6p5 + 7p6 + . . . i [(2i + 3)(p3i + p3i+1 ) + (2i + 2)p3i+2 ] =
=0

i

(i + 2)pi

efinition 2.3. A tree T is optimal for Pp if, for al l trees T , Cost(T , Pp )  Cost(T , Pp ). Our problem is, given p, to find an optimal tree for Pp . 2.2 The Structure of Optimal Trees In [8] it is proven that optimal (1, 1) trees for geometric distributions have bounded width. The same idea can be applied to (1, 2) trees as well. We sketch the proofs here: Lemma 2.2. (Optimal Trees have bounded width). Let p be fixed and B (p) = min{k : pk < 1-p }. If T 2 is any optimal tree for Pp , then l, Il (T )  B (p). Furthermore, l, El (T )  2B (p) and Cl (T )  B (p). Pro of. (Sketch) If there is some optimal T and l such that Il (T ) > B (p) simply replace Il (T ) - B (p) of the internal nodes on level l with leaves. It is a relatively straightforward calculation to show that the resulting tree has lower cost than T , contradicting optimality. The second part of the lemma follows from the fact that l > 1, El (T )  Il-1 (T ) + Il-2 (T ) and Cl (T ) = Il-1 (T )

=0

Note that for geometric distribution Pp the cost has a particularly simple form: Lemma 2.1. If p is fixed then 0 Cost(T , Pp ) = pi min{l : i < Al (T )}
i

(2.4)

= =

0

l

10 1-p

A

pj

l j

pAl (T ) .

l

Returning to the same examples the first tree in Figure 2 has cost
p0 =1 1 + p0 + p1 + p2 + p3 + . . . 1-p 1-p 1 + 1 1-p .

2

Similarly, the second tree has cost

An immediate consequence of this lemma is that there are at most 2B 3 (p) different possible {(Il , Cl , El )} triples that can appear in any optimal tree for Pp . This means that somewhere in the first 2B 3 (p) levels of an optimal tree T , some level must repeat. Theorem 2. (Cycle Theorem) Let p be fixed. Let T be an optimal tree for. Let t be the smal lest index such that there exists an i such that (It (T ), Ct (T ), Et (T )) = (2.5) (It+i (T ), Ct+i (T ), Et+i (T )). D For this t let i be the smal lest i such that (2.5) is satisfied. Now define tree T to be the tree that has the

= p0 1 + p0 + p0 + p2 + p3 + p5 + p6 + p8 + . . . 1-p = 2 i i 1 i p1+3i p- + 1-p =0 =0 2 1 p 1 - + 1-p 1 - p 1 - p3

762


same first t levels as T and then infinitely repeats the next i levels. That is (Il (T ), Cl (T ), El (T ))   (Il (T ), Cl (T ), El (T ))     (It+(l-t) mod i (T ), =   Ct+(l-t) mod i (T ),    Et+(l-t) mod i (T )) if l < t.

if l  t.

Then T

i

s also an optimal tree for Pp .

Pro of. (Sketch) Split T into three parts; levels 0, . . . , t - 1 are H ead(T ), levels t, . . . , t + i - 1 are M id(T ) and levels t + i, . . . ,  are T ail(T ). Let T j be the infinite tree composed of H ead(T ) followed by j copies of M id(T ) followed by T ail(T ) (it is not difficult to see that T j satisfies (2.3) and is therefore really a tree). The optimality of T can be shown to imply that each T j is also optimal; otherwise T could be modified to build a lower cost tree. The trees T j are all optimal and, as a sequence, converge levelwise to tree T . Standard convergence theorems [17, 8] then imply that T is optimal.

Al  2B (p)l. Putting this all together we see that both the degrees of the numerators and denominators are upper-bounded by 8B 4 (p). 2 Now, note that given p, the number of cyclic trees that could possibly be optimal trees for any p  (0, p] is bounded by a function of p. The cost of the optimal tree is therefore the lower envelope of a finite number of rational functions in p . The numerators and denominators of the rational functions are also bounded by some universal constant in p so any pair of such functions only intersects a constant number of times. This implies that the lower envelope of these functions has complexity bounded by a function in p. Putting all of this together we see that the interval (0, p] can be subdivided into a finite number of subintervals, each with an associated tree which is optimal for every p in the subinterval. This implies Theorem 3. (Partition theorem) There exists a monotonical ly increasing sequence of real numbers {qi } 0 i= such that q0 = 0 and limi qi = 1 and a countable sequence of trees {Ti } 0 such that tree Ti is optimal i= for p  (qi , qi+1 ]. As mentioned earlier, Gallager and Voorhis' result (Theorem 1) implies this result for the standard Huffman case. Their proof was constructive. We have just seen that the result is not specific to the Huffman case but is also correct for the (1, 2)-lopsided case (and as discussed, can be extended to all of the other generalizations mentioned). 2.3 Algorithm In this section we will present an algorithm that, given p, finds an optimal (1, 2)-lopsided tree for Pp . The idea behind our algorithm will be to modify the top-down algorithm developed in [10] that found a min-cost lopsided tree for a finite source by finding a min-cost path in a directed graph. That algorithm worked by starting from a node corresponding to the tree "root" and building trees top down. An edge in the graph corresponded to adding a level to the bottom of a tree; the cost of the edge corresponded to the "contribution" of that level to the tree's cost. Thus every path in the graph starting at the "root node" corresponded to some tree (a length k path corresponding to a depth k tree) with the cost of a path equaling the cost of the corresponding tree. After defining an appropriate destination node the algorithm for finding a min-cost lopsided tree was simply to find a min-cost root-destination path in the graph. Although the trees, and associated paths, that we are now constructing, are infinite, we know from the previous subsection that optimal trees have the pattern of a head followed by cycles. Therefore our search

2
The theorem and Lemma 2.1 together tell us, at least, theoretically, that our problem is solvable. Given any p we can simply construct the exponentially huge but bounded number of cyclic trees restricted to Il  B (p) and compare all of their costs for that value of p, returning the minimum one. In the next section we will see a much more efficient procedure. Before doing so we first remark on one almost immediate result on the structure of the solution space of optimal trees. Corollary 2.3. The cost of any optimal tree T for p is a rational function in which the degrees of the numerators and denominators are upper-bounded by a function of B (p). Pro of. Let M = At+si - At+(s-1)i = At+i - At . From equation (2.4) and theorem 2, Cost(T , Pp ) 10 pA l 1-p l   0 t 1 1 pA l + pA l  1-p 1 - pM l<t l<t+i 0 0 Al Al +M - l<t+i p l<t p . M + pM +1 1-p-p

(2.6) =

=

=

How large can M be? As seen before t + i  2B 3 (p) and E (l)  2B (p). Therefore M  4B 4 (p). By definition

763


space is actually finite and we will be able to modify the algorithm of [10] to find a min-cost infinite path. Before discussing the algorithm we need to introduce some definitions. We construct an infinite directed weighted graph G = (V , E ) where V = {(m; e, c) : m, e, c integers  0}; Each vertex (m; e, c) has e + 1 directed edges leaving it. For 0  q  e the edges in E leaving the vertex are (2.7) ((m; e, c), (m + e - q ; c + q , q )) .

The weight of all of the edges leaving (m; e, c) will be pm+1 1-p . The reason for introducing this definition is that if we consider the vertices as corresponding to levels in a tree and edges as corresponding to the possibility of one level following another we can create a 1 - 1 correspondence between trees and paths in G starting from (0; 1, 1); in this correspondence a length k path will correspond to a depth k tree and an infinite path to an infinite tree. Intuitively, vertex (m; e, c) will correspond to the level of a tree that has m leaves above it, c cross edges passing through it, and e leaves plus internal nodes on it. More formally, recall our definition of an infinite treje as a sequence T = {(Il , Cl , El )} 0 with Al (T ) = l= l Ej (T ) b eing the numb er of leaves on or ab ove level l. Working through the details (omitted in this extended abstract) we can find that given tree T we can associate a path {(ml ; el , cl )} 1 in G with (m1 ; e1 , c1 ) = (0; 1, 1) l= and, for i > 1, (ml ; el , cl ) = (Al-1 , Il + El , Cl ). In the other direction, given a path {(ml ; el , cl )} 1 in G l= with (m1 ; e1 , c1 ) = (0; 1, 1) we can associate a tree such that, (I0 , C0 , E0 ) = (1, 0, 0) and for i  1 (Il , Cl , El ) = (cl+1 , cl , ml+1 - ml ). Furthermore, using Lemma 2.1 we can work out that the cost of a tree for Pp will be exactly equal to the cost of the associated infinite path. Thus, our problem of trying to find a minimum-cost tree is equivalent to finding a minimum-cost infinite path in G that starts at node (0; 1, 1). Of course, since the graph is infinite, we can not explicitly run a shortest path algorithm. Instead we use the structural properties from section 2.2 to restrict our search. The first thing to note is that Lemma 2.2 permits us to restrict e, c  B (p). After doing this we run Dijkstra's shortest path algorithm starting at node (0; 1, 1), building a shortest path tree in the graph. The main observation is that the paths that we are building in the shortest-path-tree can considered as prefixes of an infinite path (these prefixes correspond to the top levels of an infinite tree). From Theorem 2 (the cycle theorem) we know that if some path {(ml ; el , cl )}n 1 is a finite prefix of an optimal path and, l= for some i < n, (ei , ci ) = (en , cn ), then the path must

cycle starting from n + 1 and we can figure out exactly what the corresponding cyclic tree is; applying Lemma 2.1 will give its cost. This motivates us to modify Dijkstra's algorithm so that, at the time we add node (m; e, c) into the tree, we walk backwards in the tree from (m; e, c) until we either reach some other node (m , e, c) if such a node exists, or the root. If we find the root we just continue normally with Dijkstra's algorithm. If we find a node (m , e, c) then we have found a cycle corresponding to a candidate cyclic tree. We calculate the cost of this tree; if it has the minimum cyclic tree cost found so far we keep it as the new minimum, otherwise, we throw it away. In neither case do we continue processing (m; e, c) by updating its neighbors' costs since we know that it can not contribute to any other candidate trees. Note that the algorithm must terminate since any path that gets longer than 2B 2 (p) edges will contain some repeated (e, c) value. Furthermore, gnven any tree path {(ml ; el , cl )}n 1 we i l= 2 have mn  l=1 el  n2B (p). Since n  2B (p) we have that mn  4B 3 (p) so we can use this restriction to make our graph finite. Figure 3 gives the pseudocode for the algorithm. To prove that the algorithm gives the correct answer we must show that it finds an optimal cyclic tree. Let p be fixed and {(ml ; el , cl )} 1 correspond to some l= optimal cyclic tree T. Let (t, i) be the minimum pair (lexicographically) such that (et , ct ) = (et+i , ct+i ). Since the prefix of a shortest path is a shortest path we know t+i that {(ml ; el , cl )}l=1 is a shortest path. If this is the path to (mt+i , et+i , ct+i ) found by the algorithm then the algorithm will, by definition, find T . A difficulty might arise if there are two possible shortest paths to (mt+i , et+i , ct+i ) and the algorithm did not find t+i {(ml ; el , cl )}l=1 . It is possible to prove by contradiction, though, that if this happens, then the algorithm will find another cyclic tree that has the same cost as T (proof omitted in this extended abstract) and therefore it will always find some optimal cyclic tree. 2.4 Results As an illustration of the technique we implemented the algorithm in the previous section for finding optimal (1, 2) lopsided trees and ran it for p ranging from 0.001 to 0.960 with 0.001 increments. Figure 6 shows the ranges such that all p sampled in the same range share the same optimal tree. Note that this seems to correspond to the statement of the Partition theorem, Theorem 3 (but see the comment at the end of the next section). Figure 4 shows the path expansions of the optimal trees found and Figure 5 gives realizations of the actual trees (when we write P0.755 the  denotes that our sampling indicates the boundary of interval to be close

764


Input: p 1. Initialization Set COST[m; e, c] :=  for all entries with m < 4B 3 (p) and e, c < B (p). Set COST[0; 1, 1] := 0. PUSH(Q, (0; 1, 1)). /* The current minimum cost of infinite path. */ Set min cost := . 2. while Q =  do 2a. (m; e, c)  EXTRACT MIN(Q). /* Check for repetition of (e, c) */ for each vertex (m ; e , c ) on the path from the root to (m; e, c) do if (e , c ) = (e, c) then /* Calculate cyclic tree cost */ 2b. cost := COST[m ; e , c ] ;,] c -C T[ + COST[m;e,-]pmOSmm e c . - 1 if min cost >cost then min cost :=cost. minpath  a list of vertices on the path. Goto 2. end for /* Relaxation */ 2c. new cost := COST[m; e, c] + pm+1 /(1 - p). 2d. for q := 0 to e do 2e. Let (m ; e , c ) := (m + e - q ; c + q , q ). if COST[m ; e , c ] > new cost then COST[m ; e , c ] := new cost. if (m ; e , c ) is not in Q AND COST[m ; e , c ] < min cost then PUSH(Q, (m ; e , c )). end for end while 3. Extract tree from minpath.

Figure 3: Modified Dijkstra algorithm for finding optimal infinite tree

to 0.755 but we do not know the exact boundary). Note that there are many (uncountable) ways to realize the trees given the corresponding path expansions. We choose the presented ones because of their easily observable recursive structures. 3 Extensions, conclusions and op en problems Previous work on constructing minimum-cost prefix-free codes for infinite source alphabets were non-algorithmic and very restricted in their applicability. In this extended abstract we developed the first algorithmic technique for constructing minimum-cost prefix-free codes. This new technique is applicable to many generalizations of Huffman coding, e.g., unequal cost letters and restrictions on codewords, that the previously known technique could not work on. This new technique also

works for a larger set of generalizations of geometric sources than could be solved for previously. In this abstract we only described an algorithm for constructing an optimum (1, 2)-lopsided tree for a standard geometric source. Modifying the algorithm to work for other (,  )-lopsided trees only requires changing the tuple definition of a tree given in Definition 2.3 to reflect the (,  ), i.e., instead of only saying how many edges are crossing a level the tuple has to encode how many edges are crossing the level at 1 unit into their length, how many at 2 units into their length, etc.. Similarly, the technique can be modified to deal with regular language restrictions on the codewords (for regular languages described by Deterministic Finite Automatons) by modifying the tuple definition to indicate how many nodes in each DFA state there are per level. Once these definition modifications are made all of the other lemmas and theorems can be modified to work as well. Similarly, for modifications of the geometric source such as (1.2) we can have the tuple encode at which t the labellings of the leaves on a particular level start and modify everything else accordingly. The big open question in this area is to design a technique that would work for non-geometric-like sources. Unfortunately, this problem has been open for a long time and nothing seems to be known about how to attack it [2]. Another, more technical problem, would be to improve the Partition Theorem (Theorem 3). This theorem states that, for all the generalizations of Huffman coding discussed, the unit interval can be decomposed a into a countable union of subintervals such that if p, p re in the same subinterval then the same tree is optimal for both p and p . This was a generalization of a theorem due to Gallager and Voorhis (Theorem 1) for the standard Huffman case. Gallager and Voorhis's result actually implies something much stronger for the standard Huffman case which is that if tree T is optimal for both p and p with p < p then T is also optimal for al l q  [p, p ]. The question is whether this statement holds for the general cases addressed by the partition theorem. If it did then our algorithm's results would be much stronger. For example, in Section 2.4 we presented the optimal trees for sampled values of p and found that all sampled p in the given intervals had the same optimal tree. If the italicized statement above held in the general case then we would know that those trees were optimal for al l p in the given intervals, not just the sampled ones.

765


Interval P0.001 - P0.500 P0.500 - P0.683 P0.683 - P0.755 P0.755 - P0.809 P0.809 - P0.838 P0.838 - P0.864

Expansion (0; 1, 1)  (1; 1, 0)  (1; 1, 1) (0; 1, 1)  (0; 2, 1)  (1; 2, 1) (0; 1, 1)  (0; 2, 1)  (0; 3, 2)  (2; 3, 1)  (3; 3, 2) (0; 1, 1)  (0; 2, 1)  (0; 3, 2)  (1; 4, 2)  (3; 4, 2) (0; 1, 1)  (0; 2, 1)  (0; 3, 2)  (0; 5, 3)  (3; 5, 2)  (5; 5, 3) (0; 1, 1)  (0; 2, 1)  (0; 3, 2)  (0; 5, 3)  (2; 6, 3)  (5; 6, 3)

Figure 4: Signature expansions for the first 6 intervals. Underlined signatures are where level repetitions occur.

0 1 2 3 4 5 6 7 . . . .

0 1 2 3

0 1 2 3 4 5

4 5 6 7 . . . .

6 7 8 9 10 . . . .

P0.001 - P0.500
0 1 2 3 4 5 6 7
7 0 1 2 3

P0.500 - P0.683
0 1 2 3 4
4

P0.683 - P0.755

5
5 6

6 7
. . . .

8 9 10 . . . .

8 9 10

8 9 10

. . . .

P0.755 - P0.809

P0.809 - P0.838

P0.838 - P0.864

Figure 5: Optimal trees for the first 6 intervals. The two horizontal lines show the two levels that form the repetition.

Figure 6: Partition of (0, 1) so that all sampled ps in the same interval share the same tree.

766


References

[1] Julia Abrahams, "Huffman-Type Codes for Infinite Source Distributions," Journal of the Franklin Institute, 331B(3) (1994) 265-271. [2] Julia Abrahams, "Code and Parse Trees for Lossless Source Encoding," Communications in Information and Systems, 1(2)(April 2001) 113-146. [3] Toby Berger and Raymond W. Yeung, "Optimum "1"-Ended Binary Prefix Codes," IEEE Transactions on Information Theory , 36(6) (November, 1990) 1435-1441. [4] S.L. Chan and M. Golin, "A Dynamic Programming Algorithm for Constructing Optimal "1"-ended Binary Prefix-Free Codes," IEEE Transactions on Information Theory, 46 (4) (July 2000) 1637-44. [5] S. Dolev, E. Korach and D. Yukelson, "The Sound of Silence: Guessing Games for Saving Energy in Mobile Environment," Eighteenth Annual Joint Conference of IEEE Computer and Communications Societies (IEEE INFOCOM'99), (1999) 768-775. [6] Robert G. Gallager and David C. Van Voorhis, "Optimal Source Codes for Geometrically Distributed Integer Alphabets," IEEE Transactions on Information Theory, (March 1975) 228-230. [7] E.N. Gilbert, "Coding with Digits of Unequal Cost," IEEE Transactions on Information Theory, 41(2) (March 1995) 596-600 [8] Mordecai Golin, "A Combinatorial Approach to Golomb Forests," Theoretical Computer Science, 263(1-2) (July 2001) 283-304. [9] Mordecai Golin, Claire Kenyon, and Neal Young, "Huffman Coding with Unequal Letter Costs," Proceedings of the 34th ACM Symposium on Theory of Computing (STOC2002), (May 2002) 785-791. [10] M. Golin and G. Rote, "A Dynamic Programming Algorithm for Constructing Optimal Prefix-Free Codes for Unequal Letter Costs," IEEE Transactions on Information Theory, 44(5) (September 1998) 1770-1781. [11] S.W. Golomb, "Run Length Encodings," IEEE Transactions on Information Theory, IT-12 (July 1966) 399401. [12] R. Hassan, "A Dichotomous Search for A Geometric Random Variable," Operations Research, 32(2) (1984) 423-439. [13] D. A. Huffman, "A Method for the Construction of Minimum-Redundancy Codes," Proceedings of the IRE, 40 (September 1952) 1098-1101. [14] F. K. Hwang, "On Finding a Single Defective in Binomial Group Testing," Journal of the American Statistical Association, 69(345) (1974) 146-150. [15] R. M. Karp, "Minimum-Redundancy Coding for the Discrete Noiseless Channel," IRE Transactions on Information Theory, 7 (1961) 27-39. [16] A. Kato, Te Sun Han and H. Nagaoka, "Huffman coding with an infinite alphabet.," IEEE Transactions on Information Theory, 42 (3) (May 1996) 977-84.

[17] Tamas Linder, Vahid Tarokh and Kenneth Zeger, "Existence of Optimal Prefix Codes for Infinite Source Alphabets," IEEE Transactions on Information Theory, 43(6) (November 1997) 2026-2028. [18] N. Merhav, G. Seroussi and M. J. Weinberger, "Optimal Prefix Codes for Sources with Two-Sided Geometric Distributions," IEEE Transactions on Information Theory, 46(1) (Jan 2000) 229-236. [19] Y. C. Yao and F. K. Hwang, "On Optimal Nested Group testing Algorithms," Journal of Statistical Planning and Inference, 24 (1990) 167-175

767