On Distributions Computable by Random Walks on Graphs
Guy Kindler
Abstract We answer a question raised by Donald E. Knuth and Andrew C. Yao, concerning the class of polynomials on [0, 1] that can be realized as the distribution function of a random variable, whose binary expansion is the output of a finite state automaton driven by unbiased coin tosses. The polynomial distribution functions which can be obtained in this way are precisely those with rational coefficients, whose derivative has no irrational roots on [0, 1]. We also show, strengthening a result of Knuth and Yao, that all smooth distribution functions which can be obtained by such automata are polynomials. 1 Intro duction In a 1976 paper, Donald E. Knuth and Andrew C. Yao laid the foundations for a complexity theory of probability distribution functions. They defined a computability class of distribution functions that can be "computed" by a random walk on an edgelabelled graph (this can also be thought of as a finitestate automaton driven by a sequence of random bits). They called such a graph a finite-state generator, or f.s.g. Formally, an f.s.g. is a finite directed graph whose vertices are called states, with one designated state called the initial state. Some of the edges in the graph are labelled with output strings, which are finite binary strings. The output of the f.s.g. is the random sequence of bits 1 2 3 . . . obtained by performing a simple random walk on its states, starting from the initial state, and writing down sequentially the output strings that are encountered
 The Einstein Institute of Mathematics, The Hebrew University of Jerusalem, Jerusalem, Israel. email: puzne@math.huji.ac.il  Department of Mathematics, The Weizmann Institute of Science, Rehovot 76100, Israel. email: romik@wisdom.weizmann.ac.il

Dan Romik
along the way. We identify the output with the real-valued random variable 0  X  1 whose binary expansion is the output sequence 1 2 3 . . . , namely n n X= 2n =1 A distribution function F (x) supported on [0, 1] (that is, F (0-) = 0 and F (1) = 1) is called computable by an f.s.g., or just computable, if it can be realized as the distribution function of a random variable X generated by an f.s.g. A natural question is to identify all computable distribution functions. Clearly there is a countable number of such functions, so the class of computable distribution functions is rather small. However, since the set of such distributions contains many Cantor-like distributions, and other singular distributions which do not have a simple description, one soon realizes that this question is (probably) too general to possess a meaningful answer. On the other hand, if the discussion is limited to "nice" distributions, e.g. piecewise smooth distribution functions, then a beautiful algebraic connection is revealed. Knuth and Yao showed that if F is a computable distribution function, and F is realanalytic in an interval (a, b)  [0, 1] then it must be a polynomial with rational coefficients there. (Theorem 1.2 below shows that it is enough to require that F be smooth in (a, b).) They constructed a family of polynomial distribution functions which are computable, but left open the question ([3], question (v) on page 427) of precisely which polynomials are distribution functions that can be computed by an f.s.g. The question was raised again by Yao [5], who gave some necessary conditions. The purpose of this paper is to show that Yao's necessary conditions are sufficient. Our main result is Theorem 1.1. A polynomial Q(x) which is mono-

Copyright Š 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688

131


tone increasing on [0, 1] and satisfies Q(0) = 0, Q(1) = 1, can be realized as the distribution function of a random variable that is generated by an f.s.g., if and only if 1. Q(x) has rational coefficients; 2. Q (x) has no irrational roots in [0, 1]. We prove two additional results. The next theorem further substantiates Knuth and Yao's claim that polynomials form the main class of interesting computable distribution functions, by showing that if a computable distribution function is smooth, then it is a polynomial. This strengthens Theorem 7.4 of [3], which shows the same for analytic computable distribution functions.

In [3, 5] it was required that the outdegree of each vertex in the graph be 2 (this restriction is natural when an f.s.g. is interpreted as a coin-tossing automaton). In Section 3 below, we use another equivalent variation on the f.s.g. model. Our paper was inspired by the recent work of Mossel and Peres [4], which deals with questions somewhat similar to ours. Mossel and Peres characterize the class of functions f : (0, 1)  (0, 1) for which there exists a finite state automaton whose input is a sequence of random bits with bias p and whose output is a single random bit with bias f (p). Those functions are precisely the rational functions of p with rational coefficients.

Structure of the pap er. In the next section we prove Theorem 1.1. The "only if " part was Theorem 1.2. Let F be a computable distribution function. If F is infinitely differentiable on an already proved in [3] and [5]. For the "if " part, interval (a, b)  [0, 1], then F is a polynomial there. we rely essentially on Knuth and Yao's construction involving the order statistics of uniform random The last theorem investigates some structural variables. It is amusing that order statistics should properties of f.s.g.'s that compute non-smooth dis- play a distinguished role in this problem, and that tributions. Recall that any distribution function F in fact by taking scalings and rational mixtures of polynomials constructed using order statistics can be decomposed into a mixture one obtains the most general class of constructible polynomials. (1.1) F = Fac + (1 - )Fsing , 01 In section 3 we prove Theorem 1.3. In section of an absolutely continuous distribution function Fac 4 we prove Theorem 1.2. In section 5 we give and a singular distribution function Fsing (for the an example of a computable distribution function purpose of this paper we include the atomic part of which is absolutely continuous but whose density is F in Fsing ­ see also the comment in Section 5).  everywhere locally unbounded, and discuss related is uniquely determined, and if 0 <  < 1, namely if open problems. F is not purely singular or absolutely continuous, then Fac and Fsing are also uniquely determined 2 Pro of of Theorem 1.1 (otherwise, one of them is trivially not). It will be convenient, in the proof of Theorem 1.1, to deal with density functions rather than cumulative Theorem 1.3. Let F (x) be a computable distribudistribution functions. Let D be the set of piecetion function, let F = Fac + (1 - )Fsing be the wise polynomial density functions on [0, 1]. Let C be decomposition of F as in (1.1), and assume that those elements q (x)  D such that the corresponding x 0 <  < 1. Then  is rational, and Fac and Fsing cumulative distribution function Q(x) = 0 q (t)dt are both computable. is computable. The elements of C are called comIn the proof of Theorem 1.3 it is shown that putable (piecewise polynomial) densities. The following theorem summarizes Knuth and essentially, the contributions to the absolutely continuous and singular parts, respectively, come from Yao's constructions of computable densities: different parts of the f.s.g. which do not interact. Theorem 2.1. ([3]) (i) If 0  a < b  1 are ratioRemarks. The above definition of an f.s.g. is nal, then the uniform density on [a, b] is computable. a slight variation on those of [3, 5], but is easily (ii) If 0  a < b  1 are rational, then the density seen to be equivalent, in the sense that the class (n + 1)! (x - a)k (b - x)n-k 1[a,b] (x) of computable distribution functions is the same. k !(n - k )!(b - a)n+1

132


Proposition 2.1 implies our claim that qi is computable. To see this, let f be a polynomial density on [ri , ri+1 ] which has the form (2.3) (note that not only h, but also qi is a mixture of such polynomials, by (2.2)). We prove that f is computable by showing that its restriction to each subinterval [tj , tj +1 ] (normalized to have integral 1) is a computable density. Let q  D bex a polynomial density function On [tj , tj +1 ], write f as such that Q(x) = 0 q (t)dt satisfies the conditions of Theorem 1.1. In terms of q , this simply means that f (x) = c [(x - tj ) + (tj - t1 )]v1 [(x - tj ) + (tj - t2 )]v2 . . . q has rational coefficients, and no irrational roots in v v v [0, 1]. Our aim is to show that q is computable. Let (x-tj ) j (tj +1 -x) j+1[(tj +1 -x)+(tj +2 -tj +1 )] j+2 . . . 0 = r0 < r1 < r2 < ˇ ˇ ˇ < rk-1 < rk = 1 be the roots [(tj +1 - x) + (tm - tj +1 )]vm of q in [0, 1], together with 0 and 1 if they are not roots. In view of Theorem 2.1(iii), it is enough to Now expand out the products, observing that tj - t1 , tj - t2 , . . . , tj - tj -1 , tj +2 - tj +1 , . . . , tm - tj +1 are show that each of the densities all positive rational numbers. This gives a repre1 sentation of f as a rational mixture of polynomials q (x)1[ri ,ri+1 ] (x), qi (x) = ri+1 q (t)dt proportional to (x - tj ) (tj +1 - x) , hence by Theri orem 2.1(ii),(iii), the restriction of f to [tj , tj +1 ] is (i = 0, 1, 2, . . . , k - 1) computable. (the density q conditioned on the interval [ri , ri+1 ]) Our goal is now to prove Proposition 2.1. We is computable. This is because q is then a mixture start with the following lemma, which considers of the qi with rational coefficients. Now fix i, 0  i  k - 1. The density qi is 0 the representations of non-negative polynomials on outside the interval [ri , ri+1 ]. Inside this interval qi an interval by convex combinations of polynomial densities which are not necessarily rational. has the form of the (k + 1)'th order statistic of n + 1 independent random variables distributed uniformly on [a, b], is computable. (iii) If f1 , f2 , . . . , fn are computable densitien , then s any rational mixturei of the form f = i=1 ai fi where 0 < ai  Q, ai = 1, is also computable. (2.2) qi (x) = c(x - ri )j (ri+1 - x)l h(x), Lemma 2.1. Let Cn [a, b] be the set of non-negative polynomials of degree at most n on the interval [a, b], which integrate to 1 there. Then Cn [a, b] is a compact set, and its extreme points are precisely the polynomials in Cn [a, b] of degree exactly n which have the form (2.3) for some a  t1 < t2 < ˇ ˇ ˇ < tm  b and positive even v1 , v2 , . . . , vm (again, with the exception that if t1 = a, v1 can be odd, and if tm = b, vm can be odd). As was indicated to us by one of the referees, a proof of Lemma 2.1 appears in the 1953 paper [1] by Karlin and Shapley (Theorem 9.2, page 28). We include the proof here for completeness. Proof. Recall that a bounded closed set within a finite-dimensional normed space is compact. The space of n-degree polynomials is finite dimensional, and it can be equipped with the norm defined by b ||f || = a |f (x)|dx. The set Cn [a, b] is bounded with respect to this norm (all of its elements have norm 1), and it is obviously closed, hence it is compact.

where c  Q  (0, ), j, l  0, and h(x) is a polynomial with rational coefficients that is strictly positive on [ri , ri+1 ], and integrates to 1 there. Our claim now relies on the following statement. Proposition 2.1. h(x) can be expressed as a rational mixture (namely a convex combination with rational coefficients) of polynomials which have the form (2.3) c(x - t1 )v1 (x - t2 )v2 . . . (x - tm-1 )vm-1 (-x + tm )vm for some rational ri  t1 < t2 < ˇ ˇ ˇ < tm  ri+1 , and which integrate to 1 on [ri , ri+1 ] (the constant c is therefore necessarily rational). The powers v1 , v2 , . . . , vm above must be even, with the exception that if t1 = ri then v1 can be odd, and if tm = ri+1 , vm can be odd (this is why the last term in (2.3) is written differently than the other terms).

133


Now let f  Cn [a, b] be a polynomial of degree n with all n roots (counting multiplicities) in the interval [a, b] (the evenness of the multiplicities is automatic from the non-negativity requirement), and suppose f = g + (1 - )h, where g , h  Cn [a, b], and 0 <  < 1. From positivity we have that wherever f vanishes, g and h must also vanish with at least the same order, so they share the same n roots as f and are therefore equal to it, since they are both of degree at most n and integrate to 1. Thus f is an extreme point of Cn [a, b]. Conversely, if f  Cn [a, b] does not have n roots in the interval [a, b], then it can be represented as f (x) = c(x - t1 )v1 (x - t2 )v2 . . . (-x + tm )vm ˇ g (x) =: w(x) ˇ g (x), where a  t1 < t2 < ˇ ˇ ˇ i tm  b, the sum of the < multiplicities deg w = vi is strictly less than n, the constant c > 0 is chosen so that g  Cn [a, b], and g has no roots in [a, b]. Now, either of two cases must hold: if g is a constant, then deg f = deg w < n, and bt then, setting  = a b-a f (t)dt, the equation -a  f (x) =  
a

polynomials of the form (2.3), without insisting on a rational mixture: this is since for a linear system of equations with rational coefficients, the set of rational solutions is dense in the set of real solutions. Now, the idea of the proof is to first use Lemma 2.1 to represent h(x) as a convex combination of polynomials of the form (2.3), with ri  t1 < t2 < ˇ ˇ ˇ < tm  ri+1 not necessarily rational. Then the ti 's are slightly perturbed to make them rational. Proposition 2.1 follows from the three lemmas below as follows. First, note that since h(x) has no roots, it is actually an interior point of Cn [ri , ri+1 ], where n = deg h (we consider Cn [ri , ri+1 ] as a subset of the affine vector space of polynomials of degree at most n that integrate to 1 on [ri , ri+1 ]). By Lemma 2.2, this implies that h(x) is also in the interior of the convex hull of some finite set P of polynomials of the form (2.3). According to Lemma 2.3 the polynoimals in P may be perturbed slightly while maintaining h(x) in the interior of their convex hull. Finall, Lemma 2.4 implies that these perturbation can be chosen so that the roots of the polynomials become rational. Lemma 2.2. Let K be a compact convex set in a finite-dimensional vector space V , and let n = dim(V ). Then for every interior point x  K  there exist extreme points y1 , . . . , ym of K , such that x  Conv (y1 , . . . , ym ). The number of points, m, is at most 2n. Lemma 2.3. Let x, y1 , . . . , yn be points in a finitedimensional vector space V . Suppose that x  Conv (y1 , .., yn ) (where B  denotes the interior of B ). Then there exists a neighborhood U of 0  V with the fol lowing property. If z1 , . . . , zn  V satisfy zi - yi  U for al l i, then x  Conv (z1 , .., zn ). Lemma 2.4. The set of extreme points in Cn [ri , ri+1 ] al l of whose roots are rational is dense in the set of extreme points of Cn [ri , ri+1 ] (with the obvious topology).

b

t-a f (t)dt b-a b-t f (t)dt b-a

-1

 x-a f (x) b-a  b-x f (x) b-a

 +(1 - ) 
a

b

-1

represents f as a convex combination of two unequal polynomials in Cn [a, b]. Otherwise, deg g  1, in which case, letting = minx[a,b] g (x), the equation b ˇ w(t)(g (t) - )dt w(x)(g (x) - ) a f (x) = b 2 w(t)(g (t) - )dt
a

b +
a

w(t)(g (t) + )dt 2

ˇ

w(x)(g (x) + ) b w(t)(g (t) + )dt a

Lemmas 2.3 and 2.4 are obvious, hence we only represents f as a convex combination of two polynomials in Cn [a, b] which (because deg g  1) are prove Lemma 2.2. Note that the bound 2n on the not equal. Therefore f is not an extreme point of number of required extreme points in Lemma 2.2 is tight, as can be seen by taking x = 0 and Cn [a, b]. K = Conv(ąe1 , . . . , ąen ). Proof of Proposition 2.1. First, note that it is enough Proof of Lemma 2.2. Assume that K  = , so that to show that h(x) can be expressed as a mixture of there will be something to prove. Without loss of

134


generality, assume that x = 0. We choose a basis y1 , . . . , yn for V whose elements are extreme points of K , as follows. Take y1 to be any extreme point of K (y1 = 0). Having chosen y1 , . . . , yi for i < n, we set Hi = span(y1 , . . . , yi ). Since K contains a neighborhood of 0, it cannot be contained in Hi . Therefore there exists an extreme point yi+1 of K , satisfying yi+1  Hi (for example, there exists an extreme point maximizing the convex function dist(ˇ, Hi ), where dist is computed according to some norm on V . Recall that a convex function defined on a compact convex set always attains its maximum on some extreme point). This process obviously yields a basis for V . Take z to be the intersection of the boundary of K with the ray {t ˇ (-y1 - y2 - ˇ ˇ ˇ - yn ) : t > 0}. Obviously, the convex hull of y1 , . . . , yn , z contains a neighborhood of 0. Now let Hz be an affine hyperplane supporting K at z . The intersection of K with Hz is a convex body in a vector space of dimension  n - 1, and therefore by Carath´odory's e theorem (see [2]) z is a convex combination of at most n extreme points yn+1 , . . . , ym in it. Since these are also extreme points of K , and since obviously Conv(y1 , . . . , yn , z )  Conv(y1 , . . . , ym ), the proof is complete.

two sub-stochastic matrices with rational entries A = A0 + A1 where A0 has non-zero entries for those edges whose output label is "0", and A1 has non-zero entries for those edges with output label "1". Specifying the f.s.g. is equivalent to specifying the matrices A0 , A1 and the initial state s0 . Let F = Fac +(1-)Fsing be as in Theorem 1.3, and suppose that S is the set of states of a given f.s.g. that computes F , with initial state s0  S . For any state s  S , let F s be the distribution function generated by the same f.s.g. with the initial state replaced by s. Thus, F = F s0 . Thinking of the F s as measures on [0, 1], we denote for any Borel subset B  [0, 1] B F (B ) = dF (x). A state s  S is said to be of absolutely continuous (a.c.) type, if F s is an absolutely continuous measure. Call s of singular type (or just singular) if F s is a singular measure. Call s pure if it is either absolutely continuous or singular.

Lemma 3.1. 1. If s  S is pure, and s  S is a state such that there exists a path in the graph of the f.s.g. leading from s to s , then s is pure and of the same type as s. 2. If the graph of the f.s.g. is strongly connected 3 Pro of of Theorem 1.3 (namely there is a path from any state to any other In the next two sections, we modify slightly our state), then al l the states are pure (and are therefore model of finite state generators to an equivalent of the same type by part 1). model. In the modified model, the outgoing edges Proof. Let ľ = (F s )sS be the vector-valued meaare labelled with transition probabilities, which are sure whose coordinates are the measures F s . The arbitrary rational numbers in (0, 1] (and which sum definition of the f.s.g. and the measures F s can be to 1 for any given state). The random walk which translated into the following system of equations satis performed is then a weighted random walk with isfied by ľ: For any Borel subset B  [0, 1] and any these transition probabilities. We also require every state s  S , edge to be labelled with a single output bit. s s( The equivalence of the two models is simple, and F s (B ) = 2B  [0, 1]) ss F was noted in [3], p. 421-422. 0 -s p  Let S be the set of states of such a modified f.s.g. s s( An alternative description of the f.s.g. is in terms + (2B - 1)  [0, 1]), ss F of the matrix of transition probabilities, which we 1 -s p  denote by  m with s - s eaning that s has an outgoing edge to  A = (pss )s,s S . s , labelled by the output bit . In matrix notation, this can be written as The matrix A is a Markov transition matrix with rational entries, and is decomposed as the sum of (3.4) ľ(B ) = A0 ľ(2B [0, 1])+A1 ľ((2B -1)[0, 1])

135


where ľ is thought of as a column vector. Now let s be an a.c. state, and let s be a state  such that s - s , with  being either 0 or 1. Then  for any Borel set B  [0, 1] which has Lebesgue measure 0, we have (3.5) 0 = F s ((B + )/2)  pss
i F

s

(

B ).

Therefore F s s also a.c. Similarly, if s is singular, then, taking C  [0, 1] a set of Lebesgue measure 0 such that F s (C ) = 1, and B = [0, 1] \ (2C - ), again (3.5) holds. This proves that s is singular. For part 2 of the Lemma, observe first that (3.4) uniquely determines a vector ľ = (F s )sS of probability measures on [0, 1] ­ this is equivalent to saying that the output of the f.s.g. is a well-defined random variable. Now, for any state s  S , let s F s = (s)Fac + (1 - (s))Fssing be the decomposition s of F into a mixture of an a.c. probability measure and a singular probability measure. We claim that, when the graph of the f.s.g. is strongly connected, the coefficients (s) in these decompositions are all equal. This is because, by (3.4), (s) is a harmonic function on this (finite) graph and is therefore constant (take as the subset B in (3.4) the union of the supports of all the measures Fssing ). So if 0 <  = (s) < 1 then we have shown that

since the measures Fs for the sub-f.s.g. are the same as for the original one. Call a strongly connected component with pure states either a.c. or singular, according to the type of its states. The above discussion leads to an identification of the mixture coefficient (s0 ): it is simply the probability that the random walk eventually ends up in one of the a.c. terminal components. This probability is clearly rational, as it can be represented as the solution of a (well-posed) system of linear equations with rational coefficients. From the discussion it is also easy to see how to build an f.s.g. that computes Fac : simply delete any edges going into singular components, and renormalize the transition probabilities so that the sum of the probabilities of outgoing edges for any state is 1. (In other words, the new f.s.g. is the old f.s.g. conditioned never to go into a singular component.) A similar construction replacing the words "singular" and "a.c." computes Fsing .

4 Pro of of Theorem 1.2 Let F be a distribution function, computable by a given f.s.g. with state set S and initial state s0 , which is infinitely differentiable on an interval (a, b)  [0, 1]. Let x  (a, b) be a dyadic number, i.e. of the form x = k /2m for some integers m  1, 0  ľ = ľac + (1 - )ľsing k < 2m . For every n  m, we shall apply (3.4) n times repeatedly starting with the set where ľac and ľsing are vector-valued measures each S x 1 coordinate of which is a probability measure. But B = ,x + n then, both ľac and ľsing are easily seen to be 2 solutions of (3.4), and therefore we have found two ome notation will help: If the binary expansion of different (in fact, mutually singular) solutions to x is x = 0.1 2 . . . n (the last n - m digits are 0), (3.4), in contradiction to the fact that (3.4) has exactly one solution. Therefore  must be 0 or 1, and for   {0, 1} we denote by T the set operation and all the states are pure. T (C ) = 2C - , C  [0, 1]

then applying (3.4) successively gives the vector s0 Corollary 3.1.  = (s0 ) is rational, and Fac , equation string s0 Fsing are computable. ľ(B ) = A1 ľ(T1 (B )) = Proof. The states of the f.s.g. decompose into = A1 A2 ľ(T2  T1 (B )) = . . . strongly connected components. Call a strongly = A1 A2 . . . An ľ(Tn  ˇ ˇ ˇ  T1 (B )) connected component terminal, if it has no edges = (A1 . . . Am )(Am+1 . . . An )ľ([0, 1]) going out to other strongly connected components. = (A1 . . . Am )An-m ľ([0, 1]) 0 Clearly, with probability one the random walk on n =: Ax An-m ľ([0, 1]) = Ax A0 -m 1. 0 the states must end up in a terminal component. Looking at a terminal component as a sub-f.s.g., Here, 1 is the vector of all ones (1)sS , and Ax is, as Lemma 3.1 implies that its states must be pure, above, the matrix with rational entries obtained by

136


(al+1 , bl+1 )  (a, b) where the (l + 1)'th derivative of F is nonzero. The l'th derivative is strictly monotone on (al+1 , bl+1 ), and hence it crosses zero at most once. Hence there is a subsegment (al , bl )  (al+1 , bl+1 ) where both the l'th derivative and the (l + 1)'th derivative are nonzero. Continuing by induction one where 1s0 is the state vector all of whose coordinates obtains an interval (a , b )  (a, b) where all deriva11 are 0 except the s0 -th coordinate, which is 1. Now tives up to (l + 1) are non-zero. This is a conobserve that, since F is infinitely differentiable at tradiction to the assumption that F has at most x, then for any j the left-hand side of (4.6) has the l nonzero derivatives in every point of D (since asymptotic expansion as n   D  (a1 , b1 ) = ). - x ( 1 F x) 1 1 F (x) = F (x) ˇ n + ˇ 2n + . . . F +n 2 2 2 2 5 Op en problems Several natural questions arise from the paper: F 1 F (j ) (x) 1 + ˇ jn + O j! 2 2(j +1)n 1. Our proof of Theorem 1.1, which is presented in a somewhat abstract form, can easily be transor the right-hand side of (4.6), on the other hand, lated into an algorithm for constructing an f.s.g. we can write down a complete expansion in terms that computes a given polynomial distribution of the eigenvalues 1 , 2 , . . . , l of the matrix A0 : function F . The resulting algorithm, however, clearly it must be of the form seems to generate extremely large f.s.g.'s, as a l function of the degree of the given polynomial i and the denominators of its coefficients. ci n pi (n) i multiplying A0 's and A1 's corresponding to the m bits in the binary expansion of x. Taking the s0 -th coordinate in the above equation we obtain - x 1 F (x) = 1T0 Ax An-m 1 (4.6) F (B ) = F +n s 0 2
=1

for some constants ci and polynomials pi (t) derived from x, the matrices Ax , A0 and the vectors 1s0 , 1 (the polynomials pi appear when A0 is not diagonalizable). Equating the two expansions as n  , we conclude the following. Lemma 4.1. At any dyadic x  (a, b), F can have at most |S | nonzero derivatives. The proof of Theorem 1.2 will be complete once we prove the following simple lemma: Lemma 4.2. Let F be an infinitely differentiable function on an interval (a, b), and let D  (a, b) be a dense subset, such that in every point x  D, F has at most l nonzero derivatives. Then F is a polynomial on (a, b) of degree at most l. Proof. Suppose for the sake of contradiction that F is not a polynomial of degree at most l. Then there exists a point x  (a, b) where its (l + 1)'th derivative is nonzero. By continuity, there exists a subsegment

It is interesting to determine the complexity class of finding the smallest f.s.g. that computes a given polynomial. Another interesting question is to give a sharp bound on the number of states required to compute a polynomial of given parameters. 2. One may consider the same questions that are discussed here, in the case of pushdown automata. Partial results in this direction are given in [5]. 3. It may be of interest to investigate the computable distribution functions among the absolutely continuous (and not necessarily smooth) distributions. This class contains some peculiar specimens, such as the distribution computed by the f.s.g. in Figure 1 below. This distribution is absolutely continuous, yet its density function is nowhere locally bounded. 4. A sufficient condition for the distribution function F computed by a given f.s.g. to be a.c. is that any terminal component of the graph (considered as a sub-f.s.g.) outputs a uniform distri-

137


"0"/0.4

"0"/0.99

"0"/0.5

   R R R " "/0.2 " "/0.01 s0      "1"/0.4 "1"/0.5

Figure 1: An f.s.g. generating a nowhere bounded density

bution on [0, 1] starting from any of its states. Is this condition necessary? 5. Characterize all the atomic computable distributions. 6 Acknowledgments We would like to thank our anonymous referees for their careful reading of our paper and their helpful remarks. References
[1] S. Karlin, L. S. Shapley, Geometry of Moment Spaces, Memoirs of the Amer. Math. Soc. 12, 1953. [2] P. J. Kelly, M. L. Weiss, Geometry and Convexity, Wiley, 1979. [3] D. E. Knuth, A. C. Yao, The complexity of nonuniform random number generation. Algorithms and Complexity: New Directions and Recent Results, ed. J. F. Traub, Addison-Wesley, 1976. [4] E. Mossel, Y. Peres, New coins from old: computing with unknown bias. To appear. [5] A. C. Yao, Context-free grammars and random number generation. Combinatorial Algorithms on Words, ed. A. Apostolico and Z. Galil, NATO ASI SERIES, Springer-Verlag 1985.

138