Almost Tight Upper Bound for Finding Fourier Coefficients of Bounded Pseudo-Boolean Functions Sung-Soon Choi Random Graph Research Center Yonsei University Seoul, 120-749 Korea ss.choi@yonsei.ac.kr Kyomin Jung Department of Mathematics MIT Cambridge, MA02139, USA kmjung@mit.edu Jeong Han Kim Department of Mathematics and Random Graph Research Center Yonsei University Seoul, 120-749 Korea jehkim@yonsei.ac.kr Abstract A pseudo-Boolean function is a real-valued function defined on {0, 1}n . A k -bounded function is a pseudo-Boolean function that can be expressed as a sum of subfunctions each of which depends on at most k input bits. The k -bounded functions for constant k play an important role in a number of research areas including molecular biology, biophysics, and evolutionary computation. In this paper, we consider the problem of finding the Fourier coefficients of k -bounded functions with a series of function evaluations at any input strings. Suppose that a k -bounded function f with m non-zero Fourier coefficients is given. Our main result is to present an adaptive randomized algorithm to find the Fourier coefficients of f with high probability in O (m log n) function evaluations for constant k . Up to date, the best known upper bound is O ((n, m)m log n), where (n, m) is between 1 n 2 and n depending on m. Thus, our bound imn1 . proves the previous bound by a factor of 2 Also, it is almostmight w. th respect to the known t i log n To obtain the main relower bound log m sult, we first show that the problem of finding the Fourier coefficients of a k -bounded function is reduced to the problem of finding a k -bounded hypergraph with a certain type of queries under an oracle with one-sided error. For this, we devise a method to test with one-sided error whether there is a dependency within some set of input bits among a collection of sets of input bits. Then, we give a randomized algorithm for the hypergraph finding problem and obtain the desired bound by analyzing the algorithm based on a large deviation result for a sum of independent random variables. This work was supported by the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MOST) (No. R16-2007-075-01000-0). This work was partially supported by Yonsei University Research Funds 2006-1-0078 and 2007-1-0025, and by the second stage of the Brain Korea 21 Project in 2007, and by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) (KRF-2006-312-C00455). 1 Introduction A pseudo-Boolean function is a real-valued function defined on the set of binary strings of fixed length. If a pseudoBoolean function can be expressed as a sum of subfunctions each of which depends on at most k input bits, it is called k -bounded. Given a 2-SAT formula, for example, the number of clauses an assignment satisfies is a 2-bounded pseudoBoolean function of the assignment. Note that a k -bounded pseudo-Boolean function is a polynomial of Boolean variables of degree k or less, and vice versa. In this paper, we consider the problem of finding the Fourier coefficients of k -bounded pseudo-Boolean functions. In the problem, we assume the oracle that, given any binary string, returns the function value at the string. Our main concern is the query complexity to solve the problem, i.e., the number of function evaluations required to find the Fourier coefficients of k -bounded pseudo-Boolean functions. (Unless otherwise specified, a k -bounded function means a k -bounded pseudoBoolean function in this paper.) The k -bounded functions have played an important role in molecular biology and biophysics. In those areas, a number of mathematical models have been proposed to study the evolution of a population of organisms (or biological objects) [Ewe79, FL70, KL87, Lew74, MP89]. In many of the models including the NK model [Kau89], k -bounded functions have been used to measure the fitness of an organism in an environment. In the NK model [Kau89], each subfunction represents the contribution of a gene of the organism to the overall fitness, interacting with a fixed number of other genes. Hence, a k -bounded function may be regarded as a sum of subfunctions each of which depends on at most k genes. The k -bounded functions with small k in the NK model induce the fitness landscapes of reasonable evolvability and complexity, which were used for describing the evolution of living systems [Kau93]. They were also used as a benchmark for comparing the landscapes arising in RNA folding [FSBB+ 93]. In this regard, k -bounded functions with small k have been paid attention. The k -bounded functions have been also used as testbed problems for comparing the performance of heuristic algorithms in the area of evolutionary computation [CC06, HG97, MM99, MG99, PG00]. The problem of maximizing arbitrary k -bounded functions is NP-hard even for k = 2 as it is at least as hard as the MAX-2-SAT problem [GJS76]. The larger the value of k is, the higher is the degree of the depen- dency among the input bits in a k -bounded function. By controlling the degree of the dependency (the value of k ), in general, we may control the difficulty of the problem of maximizing the k -bounded functions. There are good heuristic algorithms to approximate the maximum of a k -bounded function when the dependency among the input bits are known [dBIV97, Gol89, MM99, PGCP00, Str04]. Fourier transform is a formal approach to define the dependency among the input bits of a pseudo-Boolean function. There have been a number of papers addressing the problem of finding the Fourier coefficients of a k -bounded function f : {0, 1}n R with constant k . Kargupta and Park [KP01] presented a deterministic algorithm using O(nk ) function evaluations. Later, Heckendorn and Wright [HW03, HW04] proposed a randomized algorithm for the problem. They analyzed the algorithm to show that, with negligible error probability, it finds the Fourier coefficients in O(n2 log n) function evaluations on average for the k -bounded functions with O(n) non-zero Fourier coefficients generated from a random model. For the k -bounded functions with m nonzero Fourier coefficients, Choi, Jung, and Moon [CJM08] m f o proved that any randomized algorithm requires lolg g n m unction evaluations to find the Fourier coefficients with error probability at most a given constant. By analyzing the algorithm of Heckendorn and Wright, they also proved that O((n, m)m log n) function evaluations, where (n, m) is 1 between n 2 and n depending on m, are enough to find the Fourier coefficients. Recently, for 2-bounded functions of which non-zero Fourier coefficients are between n-a and nb in absolute value for some positive constants a and b, Choi and Kim [CK08] shmwed that there exists a deterministic o f o algorithm using O lolg g n unction evaluations, provided m that m n for any constant > 0. This algorithm is nonadaptive while the previous algorithms are adaptive.1 However, an explicit construction of the algorithm is unknown. Our main result is Theorem 1 Suppose that f is a k -bounded function defined on {0, 1}n for constant k and that f has m non-zero Fourier coefficients. Then, there exists an adaptive algorithm to find the Fourier coefficients of f in1O.(m log n) function evaluations with probability 1 - O n We prove Theorem 1 by showing an explicit construction of the desired algorithm. This result improves the best known n1 a upper bound O ((n, m)m log n) by a factor of 2 nd . m o it is almost tight with respect to the lower bound lolg g n m We should note that there have been a number of papers addressing the problem of finding the Fourier coefficients of Boolean functions [BJT04, BT96, Jac97, KM93, Man94]. The KM algorithm [KM93] is one of the most famous algorithms for the problem and most of the subsequent algorithms have been based on the algorithm. These algorithms for Boolean functions can be extended to pseudo-Boolean functions. However, the extensions of the algorithms do not 1 An algorithm is called adaptive if the algorithm uses a sequence of queries in which some queries depend on the previous queries. Otherwise, it is called non-adaptive. give a good bound for k -bounded pseudo-Boolean functions. One of the main reasons is that their query complexities depend on the values of the target function. For example, for a k -bounded function f , (to the best of our knowledge) the most efficient extension [BJT04] among those has the query rB2, complexity of where r is the number of input bits on which f depends, B is the maximum absolute value of f , and is the minimum absolute value of the non-zero Fourier coefficients of f . Thus, the query complexity may be made arbitrarily large depending on B and .2 The query complexity of our algorithm is independent of the values of the target function. To prove Theorem 1, we first show that the problem of finding the Fourier coefficients of a k -bounded function is reduced to the problem of finding a k -bounded hypergraph3 (with a certain type of queries under a probabilistic oracle). For a pseudo-Boolean function f defined on {0, 1}n , we consider the hypergraph representing the dependency among the input bits as follows. Suppose that H is a subset of [n], where [n] is the set of the integers from 1 to n. We say that there is a linkage among the input bits in H if, for any additive exi pression of f , f = fi , there is j such that H is included in the support set of fj .4 The linkage graph of f is a hypergraph Gf = ([n], E ), where each bit in [n] represents a vertex and a subset H of [n] belongs to the edge set E if and only if there is a linkage among the bits in H . For example, consider the following function: f (x1 , x2 , x3 , x4 , x5 ) = 5x1 x2 - 3x2 x3 x4 . If we let f1 (x1 , x2 ) = 5x1 x2 and f2 (x2 , x3 , x4 ) = -3x2 x3 x4 , f can be represented as an additive expression, f = f1 + f2 . In this expression, each subfunction of f has a support set of which size is at most three and so f is 3-bounded. It can be shown that the support sets of f1 and f2 , {1, 2} and {2, 3, 4}, are hyperedges of Gf . By definition of linkage, the nonempty subsets of {1, 2} and {2, 3, 4} are also hyperedges of Gf . Generally, if a set of vertices is a hyperedge of Gf , then any non-empty subset of the set is also a hyperedge of Gf . We call this property the hierarchical property among hyperedges. The linkage graph Gf has nine hyperedges: {1}, {2}, {3}, {4}, {1, 2}, {2, 3}, {2, 4}, {3, 4}, and {2, 3, 4}. There is no hyperedge containing 5 since f does not depend on x5 . It is known that, for a k -bounded function with constant k , the problem of finding the Fourier coefficients is asymptotically equivalent to the problem of finding the linkage graph in terms of the number of function evaluations [HW03, HW04]. 2 To see a typical behavior of the complexity, we may consider the NK model [Kau89]. The NK model with parameters N = n and K = k - 1 generates a class of k-bounded functions that are expressed as a sum of n subfunctions. When f is a function randomly generated from the NK model with parameters N = n and K = k - 1 for constant k, it is not difficult to show that r = (n), B = (n), and = O(1) with high probability. ` hu´ , the query Ts complexity of the algorithm [BJT04] for f is n3 with high probability while the query complexity of our algorithm for f is O (n log n). 3 A hypergraph is k-bounded if the order of each hyperedge is at most k. 4 The term linkage is from genetics and it means the interaction among the genes. (The description of the asymptotic equivalence is provided in Section 2.) In a hypergraph, we say that a hyperedge crosses among certain disjoint sets of vertices if the number of the sets is equal to the order of the hyperedge and each of the sets contains exactly one vertex in the hyperedge. (By definition, a hyperedge of order one crosses among any set of vertices including the hyperedge.) Our main contribution is to show that, given a collection of disjoint sets of vertices, the existence of a hyperedge of the linkage graph crossing among those sets is testable with one-sided error by using a constant number of function evaluations. Theorem 2 Suppose that f is a k -bounded function defined on {0, 1}n and S1 , . . . , Sj are j disjoint subsets of [n]. Then, we can use 2j function evaluations of f to test the existence of a hyperedge in the linkage graph Gf crossing among Si 's, where the test result is correct with probability at least 21k 2 if such a hyperedge exists and it is correct with probability 1 otherwise. Theorem 2 is an extension of a previous theorem of Heckendorn and Wright [HW04] (Proposition 1 in Section 2), which holds only for the case when each of Si 's is a singleton set of vertices. To prove Theorem 2, we devise a random perturbation method for testing the existence of a hyperedge. It tests the existence of a hyperedge by flipping a randomly generated string at certain bit positions and evaluating the function values at the flipped strings. We obtain the desired result by analyzing the method. The analysis extensively uses the properties of basis functions in the Fourier transform of a k -bounded function. Theorem 2 implies that the problem of finding the linkage graph of a k -bounded function is reduced to the following graph finding problem. Suppose that a hypergraph G has n vertices and m hyperedges and the hyperedges of G are unknown. A cross-membership query asks the existence of a hyperedge crossing among certain disjoint sets of vertices. We assume the oracle with one-sided error as follows. Given a cross-membership query, the oracle correctly answers with probability at least 1 - if the true answer for the query is YES and it correctly answers with probability 1 otherwise. The problem is to find the hyperedges of G by using as few queries to the oracle as possible. In fact, it is enough for our purpose to consider the hypergraph finding problem for the k -bounded hypergraphs with the hierarchical property. Since we think that the problem is of self interest, however, we consider the problem for arbitrary k -bounded hypergraphs. We present an adaptive randomized algorithm for the problem to show Theorem 3 Suppose that G is an unknown k -bounded hypergraph with n vertices and m edges for constant k . Then, for any constant 0 < 11 the hyperedges of G can be ,b found with probability 1-O n y using O (m log n) crossmembership queries under the oracle with one-sided error . (The number of cross-membership queries is 2O(k) in k .) Our algorithm for Theorem 3 iteratively uses binary search to find the hyperedges. In this sense, it is analogous to the algorithm of Angluin and Chen [AC04, AC05, AC06] for the hypergraph finding problem with edge-detecting queries or to the algorithm of Reyzin and Srivastava [RS07] for the graph finding problem with edge-counting (or additive) queries.5 On the other hand, since the answers of the oracle may contain errors in our situation, we need to handle the error bound more carefully, which is the main task in proving Theorem 3. A large deviation result for a sum of independent random variables with geometric distribution is crucially used for the task. (There have been a number of papers addressing the problem of finding a graph or a hypergraph by using various types of queries. For example, see [AA04, AA05, ABK+ 02, ABK+ 04, AC06, BAA+ 01, BGK05, CK08, GK00].) Theorem 1 is obtained from Theorems 2 and 3 and the equivalence between the problems of finding the Fourier coefficients and the linkage graph. The remainder of the paper is organized as follows. In Section 2, we review some basic facts and previous results for the problem of finding the Fourier coefficients of k -bounded functions. In Section 3, we prove Theorem 2, which states the linkage testability of a linkage graph, by proving relevant lemmas. Section 4 deals with the graph finding problem with cross-membership queries under the probabilistic oracle as an independent problem. In the section, we give a randomized algorithm for the problem and analyze it to obtain Theorem 3. In Section 5, some remarks on the query and time complexity of the proposed algorithm are provided along with a factor of improving the complexity. Finally, concluding remarks closes the paper in Section 6. 2 Preliminaries 2.1 Linkage Test Function Munetomo and Goldberg [MG99] proposed a perturbation method to test for the existence of linkage in a 2-subset of [n]. Given a 2-subset S and a string x, it checks the nonlinearity between the two bits in H by flipping the two bits of x individually and simultaneously and adding/subtracting the function values at the flipped strings. Heckendorn and Wright [HW04] generalized the method to detect linkage for subsets of any order. Suppose that f is a pseudo-Boolean function defined on {0, 1}n , S is a subset of [n], and x is a string in {0, 1}n . They considered the linkage test function L depending on f , S , and x as follows: A L(f , S, x) = (-1)|A| f (x 1A ) . S Here, 1A represents the string consisting of ones in the bit positions of A and zeros in the rest. For two strings x, y {0, 1}n , x y means the bitwise addition modulo 2 of x and y . The linkage test function L performs a series of function evaluations at x and the strings obtained by flipping x in order to detect the existence of the linkage among the bits in S . Heckendorn and Wright [HW04] proved the following theorem, which shows the usefulness of the linkage test function in finding hyperedges of Gf . 5 An edge-detecting query asks the existence of an edge (or a hyperedge) in a set of vertices while an edge-counting query asks the number of edges (or hyperedges) in a set of vertices. Proposition 1 Suppose that f is a k -bounded function defined on {0, 1}n . Then, the followings hold: (a) A subset S of [n] is a hyperedge of Gf if and only if L(f , S, x) = 0 for some string x {0, 1}n . (b) For a hyperedge S of order j in Gf , the probability that L(f , S, x) = 0 for a string x chosen uniformly at random from {0, 1}n is at least 2k1 j . - Proposition 1 indicates that the linkage test function determines the existence of a hyperedge with one-sided error. Thus, by repeatedly evaluating the linkage test function for randomly chosen strings, we can make the error arbitrarily small. In particular, when k is a constant, this implies that a constant number of linkage tests (consequently, a constant number of function evaluations) is enough for determining the existence of a hyperedge with error probability at most a given constant. The hierarchical property among hyperedges implies that, for j 2, a j -subset H can be a hyperedge only if every (j - 1)-subset of H is a hyperedge. Based on this observation, Heckendorn and Wright [HW04] proposed a randomized algorithm that performs linkage test only for such a hyperedge candidate: The algorithm first detects the hyperedges of order one by investigating all the singleton subsets of [n]. Then, for j from 2 to k , it detects the hyperedges of order j by performing linkage test for the hyperedge candidates of order j that have been identified from the information of the hyperedges of lower order. Recently, the performance of the algorithm was fully analyzed by Choi et al. [CJM08]. Given a k -bounded function f with m hyperedges and a constant > 0, they showed that the algorithm finds the linkage graph Gf in O ((n, m)m log n) function evaluations with error probability at most , where (n, m) 1 is between n 2 and n depending on m. 2.2 A Fourier Transform Walsh transform is a Fourier transform for the space of pseudoBoolean functions in which a pseudo-Boolean function is represented as a linear combination of 2n basis functions called Walsh functions [Wal23]. For each subset H of [n], the Walsh function corresponding to H , H : {0, 1}n R, is defined as P H (x) = (-1) iH x[i] , where x[i] represents the ith bit value in x. If we define an inner product of two pseudo-Boolean functions f and g as x f (x) · g (x) f, g = , 2n n {0,1} Heckendorn and Wright [HW04] provided a number of results to show the relation between the linkage test function and the Fourier coefficients. Some of them are summarized in the following proposition. Proposition 2 Suppose that f is a pseudo-Boolean function defined on {0, 1}n . Then, the followings hold: (a) For a subset H of [n], f (H ) is a maximal non-zero Fourier coefficient of f if and only if H is a maximal hyperedge of Gf . (b) For a maximal hyperedge H [n], n f (H ) = L(f , H, 0 ) . |H | 2 (c) For a subset H of [n], n H f (H ) = L(f , H, 0 ) - (H ). 2|H | Hf (d) For subsets H and H A L(f , H , 0n ) = o f [n] with H H \H , (-1)|A| L(f , H, 1A ). H Proposition 2 (a) says that the subsets of [n] with maximal non-zero Fourier coefficients of f are the maximal hyperedges in the linkage graph of f . Thus, from Proposition 2 (b), the maximal non-zero Fourier coefficients of f are found by evaluating the linkage test function at the zero string for each maximal hyperedge. Once the maximal nonzero Fourier coefficients are found, the Fourier coefficients corresponding to the subsets of lower orders can be found by successively applying Proposition 2 (c). Proposition 2 (d) implies that no additional function evaluations are required for finding the Fourier coefficients corresponding to the subsets of lower orders. Hence, if f is k -bounded for constant k and m is the number of hyperedges in Gf , O(m) additional function evaluations are enough to m nd thef Fourier fi o coefficients of f . On the other hand, lolg g n unction m evaluations are required for finding the linkage graph of a k -bounded function for constant k as shown in [CJM08]. Thus, the problem of finding the Fourier coefficients of a k -bounded function for constant k is equivalent to the problem of finding the linkage graph in terms of the number of function evaluations required up to a constant factor. 3 Generalized Linkage Test 3.1 Generalized Linkage Test Function Let f be a pseudo-Boolean function defined on {0, 1}n , S be a collection of disjoint subsets of [n], and x be a string in {0, 1}n . We define the generalized linkage test function L depending on f , S , and x as follows: x A . S |S | L (f , S , x) = (-1) f A S the set of Walsh functions, {H | H [n]}, becomes an orthonormal basis of the space of pseudo-Boolean functions. Hence, a pseudo-Boolean function f can be represented as H f (H ) · H , f= [n] where f (H ) = f, H is called the Fourier coefficient corresponding to H . Specifically, if f (H ) = 0 and f (H ) = 0 Hf for any H , (H ) is called a maximal non-zero Fourier coefficient of f . We refer to [HW99] for surveys of the properties of Walsh functions and Walsh transform in the space of pseudo-Boolean functions. S 1 If we let SH = {{a} | a H } for a subset H of [n], we see that L (f , SH , x) = L(f , H, x) for any x {0, 1}n . The following lemmas describes the basic properties of the generalized linkage test function. Lemma 4 Suppose that S is a collection of disjoint subsets of [n]. Then, the followings hold: (a) (Linearity) If f1 , . . . , f are pseudo-Boolean functions defined on {0, 1}n and c1 , . . . , c are constants, i =i L =1 n crossing among R. This implies that f (H ) = 0 for all H such that H A = for all A R, by definition of Gf and Proposition 2. Thus, by Lemma 4 (a), L (f , R, x) = H f (H )L (H , R, x) , ci fi , S , x =1 ci L (fi , S , x) for all x {0, 1} . (b) (Recursion) If f is a pseudo-Boolean function defined on {0, 1}n , L (f , S , x) = L (f , S \ {A}, x) - L (f , S \ {A}, x 1A ) for any A S and any x {0, 1}n . Proof: Omitted. Lemma 5 Suppose that f is a pseudo-Boolean function defined on {0, 1}n and S is a collection of disjoint subsets of [n]. If the support set of f is disjoint with some A S , L (f , S , x) = 0 for all x {0, 1}n . Proof: Omitted. 3.2 Linkage Test Theorem A collection of disjoint subsets of [n], R = {R1 , . . . , Rj }, is called a setwise subcollection of S if Ri Si for all 1 i j . In this case, we denote by R S. Note that a setwise subcollection R of S is allowed to contain multiple empty sets from the definition. We consider a random model (S ) that generates a setwise subcollection of S as follows: For each Si S , we select each element in Si independently and with probability 1 and put it into Ri . Then, we build a 2 setwise subcollection R of S by letting R = {Ri | 1 i j }. In the following, str(R) denotes the set of the strings, x's, such that x has the same bit value in the bit positions in Ri for all 1 i j : str(R) = {x {0, 1}n | x[a] = x[b] for all a, b such that a, b Ri for some i with 1 i j } . Theorem 6 Suppose that f is a k -bounded function and S is a collection of disjoint subsets of [n]. Then, the followings hold: (a) The linkage graph Gf contains a hyperedge crossing among S if and only if there exist R S and x str(R) such that L (f , R, x) = 0. (b) If Gf contains a hyperedge crossing among S , the probability that L (f , R, x) = 0 for a randomly generated R from (S ) and a string x chosen uniformly at random from str(R) is at least 21k . 2 Theorem 6 implies Theorem 2 and provides an efficient method to test for the existence of a hyperedge crossing among a given collection of sets of vertices in the linkage graph. Proof: Since (b) implies the only-if part of (a), we first prove the if part of (a) and then prove (b). Suppose that Gf does not contain any hyperedge crossing among S . Let R be a setwise subcollection of S and let x be a string in {0, 1}n . Since Gf does not contain any hyperedge crossing among S , Gf does not contain any hyperedge where the summation is over the subsets, H 's, such that H A = for some A R. Since the support set of H is H for any H [n], L (H , R, x) = 0 for H 's in the summation by Lemma 5 and so L (f , R, x) = 0. Now, consider the proof of (b). Let S = {S1 , . . . , Sj }. For a setwise subcollection R of S , let R = {R1 , . . . , Rj }, where Ri Si for all 1 i j . Let ri = |Ri | and j r = i=1 ri . For each Ri , set a distinct bit position ai i [n - r + j ] and, for each i [n] \ ( Ri ), set a distinct bit position bi [n - r + j ]. For each H [n], define H,R : {0, 1}n-r+j R as follows: If |H Ri | is odd for all 1 i j , H,R (y ) = (-1) Pj i=1 y [ai ]+ P S i H \( i Ri ) y [ bi ] for any y {0, 1}n-r+j . Otherwise, H,R is the zero function that assigns zero value to all input strings y {0, 1}n-r+j . For each x str(R), assign the string yx,R {0, 1}n-r+j such that yx,R [ai ] = x[a] for some a Ri for all 1 i j i and yx,R [bi ] = x[i ] for all i [n] \ ( Ri ). Note that {yx,R |x str(R)} = {0, 1}n-r+j and the sets str(R) and {yx,R |x str(R)} are in one-to-one correspondence. Letting SR = {{ai } | 1 i j }, we have Claim 7 For any H [n], L (H , R, x) = L (H,R , SR , yx,R ) for all x str(R). Proof: Suppose that |H Ri | is odd for all 1 i j . Let ui be u bit position in H Ri for 1 i j . For all x str(R), a H Ri x[u] = |H Ri | · x[ui ] = x[ui ] (mo d 2) for all 1 i j and so H (x) = = (-1) (-1) PP i uH Ri x[u]+ P S i H \( i Ri ) x[i ] P i x[ui ]+ y P S i H \( i Ri ) x[i ] = (-1) i x,R = H,R (yx,R ). P [ai ]+ P S i H \( i Ri ) yx,R [bi ] A f Let xR = x or R R. i f x str(R), I R 1A . xR str(R) and yxR ,R = yx,R :Ri R 1{ai } Hence, for all x str(R), L (H , R, x) = R | (-1)|R H = R x A A R 1 R =R = = R = S (-1)|R H (xR R | ) (-1)|R H,R R | y xR |H Sl | for all 1 l i - 1. Let Ai be a set consisting of an element in H Si and let Bi = (H Si ) \ Ai . Let R = {R1 , . . . , Rj }, where Ri Si for all 1 i j . Since i i Ai Bi = H Si and |Ai Bi | = |H Si | |H | k , the probability that Ri Ai and Ri Bi for all 1 i j is at least 21 . k Consider the condition that Ri Ai and Ri Bi for all 1 i j . Denote H = {H [n] | f (H ) = 0, i H (i Bi ) (H \ (i Si )) , and |H Si | = |H Si | for all i}. {ai } :Ri R 1 ,R y (-1) R |{ai |Ri R }| H,R y x,R B x,R (-1) S R |S | H,R S 1 B L (H,R , SR , yx,R ). Now, suppose that |H Ri | is even for some i. For all x {0, 1}n , L (H , R \ {Ri }, x) = L (H , R \ {Ri }, x 1Ri ) and so L (H , R, x) = 0 by Lemma 4 (b). Since H,R is the zero function, on the other hand, L (H,R , SR , y ) = 0 for all y {0, 1}n-r+j . Hence, L (H , R, x) = L (H,R , SR , yx,R ) for all x str(R). Define the pseudo-Boolean function gf ,R : {0, 1}n-r+j R by H f (H ) · H,R . gf ,R = [n] It is clear that H H . Given the condition, if H,R = H ,R , H should be in H . Thus, in the Walsh transform of gf ,R , the Walsh coefficient corresponding to the Walsh funcHf tion H ,R is equal to (H ), where the summation is over H 's such that H H and (H Si \ Bi ) Ri for all i. Since H was chosen in a maximal sense as mentioned, for any H H , |H Si \ Bi | = 1 for all 1 i j . Thus, when we choose each element in Si \ (Ai Bi ) independently and 1 with probability 2 and put it into Ri , the conditional probaHf bility that (H ) = 0, where the summation is over H 's such that H H and (H Si \ Bi ) Ri for all i, is at 1 least 2j . In this case, H ,R may be expressed as H for H [n - r + j ] such that H = {ai | 1 i j } {bi | i (i Bi ) (H \ (i Si ))} Claim 8 For all x str(R), L (f , R, x) = L (gf ,R , SR , yx,R ). Proof: By Lemma 4 (a) and Claim 7, H f (H ) · L (H , R, x) L (f , R, x) = [n] and the Walsh coefficient corresponding to H in the Walsh transform of gf ,R is non-zero. At this time, the linkage graph of gf ,R has the j -hyperedge crossing among SR = {{ai } | 1 i j }. Therefore, the probability that the linkage graph of gf ,R has the hyperedge crossing among SR for a setwise subcollection R randomly generated from (S ) is at least 2k1 j and + the proof is completed. Since f is a k -bounded function, gf ,R is also k -bounded. Thus, when the linkage graph of gf ,R has the hyperedge crossing among SR , the probability that L (gf ,R , SR , y ) = 0 for a string y chosen uniformly at random from {0, 1}n-r+j is at least 2k1 j by Proposition 1 (b). Hence, by Claim 9, the - probability that L (gf ,R , SR , y ) = 0 for a setwise subcollection R randomly generated from (S ) and a string y chosen uniformly at random from {0, 1}n-r+j is at least 21k . Since 2 the sets str(R) and {0, 1}n-r+j = {yx,R | x str(R)} are in one-to-one correspondence, we have the part (b) of the theorem by Claim 8. = H [n] f (H ) · L (H,R , SR , yx,R ) = L (gf ,R , SR , yx,R ), for all x str(R). Suppose that Gf contains a hyperedge crossing among S. Claim 9 Suppose that a setwise subcollection R is randomly generated from (S ). Then, the probability that the linkage graph of gf ,R has the hyperedge crossing among SR is at least 2k1 j . + Proof: Since Gf contains a hyperedge crossing among S , there exist subsets H 's such that f (H ) = 0 and H Si = for all Si S . Among those subsets, we choose a maximal subset H in viewpoint of the size of intersection with Si 's: For each 1 i j , |H Si | |H Si | for any H such that f (H ) = 0, H Si = for all Si S , and |H Sl | = 4 Finding Graphs with Cross-Membership Queries In this section, we focus on the problem to find an unknown hypergraph with cross-membership queries under the oracle with one-sided error . Recall that, given a cross-membership query, the oracle with one-sided error correctly answers with probability at least 1 - if the true answer for the query is YES and it correctly answers with probability 1 otherwise. Section 4.1 presents a randomized algorithm for the graph finding problem. The algorithm is analyzed in Section 4.2, which induces Theorem 3. GRAPHFINDINGALGORITHM(n,k,) // Ej : the set of the hyperedges of order j found so far // Q : the set of the vertices in the hyperedges of order j found so far // W : the set of the vertices v such that all the hyperedges of order j containing v have been found by the algorithm for j from 1 to k Q , W ; Ej ; repeat (Si )j=1 C H E C K E X I S T E N C E(,W ,j ); i if (Si )j=1 = NULL, break; i v B I NA RY S E A R C H((Si )j=1 ,1); i Q Q {v }; while Q \ W = choose a vertex v in Q \ W ; Ev,j F I N D H Y P E R E D G E S({v },W ,j ); Ej Ej EH,j ; v ; QQ Ev,j H W W {v }; k E j =1 Ej ; return E ; Figure 1: Main procedure of the algorithm G FA (The output of G FA is the set of the hyperedges of the input graph that have been found. For the subprocedures, C H E C K E X I S T E N C E, B I NA RY S E A R C H, and F I N D H Y P E R E D G E S, see Figures 2, 3, and 4, respectively.) 4.1 Algorithm for Finding Graphs algorithm. The variable Ej contains the hyperedges of order j found so far. To check the existence of a new connected component of two or more vertices in the subgraph consisting of the hyperedges of order j , G R A P H F I N D I N G A L G O R I T H M calls the subprocedure C H E C K E X I S T E N C E. Given sets of vertices U and W and a positive integer j , the procedure C H E C K E X I S T E N C E performs a randomized test for whether there is a hyperedge of order j that contains all the vertices in U and does not contain the vertices in W . For the purpose, it iteratively generates a collection of disjoint sets of vertices (Si )j=1 for a cross-membership query i as follows. Letting U = {v1 , . . . , v|U | }, the set Si is fixed with Si = {vi } for 1 i |U |. The sets S|U |+1 , . . . , Sj are generated as a uniform random partition of vertices in [n] \ (U W ). If the oracle answers YES for the crossmembership query with some (Si )j=1 , there is a hyperedge i of order j crossing among Si 's, which contains the vertices in U and does not contain the vertices in W . In this case, C H E C K E X I S T E N C E returns the generated sets (Si )j=1 . If the i oracle answers NO for all the generated collections of disjoint sets, C H E C K E X I S T E N C E returns NULL regarding that there is no such a hyperedge. If C H E C K E X I S T E N C E returns NULL, G R A P H F I N D I N G A L G O R I T H M regards that there is no hyperedge of order j and continues to find the hyperedges of order j + 1. If C H E C K E X I S T E N C E returns a (non-NULL) collection of disjoint sets of vertices, this implies that there is a hyperedge of order j . To find a vertex in the hyperedge, G R A P H F I N D I N G A L G O R I T H M calls the subprocedure B I NA RY S E A R C H . Given a collection of disjoint sets of vertices (Si )j=1 and a i positive integer r between 1 and j , the procedure B I NA RY- In this section, we present the algorithm to find an unknown hypergraph with cross-membership queries under the oracle with one-sided error , the Graph Finding Algorithm (GFA). The algorithm GFA takes three arguments: The number of vertices of the unknown hypergraph n, the order of the hypergraph k , and the error bound for the answer of the oracle 0 < 1. It returns the set of the hyperedges of the hypergraph that have found. The algorithm GFA consists of the main procedure G R A P H F I N D I N G A L G O R I T H M (Figure 1) and the three subprocedures C H E C K E X I S T E N C E (Figure 2), B I NA RY S E A R C H (Figure 3), and F I N D H Y P E R E D G E S (Figure 4). In the pseudocode, the values of n, k , and can be accessed by any procedure. All other variables are local to the given procedure. Suppose that G is an unknown hypergraph given to GFA and let Gj be the induced subgraph of G consisting of the hyperedges of order j for 1 j k . The algorithm GFA successively finds the hyperedges of G1 , G2 , and so on. After the algorithm finally finds the hyperedges of Gk , it returns all the hyperedges found so far. To find the hyperedges of Gj for j = 1, . . . , k , the algorithm iteratively checks whether there is a hyperedge of order j that has not been found and, if such a hyperedge exists, the algorithm finds all the hyperedges in the connected component that the hyperedge belongs to. It continues this process until there is no more hyperedge that can be found. In the main procedure G R A P H F I N D I N G A L G O R I T H M, the variable Q contains the vertices in the hyperedges found so far. The variable W contains the vertices v such that all the hyperedges of order j containing v have been found by the CHECKEXISTENCE(U ,W ,j ) label the vertices in U as v1 , . . . , v|U | ; for i from 1 to |U | Si {vi }; for i from |U | + 1 to j Si ; j j+ repeat e 1- 1 log n times for each v [n] \ (U W ) choose i uniformly at random from {|U | + 1, . . . , j }; Si Si {v }; if CMQ(S1 , . . . , Sj ) = YES return (Si )j=1 ; i return NULL; Figure 2: Procedure to check the existence of a hyperedge of order j that contains all the vertices in U and does not contain the vertices in W (Here, CMQ((Si )j=1 ) is the answer of the oracle for the cross-membership query (Si )j=1 .) i i B I NA RY S E A R C H((Si )j=1 ,r) i if |Sr | = 1, return the vertex in Sr ; + repeat 6(j-1) log n times 1 choose a subset Sr of Sr uniformly at random among the subsets of order |Sr | ; 2 if CMQ(S1 , . . . , Sr-1 , Sr , Sr+1 , . . . , Sj ) = YES, r Sr S ; if |Sr | = 1, return the vertex in Sr ; return a vertex in Sr ; Figure 3: Procedure to search a vertex in Sr that is contained in a hyperedge of order j crossing among S1 , . . . , Sj (Here, CMQ((Si )j=1 ) is the answer of the oracle for the cross-membership query (Si )j=1 .) i i S E A R C H returns a vertex that is in Sr and in one of the hyperedges crossing among Si 's. Among the subsets of Sr of order |Sr | , it chooses a subset Sr uniformly at random. For 2 the sets of vertices (Si )j=1 in which Sr is replaced with Sr , i it asks the cross-membership query to check whether there is a hyperedge crossing among the sets. If the answer of the oracle is YES, i.e., if it turns out that there is a hyperedge crossing among the sets, it replaces Sr with Sr . The procedure B I NA RY S E A R C H repeats this process at most a specified number of times until there remains one vertex in Sr . If there remains one vertex in Sr before the specified number of iterations, B I NA RY S E A R C H returns the vertex. Otherwise, it fails to exactly search the desired vertex and returns an arbitrary vertex in Sr . Once a vertex in the new connected component is found by B I NA RY S E A R C H, G R A P H F I N D I N G A L G O R I T H M puts the vertex into Q and repeats the following process while Q \ W = . It chooses a vertex v in Q \ W and finds all the hyperedges of order j containing v by calling the subprocedure F I N D H Y P E R E D G E S. Given two sets of vertices U and W and a positive integer j , F I N D H Y P E R E D G E S returns the set of the hyperedges of order j that contain the vertices in U and do not contain the vertices in W . In the procedure F I N D H Y P E R E D G E S, the variable A contains the vertices such that the desired hyperedges of order j containing the vertices in A have been found. Initially, A is set to be empty. If |U | = j , U is the only hyperedge of order j containing the vertices in U and F I N D H Y P E R E D G E S returns the set consisting of U . Otherwise, it recursively finds the desired hyperedges of order j as follows. By calling C H E C K E X I S T E N C E, It first checks whether there is a hyperedge of order j that contains the vertices in U and does not contain the vertices in W . If C H E C K E X I S T E N C E returns NULL, F I N D H Y P E R E D G E S regards that there is no such a hyperedge and returns the set of the hyperedges found so far. Otherwise, it chooses a vertex v in the hyperedge by calling B I NA RY S E A R C H. Then, it finds the hyperedges of order j that contain the vertices in U {v } and does not contain the vertices in W A by calling F I N D H Y P E R E D G E S recursively. After that, it puts v into A and continues to find the desired hyperedges of order j not containing the vertices in A. After all the hyperedges of order j containing v are found, they are put into Ej . The vertices contained in the hyperedges are put into Q to mark that they are in the connected component being searched. The vertex v is put into W to prevent the hyperedges of order j containing v from being searched again. 4.2 Algorithm Analysis In this section, we analyze the algorithm GFA to obtain Theorem 3. We first analyze the number of cross-membership FINDHYPEREDGES(U ,W ,j ) if |U | = j , return {U }; EU,j , A ; repeat (Si )j=1 C H E C K E X I S T E N C E(U ,W A,j ); i if (Si )j=1 = NULL, break; i v B I NA RY S E A R C H((Si )j=1 ,|U | + 1); i EU,j EU,j F I N D H Y P E R E D G E S(U {v },W A,j ); A A {v }; return EU,j ; Figure 4: Procedure to find the hyperedges of order j that contain all the vertices in U and do not contain the vertices in W queries used in GFA. Lemma 10 Suppose that G is an unknown k -bounded hypergraph with n vertices and m hyperedges for constant k . Then, for any constant 0 < 1, GFA uses O (m log n) cross-membership queries for G under the oracle with onesided error . Proof: Omitted. To analyze the error probability of GFA, we need a large deviation result for a sum of independent random variables following geometric distributions. A random variable X follows the geometric distribution with parameter p if, for a coin of which HEAD appears with probability p, X is the number of coin tosses until the first HEAD appears. It is easy to show 1 that the expectation of X is p . We obtain the desired result by using the Chernoff bound as follows [Che52, MR95]. 3 Suppose that, for some 0 < p 1, X1 , . . . , X re independent random variables such that Pr[Xi = 1] = p and Pr[Xi = 0] = 1 - p for all 1 i . Let X = i =1 Xi . Then, for any 0 < 1, . - E[X ]2 Pr [X (1 - )E[X ]] exp 2 P a roposition Now, we present the result for a sum of independent random variables following geometric distributions. Lemma 11 Suppose that, for some 0 < p 1, X1 , . . . , X a re independent random variables each of which follows the i geometric distribution with parameter p. Let X = =1 Xi . Then, for any > 0, - . 2 2 Pr [X > (1 + )E[X ]] exp (1 + ) Proof: Omitted. Lemma 12 Suppose that G is an unknown k -bounded hypergraph with n vertices and m hyperedges for constant k . Then, for any 0 < 1, GFA correctly finds the hyper1u edges of G with probability 1 - O n nder the oracle with one-sided error . Proof: We will show that the probab1lity that GFA does not if find all the hyperedges of Gj is O n or each j with 1 j k . Then, the lemma follows by the union bound. We first consider the probability that C H E C K E X I S T E N C E performs incorrectly for given arguments U , W , and j . Suppose that there is no hyperedge of order j in G that contains the vertices in U and does not contain the vertices in W . In this case, C H E C K E X I S T E N C E returns NULL and the probability of C H E C K E X I S T E N C E being incorrect is zero. Suppose that there is a hyperedge of order j in G that contain the vertices in U and does not contain the vertices in W . Let U = {v1 , . . . , v|U | } and let the hyperedge of order j be {v1 , . . . , v|U | , v|U |+1 , . . . , vj }. The probability that ( v|U |+1 , . . . , vj are put into different Si 's is (j -j|-|U |)!|U | . When U |) j - v|U |+1 , . . . , vj are put into different Si 's, the probability that the oracle answers YES for the cross-membership query (Si )j=1 i is at least 1 - . Thus, for each iteration of the repeat loop in C H E C K E X I S T E N C E, the probability that the hyperedge is not ( detected is at most 1 - (j -j|-|U |)!|U | (1 - ). Hence, the probU |) j - j+ i ability that the hyperedge is not detected for e 1- 1 log n terations of the repeat loop is at most j+ 1 ej1- 1 log n (j - |U |)! . - (1 - ) (j - |U |)j -|U | j By using the fact that 1 - x e-x for any real x, this value is at most - . (j - |U |)!ej j + 1 exp log n (j - |U |)j -|U | ( j! After some calculation using the facts that (j -j|-|U |)!|U | j j U |)j - jj 1 and j ! > 2 j e e 12j+1 , we have - (j - |U |)!ej j + 1 exp log n (j - |U |)j -|U | exp (-(j + 1) log n) 1 = j +1 . n Thus, the probability of C H E C K E X I S T E N C E being incorrect is at most nj1 1 . + Now, we bound the probability that B I NA RY S E A R C H performs incorrectly for given arguments (Si )j=1 and r. To i this end, we consider an imaginary procedure B S' that is the same as B I NA RY S E A R C H except that, in the procedure B S', the repeat loop continues until the size of Sr becomes one. In the repeat loop of B S', Sr is iteratively halved and updated. Suppose that the size of Sr becomes one after Sr is halved and updated t times. For 1 i t, let Xi be the number of iterations of the repeat loop between the (i - 1)th update and the ith update of Sr . Let v be a vertex of a hyperedge crossing among Si 's that is in the initial Sr . When v is in the (i - 1) times updated Sr , the probability that v is 1 chosen as an element of Sr is at least 3 . (The extreme case is when the order of Sr is three.) Thus, Xi follows a geometric distribution with the parameter at least 1 (1 - ). If we let 3 t X = i=1 Xi , by linearity of expectation, E [X ] Thus, X = Pr 6(j + 1) log n 1- X 2 (j + 1) log n Pr > t X 2 (j + 1) log n Pr > t > 3t . 1- some v U W A. Thus, it must be found by a recursive call of F I N D H Y P E R E D G E S later. Returning to the main procedure G R A P H F I N D I N G A L G O R I T H M , for each vertex v [n], the hyperedges of order j containing v are found by F I N D H Y P E R E D G E S in the while loop and so all the hyperedges of Gj are found by GFA. It is clear that the set of the hyperedges of order j returned by GFA is included in the set of the hyperedges of Gj . Thus, GFA finds the hyperedges of Gj correctly, given the condition that C H E C K E X I S T E N C E and B I NA RY S E A R C H correctly perform. Therefore, GFA correctly finds the hyperedges of 1. Gj with probability 1 - O n w a Theorem 3 follows from Lemmas 10 and 12. Here, 2 e m mention that it is more straightforward to obtain O log n lgorithm for ahe hypergraph finding problem (and hence t m2 O log n lgorithm for finding the Fourier coefficients) by querying the oracle (log n) times for each cross-membership query to make the error probability O (1/poly(n)). For the k -bounded hypergraph finding problem, it is not difficult to show that any randomized algorithm requires (m log n) cross-membership queries for constant k to make the error probability at most a given constant, provided that m nk- for any constant > 0. (To obtain the lower bound, we may use Yao's minimax principle [Yao77] and the information-theoretic arguments based on the fact that, for a cross-membership query, the oracle returns one of two values.) Thus, GFA is optimal up to a constant factor, provided that m nk- for any constant > 0. Note that this does not mean the optimality of the proposed algorithm for the problem of finding Fourier coefficients. While the oracle for the hypergraph finding problem gives binary values, function evaluations for the problem of finding Fourier coefficients give real values that may give more information about the Fourier coefficients. 3 t 1- E . [X ] Since Xi 's are independent, letting 1 + = 2(j +1t) log n , we apply Lemma 11 to the above inequality to obtain X - 6(j + 1) 2 t Pr > log n exp 1- 2(1 + ) exp (-(j + 1) log n) 1 = . j +1 n Thus, the probability of B I NA RY S E A R C H performing incorrectly is at most nj1 1 as it is at most the probability of X + (j +1) being more than 6 1- log n . The number of C H E C K E X I S T E N C E and B I NA RY S E A R C H being called for GFA to find the hyperedges of Gj are at most j 2 m, respectively. Thus, in the process of GFA finding the hyperedges of Gj , the probability that C H E C K E X I S T E N C E or B I NA RY S E A R C H incorrectly perform once or s 1 2 2j m n j2 more times is at most 2jj+1 2jj+1 = 2n , which is O n n n ince j k, for constant k . This means that, with probability 1 1 - O n C H E C K E X I S T E N C E and B I NA RY S E A R C H performs correctly throughout the process of GFA finding the hyperedges of Gj . Suppose the condition that C H E C K E X I S T E N C E and B I NA RY S E A R C H correctly perform throughout the process of GFA finding the hyperedges of Gj . We show that, given U , W , and j , F I N D H Y P E R E D G E S correctly return the set of the hyperedges of order j containing the vertices in U and not containing the vertices in W . Suppose that, for any u A, the hyperedges of order j containing the vertices in U {u} and not containing the vertices in W have been found by F I N D H Y P E R E D G E S. At this time, any hyperedge that has not been found is a hyperedge containing the vertices in U {v } and not containing the vertices in W A for 5 Remarks on Query and Time Complexity Suppose that we are given a k -bounded function f defined on {0, 1}n with m non-zero Fourier coefficients. To find the Fourier coefficients of f , we first find the hyperedges of the linkage graph of f . From Theorem 6, we have the oracle with one-sided error = 1 - 21k that gives the answer for a cross2 membership query by using 2k function evaluations. Since f has m non-zero Fourier coefficients, the linkage graph of f has at most 2k m hyperedges. Given a k -bounded hypergrap( with n vertices and at most 2k m hyperedges, GFA uses h c )k 3.5 O 2e1-k m log n ross-membership queries as shown in the proof of Lemma 10. Thus, we can find the hyperedges of the linkage graph off f (with high probability) by ( using O 16e)k k 3.5 m log n unction evaluations. Once the linkage graph of f is ob2ained,athe Fourier cot efficients can be found by using O k m dditional function evaluations from Proposition 2. Thus, the overall query complexity of findi( g the Fourier coefficients of f (with high n . probability) is O 16e)k k 3.5 m log n This is O (m log n) for constant k and Theorem 1 follows. Another important issue in practical applications is the time complexity of the algorithm. From the pseudocode of the proposed algorithm, we can check that the time complexity of the algorithm is O (nm log n) for constant k . (It is exponential in k .) We should note that GFA does not assume the hierarchical property among the hyperedges. The query complexity of GFA can be improved for the restricted class of the k bounded hypergraphs with the hierarchical property. Thus, the query complexity of finding the Fourier coefficients of a k -bounded function can be improved for general k . More concretely, to find the hyperedges of order j , we consider only the subsets of order j that contain some hyperedge of order j - 1 that have been already found. This reduces it j t to O 1- log n he number of iterations of the repeat loop in C H E C K E X I S T E N C E for checking the existence of a hyperedge of order j . (It also reduces the number of C H E C K E X I S T E N C E and B I NA RY S E A R C H being called to O (k m).) By this modification, the query complexity of GFA for finding a k -bounded hypergraph with 2n vertices and at most . k2 k 2k m hyperedges is reduced to O 1- m log n If we use this modified version of GFA, the query complexity of find( f ing the Fourier coefficients is to be O 16)k k 2 m log n or a k -bounded function defined on {0, 1}n with m non-zero Fourier coefficients. 6 Conclusion In this paper, we showed that the Fourier coefficients of a k bounded function with m non-zero Fourier coefficients can be found in O (m log n) function evaluations for constant k . To this end, we first showed that the problem of finding the Fourier coefficients of a k -bounded function is reduced to the problem of finding a k -bounded hypergraph with crossmembership queries under the oracle with one-sided error. Then, we gave a randomized algorithm for the hypergraph finding problem and analyzed it to obtain the desired bound. As shown in the previous section, the query (and time) complexity of the proposed algorithm is exponential in k . Although the main concern of this paper is the case when k is constant, it would be worth trying to find an algorithm with better query (and time) complexity for general k . References N. Alon and V. Asodi. Learning a hidden subgraph. In Proceedings of the 31st International Colloquium on Automata, Languages and Programming (ICALP 2004), pages 110­ 121, 2004. [AA05] N. Alon and V. Asodi. Learning a hidden subgraph. SIAM Journal on Discrete Mathematics, 18(4):697­712, 2005. [ABK+ 02] N. Alon, R. Beigel, S. Kasif, S. Rudich, and B. Sudakov. Learning a hidden matching. In Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science (FOCS 2002), pages 197­206, 2002. [ABK+ 04] N. Alon, R. Beigel, S. Kasif, S. Rudich, and B. Sudakov. Learning a hidden matching. SIAM Journal on Computing, 33(2):487­501, 2004. [AC04] D. Angluin and J. Chen. Learning a hidden graph using O(log n) queries per edge. In Proceedings of the 17th Annual Conference on [AA04] Learning Theory (COLT 2004), pages 210­ 223, 2004. [AC05] D. Angluin and J. Chen. Learning a hidden hypergraph. In Proceedings of the 18th Annual Conference on Learning Theory (COLT 2005), pages 561­575, 2005. [AC06] D. Angluin and J. Chen. Learning a hidden hypergraph. Journal of Machine Learning Research, 7:2215­2236, 2006. [BAA+ 01] R. Beigel, N. Alon, M. S. Apaydin, L. Fortnow, and S. Kasif. An optimal procedure for gap closing in whole genome shotgun sequencing. In Proceedings of the Fifth Annual International Conference on Computational Molecular Biology (RECOMB 2001), pages 22­30, 2001. [BGK05] M. Bouvel, V. Grebinski, and G. Kucherov. Combinatorial search on graphs motivated by bioinformatics applications: A brief survey. In the 31st International Workshop on GraphTheoretic Concepts in Computer Science (WG 2005), pages 16­27, 2005. [BJT04] N. H. Bshouty, J. C. Jackson, and C. Tamon. More efficient PAC-learning of DNF with membership queries under the uniform distribution. Journal of Computer and System Sciences, 68(1):205­234, 2004. [BT96] N. H. Bshouty and C. Tamon. On the Fourier spectrum of monotone functions. Journal of the ACM, 43(4):747­770, 1996. [CC06] D. J. Coffin and C. D. Clack. gLINC: Identifying composability using group perturbation. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 1133­1140, 2006. [Che52] H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23:493­509, 1952. [CJM08] S. S. Choi, K. Jung, and B. R. Moon. Lower and upper bounds for linkage discovery. IEEE Trans. on Evolutionary Computation, 2008. In press. [CK08] S. S. Choi and J. H. Kim. Optimal query complexity bounds for finding graphs. In Proceedings of the 40th Annual ACM Symposium on Theory of Computing (STOC 2008), 2008. To appear. [dBIV97] J. S. de Bonet, C. L. Isbell, Jr., and P. Viola. MIMIC: Finding optima by estimating probability densities. In Proceedings of the Advances in Neural Information Processing Systems, volume 9, pages 424­430. The MIT Press, 1997. [Ewe79] W. Ewens. Mathematical Population Genetics. Springer Verlag, 1979. [FL70] I. Franklin and R. Lewontin. Is the gene the unit of selection ? Genetics, 65:707­734, 1970. [FSBB+ 93] W. Fontana, P. Stadler, E. Bornberg-Bauer, T. Griesmacher, I. Hofacker, M. Tacker, P. Tarazona, E. Weinberger, and P. Schuster. RNA [GJS76] [GK00] [Gol89] [HG97] [HW99] [HW03] [HW04] [Jac97] [Kau89] [Kau93] [KL87] [KM93] [KP01] [Lew74] [Man94] [MG99] folding and combinatory landscapes. Physical Review E, 47(3):2083­2099, 1993. M. R. Garey, D. S. Johnson, and L. Stockmeyer. Some simplified NP-complete graph problems. Theoretical Computer Science, 1:237­267, 1976. V. Grebinski and G. Kucherov. Optimal reconstruction of graphs under the additive model. Algorithmica, 28:104­124, 2000. D. E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison Wesley, 1989. G. R. Harik and D. E. Goldberg. Learning linkage. In Foundations of Genetic Algorithms, volume 4, pages 247­262. Morgan Kaufmann, 1997. R. B. Heckendorn and D. Whitley. Predicting epistasis directly from mathematical models. Evolutionary Computation, 7(1):69­101, 1999. R. B. Heckendorn and A. H. Wright. Efficient linkage discovery by limited probing. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2003), pages 1003­1014, 2003. R. B. Heckendorn and A. H. Wright. Efficient linkage discovery by limited probing. Evolutionary Computation, 12(4):517­545, 2004. J. Jackson. An efficient membership-query algorithm for learning DNF with respect to the uniform distribution. Journal of Computer and System Sciences, 55(3):42­65, 1997. S. A. Kauffman. Adaptation on rugged fitness landscapes. In D. Stein, editor, Lectures in the Sciences of Complexity, pages 527­618. Addison Wesley, 1989. S. A. Kauffman. The Origins of Order: SelfOrganization and Selection in Evolution. Oxford University Press, 1993. S. A. Kauffman and S. Levin. Towards a general theory of adaptive walks on rugged landscapes. Journal of Theoretical Biology, 128:11­45, 1987. E. Kushilevitz and Y. Mansour. Learning decision trees using the Fourier spectrum. SIAM Journal on Computing, 22(6):1331­ 1348, 1993. H. Kargupta and B. Park. Gene expression and fast construction of distributed evolutionary representation. Evolutionary Computation, 9(1):1­32, 2001. R. Lewontin. The Genetic Basis of Evolutionary Change. Columbia University Press, 1974. Y. Mansour. Learning Boolean functions via the Fourier transform. In V. Roychowdhury, K. Y. Siu, and A. Orlitsky, editors, Theoretical Advances in Neural Computation and Learning, pages 391­424. Kluwer Academic, 1994. M. Munetomo and D. E. Goldberg. Identifying linkage groups by nonlinearity/non- [MM99] [MP89] [MR95] [PG00] [PGCP00] [RS07] [Str04] [Wal23] [Yao77] monotonicity detection. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 433­440, 1999. ¨ H. Muhlenbein and T. Mahnig. FDA ­ A scalable evolutionary algorithm for the optimization of additively decomposed functions. Evolutionary Computation, 7(1):45­68, 1999. C. A. Macken and A. S. Perelson. Protein evolution on rugged landscapes. In Proceedings of the National Academic of Science, USA, volume 86, pages 6191­6195, 1989. R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995. M. Pelikan and D. E. Goldberg. Hierarchical problem solving by the Bayesian optimization algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 267­274, 2000. ´ M. Pelikan, D. E. Goldberg, and E. Cantu-Paz. Linkage problem, distribution estimation, and Bayesian networks. Evolutionary Computation, 8(3):311­340, 2000. L. Reyzin and N. Srivastava. Learning and verifying graphs using queries with a focus on edge counting. In Proceedings of the 18th International Conference on Algorithmic Learning Theory (ALT 2007), pages 285­297, 2007. M. J. Streeter. Upper bounds on the time and space complexity of optimizing additively separable functions. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2004), pages 186­197, 2004. J. L. Walsh. A closed set of orthogonal functions. American Journal of Mathematics, 55:5­ 24, 1923. A. C. Yao. Probabilistic computations: Toward a unified measure of complexity. In Proceedings of the 18th Annual IEEE Symposium on Foundations of Computer Science, pages 222­ 227, 1977.