A Characterization of Easily Testable Induced Subgraphs
Noga Alon
Abstract
Let H be a fixed graph on h vertices. We say that a graph G is induced H -free if it does not contain any induced copy of H . Let G be a graph on n vertices and suppose that at least n2 edges have to be added to or removed from it in order to make it induced H -free. It was shown in [5] that in this case G contains at least f ( , h)nh induced copies of H , where 1/f ( , h) is an extremely fast growing function in 1/ , that is independent of n. As a consequence, it follows that for every H , testing induced H -freeness with one-sided error has query complexity independent of n. A natural question, raised by the first author in [1], is to decide for which graphs H the function 1/f ( , H ) can be bounded from above by a polynomial in 1/ . An equivalent question is for which graphs H , can one design a one-sided error property tester for testing induced H -freeness, whose query complexity is polynomial in 1/ . We settle this question almost completely by showing that, quite surprisingly, for any graph other than the paths of lengths 1,2 and 3, the cycle of length 4, and their complements, no such property tester exists. We further show that a similar result also applies to the case of directed graphs, thus answering a question raised by the authors in [9]. We finally show that the same results hold even in the case of two-sided error property testers. The proofs combine combinatorial, graph theoretic and probabilistic arguments with results from additive number theory.


Asaf Shapira


1

Preliminaries

1.1 Definitions All graphs and directed graphs (=digraphs) considered here are finite and have no loops and no parallel edges. Let P be a property of graphs, that is, a family of graphs closed under isomorphism. A graph G with n vertices is -far from satisfying P if no ~ graph G with the same vertex set, which differs from G in at most n2 places, (i.e., can be constructed from G by adding and removing at most n2 edges), satisfies P . An -tester, or property tester, for P is a randomized
 Scho ols

of Mathematics and Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, Israel. Email: noga@math.tau.ac.il. Research supported in part by a USA-Israeli BSF grant, by the Israel Science Foundation and by the Hermann Minkowski Minerva Center for Geometry at Tel Aviv University.  Scho ol of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv, Israel. Email: asafico@math.tau.ac.il. This work forms part of the author's Ph.D. Thesis.

algorithm which, given the quantity n and the ability to make queries whether a desired pair of vertices of an input graph G with n vertices are adjacent or not, dis2 tinguishes with probability at least 3 between the case of G satisfying P and the case of G being -far from satisfying P . Such an -tester is a one-sided -tester if when G satisfies P the -tester determines that this is the case (with probability 1). The -tester is a two-sided -tester if it may determine that G does not satisfy P even if 2 G satisfies it. Obviously, the probability 3 appearing above can be replaced by any constant smaller than 1, by repeating the algorithm an appropriate number of times. The property P is called strongly-testable, if for every fixed > 0 there exists a one-sided -tester for P whose total number of queries is bounded only by a function of , which is independent of the size of the input graph. This means that the running time of the algorithm is also bounded by a function of only, and is independent of the input size. In what follows we denote by P2 , P3 and P4 the paths of lengths 1,2 and 3 (which have 2,3 and 4 vertices respectively), and by C4 , the cycle of length 4. We measure query-complexity by the number of vertices sampled, assuming we always examine all edges spanned  by them. For a fixed graph H , let PH denote the property of being induced H -free. Therefore, G satisfies  PH if and only if it contains no induced subgraph isomorphic to H . We define PH to be the property of being (not necessarily induced) H -free. Therefore, G satisfies PH if and only if it contains no copy of H . Thus, for example, for H = C4 , any clique of size at  least 4 satisfies PH but does not satisfy PH .
1.2 Related work The general notion of property testing was first formulated by Rubinfeld and Sudan [29], who were motivated mainly by its connection to the study of program checking. The study of the no-

Copyright © 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688

942


tion of testability for combinatorial ob jects, and mainly for labelled graphs, was introduced by Goldreich, Goldwasser and Ron [23], who showed that all graph properties describable by the existence of a partition of a certain type, and among them k -colorability, have efficient -testers. The fact that k -colorability is strongly testable is, in fact, implicitly proven already in [12] for k = 2 and in [27] (see also [2]) for general k , using the Regularity Lemma of Szemer´di [30], but in the cone text of property testing it is first studied in [23], where far more efficient algorithms are described. These have been further improved in [7]. The notion of property testing has been investigated in other contexts as well, including the context of regular languages, [6], functions [21], [8], [3], [19], [20], computational geometry [16], [4], graph and hypergraph coloring [15], [8], [11] and other contexts. See [22], [28] and [18] for surveys on the topic. 2 The Main Results

tester for P whose query complexity is polynomial in 1/ . In [1] it is shown that for an undirected graph H , PH is easily testable if and only if H is bipartite. The authors of [9] obtain a precise characterization of all the directed graphs H for which PH is easily testable. As is evident from this characterization, recognizing these digraphs is rather difficult. Indeed, it is shown in [9] that deciding whether for a digraph H , PH is easily testable, is N P -complete. The next natural steps, suggested in [1] and [9], are therefore to give characterizations  of the graphs and digraphs H for which PH is easily testable. In this paper we give nearly complete such characterizations in both cases. 2.2 The new results Our first new result here is the following: Theorem 2.1. Let H be a fixed undirected graph other than P2 , P3 , P4 , C4 and their complements. Then, there exists a constant c = c(H ) > 0 such that the query complexity of any one-sided error -tester for PH is at least 1
log(1/ )

2.1 Background In [5] it is shown that every first order graph property without a quantifier alternation of type "" has -testers whose query complexity is independent of the size of the input graph. It follows from the main result of [5] that for every fixed H ,  the property PH is strongly testable. Although the query complexity is independent of n, it has a huge dependency on 1/ (the fourth function in the Ackerman Hierarchy, which is a tower of towers of exponents). In [2], it was shown, using Szemer´di's Regularity Lemma, e that for every fixed H , the property PH is also strongly testable. This result was generalized to the case of directed graphs (=digraphs) in [9], by first proving a directed version of the regularity lemma. In the above two cases the query complexity is also huge, a tower of 2's of height polynomial in 1/ . For some graphs, however, there are obviously much more efficient property testers than the ones guaranteed by the above general results. For example for the case of H being an edge, there is obviously a one-sided error property tester  for both PH and PH , whose query complexity is (1/ ). A natural question is therefore, to decide for which graphs H can one design a one-sided error property  tester for PH or PH , whose query complexity would be bounded by a polynomial of 1/ . We call a property P easily testable if there is a one-sided error property

c

.

As P2 -freeness can obviously be tested with query complexity (1/ ), the following theorem, together with the above theorem, supplies a complete characterization  for the graphs H for which PH is easily testable, except for the case of P4 , C4 and its complement (the complement of P4 is also P4 ). Theorem 2.2. There is a one-sided error property tester for testing P3 -freeness, with query complexity O(log(1/ )/ ). We also prove the following theorem, which is analogous to Theorem 2.1, only with respect to digraphs. Theorem 2.3. Let H be a fixed digraph on at least 5 vertices. Then, there exists a constant c = c(H ) > 0 such that the query-complexity of any one-sided error  -tester for PH is at least 1 log(1/ ) . c

943


We can actually prove a super-polynomial (in 1/ )  lower bound for the query complexity of PH for some of the digraphs H on at most 4 vertices as well, see subsection 4.1. We finally show that Theorems 2.1 and 2.3 can also be extended to the cases of two-sided error property testers. Theorem 2.4. The lower bounds of Theorems 2.1 and 2.3 hold for two-sided error property testers as wel l. 2.3 Organization The (relatively short) proof of Theorem 2.2 appears in section 3. The lower bound proved by Theorem 2.1 is established in section 4. To prove this result we have to construct, for any graph H (other than the ones mentioned in the theorem) and any small > 0, a graph G which is -far from being induced H -free and yet contains relatively few induced copies of H . The proof of this part, described in Section 4, uses the approaches of [1] and [9], but requires several additional ideas. It applies certain constructions in additive number theory, based on (simple variants of ) the construction of Behrend [10] of dense subsets of the first n integers without three-term arithmetic progressions. The proof of Theorem 2.3 also appears in section 4. In Section 5 we give the proof of Theorem 2.4 which extends the lower bounds of Theorems 2.1 and 2.3 to the more general cases of two-sided error property-testers. The final section, Section 6, contains some concluding remarks and open problems. Throughout this paper we assume, whenever this is needed, that the number of vertices n of the graph or digraph G considered is sufficiently large, and that the error parameter is sufficiently small. In order to simplify the presentation, we omit all floor and ceiling signs whenever these are not crucial, and make no attempt to optimize the absolute constants. In order to make the presentation simple, from here on, when we refer to a copy of H in G, we always refer to an induced copy of H in G. Also, when we speak of the property of being H -free, we mean the property of being induced H -free. 3 Easily Testable Graphs

picks a random subset of, say, 10 log(1/ )/ vertices, and checks if there is an induced copy of P3 spanned by the set. It declares G to be P3 -free if and only if it finds no induced copy of P3 . If G is P3 -free, the algorithm clearly always answers correctly. We therefore only have to show that if G is -far from being P3 -free, the algorithm finds an induced copy of P3 with probability at least 2/3. Let High denote the set of vertices of such a graph G whose degree is at least 4 n. Note that intuitively, the vertices that belong to H ig h have the highest contribution to G being -far from P3 -free. We formulate this intuition as follows: Claim 3.1. Let W  V (G) contain al l but at most 4 n of the vertices of H ig h. Then the induced subgraph of G on W is at least 2 -far from being P3 -free. Proof. Assume this is not the case. Then we can make less than 2 n2 changes within W and get a graph that contains no induced copy of P3 within W . We then remove all the edges that touch a vertex not in H ig hW (as these vertices do not belong to High, there are at most n · 4 n such edges), and any edge that touches a vertex in H ig h \ W (there are at most 4 n · n such edges as by the assumption |H ig h \ W |  4 n). We thus get a graph that is P3 -free. As altogether we make less than n2 changes in G, this contradicts the assumption that G is -far from being P3 -free. We call a set A  V (G) Good if all but at most 4 n of the vertices that belong to High have a neighbor in A. Claim 3.2. A randomly chosen subset A  V (G) of size 8 log(1/ )/ is Good with probability at least 7/8. Proof. Let A be a randomly chosen subset of size 8 log(1/ )/ , and consider a vertex v  H ig h. The probability that A does not contain any neighbor of v is at most (1 - 4 )8 log(1/ )/  2  /32 (here we assume that < 1/32). As H ig h is of size at most n, we conclude that the expected number of vertices that belong to H ig h and have no neighbor in A, is at most 3 2 n. By Markov's inequality, with probability at least 7/8, the number of these vertices is at most 4 n.

In this section we describe the proof of Theorem 2.2. The property-tester for P3 -freeness works as follows: it

944


We will use the following simple and well known following lemma (a similar one appears in [17]): observation about the structure of P3 -free graphs: A graph is P3 -free if and only if it is the disjoint union of Lemma 4.1. For every positive integer m, there exists an h-sum-free subset X  M = {1, 2, . . . , m} of size at cliques. least m  |X |  (4.1) Pro of of Theorem 2.2 We first choose a random e10 log h log m subset A of size 8 log(1/ )/ . Assume that A is Good. Proof. The main idea of the proof is to construct the If A contains an induced copy of P3 we are done. set X by representing a set of numbers in base d, make Otherwise, by the comment above there is a unique sure that there is no carry in their (base d) addition, and partition of A into cliques C1 , . . . , Cr . If a vertex v  A finally use a convexity argument. Let d be an integer / has a neighbor u  A that belongs to Ci , it follows that (to be chosen later) and define if G can partitioned into cliques D1 , . . . , Dk , where for k , i ik 1  i  r, Ci  Di , then v would have to belong to Di . d i 2 X= xi d | xi < (0  i  k )  xi = B For each vertex v  A that is connected to u  Ci  A, / h =0 =0 assign v the number i. If u is connected to vertices in A that belong to different Ci , then pick any of these where k = log m/ log d - 1 and B is chosen to numbers. This numbering induces a partition of all the maximize the cardinality of X . Assume there are x, y , z  X that satisfy the vertices of G that have a neighbor in A into r subsets. As G is by assumption -far from being P3 -free, and A equation ax + by = cz , where a, b, c  h are positive is by assumption Good, Claim 3.1 implies that there are integers, a + b = c, and at least 2 n2 pairs of vertices u, v  V (G) \ A, such that either u and v should belong to the same Di , but u and v are not connected, or u and v should belong to different subsets Di , yet u and v are connected. Therefore, choosing a set B of 8/ randomly chosen pairs of vertices finds such a violating pair with probability at least 7 . 8 The probability of A failing to be Good is at most 1 , 8 and the probability of B not containing any of the required pairs of vertices is also at most 1 . Hence, with 8 3 probability at least 4 the induced subgraph on A  B is not P3 -free. As |A| + |B | = O(log(1/ )/ ) the proof is complete. 4 Hard to Test Graphs and Digraphs x= ik
=0

xi di ,

y=

ik
=0

yi di ,

z=

ik
=0

zi di .

As xi , yi , zi < d/h, and a + b = c we have for every i, 0  i  k axi + byi = czi . By the convexity of the function f (z ) = z 2 this implies that 2 2 ax2 + byi  czi , i and the inequality is strict unless all three numbers xi , yi and zi are equal. However, if for some i the inequality is strict, we have a ik
=0

In this section we give the proofs of Theorems 2.1 and 2.3. The approach uses a construction in additive number theory, which uses the technique of Behrend [10], used to construct dense sets of integers with no three-term arithmetic progressions. A set X  M = {1, 2, . . . , m} is called h-sum-free if for every three positive integers a, b, c  h such that a + b = c, if x, y , z  X satisfy the equation ax + by = cz then x = y = z . That is, whenever a + b = c, and a, b, c  h, the only solution to the equation that uses values from X , is one of the |X | trivial solutions. We need the

x2 + b i

ik
=0

2 yi > c

ik
=0

2 zi

which is impossible as by definition of X ik
=0

x2 = i

ik
=0

2 yi =

ik
=0

2 zi = B .

Thus, xi = yi = zi for all i, and X has no nontrivial solution to the above equation. We now aim at giving an upper bound for m/|X |. We lose a factor of at most d2 of the numbers {1, . . . , m},

945


due to taking k = log m/ log d -1. As we restrict xi < d/h, we lose a factor of h per digit, for a total of hk+1 . k As xi < d/h, we have 0  i=0 x2  (k + 1)d2 /h2 . As i we chose B to maximize |X |, we can't lose more than the average, hence, this is a factor of at most (k + 1)d2 /h2 . We conclude that |X |  m d2 hk+1 (k
d + 1) h2
2

vertices which is -far from being induced H -free, and yet contains at most c log (1/ ) nh induced copies of H . Proof. Given a small > 0, let m be the largest integer satisfying 1  (4.2) .  4 e10 log m log h h It is easy to check that this m satisfies 1 (4.3) m c
log(1/ )

.

for an appropriate c = c(h) > 0. Let X  {1, 2, . . . , m} be as in Lemma 4.1. Call the vertices of  . |X |  H v1 , . . . , vh , and let V1 , V2 , . . . Vh be pairwise disjoint e10 log m log h sets of vertices, where |Vi | = im and we denote the We proceed with the proofs of Theorems 2.1 and 2.3. vertices of Vi by {1, 2, . . . , im}, where, with a slight It is convenient to start the discussion with digraphs and abuse of notation, we think on the sets Vi as being then obtain the results for undirected graphs as a special pairwise disjoint. We now construct a graph F whose vertex set is the union of the sets V1 , . . . , Vh . For each case, (as they can be viewed as symmetric digraphs). An s-blow-up of a digraph H = (V (H ), E (H )) on j , 1  j  m, for each x  X and for each directed edge h vertices is the digraph obtained from H by replacing (vp , vq ) of H , let j + (p - 1)x  Vp have an outgoing each vertex vi  V (H ) by an independent set Ii of size edge pointed to j + (q - 1)x  Vq . In other words, for s, and each directed edge (vi , vj )  E (H ), by a complete each 1  j  m and x  X , the graph F contains a bipartite directed subgraph whose vertex classes are Ii copy of H , denoted by Hj,x , which is spanned by the and Ij , and whose edges are directed from Ii to Ij . Note vertices j, j + x, j + 2x, . . . , j + (h - 1)x. Note that each that if we take an s-blow-up of H , we get a graph on of these m|X | copies of H is spanned by a set of vertices sh vertices that contains at least sh induced copies of that forms an arithmetic progression whose first element H , where each vertex of the copy belongs to a different is j and whose difference is x. A crucial implication is blow-up of a vertex from H (simply pick one vertex that F contains m|X | copies of H , such that each pair of from each independent set). We call these copies the copies have at most one common vertex. In what follows special copies of the blow-up. As each pair of vertices we call these m|X | copies of H in F , the essential copies in the blow-up is contained in at most sh-2 special of H in F . Finally, define = 2 n copies of H , it follows that adding or removing an a n s= edge from the graph can destroy at most sh-2 copies |V (F )| h(h + 1)m of H . We conclude that one must add or remove at nd let G be the s-blow-up of F (together with some least sh /sh-2 = s2 edges from the blow-up in order to isolated vertices, if needed, to make sure that the destroy all its special copies of H . For the proofs of Theorems 2.1 and 2.3, we will need number of vertices is precisely n). We now turn to show that G is indeed -far from the following lemma, in which a triangle in a digraph is simply three vertices u, v , w, such that there is at least being H -free. Consider two essential copies of H in F , H1 and H2 . As was noted above, H1 and H2 one edge between each of the three pairs. share at most one vertex vi in F . It follows that their Lemma 4.2. For every fixed digraph H on h vertices, corresponding blow-ups in G will share at most one that contains at least one triangle, there is a constant common independent set Ii , which replaces the vertex c = c(H ) > 0, such that for every positive < 0(H) vi from F . Now, consider the blow-ups of H1 and H2 and every integer n > n0 ( ), there is a digraph G on n in G, denoted T1 and T2 . As T1 and T2 share at most m

 Taking d = e log m log h , we conclude (with room to spare) that

946


one common independent set, we conclude that adding or removing an edge from G, can either destroy special copies of H that belong to T1 , or special copies of H that belong to T2 (or not destroy any copies at all). As was explained above, in order to destroy all the special copies of an s-blow-up of H , one needs to add or remove at least s2 edges from the blow-up. As G contains m|X | blow-ups of essential copies of H , we conclude that one has to add or delete at least (4.4) s2 m|X |  |X |n2 n2    n2 h4 m 4 e10 log m log h h

of this triangle are t  Vi , t + (j - i)x  Vj , t + (j - i)x + (k - j )y  Vk .

As this is a triangle, there must also be some z  X such that t + (k - i)z = t + (j - i)x + (k - j )y . We conclude that the following equation in positive coefficients, whose values are at most h, holds (j - i)x + (k - j )y = (k - i)z . As X is h-sum free, it follows that x = y = z , hence Vi , Vj , Vk span precisely m|X | triangles of the form t + ( i - 1) x  V i , t + (j - 1)x  Vj , t + (k - 1)x  Vk ,

edges in order to make G H -free. The second inequality follows from the lower bound on |X | guaranteed by Lemma 4.1, and the third from (4.2). We conclude that G is indeed -far from being H -free. To complete the proof, we have to show that G contains at most c log (1/ ) nh induced copies of H . Note, that as H contains at least one triangle, and each n  h-3 triangle belongs to at most h-3 n copies of H , it is enough to show that G contains at most c log (1/ ) n3 triangles. Consider a partition of the vertices of G into h subsets U1 , . . . , Uh , where Ui contains the im independent sets that resulted from the blow-ups of the im vertices that belonged to Vi in F . Notice that if we show that the induced subgraph of G on any three l of the subsets U1 , . . . , Uh contains at most c og (1/ ) n3 triangles, then the total number of triangles in G is at h c log (1/ ) 3 most 3 n , which is still at most c log (1/ ) n3 . Fix any three subsets Ui , Uj , Uk such that i < j < k , and recall that they are the result of an s-blow-up of Vi , Vj , Vk . As there are no edges within these sets any triangle spanned by them must have exactly one vertex in each set. Note, that if the sets span a triangle whose vertices belong to the independent sets Ix  Ui , Iy  Uj , Iz  Uz , then the vertices x  Vi , y  Vj , z  Vk in F must also span a triangle. It follows that the number of triangles spanned by Ui , Uj , Uk is exactly s3 times the number of triangles spanned by Vi , Vj , Vk . If the vertices vi , vj , vk , do not span a triangle in H , then by the definition of F , Vi , Vj , Vk do not span a triangle, and so do Ui , Uj , Uk in G, and we are done. If vi , vj , vk span a triangle in H , then by the definition of F for any triangle spanned by Vi , Vj , Vk , there are x, y  X and 1  t  im, such that the three vertices

for every 1  t  m and x  X . We conclude that Ui , Uj , Uk span at most m|X |s3 < m2 (n/m)3  n3 /m triangles. As by (4.3), m  (1/ )c log(1/ ) , the proof is complete. The proofs of Theorems 2.1 and 2.3 now follow easily from the above lemma. Pro of of Theorem 2.1: Let H be a fixed graph on h vertices. A simple yet crucial observation is that  for every graph H testing PH is equivalent to testing  PH , where H is the complement of H . Note, that this relation does not hold for testing PH . It follows that in  order to prove a lower bound for testing PH , we may  prove a lower bound for testing PH . Given a one-sided error -tester for testing H freeness we may assume, without loss of generality, that it queries about all pairs of a randomly chosen set of vertices (otherwise, as explained in [5], every time the algorithm queries about a vertex pair we make it query also about all pairs containing a vertex of the new pair and a vertex from previous queries. This may only square the number of queries. See also [24] for a more detailed proof of this statement.) As the algorithm is a one-sided-error algorithm, it can report that G is not H -free only if it finds a copy of H in it. Observe, that if the tester picks a random subset of x vertices, and

947


an input graph contains only c log (1/ ) nh copies of H , then the expected number of copies of H spanned by x is at most xh c log (1/ ) , which is far smaller than 1 l unless x exceeds (1/ )c og(1/ ) for some c = c (H ) > 0. It follows by Markov's inequality that the tester finds a copy of H with negligible probability. It is therefore enough to show that for any undirected graph H , other than P2 , P3 , P4 , C4 and their complements, there is a graph G on n vertices which is -far from being H -free, yet contains only c log (1/ ) nh copies of H . If h  6, then it follows from the simplest result in Ramsey Theory (c.f., e.g., [25], page 1) that either H or H must contain a triangle. Hence, assuming that H contains a triangle, we can use Lemma 4.2 to construct a graph G on n vertices which is -far from being H -free and yet contains at most c log (1/ ) nh copies of H . For h = 5, the only graph H , such that neither H nor H contain a triangle is C5 (the cycle of length 5, whose complement is also C5 ). In this case we can use the  fact that C5 is the core of itself to prove that PC5 is not easily testable. See subsection 4.1 for more details. As for h = 2, 3, 4 the only graphs H for which H and H are triangle-free are P2 , P3 , P4 , C4 and their complements, the proof is complete. Pro of of Theorem 2.3: The proof is similar to the proof of Theorem 2.1. One only has to note again that for every digraph H , on at least 6 vertices, either H or H contains a triangle, and that the only digraph on 5 vertices which does not have this property is the digraph D5 obtained from C5 , by replacing each undirected edge with two anti-parallel directed edges. We discuss this special case in subsection 4.1. Though the theorem does not explicitly state it, we can also conclude from Lemma 4.2 that the same lower bound applies for any digraph H on 3 or 4 vertices such that either H or H contains a triangle. In subsection 4.1 we discuss some more digraphs for which we can obtain similar bounds.

edges of K , i.e. (u, v )  E (H )  ((u), (v ))  E (K ). The core of a digraph H , is the smallest subgraph of H (with respect to number of edges) to which there is a homomorphism from H . In [9] the authors establish a lower bound similar to those of Theorems 2.1 and 2.2 for testing PH for any digraph H whose core contains at least one cycle of length at least 3. As in the proof of Lemma 4.2 the main ingredient of the proof (Lemma 8 in [9]) is a construction of a digraph that is -far from being H -free and yet contains relatively few copies of H . Though it is not explicitly stated in [9], in case H is the core of itself, the constructed graph is actually also -far from being induced H -free, and contains relatively few induced copies of H . Thus we can use the result of [9] to obtain similar lower bounds for any digraph H on 3,4 or 5 vertices such that either H or H is the core of itself and contains a cycle of length at least 3. This in particular holds for D5 , and therefore also for C5 , as testing PC5 is a special case of testing PD5 . As was noted in [9], any directed cycle C that contains a non equal number of forward edges and backward edges is the core of itself. Thus, any digraph on 4 vertices that contains such a cycle of length 4 (e.g. a Hamilton cycle) is the core of itself, and we can use the result of [9] to obtain a lower bound for this case as well. 5 Two-Sided Error Prop erty-Testers

For the proof of Theorem 2.4 we apply Yao's principle [31], by constructing, for every fixed graph H , for which a lower bound was established in Theorems 2.1 and 2.3, two distributions D1 and D2 , where D1 consists of graphs which are -far from being H -free with probability 1 - o(1) (where the o(1) term tends to 0 as tends to zero), while D2 consists of graphs which are H -free. We then show that any deterministic algorithm, which makes a small number of queries (adaptively) cannot distinguish with non-negligible probability between D1 and D2 . We prove Theorem 2.4 for the case of digraphs, as it is clear that the case of undirected graphs follows as a special case. For the case of H being the graph 4.1 Graphs which are cores of themselves In obtained from C5 by replacing each edge by a cycle of this subsection we briefly argue how to use the results length 2, we can use the fact that this graph is the core of [9] in order to obtain lower bounds for some digraphs of itself (as we did for one sided error in subsection 4.1) on 3,4 and 5 vertices. We first need some definitions.  to prove that PC5 has no two-sided -tester with query A homomorphism from digraph H to digraph K is a complexity polynomial in 1/ . We thus assume that H mapping  : V (H )  V (K ) that maps edges of H to

948


S has the property that it does not contain more than two vertices from any one of the essential copies of H . If this property holds, then each edge spanned by S is contained in a different essential copy of H . Therefore, each edge has probability 1/|E (H )| of being in F1 , and these probabilities are mutually independent. Pro of of Theorem 2.4: Let H be a fixed digraph Similarly, each such edge has probability 1/|E (H )| of which contains at least one triangle. Given n and , let being in F2 and these probabilities are also mutually X , m and the sets Vi be as in the proof of Lemma 4.2. independent. It follows that in this case, sampling a Construct the digraph F just as in the proof of Lemma digraph G from D1 , and looking at the induced digraph 4.2, and remember that it consists of m|X | pairwise edge on a set S with the above property, has exactly the disjoint copies of H which we called the essential copies same distribution as sampling a digraph G from D2 , of H in F (though it may well contain additional copies and looking at the induced digraph on S . of H ). To complete the proof we have to show that no To construct D1 which consists of digraphs that deterministic algorithm can distinguish between the are -far from being H -free with high probability, we distributions D1 and D2 with constant probability. first construct F1 by removing each of the m|X | esTo this end, it is clearly enough to show that with sential copies of H , randomly and independently, with probability 1 - o(1), any deterministic algorithm that probability 1 - 1/|E (H )|. We then create G1 by tak- v l looks at a digraph spanned by less than (1/ )c og 1/ ing an s blow up of F1 , adding isolated vertices, if ertices, has exactly the same probability of seeing any needed. Finally, D1 consists of all randomly permuted digraph regardless of the distribution from which the copies of such digraphs G1 . It follows from a standigraph was chosen. By the discussion in the previous dard Chernoff bound, that with probability 1 - o(1), paragraph, this can be proved by establishing that, with at least m|X |/2|E (H )| essential copies of H are left in high probability, a small set of vertices does not contain F1 , where the o(1) term tends to 0, as tends to 0. Simthree vertices from the same essential copy of H . For ilar to the derivation of (4.4), it is easy to show that if a fixed ordered set of three vertices in S , consider the m|X |/2|E (H )| of these copies of H are left in F1 , the event that they all belong to the same essential copy graph G1 is -far from being H free. It follows that with of H . The first two vertices determine all the vertices probability 1 - o(1), a member of D1 is -far from being of one of these copies uniquely. Now, the conditional induced H -free. probability that the third vertex is also a vertex of the The distribution D2 of digraphs that are H -free, same copy is h/|V (F )|  h/m. By the union bound, is defined by first constructing F2 by randomly, indethe probability that the required property is violated, pendently and uniformly picking from each of the m|X | assuming |S | = D, is at most essential copies of H a single edge, and removing all the hD3 /m  hD3 c log 1/ . other edges of that copy. We then create G2 by taking an s blow up of F2 adding isolated vertices, if needed. Finally, D2 consists of all randomly permuted copies of such digraphs G2 . The main argument of Lemma 4.2, states that the graph F defined in the lemma contains only triangles whose three edges belong to one of the essential copies of H . Hence, keeping a single edge from each of these copies results in a triangle free graph, and in particular all the graphs in G2 are H -free. Now consider a set of vertices S in G1 (or G2 ) and its natural pro jection to a subset of V (F ), which we also denote by S with a slight abuse of notation. Suppose This quantity is o(1) as long as D = o((1/ ) 3 log 1/ ), where here we applied the lower bound on the size of m given in (4.3). Therefore, if the algorithm has query l complexity o((1/ )c og 1/ ) for some absolute positive constant c , it has probability 1 - o(1) of looking at a subset on which the distributions D1 and D2 are identical, thus, the probability that it distinguishes between D1 and D2 is o(1). A slightly more complicated argument than the above can give two distributions D1 and D2 , such that
c

is a graph on at least 6 vertices. As in the proofs of  Theorems 2.1 and 2.3, testing PH with two-sided error  has the same query complexity as testing PH , thus we assume that H contains at least one triangle.

949


the graphs in D1 are always -far from being induced H -free, while the graphs in D2 are always H -free. The idea is to first partition the m|X | essential copies of H into groups of size |E (H )| assuming for simplicity that |E (H )| divides m|X |. To create D1 , we randomly pick from each group of |E (H )| copies of H a single copy, and delete all its edges. To create D2 , we do exactly the same as we did in the proof of Theorem 2.4. It is easy to appropriately modify the proof above in order to show that any deterministic algorithm with query complexity o((1/ )c log 1/ ) can not distinguish between D1 and D2 . As this argument has no qualitative advantage, we described the simpler one given above. 6 Concluding Remarks and Op en Problems · As in the case of PH , there is a huge gap between  the general upper bounds for testing PH that were established in [5], and the lower bounds in this paper. It would be very interesting, and probably challenging, to improve any of these bounds. Even in the seemingly simplest case of H being a triangle, we do not know how to improve these bounds.

two anti-parallel edges, and the other by a single edge, and the graph obtained from P3 by replacing both edges with two anti-parallel edges. As all the digraphs on at least 5 vertices are hard to test, the only remaining unclassified digraphs are the digraphs H such that neither H nor H contains a triangle, and neither H nor H contains a cycle of length 4 and is the core of itself (e.g. the graph obtained from either C4 or P4 by replacing each edge with two anti-parallel edges). It will be interesting to classify these cases as well. · There is an interesting possible connection between  the problem of graph isomorphism and testing PH . It is known (see [13]) that for any graph H  {P2 , P3 , P4 , C4 }, the graph isomorphism problem can be solved in polynomial time for H -free graphs. Moreover, for any other H , any instance of the graph isomorphism problem can be reduced to an instance that is H -free. Thus, in some sense, the problem on H -free graphs, for H other than P1 , P2 , P3 and C4 , is isomorphism hard. It might be interesting to understand if this connection is indeed meaningful.

· Another interesting open problem is to complete the characterizations of easily testable properties  PH for undirected graphs H , by solving the cases References of H = P4 , C4 (recall that testing the complement of C4 is equivalent to testing C4 ). The case [1] N. Alon, Testing subgraphs in large graphs, Proc. 42nd of testing P4 -freeness seems the simplest one to IEEE FOCS, IEEE (2001), 434-441. resolve, since there are known structural results, [2] N. Alon, R. A. Duke, H. Lefmann, V. R¨dl and o that characterize P4 -free graphs. These graphs R. Yuster, The algorithmic aspects of the Regularity Lemma, Proc. 33rd IEEE FOCS, Pittsburgh, IEEE are also known as Complement Reducible graphs, (1992), 473-481. Also: J. of Algorithms 16 (1994), 80or Cographs for short, and they are precisely the 109. graphs formed from a single vertex under the [3] N. Alon, W. F. de la Vega, R. Kannan and M. closure of the operations of union and complement, Karpinski, Random Sampling and Approximation of see [14] and [26]. Cographs arise naturally in such MAX-CSP Problems, Proc. of the 34th ACM STOC, application areas as examination scheduling and ACM Press (2002), 232-239. automatic clustering of index terms. Cographs [4] N. Alon, S. Dar, M. Parnas and D. Ron, Testing of have a unique tree representation called a Cotree. clustering, Proc. 41st IEEE FOCS, IEEE (2000), 240It might be possible to use this characterization, 250. and the unique tree representation in order to [5] N. Alon, E. Fischer, M. Krivelevich and M. Szegedy, design an efficient tester for P4 -freeness. Efficient testing of large graphs, Proc. of 40th FOCS, · Combining Theorem 2.3 and subsection 4.1, the only unclassified digraphs on 3 vertices are the graph obtained from P3 by replacing one edge with
New York, NY, IEEE (1999), 656­666. Also: Combinatorica 20 (2000), 451-476. [6] N. Alon, M. Krivelevich, I. Newman and M. Szegedy, Regular languages are testable with a constant number

950


[7] [8]

[9]

[10]

[11]

[12]

[13]

[14]

[15] [16]

[17]

[18]

[19]

[20]

[21]

of queries, Proc. 40th FOCS, New York, NY, IEEE (1999), 645­655. Also: SIAM J. on Computing 30 (2001), 1842-1862. N. Alon and M. Krivelevich, Testing k-colorability, SIAM J. Discrete Math., 15 (2002), 211-227. N. Alon and A. Shapira, Testing satisfiability, Proc. of the 13th Annual ACM-SIAM SODA, ACM Press (2002), 645-654. Also: Journal of Algorithms, 47 (2003), 87-103. N. Alon and A. Shapira, Testing subgraphs in directed graphs, Proc. of the 35th Annual Symp. on Theory of Computing (STOC), San Diego, California, 2003, 700­ 709. F. A. Behrend, On sets of integers which contain no three terms in arithmetic progression, Proc. National Academy of Sciences USA 32 (1946), 331­332. A. Bogdanov, K. Obata and L. Trevisan, A Lower Bound for Testing 3-Colorability in Bounded-degree Graphs, Proc. 43rd IEEE FOCS, IEEE (2002), 93-102. B. Bollob´s, P. Erd¨s, M. Simonovits and E. Szemer´di, a o e Extremal graphs without large forbidden subgraphs, Annals of Discrete Mathematics 3 (1978), 29­41. D. G. Corneil, H. Lerchs, L. Stewart Burlingham, Complement Reducible Graphs, Discrete Applied Mathematics 3 (1981), 163­174. D. G. Corneil, Y. Perl and L. K. Stewart, A linear recognition algorithm for cographs, SIAM J. Comput. 14 (1985), 926­934. A. Czuma j and C. Sohler, Testing hypergraph coloring, Proc. of ICALP 2001, 493-505. A. Czuma j and C. Sohler, Property testing in computational geometry, Proceedings of the 8th Annual European Symposium on Algorithms (2000), 155­166. P. Erd¨s, P. Frankl and V. R¨dl, The asymptotic o o number of graphs not containing a fixed subgraph and a problem for hypergraphs having no exponent, Graphs Combin. 2 (1986) 113-121. E. Fischer, The art of uninformed decisions: A primer to property testing, The Computational Complexity Column of The Bulletin of the European Association for Theoretical Computer Science 75 (2001), 97-126. E. Fischer, G. Kindler, D. Ron, S. Safra, and A. Samorodnitsky, Testing juntas, Proc. of The 43rd FOCS (2002), 103-112. E. Fischer, E. Lehman, I. Newman, S. Raskhodnikova, R. Rubinfeld and A. Samorodnitsky, Monotonicity testing over general poset domains, Proc. of The 34th STOC (2002), 474-483. A. Frieze and R. Kannan, Quick approximation to

[22]

[23]

[24]

[25]

[26] [27]

[28]

[29]

[30]

[31]

matrices and applications, Combinatorica 19 (1999), 175-220. O. Goldreich, Combinatorial property testing - a survey, In: Randomization Methods in Algorithm Design (P. Pardalos, S. Ra jasekaran and J. Rolim eds.), AMSDIMACS (1998), 45-60. O. Goldreich, S. Goldwasser and D. Ron, Property testing and its connection to learning and approximation, Proceedings of the 37th Annual IEEE FOCS (1996), 339­348. O. Goldreich and L. Trevisan, Three theorems regarding testing graph properties, Proc. 42nd IEEE FOCS, IEEE (2001), 460-469. R. L. Graham, B. L. Rothschild and J. H. Spencer, Ramsey Theory, Second Edition, Wiley, New York, 1990. T. A. McKee and F.R. McMorris, Topics in Intersection Graph Theory, SIAM, Philadelphia, PA, 1999. V. R¨dl and R. Duke, On graphs with small subgraphs o of large chromatic number, Graphs and Combinatorics 1 (1985), 91­96. D. Ron, Property testing, in: P. M. Pardalos, S. Ra jasekaran, J. Reif and J. D. P. Rolim, editors, Handbook of Randomized Computing, Vol. II, Kluwer Academic Publishers, 2001, 597­649. R. Rubinfeld and M. Sudan, Robust characterization of polynomials with applications to program testing, SIAM J. on Computing 25 (1996), 252­271. E. Szemer´di, Regular partitions of graphs, In: Proc. e Col loque Inter. CNRS (J. C. Bermond, J. C. Fournier, M. Las Vergnas and D. Sotteau, eds.), 1978, 399­401. A. C. Yao, Probabilistic computation, towards a unified measure of complexity, Proc. of the 18th IEEE FOCS (1977), 222-227.

951