Exponential Bounds for DPLL Below the Satisfiability Threshold Dimitris Achlioptas Paul Beame Michael Molloy Abstract For each k 4, we give rk > 0 such that a random k -CNF formula F with n variables and rk n clauses is satisfiable with high probability, but ordered-dll takes exponential time on F with uniformly positive probability. Using results of [2], this can be strengthened to a high probability result for certain natural backtracking schemes and extended to many other DPLL algorithms. 1 Previous work In the last twenty years a significant amount of work has been devoted to the study of randomly generated satisfiability instances and the performance of different algorithms on them. Historically, a ma jor motivation for studying random instances has been the desire to understand the hardness of "typical" instances. Indeed, some of the better practical ideas in use today come from insights gained by studying the performance of algorithms on random k -SAT instances (defined below). Let Ck (n) denote the set of all possible disjunctions of k distinct, non-complementary literals (k -clauses) from some canonical set of n Boolean variables. A random k -CNF formula Fk (n, m) is formed by selecting uniformly, independently, and with replacement m clauses from Ck (n) and taking their conjunction. We will say that a sequence of random events En occurs with high probability (w.h.p.) if limn Pr[En ] = 1 and with uniformly positive probability if lim inf n Pr[En ] > 0. It is widely believed that for each k 3, there exists a constant ck such that Fk (n, m = cn) is w.h.p. satisfiable if c < ck and w.h.p. unsatisfiable if c > ck . Currently, the best general bounds are 2k ln 2 - O(k ) < ck < 2k ln 2 - O(1) , where by ck < c we mean that Fk (n, cn) is w.h.p. unsatisfiable (analogously for ck > c). Let res(F ) denote the size of the minimal resolution refutation of a formula F (we define res(F ) to be infinite when F is satisfiable). A celebrated result of Chvītal a and Szemerīdi [5] asserts that for all k 3 and every e Research. Email: optas@microsoft.com. Science & Engineering, Univ. of Washington. Email: beame@cs.washington.edu. Supp orted by NSF research grant CCR-0098066. Computer Science, Univ. of Toronto. Email: molloy @cs.toronto.edu, Supp orted by NSERC and a Sloan Fellowship. Computer Microsoft constant c > 0, w.h.p. res(Fk (n, cn)) = 2(n) . Since res(F ) is always a lower bound on the time it takes a DPLL algorithm to determine that F is unsatisfiable, this implies that every such algorithm w.h.p. requires exponential time on Fk (n, cn) when c > ck . In [2] we studied the behaviour of DPLL algorithms on random 3-CNF formulas when c is somewhat below the conjectured value for c3 ( 4.2). Using standard techniques it is not hard to show that many natural DPLL algorithms when applied to such formulas, generate at least one unsatisfiable subproblem consisting of a random mixture of m2 = (1 - )n 2-clauses and m3 = n 3-clauses. Our main contribution was the following theorem, implying that in that case the algorithm must spend an exponential amount of time before it can resolve such a subproblem and backtrack. Theorem 1.1. [2] For every , > 0, if F is the union of (1 - )n random 2-clauses and n random 3-clauses then w.h.p. res(F ) = 2(n) . Theorem 1.1 extends the result of Chvītal and Szea merīdi [5] to accommodate random 2-clauses (such as e those present in the subproblems generated by DPLL running on random k -CNF formulas). It asserts that such clauses do not significantly reduce the resolution complexity when m2 (1 - )n (for such m2 w.h.p. the 2-clauses by themselves are satisfiable). Thus, we proved that certain natural DPLL algorithms require exponential time significantly below the generally accepted range for the random 3-SAT threshold. As an example, for ordered-dll (which performs unit-clause propagation but, otherwise, sets variables in an a priori fixed random order/sign) we proved Theorem 1.2. [2] For c 3.81, ordered-dll requires time 2(n) on F3 (n, cn) with uniformly positive probability. Indeed, we proved that Theorem 1.2 also holds for a few other (more intelligent) DPLL algorithms with 3.81 replaced by slightly larger values (but still significantly below 4.2). Moreover, we showed that for certain natural and effective forms of backtracking, "with uniformly positive probability" can be replaced with "w.h.p." in Theorem 1.2. Indeed, it appears that this should be true for every form of backtracking, but technical reasons make a general analysis rather unwieldy. Copyright Đ 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688 139 2 Our new contributions The most obvious drawback of Theorem 1.2 is that it holds for values of c that are only conjectured (but not proven) to be in the "satisfiable regime". In this paper, we rectify this problem by showing that for all k 4, the analogue to Theorem 1.2 holds for values of c that are proven to be in the satisfiable regime. Moreover, the gap between the density at which the algorithm begins to require exponential time, and the greatest density for which formulas are known to be satisfiable w.h.p. is large. In fact, the two densities are not even of the same asymptotic order in k . Specifically, we prove that Lemma 3.2. [4] Let A be any DPLL algorithm extending uc. If during the first t steps of an execution of A on a random k -CNF formula no backtracking has occurred, then for each i, the set of i-clauses in Ft is uniformly random conditional on its size. Definition 3.1. For any fixed integers k , i 2 and any real c > 0 define fk : 2 , 1] R as i [0 fi (x) = c i-k i (1 - x)i xk-i . Lemma 3.3. [4, 1] Assume that k , c, x0 are such that f2 (x) < 1 - x for al l x [0, x0 ]. Fix x [0, x0 ] and let t = xn . If we run any DPLL algorithm extending Theorem 2.1. With uniformly positive probability uc on a random k -CNF formula with cn clauses then ordered-dll requires time 2(n) on Fk (n, cn) if k = 4 with uniformly positive probability al l of the fol lowing hold after exactly t steps: and c 7.5 or k 5 and c (11/k ) 2k-2 . a) No backtracking has occurred, b) No 0- or 1-clauses The values of c in Theorem 2.1 include values in the are present in Ft , c) For every 2 i k , the number satisfiable regime by the following recent result of [3]. of i-clauses in Ft is fi (x) · n + o(n). Theorem 2.2. [3] Fk (n, cn) is w.h.p. satisfiable if k = Using Lemmata 3.1 and 3.3 we prove the following which readily implies Theorem 2.1. 4 and c < 7.91 or k 5 and c < 2k ln 2 - (k + 4)/2. Lemma 3.4. We note that Theorem 2.1 is, perhaps, only the · Let F be a random 4-CNF formula on n variables simplest theorem that our techniques can deliver. with cn clauses, where c 7.5. There exists t ordered-dll is an extension to a full DPLL algorithm such that with uniformly positive probability, Ft is of the non-backtracking uc algorithm [4]. We can, unsatisfiable and res(Ft ) = 2(n) . for example, prove analogues of Theorem 2.1 for any DPLL algorithm extending many other "myopic" [1] al· For fixed k 5, let F be a random k -CNF formula gorithms, that are much more sophisticated than uc. on n variables with cn clauses, c (11/k ) 2k-2 . The arguments are technically more complex but in There exists t such that with uniformly positive spirit they completely parallel the proof of Theorem 2.1. probability, Ft is unsatisfiable and res(Ft ) = 2(n) . 3 The pro of Our strategy is to prove that, even for a satisfiable formula F , the algorithm will w.h.p. reach an unsatisfiable subformula F where res(F ) = 2(n) . Thus, the algorithm will take 2(n) steps just to determine that F is unsatisfiable before backtracking away from it. a roof Elements. For k = 4, take x4 = 0.375, t = x4 n P nd observe that for c = 7.5, f2 (x) < (1 - x) for all x < x0 0.378. To prove that Ft is unsatisfiable, let ri = 0.999 Ũ fi (x4 )/(1 - x4 ) and apply Lemma 3.1. For k 5, take xk = k-4 , t = xk n and argue analogously k-1 for c = (11/k ) 2k-2 . For all k , the result for larger c follows by a monotonicity argument. Lemma 3.1. Let F be a random CNF formula on n References variables with mi ri n clauses of length i. If = i -i ln 2 + ri ln(1 - 2 ) < 0 then F is unsatisfiable w.h.p. [1] D. Achlioptas. Lower bounds for random 3-SAT via Proof. If X is the expected number of satim ying assf i1 i signments of F then E[X ] = 2n - 2-i . Thus, Pr[X > 0] E[X ] exp(n). W e will say that a DPLL algorithm makes a "step" every time it assigns a value to a variable. Thus, the number of steps is non-decreasing (it stays constant as variables are unassigned during backtracking). Given an input formula F , we will denote the residual formula after t steps by Ft . differential equations. TCS, (1-2):159-185, 2001. [2] D. Achlioptas, P. Beame, and M. Molloy. A sharp threshold in proof complexity. STOC'01, pp. 337­346. [3] D. Achlioptas and Y. Peres. The random k-SAT threshold is 2k ln 2 - O(k). STOC'03, pp. 223­231. [4] M.-T. Chao, J. Franco. Probabilistic analysis of a generalization of the unit-clause literal selection heuristics for the k-satisfiability problem. Inform. Sci., 51(3):289­314, 1990. [5] V. Chvītal, E. Szemerīdi. Many hard examples for a e resolution. JACM, 35(4):759­768, 1988. 140