Locally satisfiable formulas Daniel Kr´l' a Abstract A CNF formula is k-satisfiable if each k clauses of can be satisfied simultaneously. Let k be the largest real number such that for each k-satisfiable formula with variables xi , there are probabilities pi with the following property: If each variable xi is chosen randomly and independently to be true with the probability pi , then each clause of is satisfied with the probability at least k . We determine the numbers k and design a linear-time algorithm which given a formula either outputs that is not k-satisfiable or finds probabilities pi such that each clause of is satisfied with the probability at least k . Our approach yields a robust linear-time deterministic algorithm which finds for a k-satisfiable formula a truth assignment satisfying at least the fraction of k of the clauses. A related parameter is rk which is the largest ratio such that for each k-satisfiable CNF formula with m clauses, there is a truth assignment which satisfies at least rk m clauses. It was known that k = rk for k = 1, 2, 3. We compute the ratio r4 and show 4 = r4 . We also design a linear-time algorithm which finds a truth assignment satisfying at least the fraction r4 of the clauses for 4-satisfiable formulas. 1 Intro duction. CNF formulas have a prominent position in computer science because of their essential role in many hardness reductions and constructions, see [1, 2, 6, 11]. We study an extremal problem which relates local and global satisfiability of CNF formulas and which is based on a notion of k -satisfiable formulas: Definition 1.1. A CNF formula is k -satisfiable if any k clauses of it can be simultaneously satisfied. This notion was introduced by Lieberherr and Specker [7, 8]; a separate section (20.6) is devoted to this concept in a recent monograph on extremal combinatorics by Jukna [5]. Note also that if each clause of a formula 2 contains at least l literals, then the formula is l - 1 satisfiable (consider a random truth assignment). One of the problems which we address is the following: Let be a k -satisfiable formula with the variables x1 , . . . , xn . What is the largest number ( ) for which there are probabilities p1 , . . . , pn such that if each xi is for Theoretical Computer Science, Charles University, Malostransk´ n´mst´ 25, 118 00 Prague 1, Czech Republic. eaei E-mail: kral@kam.mff.cuni.cz. Institute for Theoretical Computer Science (ITI) is supported by Ministry of Education of Czech Republic as pro ject LN00A056. Institute chosen randomly and independently to be true with the probability pi , then each clause of is satisfied with the probability at least ( )? Note that ( ) = 1 iff is satisfiable. Note that optimal probabilities pi may also be viewed as an optimal fractional truth assignment for . Let k be the largest number such that ( ) k for each k -satisfiable formula . In this paper, we compute the numbers k and discover a surprising connection between them and the voting paradox studied by Usiskin [10]. Based on our structural results, we design a deterministic linear-time algorithm which for a k -satisfiable formula with m clauses constructs a truth assignment satisfying at least k · m clauses. The ratios of satisfied clauses of our algorithm dominate the ratios of the previous algorithm (using a similar technique) for the problem by Trevisan [9]. By the definition of k , our algorithm is optimal in sense that no algorithm based on fractional truth assignments can guarantee a larger fraction of satisfied clauses in the class of k -satisfiable formulas. In addition our algorithm is robust, i.e., its input may be any formula and the algorithm either constructs a truth assignment or outputs that the input formula is not k -satisfiable. In the latter case, the algorithm also finds k clauses of the input formula which cannot be satisfied simultaneously. The numbers i are actually related in this paper to the voting paradox studied by Usiskin [10] which is described in the next: Let Xi for i = 1, . . . , k be independent real-valued random variables. We define µ(X1 , . . . , Xk ) to be the following quantity: min{P(X1 > X2 ), . . . , P(Xk-1 > Xk ), P(Xk > X1 )} The k -th Usiskin number bk is the largest real number such that there exist random variables X1 , . . . , Xk with µ(X1 , . . . , Xk ) bk . The values of bk were determined by Usiskin [10] who also showed that limk bk = 3/4. We prove that bk+1 = k for all k 1. Hence, the values of k are determined for all k (they were unknown for k 4) and limk k = 3/4 (however, this was previously proved by Trevisan [9]). So far, another modification of the problem (which is also considered in this paper) was mainly studied: Let rk be the largest real number such that each k -satisfiable CNF formula with m (not necessarily distinct) clauses Copyright © 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688 330 Note that the function f1 is actually an identity, i.e., k behavior of rk was completely determined by Trevisan f1 (x) = x. Observe that the recursive definition of the [9] by showing lim rk = 3/4. We compute the value of functions fk can be reversed, in particular, the following k holds for all k 0: r . Namely, we prove: 4 has a truth assignment which satisfies at least rk m clauses. Clearly, r1 = 1/2 and k rk . The latter inequality is because the expected number of satisfied clauses for optimal probabilities pi is at least k m and hence there is a truth assignment which satisfies at least k m clauses. We briefly survey known results on the values of rk ; the reader is also welcomed to see the section 20.6 in the monograph [5]. The study of the problem was tarted s by Lieberherr and Specker. They showed r2 = 5-1 2 0.6180 [7] and they consequently established r3 = 2/3 [8]. Later, Yannakakis [12] simplified proofs of the lower bounds for r2 and r3 using a probabilistic argument. Huang and Lieberherr [4] studied the asymptotic behavior of rk and proved lim rk 3/4. The asymptotic in Section 3 (Theorems 3.1 and 3.2). Next, we establish several identities for the claimed value of r4 in Section 4, in particular the equality r4 = 1/(2 - p2 ) where p0 is 0 a root of the equation p3 + p2 - 1 = 0. The matching 0 0 lower and upper bounds on r4 are proved in Section 5 and Section 6, respectively. Finally, we utilize the obtained results to design our algorithms in Section 7. 2 Usiskin's numb ers. Let us first define recursively functions fk for all nonnegative integers k : 1 for k = 0 and (2.1) fk (x) = 1 - fk1-xx) otherwise. -1 ( r4 = 5+ 3 3 3 69-11 2 - 3 3 0.6992 69+11 2 (2.2) fk (x) = 1-x 1 - fk+1 (x) In addition, our proof gives a linear-time algorithm for 4-satisfiable formulas which finds an assignment satisfying at least the fraction r4 of the clauses. We remark that the case of rk for k = 4 substantially differs from the cases k < 4: In the cases of k = 1, 2, 3, Yannakakis found for each k -satisfiable formula a probability distribution such that each clause of was satisfied with probability at least rk (and hence k = rk for k < 4). Thus the expected weight of the satisfied clauses with respect to such a distribution is at least rk w( ) regardless the actual choice of the weight function. However such a distribution does not exist in general for 4-satisfiable formulas (hence 4 = r4 ). The just introduced problem is actually studied for CNF formulas where non-negative weights are assigned to the clauses. This is the same problem because the weights of the clauses can be simulated (with a negligible error) by repeating the clauses several times in the formula (allowing an exponential blow-up in the size). If is a formula, let w( ) be the sum of the weights of all its clauses and w0 ( ) be the maximum weight of the clauses of which can be simultaneously satisfied. Let k be the set of all k -satisfiable CNF formulas. Using the just defined notation, we can write: rk = inf w0 ( ) w( ) k = inf ( ) k Usiskin [10] showed that the number bk , k 2, is the only solution x with x 1/2, 3/4) of the following equation: (2.3) fk (x) = 0 which satisfies in addition that fi (x) > 0 for all i = 1, . . . , k - 1. It can be shown that bk < 3/4 for each k and limk bk = 3/4. Observe that the equality of (2.3) can be rewritten to the following form: (2.4) 1 - bk = fk-1 (bk ) Based on (2.3), it is possible to construct an equation of degree k/2 whose root is the number bk [10]. Then, some of the Usiskin's numbers can be expr sed es explicitly, g., b2 = 1/2, b3 = ( 5 - 1)/2, b5 = 2/2 e. and b10 = 3 - 1. We now state several lemmas on the values fi (bk ). We decided to include their short proofs because they relieve digesting the definition of the function fk (x): Lemma 2.1. Let k 2 be an integer. Then: 1 = f0 (bk ) > f1 (bk ) > . . . > fk-1 (bk ) > fk (bk ) = 0 Proof. By the definition of the function f0 , we have f0 (bk ) = 1. On the other hand, the equality of (2.3) implies that fk (bk ) = 0. We now show that fi (bk ) > fi+1 (bk ) for each i = 0, . . . , k - 1: fi (bk ) > fi+1 (bk ) 1 - bk fi (bk ) > 1 - fi (bk ) k The paper is organized as follows. We recall certain equalities and inequalities which hold for Usiskin's numbers bk in Section 2. Then, we show that bk+1 = k fi (bk )2 - fi (bk ) + 1/4 > 1/4 - (1 - bk ) (fi (bk ) - 1/2)2 > bk - 3/4 331 The last inequality holds because its left-hand side is non-negative and its right-hand side is negative (recall that bk < 3/4). Lemma 2.2. Let i, j and k be non-negative integers such that i + j k - 1 and k 2. Then: fi (bk ) · fj (bk ) 1 - bk Moreover, the equality holds only if i + j = k - 1. Proof. By Lemma 2.1, it is enough to prove that fi (bk ) · fj (bk ) = 1 - bk for i + j = k - 1. The proof proceeds by induction on i. If i = 0, then fi (bk ) = f0 (bk ) = 1 and fj (bk ) = fk-1 (bk ) = 1 - bk (the latter follows from the equality of (2.4) ). Hence, let us assume that i 1 and we need to show that fi (bk ) · fk-1-i (bk ) = 1 - bk . By the equations of (2.1) and (2.2), it is enough to prove: 1 · 1 - bk 1 - bk = 1 - bk - fi-1 (bk ) 1 - fk-i (bk ) 1 - bk 1- = 1 - fk-i (bk ) fi-1 (bk ) 1 - bk fk-i (bk ) = fi-1 (bk ) fi-1 (bk ) · fk-i (bk ) = 1 - bk Proof. If i = j , then the claim trivially holds. Hence, assume that i < j in the rest. In particular, fi (bk ) > fj (bk ) by Lemma 2.1. Let us first rewrite the inequality from the statement to a little different form: fi (bk ) · fj -1 (bk ) fi-1 (bk ) · fj (bk ) 1 - bk 1 - bk fi (bk ) · · fj (bk ) 1 - fj (bk ) 1 - fi (bk ) fi (bk ) - fi (bk )2 fi (bk ) - fj (bk ) 1 fi (bk ) + fk-i (bk ) fj (bk ) - fj (bk )2 fi (bk )2 - fj (bk )2 fi (bk ) + fj (bk ) fi (bk ) + fj (bk ) The equality 1 = fi (bk ) + fk-i (bk ) holds by Lemma 2.3. The last inequality above follows from Lemma 2.1 and the fact that k - i j . Lemma 2.5. Let i, j and k be non-negative integers such that i + j k - 1 and k 2. Then: fi (bk ) · fj (bk ) fi+j (bk ) Proof. We can assume that i j . If i = 0, the inequality follows from the fact that f0 (bk ) = 1. Otherwise, Lemma 2.4 implies that: fi (bk ) · fj (bk ) fi-1 (bk )fj +1 (bk ) . . . The last equality holds by the induction hypothesis. Lemma 2.3. Let i and k be non-negative integers such that k 2. Then: f0 (bk )fi+j (bk ) = fi+j (bk ) 3 The Lower and Upp er Bounds on k . We first introduce some notation used in this section fi (bk ) + fk-i (bk ) = 1 and Section 7. The true value is denoted by 1 and the 1 Proof. The proof proceeds by induction i. If i = 0, false value by 0. Following this notation, x denotes the 0 then fi (bk ) = f0 (bk ) = 1 and fj (bk ) = fk (bk ) = 0 (the literal x and x the literal ¬x. Let now be a formula latter is from the definition of bk based on the equality and x a variable contained in it. Then, cl (x) is the cardinality of the smallest set C of clauses of such of (2.3) ). Let us suppose now that i > 0: that there is no truth assignment which satisfies all the fi (bk ) + fk-i (bk ) = 1 clauses of C and which assigns 1 - to the variable x. If there is not such a set C , then cl (x) = . We now 1 - bk 1 - bk 1- + =1 state two simple lemmas on the values of cl (x) for a fi-1 (bk ) 1 - fk-i+1 (bk ) k -satisfiable formula : 1 - bk 1 - bk = fk-i+1 (bk ) 1 - fi-1 (bk ) Lemma 3.1. Let be a k -satisfiable formula. Then, 1 - fi-1 (bk ) = fk-i+1 (bk ) the fol lowing holds for each variable x contained in : fi-1 (bk ) + fk-i+1 (bk ) = 1 cl0 (x) + cl1 (x) k + 1 Now, the last equality follows from the induction hyMoreover, if cannot be satisfied, then each cl (x) is pothesis. finite. Lemma 2.4. Let i, j and k be positive integers such that Proof. If cannot be satisfied, then a set C of all i j and i + j k . Then: the clauses of satisfies the properties required by the fi (bk ) · fj -1 (bk ) fi-1 (bk ) · fj (bk ) definition of cl (x). Then, each of cl (x) is finite. 332 both by Lemma 3.1. A variable xi is called neutral if pi = 1/2, i.e., both cl0 (xi ) and cl1 (xi ) are at least k /2. Choose each xi randomly and independently to be true with the probability pi . By Lemmas 2.1 and 2.3, if cl (xi ) k /2, then xi is chosen to be with the probability greater than 1/2. We show that each clause Lemma 3.2. Let be a k -satisfiable formula and let of is satisfied with the probability at least bk+1 . Fix (x1 . . . xn ) be a clause of . Then, the fol lowing a clause of the formula . If contains at least two n 1 neutral variables, then it is satisfied with the probability holds: at least 3/4 bk+1 . Hence, we can assume that cl1-1 (x1 ) + . . . + cl1-n (xn ) k contains at most one neutral variable. We first consider the case that the clause contains Proof. Assume that cl1-1 (x1 ) + . . . + cl1-n (xn ) < k , 1-i a neutral variable. If also contains a literal x for xi i in particular, all the numbers cl (xi ) are finite. Let with cl (xi ) k /2, then the clause is satisfied with 1-i Ci be a set of the size cl (xi ) of clauses which can be the probability at least 3/4 > bk+1 . Hence, it is enough satisfied only when the value of xi is 1 - i . Let C be to consider only the case that the clause is of the form: now the set containing the clause (x1 . . . xn ) and all n 1 0 1 the clauses of the sets Ci . If there is a truth assignment = (x0 x1 . . . xll ) i i i which satisfies all the clauses of C , then the value of xi must be 1 - i (because of the clauses of Ci ). But, then where p = 1/2 and cl1-j (x ) < k /2 for j = 1, . . . , l. i0 ij the clause (x1 . . . xn ) is not satisfied. Hence, there n 1 By Lemma 2.3 and the definition of pi , xijj is chosen to is no truth assignment which satisfies all the clauses of be 1 - j with the probability equal to: the set C , but |C | k . We are now ready to prove our lower bound on the ratio k . Let us briefly sketch the main idea of the proof. The constructed probability distribution is based on the values cl (x). Essentially, a small value of cl (x) (smaller than k /2) represents that the probability of x being must be large (greater than 1/2). Lemma 3.1 then states that for a variable x at most one of cl0 (x) and cl1 (x) is large in this sense. Suitable probabilities of x being true/false are chosen as values of the functions fi from the previous section. fcl1-j (x ) (bk+1 ) j Assume that there is a variable x such that cl0 (x) + cl1 (x) k . Let C be a set of the size cl (x) of the clauses which can be satisfied only when the value of the variable x is . Then, the clauses of C 0 C 1 cannot be simultaneously satisfied, but |C 0 C 1 | k . Since cl0 (xi0 ) (k + 1)/2, we have that cl1-1 (xi1 ) + . . . + cl1-l (xil ) (k + 1)/2 - 1 by the definition of cl0 (xi0 ). Lemmas 2.1 and 2.5 bound the probability that each variable xijj is chosen to be 1 - j for all j = 1, . . . , l as follows: jl =1 fcl1-j (x ) (bk+1 ) f j (k+1)/2 -1 (bk+1 ) Theorem 3.1. Let be a k -satisfiable formula. Then, Observe that f (k+1)/2 (bk+1 ) 1/2 by Lemmas 2.1 bk+1 ( ). Hence, bk+1 k . and 2.3. Thus, we can ma jorize using Lemmas 2.2 and 2.3 the upper bound 1/2 · f (k+1)/2 -1 (bk+1 ) on the Proof. Let x1 , . . . , xn be the variables of the formula probability that the clause is not satisfied as follows: . If the formula can be satisfied, then there are probabilities p1 , . . . , pn such that each clause is 1/2 · f (k+1)/2 -1 (bk+1 ) satisfied with the probability equal to one (consider the probabilities pi derived from a satisfying truth (1 - f (k+1)/2 (bk+1 )) · f (k+1)/2 -1 (bk+1 ) = assignment). Assume in the rest that the formula fk+1- (k+1)/2 (bk+1 ) · f (k+1)/2 -1 (bk+1 ) = 1 - bk+1 is unsatisfiable. In particular, all the numbers cl (xi ) are finite by Lemma 3.1. The final case is that the clause contains no We now define the probabilities pi : neutral variables. Let us assume that the clause is of the following form: fcl1 (xi ) (bk+1 ) if cl1 (xi ) k /2, 1 = (x1 . . . xll ) pi = fk-cl0 (xi ) (bk+1 ) if cl0 (xi ) k /2, i i 1/2 otherwise. If there are distinct j and j such that clj (xij ) k /2 The probabilities pi are well-defined because the in- and cl j (xij ) k /2, then each of xj and xj is chosen 0 1 equalities cl (xi ) k /2 and cl (xi ) k /2 cannot hold to be j and j , respectively, with the probability at 333 least 1/2. Hence, is satisfied with the probability at least 3/4 > bk+1 . Let us consider now the case that there is exactly one j such that clj (xij ) k /2. We can assume that such j is equal to 1. The variable xi1 is chosen to be 1 with the probability fcl1 (xi ) (bk+1 ) due to the 1 definition of pi1 . By the assumption that no variable of 1- is neutral, we have cl j (xij ) k /2 for j = 1. The 1 definition of cl (xi1 ) now implies that: cl1-2 (xi2 ) + . . . + cl1-l (xil ) cl1 (xi1 ) - 1 The variable for j = 2, . . . , l is chosen to be 1 - j with the probability: xijj The matching upper bound on k is presented in the next theorem. Actually, the proof of this theorem led us to definition of the functions fi (as defined in the previous section) and later we became aware of the paper [10]: Theorem 3.2. Let be the fol lowing formula: = (x1 ) (¬x1 x2 ) (¬x2 x3 ) . . . (¬xk-1 xk ) (¬xk ) The formula is k -satisfiable and bk+1 ( ). Hence, bk+1 k for al l k 1. Proof. Let k be a fixed integer. We first show that the formula is k -satisfiable, i.e., that any formula fcl1-j (x ) (bk+1 ) j obtained from by removing a single clause can be satisfied. If the missing clause is (x1 ) or (¬xk ), By Lemmas 2.1 and 2.5, we can now bound the probj then set all variables to be false or true, respectively. ability that each variable xij is chosen to be 1 - j for Otherwise, the missing clause is (¬xi xi+1 ) for some all j = 2, . . . , l: i, 1 i k - 1. Then, set all the variables xi for i < i to be true and all the others to be false. jl Let p1 , . . . , pk be fixed probabilities. We prove that fcl1-j (x ) (bk+1 ) fcl1 (xi )-1 (bk+1 ) 1 j if each variable xi is chosen randomly and independently =2 to be true with the probability pi , then contains a Hence, the clause is satisfied with the probability clause which is satisfied with the probability at most which is at least: bk+1 . Assume that all the clauses are satisfied with the probability at least bk+1 . We show that then each clause 1 - (1 - fcl1 (xi )-1 (bk+1 )) · fcl1 (xi )-1 (bk+1 ) = 1 1 is satisfied with the probability exactly bk+1 . We prove by induction on i that pi fi (bk+1 ) and 1 - fk+1-cl1 (xi )-1 (bk+1 ) · fcl1 (xi )-1 (bk+1 ) = bk+1 1 1 that if pi = fi (bk+1 ), then pi = fi (bk+1 ) for all i < i. The first equality follows from Lemma 2.3 and the last If i = 1, then p1 f1 (bk+1 ) = bk+1 because the clause (x1 ) is satisfied with the probability at least bk+1 . Let one from Lemma 2.2. Finally, if there is no j such that clj (xij ) k /2, i 2. Since the clause (¬xi-1 xi ) is satisfied with the probability at least bk+1 , we have: 1- we have cl j (xij ) k /2 for all j because no variable j of is neutral. The variable xij is chosen to be 1 - j 1 - pi-1 (1 - pi ) bk+1 with the probability equal to: By the induction hypothesis and the equality of (2.1), fcl1-j (x ) (bk+1 ) we get: j Lemma 3.2 implies the following: cl1-1 (xi1 ) + . . . + cl1-l (xil ) k By Lemmas 2.1 and 2.5, we can now bound the proba bility that each variable xijj is set to value 1 - j for all j = 1, . . . , l: jl =1 1 - fi-1 (bk+1 )(1 - pi ) bk+1 1 - bk+1 fi-1 (bk+1 )(1 - pi ) 1 - bk+1 1 - pi fi-1 (bk+1 ) 1 - bk+1 pi 1 - fi-1 (bk+1 ) pi fi (bk+1 ) fcl1-j (x ) (bk+1 ) fk (bk+1 ) = 1 - bk+1 j Thus, the clause is satisfied with the probability at least bk+1 . In addition, if pi = fi (bk+1 ), then all the inequalities are strict, in particular, pi-1 = fi-1 (bk+1 ). Hence, pi = fi (bk+1 ) for all i < i. 334 So, pk fk (bk+1 ). Since the clause (¬xk ) is The claimed value of r4 can be determined from the valsatisfied with the probability at least bk+1 , we have ues of p0 and q0 : 1 - fk (bk+1 ) 1 - pk bk+1 . Thus, pk = fk (bk+1 ). 1 1 1 This implies that pi = fi (bk+1 ) for all i and each r4 = 3 = 2 - p2 = 2 - q = 1 + p0 0 clause is satisfied with the probability exactly bk+1 by 0 Lemma 2.2. 3 3 3 (4.8) 0.6992 Remark 3.1. As pointed out by one of the referees, 69-11 69+11 5+ 3 -3 2 2 Theorem 3.2 can be also deduced directly from Usiskin's results mentioned in Introduction. Fix k and consider 5 The Lower Bound on r4 . probabilities pi , i = 1, . . . , k , such that the variable xi is chosen to be true with the probability pi . Let us now We prove that r4 is at least the value given by (4.8) in the next theorem. In particular, 4 < r4 and define random variables X0 , . . . , Xk as fol lows: hence, unlike in the case of k -satisfiable formulas for X0 = 0 with the probability 1, k = 1, 2, 3, there is no universal probability distribution Xi = i - k - 1 with the probability pi and for 4-satisfiable formulas in general. i with the probability 1 - pi . Theorem 5.1. Let be a 4-satisfiable CNF formula. Observe that X0 > X1 holds with the probability p1 . Then: This is exactly the probability that the clause (x1 ) is w0 ( ) 1 satisfied. Observe now that Xi > Xi+1 holds with the w( ) 2 - p2 0 probability (1 - pi )pi+1 for i = 1, . . . , k - 1 which is exactly the probability that the clause (¬xi xi+1 ) is Thus, the fol lowing holds: satisfied. Final ly, Xk > X0 holds with the probability 1 r4 1 - pk which is the probability that the clause (¬xk ) is 2 - p2 0 satisfied. Hence, we can conclude: ( ) = min{P(X0 > X1 ), . . . , P(Xk > X0 )} Then, ( ) must be at most bk+1 . It seems to be interesting to try to find a proof of Theorem 3.1 which also directly uses properties of Usiskin's numbers bk . 4 Prop erties of r4 . In this section, we establish several identities which are satisfied by the claimed value of r4 . Let q0 be the unique real solution of the following equation: (4.5) 3 q0 = (1 - q0 )2 Using Cardano's formula, we have: 3 3 69 - 11 1 69 + 11 3 3 + 0.5698 - q0 = 54 3 54 Let us define p0 = q0 0.7547. It is easy to check from (4.5) that p0 and q0 satisfy the following equation: (4.6) 2 p4 = q0 = p0 (1 - q0 ) 0 The first and the last part of (4.6) together with the definition p0 = q0 gives: (4.7) p3 + p2 - 1 = 0 0 0 Proof. Due to space limitations, we only explain the main ideas of the proof and omit (quite technical) computing the expected weight of satisfied clauses with respect to considered probability distributions. Consider a fixed 4-satisfiable formula . It is possible to assume that no clause of contains simultaneously a positive and a negative occurrence of the same variable (such a clause is always satisfied and removing it from can only decrease the ratio w0 ( )/w( )). If does not contain a clause of size one, then set each variable randomly and independently to be true with the probability 1/2. The expected weight of the satisfied clauses is at least 3w( )/4 (each clause is satisfied with the probability at least 3/4) and hence there is a truth assignment which satisfies clauses of the total weight at least 3w( )/4 w( )/(2 - p2 ). 0 Assume in the rest that contains clauses of size one. It is also possible to assume that all the occurrences of variables in clauses of size one are positive: If this is not true and contains a clause (¬x), then change all positive occurrences of x to negative and vice versa. Observe that no variable can appear both positively and negatively in clauses of size one because is 4satisfiable. Now, let X be the set of variables which occur in the clauses of size one. In addition, assume that the sum of the weights of clauses containing a literal x for x X is equal to one (this can be assured by multiplying all the weights of all the clauses 335 by the same suitable constant which clearly preserves the ratio w0 ( )/w( )). We can also assume that there is no clause (¬x ¬y ) for x X and any variable y . If there is such a clause, change all negative occurrences of y to positive and vice versa (note that y X because is 4-satisfiable). If this does not help, then contains a clause (¬x y ) and a clause (¬x ¬y ) for some x, x X . But then, the clauses (x), (x ), (¬x y ) and (¬x ¬y ) of contradict the 4-satisfiability of the formula . Let Y be the set of variables y which occur in the clauses of type (¬x y ) and which are not contained in X , i.e., Y is the set of all the variables which are not contained in X and which are contained in a clause of size two together with a literal of the form ¬x for x X . Finally, let Z be the set of the remaining variables contained in , i.e., the variables of included neither to X nor to Y . Let be the sum of the weights of the following four types of clauses: · clauses containing only literals of the form ¬x for x X, · clauses which contain a literal ¬x for x X and a literal y for y Y but no literal x for x X , · clauses containing only literals of the form ¬x for x X and literals of the form ¬y for y Y and · clauses containing only literals of the type ¬y for y Y and literals with the variables of Z . Let us consider the following probability distribution: Set a variable of X to be true with the probability p0 , a variable of Y with the probability q0 and a variable from Z with the probability 1/2. It can be shown under the assumption w( )/p3 , that the expected weight 0 of the satisfied clauses with respect to this distribution is at least w( )/(2 - p2 ). We omit technical details of 0 this computation. If > w( )/p3 , then two probability distributions 0 need to be considered. The first one is the same as in the previous paragraph. The second one is simpler: All the variables of X and Y are set to be false and the variables of Z are set to be true with the probability 1/2. Now, the expected sums of weights of the satisfied clauses with respect to these two distributions are computed and it is shown that one of them is at least w( )/(2 - p2 ). 0 Again, we omit details of this. 6 The Upp er Bound on r4 . We first present a construction of a certain 4-satisfiable formula. Next, we show that any truth assignment of it behaves in some sense almost like a random assignment (Lemma 6.2) and then we prove the upper bound by a careful choice of parameters in the construction of the formula. The formula SAT4 (n, , , ) is defined as follows: It has n variables xi , 1 i n, and nk variables ya where a ranges over all ordered k -tuples of numbers 1, . . . , n for k = n1/3 . We say that two ordered k tuples a and b have a common entry if there is i which is an entry both of a and b; we do not require that i has the same position in a and b, e.g., i may be the first entry of a and the last one of b. The clauses of the formula SAT4 (n, , , ) are the following: · n clauses (xi ) for 1 i n each of weight 1/n. · k nk clauses (¬xi ya ) for all pairs i and a such that i is contained in a. If i is contained in a several times, then the formula contains as many clauses (¬xi ya ) as is the number of occurrences of i in a. Each of these k nk clauses has the weight equal to /(k nk ), i.e., the sum of their weights is equal to . · clauses (¬ya ¬yb ) for all ordered pairs of k -tuples a and b which do not have a common entry. Note that each clause of this type is included to the formula twice because the pairs of a and b are considered to be ordered. All the clauses of this type have the same weight chosen in such a way that the sum of their weights is equal to . nc ·4 lauses (¬xi1 ¬xi2 ¬xi3 ¬xi4 ) for all unordered quadruples i1 , i2 , i3 and i4 of different numbers between 1 and n. Aln t,hese clauses have l the same weight equal to / 4 i.e., the sum of their weights is equal to . We state two lemmas on formulas SAT4 (n, , , ). Note that Lemma 6.2 says that any truth assignment behaves for SAT4 (n, , , ) like a random assignment. Observe that p + (1 - p + pq ) + (1 - q 2 ) + (1 - p4 ) is the expected weight of the satisfied clauses of SAT4 (n, , , ) when the variables xi are set to be true with the probability p and the variables ya with the probability q . Lemma 6.1. The SAT4 (n, , , ) is 4-satisfiable for al l n, , and . Proof. Let W be a minimal set of clauses of a formula SAT4 (n, , , ) which cannot be simultaneously satisfied. Assume for the sake of contradiction that |W | 4. By the minimality of W , any variable either does not appear in the clauses of W at all or it has both a positive and a negative occurrence in the clauses of W . Since 336 |W | 4, W cannot contain a clause of size four. Moreover, W must contain at least one clause of the type (¬ya ¬yb ) (otherwise, it contains only the clauses of types (xi ) and (¬xi ya ) which can be satisfied by setting all the variables to be true). If W contains a clause (¬ya ¬yb ), it must contain also clauses (¬xi ya ) for some i contained in a and (¬xj yb ) for some j contained in b. Since the clause (¬ya ¬yb ) is present in SAT4 (n, , , ), we have i = j . But then, W must contain also the clauses (xi ) and (xj ) which implies that |W | > 4. Lemma 6.2. The weight of the satisfied clauses of a formula SAT4 (n, , , ) is equal to p + (1 - p + pq + O(n-1/12 )) + (1 - q 2 + O(n-1/3 )) + (1 - p4 + O(n-1 )) for al l n, , and where p is equal to the fraction of variables xi set to true and q is the fraction of variables ya set to true. The constants in the functions O(n-1/12 ), O(n-1/3 ) and O(n-1 ) are independent of n, , , , p and q and the actual terms estimated by them can be both negative and positive. Proof. Let k = n1/3 be the size of the tuples a as in the construction of SAT4 (n, , , ). We deal with each of the four types of the clauses contained in SAT4 (n, , , ) separately: to q + (1 - q )(1 - p + O(n-1/12 )) + e-(n 1 - p + pq + O(n-1/12 ). 1/6 ) = · (¬ya ¬yb ) for a and b with no common entry Consider all ordered pairs of literals ¬ya and ¬yb (including those with a = b); exactly 1 - q 2 of them contain a satisfied literal. There are n2k such pairs. However only some of them correspond to real clauses. The number of the clauses of the type (¬ya ¬yb ) is at least nk (n - k )k . Hence the error made by approximating the ratio of the satisfied kk clauses by 1 - q 2 is of the order 1 - (n-k ) = n O(k 2 /n) = O(n-1/3 ). · (¬xi1 ¬xi2 ¬xi3 ¬xi4 ) Consider all n4 ordered quadruples of (not necessarily distinct) literals ¬xi ; exactly (1 - p4 )n4 of them contain a satisfied literal. The number of ordered quadruples n or( esponding to the clauses of cr the formula is 4! 4 4! quadruples correspond to each clause). Hence at most n4 - n(n - 1)(n - 2)(n - 3) = (n3 ) of the quadruples do not correspond to the clauses and the fraction of the satisfied clauses of this type is 1 - p4 + O(n-1 ). Now, the upper bound is presented. We remark that actually the equations (6.9)­(6.11) from this proof led us to computing the ratio r4 . · (xi ) for 1 i n There are exactly pn clauses xi , 1 i n satisfied. Theorem 6.1. For each > 0, there exists a 4Hence the weight of the satisfied clauses of this type satisfiable formula such that w0 ( ) w( )/(2 - p2 ) + 0 is exactly p. . · (¬xi ya ) for i contained in a Proof. Set , and to be the unique solution of Let (a) be the number of entries of a which the following system of equations: correspond to the variables xi set to be true by the p 4 + p 4 + p 4 - p0 = 0 assignment (counting multiplicity if i is contained (6.9) 0 0 0 several times in a). Chernoff bounds [3] can be (6.10) 1 - + q0 - 4 p3 = 0 0 used to bound the number of ordered k -tuples a for (6.11) p 0 - 2 q 0 = 0 which (a) differs significantly from the "average value" k p: The values of , and are approximately equal to 1.234, 0.819 and 0.272, respectively. 2 |{a, (a) (1 + )pk }| e- pk/3 nk Consider a formula SAT4 (n, , , ) for a suffi-2 pk/2 k ciently large number n, i.e., such n that the er|{a, (a) (1 - )pk }| e n ror term O(n-1/12 ) + O(n-1/3 ) + O(n-1 ) from Set = n1/4 /pk n-1/12 . Observe that except Lemma 6.2 is smaller than the considered . The for1/6 for at most e-(n ) nk tuples a, all tuples a satisfy mula SAT4 (n, , , ) is 4-satisfiable by Lemma 6.1. that the number of their entries i which correspond The value w0 (SAT4 (n, , , )) is equal up to an adto the variables xi set to true is within the difference ditive error of to the maximum4 of the function p + 2 of n1/4 from pk . If ya is false and the number of (1 - p + pq ) + (1 - q ) + (1 - p ) for 0 p, q 1 by such i's is within the difference of n1/4 from pk , Lemma 6.2 and the choice of n. The first partial derivatives according to p and q are then there are satisfied (1 - p + O(n1/4-1/3 ))k = -1/12 the following: (1 - p + O(n ))k clauses out of all the k clauses (¬xi ya ) containing the literal ya . Hence the ratio = p + (1 - p + pq ) + (1 - q 2 ) + (1 - p4 ) of the satisfied clauses of the type (¬xi ya ) is equal p 337 1 - + q - 4 p3 p + (1 - p + pq ) + (1 - q 2 ) + (1 - p4 ) q p - 2 q = Input: a formula Output: the numbers im (x) unmark all literals x set all im (x) to for each clause (x ) set im (x) to 1 while there is an unmarked literal x such that im1- (x) < do choose an unmarked x1 1 with im1-1 (x1 ) as small as possible mark the literal x1 1 for all clauses (x0 . . . xl ) consisting 0 l only of unmarked literal x0 0 set im0 (x0 ) to min{im0 (x0 ), 1 + im1-1 (x1 ) + . . . + im1-l (xl )} endfor endwhile Hence, the function p + (1 - p + pq ) + (1 - q 2 ) + (1 - p4 ) attains its maximum for one of the following pairs of values (p, q ): (0, 0), (0, 1), (1, 0), (1, 1), (1, /2 ) and (p0 , q0 ). A pair (p0 , q0 ) is one of the candidates because , and satisfy (6.10) and (6.11). It is easy to check that the values of the function p + (1 - p + pq ) + (1 - q 2 ) + (1 - p4 ) for (p, q ) = (0, 0) and for (p, q ) = (p0 , q0 ) dominate all the others. By (4.6) and (6.9), the value of p + (1 - p + pq ) + (1 - q 2 ) + (1 - p4 ) is equal to + + both for (p, q ) = (0, 0) and (p, q ) = (p0 , q0 ). The maximum of the function p + (1 - p + pq ) + (1 - q 2 ) + (1 - p4 ) is equal to + + and it is within the error of from w0 (SAT4 (n, , , )). We have w(SAT4 (n, , , )) = 1 + + + due Figure 1: A linear time algorithm computing the num to the construction of SAT4 (n, , , ). The desired bers im (x). estimate on the ratio now follows from (4.8) and (6.9): w0 (SAT4 (n, , , )) ++ += w(SAT4 (n, , , )) 1+++ 7 1 1 += + 1 + p3 2 - p2 0 0 · If there is no restriction on the value of im (x), set im (x) to . The third property on the numbers (x) is satisfied by the definition of im (x). Observe that if im (x) is finite, then there is a set C of clauses of the size im (x) such that the clauses of C can be satisfied only if the value of x is . Thus, cl (x) im (x). We can conclude: If is k -satisfiable, then the numbers im (x) have all the three properties needed in the proof of Theorem 3.1. The numbers im (x) may also have the properties for a non-k -satisfiable formula. Trevisan [9] considered similarly defined numbers but he required in his paper instead that of 0 (x0 ) 1 1 1 + -1 (x1 ) + . . . + -l (xl ) that 0 (x0 ) 1 + 1-l 1-1 max{ (x1 ), . . . , (xl )}. Hence Trevisan's numbers correspond to depths of certain "derivations" that x is and our numbers im (x) to their sizes. This causes that the ratios of the algorithm obtained by Trevisan are worse, but on the other hand, our analysis is less straightforward. The numbers im (x) can be computed in linear time by the algorithm from Figure 1. Once the numbers im (x) are computed, one can check (in linear time) whether the numbers im (x) have the first two proper ties mentioned above (the third one follows directly from their definition). If this is not the case, then the input formula is not k -satisfiable. In addition, if one of the two condition, let us say the first one, is not satisfied, i.e., im0 (x) + im1 (x) k for some x, then it is possible The Algorithms. One can check that the numbers cl (x) can be replaced in the proof of Theorem 3.1 by any numbers (x) which have the following three properties: 0 1 1. (x) + (x) k + 1 for each variable x of , 1 1 2. -1 (x1 ) + . . . + -l (xl ) k for each clause 1 l (x . . . x ) of and 1 1 3. 0 (x0 ) 1 + -1 (x1 ) + . . . + -l (xl ) for each 0 l clause (x . . . x ). Moreover, all the numbers (x) need not to be finite (this assumption only simplified several arguments in the proof ). The numbers cl (x) have the first two of the above properties by Lemmas 3.1 and 3.2 and the third one followed from their definition. Instead of cl (x), our algorithms use the numbers im (x) which are defined to be the (largest) numbers which have the following properties: · If contains a clause (x ), then im (x) = 1. · If contains a clause (x0 . . . xl ), then 1 1 im0 (x0 ) 1 + im-1 (x1 ) + . . . + im-l (xl ). 338 to find k clauses which cannot be satisfied simultaneously. Simply, the im0 (x) clauses which forces x to be false and im1 (x) clauses which forces x to be true are such k clauses. This set of clauses can be constructed in linear time. If the formula is k -satisfiable, then we can define the probabilities pi as in the proof of Theorem 3.1. In particular, each clause of with respect to the probabilities pi is satisfied with the probability at least bk+1 : Theorem 7.1. There is a linear time algorithm which for a given formula with variables xi either outputs that is not k -satisfiable (and it finds k clauses which cannot be simultaneously satisfied) or finds probabilities pi such that each clauses is satisfied with the probability at least k if a variable xi is chosen to be true with the probability pi . Using the algorithm from Theorem 7.1, we can find a truth assignment which satisfies at least the fraction of bk+1 of the clauses using a derandomization method proposed by Yannakakis [12]: Theorem 7.2. There is a deterministic linear time algorithm which for a given formula with m clauses either outputs that is not k -satisfiable (and it finds k clauses which cannot be simultaneously satisfied) or finds a truth assignment such that at least bk+1 · m clauses are satisfied. The same method can be applied to the result presented in Section 5: Theorem 7.3. There is a deterministic linear time algorithm which for a given formula with m clauses either outputs that is not 4-satisfiable (and it finds four clauses which cannot be simultaneously satisfied) or finds a truth assignment such that at least r4 · m clauses are satisfied. Acknowledgement. The author would to thank Gerhard Woeginger for attracting his attention to the problem and for pointing out the paper [10]. He would also like to thank the referees for their valuable comments on the results. References [1] M. R. Garey, D. S. Johnson: Computers and intractability, A guide to the theory of NP-completeness, Freeman, San Francisco, California, 1979. [2] J. H°stad: Some optimal inapproximability results, a Proc. 28th ACM Symposium on Theory of Computing 1997, pp. 1­10. [3] T. Hagerup, Ch. Rub: A guided tour Chernoff bounds, ¨ Inform. Process. Letters, 33 (1989), pp. 305­308. [4] M. A. Huang, K. Lieberherr: Implications of forbidden structures for extremal algorithmic problems, Theoretical Computer Science, 40 (1985), pp. 195­210. [5] S. Jukna: Extremal combinatorics with applications in computer science, Springer, Heidelberg, 2001. [6] H. Karloff, U. Zwick: A 7/8-approximation algorithm for MAX 3SAT?, Proc. 38th IEEE Symposium on Foundations of Computer Science 1997, pp. 406­415. [7] K. Lieberherr, E. Specker: Complexity of partial satisfaction, Journal of the ACM, 28(2) (1981), pp. 411­ 422. [8] K. Lieberherr, E. Specker: Complexity of partial satisfaction II, Technical Report 293, Dept. of EECS, Princeton University, 1982. [9] L. Trevisan: On local versus global satisfiability, to appear in SIAM Journal on Discrete Mathematics. A preliminary version is available as ECCC report TR9712. [10] Z. Usiskin: Max-min probabilities in the voting paradox, Ann. Math. Stat., 35 (1963), pp. 857­862. [11] I. Wegener: The complexity of Boolean functions, John Wiley and Sons, Stuttgart, 1987. [12] M. Yannakakis: On the approximation of maximum satisfiability, Journal of Algorithms, 17 (1994), pp. 475­502. A preliminary version appeared in Proc. 3rd Annual ACM/SIGACT-SIAM Symposium on Discrete Algorithms 1992, pp. 1­9. 339