A certifying algorithm for the consecutive-ones property Ross M. McConnell Abstract We give a forbidden substructure characterization of set families that have the consecutive-ones property, and a linear time algorithm to find the forbidden substructure if a set family does not have the property. The forbidden substructure has size O(n), where n is the size of the domain. The PQ tree is a well-known data structure for representing all consecutive-ones orderings. We show that it is given by a substitution decomposition of arbitrary set families that has not been described previously. This observation gives a generalization of the PQ tree to arbitrary set families, and we give a linear-time algorithm to compute it. a b c d e fg h ijk 11 11 1 1111 111111 1 1111 1111 11 1 Intro duction A 0-1 matrix as the consecutive-ones property if it is possible to order the columns so that, in every row, the 1's form a consecutive interval. Equivalently, a family F of subsets of a domain V has the consecutive-ones property if it is possible to assign a linear order to V where each member of F is an interval. The problem of finding a consecutive-ones ordering, when one exists, is the key step of Booth and Lueker's well-known algorithm for recognizing interval graphs [1]. To do this, Booth and Lueker use the PQ tree, which gives an implicit representation of all the consecutiveones orderings of F . Interval graphs and variations of them have applications in many optimization problems and in molecular biology [7, 2, 13]. PQ trees also have applications to finding planar embeddings of planar graphs [10]. The way the PQ tree represents consecutive-ones orderings is illustrated in Figure 1. The nodes are of two types: P nodes and Q nodes. The children of a P node are unordered, but the children of a Q node are linearly ordered. The consecutive-ones orderings of the matrix are the leaf orders that result from assigning an arbitrary order to children of each P node, and reversing the order of children of some of the Q nodes. Booth and Lueker's algorithm runs in time linear in the size of the input. It either finds a consecutive-ones ordering or determines that the matrix does not have the consecutive-ones property. A practical problem with rmm@cs.colostate.edu, Computer Science Department, Colorado State University, Fort Collins, CO, 80523-1873 U.S.A. f de a b c g h i j k Figure 1: The PQ tree is a way of representing all consecutive-ones orderings of the columns of a matrix. The leaf order of the tree gives a consecutive-ones ordering. Permuting the order of children of a P node (black points) gives a new order. For instance, by permuting the order of the left child of the root, we see that (d, a, b, c, e, f , g , h, i, j, k ) is a consecutiveones ordering. Reversing the order of children of a Q node (horizontal bars) gives a new consecutive-ones order. For instance, by reversing the order of the right child of the root, we see that (a, b, c, d, e, f , k , j, h, i, g ) is a consecutive-ones ordering. The consecutive-ones orderings of the matrix are leaf orders that can be obtained by these operations. Copyright © 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688 768 this is that, although the algorithm has been proven correct, a program that implements it may have bugs. When a program claims that a given matrix does not have the consecutive-ones property, the user cannot tell whether an error has occurred. A certifying algorithm is an algorithm for a decision problem that provides a piece of evidence (a certificate) that allows its answer to be easily checked. More discussion of the practical importance of certifying algorithms can be found in [17, 14, 9]. A variety of algorithms for the consecutive-ones problem have previously appeared in the literature. All of these give a certificate in the form of a consecutiveones ordering when a matrix has the property, but reject it without supporting evidence when it does not. Tucker [16] gives a forbidden substructure characterization of consecutive-ones matrices in terms of asteroidal triples, but does not give an algorithm for finding such a forbidden structure when the family fails to have the consecutive-ones property. We give an alternative characterization, and a linear-time algorithm for finding it. The size of the certificate is O(n). Our certificate is derived as follows. Given a family F of subsets of a domain V , a consecutive-ones ordering (x1 , x2 , ..., xn ) of V defines a consecutive-ones relation R = {(xi , xj )|i < j }. Suppose a consecutive-ones ordering is not known. For a, b V , (a, b) and (b, a) cannot appear in the same consecutive-ones relation ­ they are incompatible. Suppose that a, b, c V and there exists X F such that a, c X and b X . Then (a, b) and (b, c) cannot appear in the same consecutive-ones relation, since this would imply that b goes between a and c in the corresponding consecutive-ones ordering, which would imply that X is not consecutively ordered, a contradiction. Therefore, (a, b) and (b, c) are incompatible in this case. We define the incompatibility graph to be an undirected graph whose vertices are the elements of {(a, b)|a, b V and a = b}, and whose edges are the pairs of vertices that are incompatible by the above observations. (See Figure 2). A consecutive-ones relation must have no incompatible pairs, so it is an independent set in this graph that consists of half of the vertices. The reverse of a consecutive-ones ordering is also a consecutive-ones ordering, so the remaining vertices of the incompatibility graph must also be an independent set. Therefore, the incompatibility graph must be bipartite whenever the matrix has a consecutive-ones ordering. We show that this is also a sufficient condition: the incompatibility graph fails to be bipartite whenever a set family fails to have the consecutive-ones property. It follows that when a set family does not have the (a,b) (a,c) (a,d) (b,c) (b,d) (c,d) (b,a) (c,a) (d,a) (c,b) (d,b) (d,c) (a,b) (a,c) (a,d) (b,c) (b,d) (c,d) (b,a) (c,a) (d,a) (c,b) (d,b) (d,c) Figure 2: The incompatibility graph IC (F ) for F = {{a, b}, {b, c, d}, {a, b, c}}. The upper figure shows the incompatibilities induced by just the first set, {a, b}, as well as the rule that (x, y ) and (y , x) are incompatible. The lower figure shows the whole graph, and is obtained by adding incompatibilities induced by the other sets. F has the consecutive-ones property iff its incompatibility graph is bipartite. consecutive-ones property, the incompatibility graph must have an odd cycle. Our algorithm returns an odd cycle of length O(n). Each edge of the cycle is labeled with a set from the family that documents the incompatibility. The size of the incompatibility graph is not necessarily linear in the sum of cardinalities of the set family, but it is unnecessary to construct it explicitly, either to find the odd cycle or to verify the certificate. We also describe a type of substitution decomposition of arbitrary 0-1 matrices and set families. This gives a generalization of PQ trees to arbitrary 0-1 matrices or set families. We give a linear-time algorithm to compute this generalized PQ tree. The algorithm is based on elementary operations such as radix sorting, a simple partition-refinement operation, off-line least common ancestors, and a graph traversal algorithm such as DFS. 2 Preliminaries The problem of consecutively ordering columns of a matrix reduces to that of ordering the domain V of a set family F so that each member of F is an interval on the ordering. Let n = |V | and let m = S iz e(F ) denote the sum of cardinalities of members of F . An algorithm on F is 769 A strongly partitive family is weakly partitive, so it has a decomposition tree. A weakly partitive family is strongly partitive iff its decomposition tree has no linear nodes. A transitive orientation of an undirected graph is an assignment of orientations to its edges such that whenever (a, b) and (b, c) are directed edges into and out of b, then (a, c) is also a directed edge. Lemma 2.1. The Hasse diagram of the subset relation The presence of in a set family is irrelevant to the on a tree-like family F on domain V is a tree, and if consecutive ones property or the decomposition that we X is a nonempty subset of V and does not overlap any describe. For simplicity, we will assume that F member of F , then X is a union of one or more children except where stated otherwise. of a node in this tree. Let us call the Hasse diagram of such a family the family's inclusion tree. F is a weakly partitive family if: 3 Partitive families and the PQ tree Definition 3.1. If F is a family of subsets of a universe V , then F 's non-overlapping family, denoted · V F , F , and for all v V , {v } F N (F ) is the family of nonempty subsets of V that do · For all X, Y F , if X and Y overlap, then not overlap with any member of F . X Y F , X Y F , X - Y F , and Y - X F . Theorem 3.1. If F is an arbitrary set family, then N (F ) is a strongly partitive set family. Theorem 3.2. [8] If F has the consecutive-ones property, the PQ tree is the decomposition tree of N (F ), where the prime nodes are interpreted as the Q nodes and the degenerate nodes are interpreted as the P nodes. linear if it runs in O(n + m) time. The symmetric difference of two sets is AB = (A B ) - (A B ). Two sets X and Y overlap if they intersect but neither is a subset of the other. That is, they overlap if X - Y , X Y , and Y - X are all nonempty. If X and Y are disjoint sets, a X and b Y , then {a, b} goes between X and Y , and (a, b) goes from X to Y. F is tree-like if F , V F , {x} F for all x V , and for all X, Y F , X and Y do not overlap. · V F , F , and for all v V , {v } F · For all X, Y F , if X and Y overlap, then X Y F , X Y F and X Y F . A member of a weakly partitive family F is strong if it overlaps with no other member of F . Though F is not a tree-like family, its strong members are. Let T (F ) denote their inclusion tree. Theorem 2.1. [3, 15] Let F be a weakly partitive family, let X be an internal node of T (F ), and let In [8], Theorem 3.1 is given for sets that have S1 , S2 , ..., Sk be the children of X . Then X is of one the consecutive-ones property, but the proof is general of the fol lowing three types: to arbitrary set families. This characterization of the PQ tree in Theorem 3.2 is not completely satisfactory · Degenerate: i For every I {1, . . . k } such that because it says nothing about the ordering of children 1 < |I | < k , I Si F of Q nodes. We now give a related characterization that · Prime: For every I {1, . . . k } such that 1 < captures this property also. i |I | < k , I Si F Definition 3.2. If F is an arbitrary set family on · Linear: There exists an ordering of {1, 2, ..., k } domain V such that F , let its weak closure, suich that if I {1, . . . k } and 1 < |I | < k , then W (F ), and strong closure, S (F ), denote the smal lest weakly partitive family and strongly partitive family, I Si F iff the members of I are consecutive respectively, that contain F as a subfamily. in the ordering. By Lemma 2.1 and Theorem 2.1, if F is a weak partitive family, Y V is a member of F iff it is either a node of T (F ), the union of a set of children children of a degenerate node of T (F ), or the union of a consecutive set of children in the ordering of a linear node. By F 's decomposition tree, we denote T (F ), the labeling of its internal nodes as prime, degenerate, or linear, and the linear ordering of children of linear nodes. This gives a representation of F in O(n) space. F is a strongly partitive family if: Lemma 3.1. If F is an arbitrary set family on domain V , then W (F ) and S (F ) are unique. Proof. We prove the lemma for the weak closure; the proof for the strong closure is similar. Let A := F {V } {{x}|x V }. While A is not a partitive family, execute the following closure operation: · Select overlapping members X and Y of A such that {X Y , X Y , X - Y , Y - X } are not all in A. Let A := A {X Y , X Y , X - Y , Y - X }. 770 form a strongly partitive family, and their decomposition tree is the one given by Theorem 2.1. Suppose P is a partition of the vertex set where every partition class is a module, and S1 and S2 are two sets, each consisting of one representative from each Theorem 3.3. If F is an arbitrary set family on do- member of P . It is easily seen that the subgraphs main V , then the decomposition tree of S (F ) is obtained induced by S1 and S2 are isomorphic. Let the modular from the decomposition tree of N (F ) by relabeling each quotient G/P denote this induced subgraph; the choice prime node as degenerate and vice versa. of representatives is irrelevant. The quotient may be interpreted as a graph where each member of P is a An interesting observation is that if F is itself vertex and two members of P are adjacent iff there are a strongly partitive family, then S (F ) = F , hence edges between them in G. A subgraph of G induced by N (N (F )) = F . This associates each strongly partitive a member of P is a factor. G can easily be reconstructed family F with dual N (F ), and this relationship is from the quotient and factors by substituting each factor symmetric. A partitive family is its own dual iff its in the place of the corresponding vertex of the quotient. decomposition tree is binary. This is the composition operation. This decomposition satisfies the following: Theorem 3.4. If F is an arbitrary set family on domain V , then the decomposition tree of S (F ) is obtained The Autonomy Rule: If X V is a module, then the modules of G that are subsets of X are the from the decomposition tree of W (F ) by relabeling al l same as the modules of G[X ]. linear nodes as degenerate. The Quotient Rule: If P is a quotiX t and X P , en Theorem 3.5. If F is a set family on domain V that then X is a module of G/P iff is a module of has the consecutive-ones property, then its PQ tree G. is the decomposition tree of W (F ), where the prime Since the modules are a partitive family, they have nodes are interpreted as P nodes and the linear nodes, together with the linear ordering on their children, are a decomposition tree. If X is an internal node and C are its children, then G[X ]/C is a modular quotient. interpreted as Q nodes. Collectively these quotients give a representation of G, Definition 3.3. The generalized PQ tree of an arbi- which can be obtained by working inductively from the trary set family F is the decomposition tree of W (F ). leaves to the root, performing substitution operations on these quotients. Theorem 3.6. A set family has the consecutive-ones M¨hring [15] surveys substitution decomposition in o property if and only if its generalized PQ tree has no a variety of domains, such as boolean functions, k -ary degenerate nodes. relations, and set families. What these substitution decompositions have in common is a definition of moduleBy Theorem 3.5, the standard PQ tree is just the like structures that form a strongly or weakly partispecial case of a generalized PQ tree in the case that tive family, definitions of the quotients and factors that the set family has the consecutive-ones property. have properties analogous to the autonomy and quotient Theorems 3.3, 3.4, and 3.5 give an algorithm for rules, and a substitution operation for reconstructing finding the generalized PQ tree of an arbitrary set the original structure from quotients and factors. family F : find the decomposition tree of N (F ), swap We now give a new substitution decomposition on the roles of prime and degenerate nodes to obtain the set families whose decomposition tree is the generalized decomposition of S (F ), and then determine which of PQ tree. The module-like structures are just the the degenerate nodes in this tree are linear in the members of N (F ). Note that if P is a partition of V decomposition of W (F ), as well as the linear order where every member of P is a member of N (F ), then on children of these nodes. This is the basis of our every member of F is either a subset of some member algorithm. of P or a union of members of P . Definition 4.1. Let F be an arbitrary set family on 4 PQ trees as a substitution decomp osition domain V , and let P be a partition of V where every The best-known example of a substitution decomposimember of P is a member of N (F ). tion is defined by the modules of a graph. A module of an undirected graph G = (V , E ) is a set X of vertices 1. If X P , the factor induced by X is the family of that all have the same neighbors in V - X . The modules members of F that are subsets of X . When this procedure halts, the family is weakly partitive, and by induction on the number of iterations, it is contained in every other weakly partitive family that contains F 771 2. The quotient F /P is the familyXQ = {X |X is a nonempty subfamily of P and F }. The quotient can be represented by Q = {X S |X F }, where S consists of one representative element from each member of P . The choice of representatives is irrelevant as they all give isomorphic set families. It is easily seen that, given these definitions, the autonomy and quotient rules are valid. Therefore, we may represent an arbitrary set family F with its decomposition tree, together with a labeling of the nodes of the tree with quotients. Figure 3 gives an illustration on a family that has the consecutive-ones property. This substitution decomposition we give above differs from previous ones. For instance, M¨hring defines o one on set families whose composition operation works as follows. Let the quotient C = {{a, b}, {b, c}, {c, d}} and the factors be Ca = {{1, 2}{2, 3}}, Cb = {{4, 5}}, Cc = {{6}}, and Cd = {{7, 8}, {8, 9, 10}}. To perform the composition on {a, b} and its factors, one must generate unions derived from the Cartesian product of the factors Ca and Cb , namely {1, 2, 4, 5} and {2, 3, 4, 5}. Doing this to all members of C yields {{1, 2, 4, 5}, {2, 3, 4, 5}, {4, 5, 6}, {6, 7, 8}, {6, 8, 9, 10}}. The composition operation on most examples of substitution decomposition, including modular decomposition, use a composition operation based on a Cartesian product. By contrast, the one we give above just replaces a set in the quotient with the union of domains of the corresponding factors, as well as including each member of a factor in the composition. 5 Constructing the generalized PQ tree We now describe a linear-time algorithm for finding the generalized PQ tree in linear time. By Theorem 3.6, this gives the PQ tree for a consecutive-ones matrix in linear time, but it also computes the generalized PQ tree when it is not. The algorithm consists of two parts: finding the nodes of the PQ tree and finding the ordering of children at the linear (Q) nodes. 5.1 Finding the no des of the generalized PQ tree If F is a set family, let its overlap graph Go (F ) be the graph that has one vertex for each member of F and an edge between two vertices iff the corresponding members of F overlap. Given a connected componeCt C of Go (F ), let BC n C be an equivalence relation on , where if x, y , then xB y iff the family of members of C that contain x is the same as the family of members of C that contains y . Let C 's blocks be the equivalence classes of BC . Theorem 5.1. If F is a set family on domain V such that F , then X V is a node of the decomposition M: a b c d ef gh ijk 11 1 2 31 1 4 1 5 6 7 8 1111 111111 1 1111 1111 11 11 V W X a b c Y: de f Y gZ h i j k V: Wf Y 11 1 0 20 1 1 W: Xd e Z: X: a bc 31 10 40 11 g Zjk 5 0 1 11 6 1 1 10 7 0 0 11 hi 81 1 Figure 3: The PQ tree gives a substitution decomposition of an arbitrary set family into quotient families. Here, the family has the consecutive-ones property and is displayed in matrix form. 772 tree of N (F ) iff it is one of the fol lowing: 1. V or a one-element subset of V ; C 2. for some connected component C of F 's overlap graph; 3. A block of a connected component of F 's overlap graph. The following theorem, due to Dahlhaus, is critical for our time bound: His algorithm, when run on a set family H, begins creating a list L of the members of H, sorted in ascending order of cardinality. Then, for each member X of H, if X overlaps with some member of H that lies to the right of X in L, he computes the rightmost member Y of L that X overlaps with. Let us denote Y by M ax(X ). M ax(X ) is undefined if Y does not exist. If C H and M ax(C ) is defined, then C and M ax(C ) are adjacent in the overlap graph. A key observation is the following: Lemma 5.1. If C, D H, C D = , and D lies between C and M ax(C ) in L, then D overlaps either Theorem 5.2. [5] It takes O(n + m) time to find F 's C or M ax(C ). overlap components. By this lemma, D can be safely placed in the same Dahlhaus's very clever algorithm uses only radix connected component as C and M ax(C ) without finding sorting, off-line least-common ancestors [4], and a graph which of C and M ax(C ) overlaps D. traversal algorithm such as depth-first search. His algorithm creates a surrogate graph GC (H) We may then find the blocks of the connected that is based on this lemma and whose connected components as follows. Let k be the number of overlap components are the same as the connected components components. Number them from 1 to k . Number the of the overlap graph of H. For x V , let Lx = x x x members of each overlap component C from 1 to |C |. C1 , C2 ...Ck be the subsequence of L defined by those x x Number the elements of V from 1 to |V |. For each sets that contain x. For all x V , Ci and Ci+1 such x x member X of F , and each element y X , create a that there exists j i such that either M ax(Cj ) = Ci+1 x x xx triple (y , i, j ) where i is the component number of X or M ax(Cj ) lies to the right of Ci+1 in L, then Ci Ci+1 x and j is X 's number label. Radix sorting all such triples is an edge of GC (H). In this case, let Cj be a witness with y as the primary key, i as the secondary key, and for the edge. j as the tertiary key gives, for each element y of V and Dahlhaus shows that GC (H) can be constructed in each overlap component, a sorted list (j1 , j2 , ..., jk ) of linear time. Let us modify the algorithm so that it labels members of the component that y belongs to. Making y each edge of GC (H) with a witness. The algorithm the "owner" of this list and prepending the component knows the witness when the edge is installed, so this number to each of these lists and then putting them does not affect the time bound. in lexical order makes the blocks consecutive, and the To find a spanning tree of a component C , we trablocks are the owners of each group of identical lists in verse the component of GC (F ) using a graph traversal the ordering. algorithm such as depth-first search. While doing this, By Theorem 5.1, this gives the set N of nodes of we grow a spanning tree of the overlap component. We the PQ tree. To construct the inclusion tree, radix sort add some nodes before we actually visit them in GC (F ), the members of N by size. Then, create a list, for each and whenever we visit a node, we add it to the spanning x V , of the members of N that contain x, in ascending tree if it has not already been added. order of size. This can be accomplished by visiting each An observation that we will use is that if U is Y N in descending order of size, and for each x Y , a set and W is a set such that |W | |U |, it takes inserting a pointer to Y to the front of x's list. Then, O(|U |) time to find whether W and U overlap using an visit each member x of V , putting a parent pointer from initialized array of size |V | that is left initialized after each member of x's list to its successor in x's list if there the operation. isn't already one, as these are the chain of ancestors of Suppose by induction that we have visited k nodes {x}. and successfully added them to a partial spanning tree of the overlap component. It suffices to describe how 5.2 Finding a spanning tree of each overlap to add a node D to the partial spanning tree when comp onent Dahlhaus's algorithm [5] finds the over- we travel to it along an edge C D from a node C that lap components without actually computing the overlap is already part of the spanning tree. Let A be the graph or even spanning trees of its components. How- witness on C D and let B = M ax(A). Note that ever, it is not hard to modify the algorithm to get a |A| C, D B = |M ax(A)|. spanning tree of each component, as we show next. If neither A nor B is in the spanning tree yet, find 773 let Q be the result of removing V and its one-element subsets. Clearly, Q has a single overlap component. The algorithm performs one pivot on each X Q. Once a pivot has been performed on X , it has been processed and is not used again. The algorithm keeps an linearly ordered family P of disjoint subsets of V . That is, the members of P are kept in a linearly ordered list. Let P S=V- be those members of V that are not in any processed set. By Theorem 5.1, Q has a single overlap component. By Theorem 5.3, we may compute a spanning tree T of this overlap component in linear time. The purpose of each pivot is to refine the partition P and to remove members of S to create a new class of P . It does this while maintaining the invariant that there exists a consecutive-ones ordering such that whenever a is a member of an earlier partition class of P than b, a is earlier than b in the consecutive-ones ordering. That is, Theorem 5.3. It takes O(n + m) time to find a spanthe ordering on P is consistent with a consecutive-ones ning tree of each overlap component of an arbitrary set ordering. family F . The following are the invariants that the algorithm maintains: 5.3 Finding the orderings of children of Q no des We now have the decomposition tree of N (F ). 1. The processed members of Q induce a subtree of By Theorem 3.3, we get the decomposition tree of the the spanning tree T of the overlap component. strong closure S (F ) of F by relabeling every prime 2. The members of P are the blocks of the processed node degenerate and every degenerate node prime. By members of F Theorem 3.4, we may obtain the generalized PQ tree by finding which degenerate nodes should be relabeled 3. The ordering of members of P is consistent with linear in the decomposition tree of W (F ), and the a consecutive-ones ordering on the processed memlinear order on children of these nodes. This gives the bers of Q. generalized PQ tree. Let a set family be simple if its generalized PQ tree 4. The ordering of members of P and its reverse are has only one internal node, the root. A simple family the only possible orderings of P that are consistent is degenerate, linear, or prime, depending on whether with a consecutive-ones ordering on the processed this root is degenerate, linear, or prime, respectively. members of Q. A simple family is prime iff very member is either its The initial pivot occurs on arbitrary X Q, and domain or a singleton subset, so these families can be creates an initial partition P = (X ), with S = V - X . recognized in linear time. Each degenerate node X of the decomposition tree This establishes the invariants initially. Suppose that after i - 1 pivots, the invariants of S (F ) is either a degenerate or a linear node in the decomposition tree of W (F ). We must distinguish these hold. Let P = (X1 , X2 , ..., Xk ). We select Z to be two cases. By the autonomy and quotient rules, X 's an unprocessed neighbor of a processed node in the quotient Q in the tree is simple and either degenerate or spanning tree T , and let H be the already processed linear. The various quotients are restrictions of disjoint members of Q. Then, by invariant 4, H {Z } can subfamilies of F , so it suffices to give a linear-time only have the consecutive-ones property if one of the algorithm that, given a simple family, determines its following cases occurs: type. We solve the problem by giving an algorithm 1. SZ is empty and Xi+1 ...Xj -1 are contained in Z , to distinguish a simple degenerate from a simple linear family. 2. SZ is nonempty, i = 1, and X1 , ..., Xj -1 are We use a partitioning procedure that is related to contained in Z , one given in [11, 12] for finding a transitive orientation of a graph. 3. SZ is nonempty, j = k , and Xi+1 , ..., Xj are Let Q be a simple degenerate or linear family, and contained in Z . whether C overlaps B ; if it does, establish C B as a spanning-tree edge, and if it does not, establish C A as a spanning-tree edge. This takes O(|B |) time. Then establish AB as a spanning-tree edge in O(1) time. This brings A and B into the spanning tree. If one of A and B is already in the spanning tree, just establish AB as a spanning-tree edge in O(1) time. Now find whether D overlaps A in O(|D|) time. If it does, establish DA as a spanning-tree edge, and if it does not, establish DB as a spanning-tree edge. The total time is O(1) plus time proportional to the sum of cardinalities of the sets that it brings into the spanning tree. Therefore, the total time to find the spanning tree of the component is O(k ), where k is the sum of cardinalities of the members of the component, in addition to a one-time O(n) initialization that does not need to be repeated for each component. This gives the following: 774 If these conditions are not met, we declare Q degenerate. Otherwise, the pivot on Z in case 2 consists of prepending SZ as a new partition class to the beginning of (X1 , X2 , ..., Xk ) to obtain (SZ , X1 , X2 , ..., Xk ), and then replacing Xj in the ordering with Xj Z and Xj - Z , in that order, to obtain (SZ , X1 , X2 , ..., Xj -1 , Xj Z, Xj - Z, ..., Xk ). Case 3 is handled in a left-right symmetric way. Case 1 is handled by splitting Xi as in Case 3 and Xj as in Case 2. Any empty members of P are then discarded. If Q is a simple linear family, then after all members have been processed, S = and every member of P contains a single element of V . The linear ordering on members of P gives the linear ordering of the domain of Q by Invariant Q This gives the linear ordering of 3. children of X = in the generalized PQ tree. For finding the generalized PQ tree of a set family F , we spend linear time on each quotient, and since the quotients are restrictions of disjoint sets of members of F , it takes linear time to find the whole tree. 6 A forbidden substructure characterization Gallai [6] describes a relation on the edges of an undirected graph that models the constraints that determine whether the graph has a transitive orientation. He described a close relationship that it has to the modular decomposition of the graph. We now describe an analogous relation for the consecutive-ones problem on pairs of elements of the domain of F , and show that it has a similar relationship to the generalized PQ tree. Definition 6.1. Let F be an arbitrary set family on domain V . Let AF = {(a, b)|a, b V and a = b}. The incompatibility graph IC (F ) of F is the undirected graph that is defined as fol lows: (See Figure 2.) · AF is the vertex set of IC (F ). · The edge set of IC (G) are pairs of the fol lowing forms: The introduction explains why the incompatibility graph must be bipartite when F has the consecutiveones property. In the rest of this section, we show the converse of this, which is that it fails to be bipartite when F does not have the consecutive-ones property. Lemma 6.1. If Y N (F ), (a, b) Y and c Y , then (a, b) is compatible with (b, c). Proof. Suppose (a, b) and (b, c) are incompatible. Then there exists X F such that a, c X and b X . But then X overlaps Y , contradicting Y 's membership in N (F ). This means that any connected component of the incompatibility graph consists of pairs that all have the same least-common ancestor in the decomposition tree of N (F ). Moreover, if this ancestor is degenerate, then they all go between some pair of children of this ancestor. To prove Theorem 6.1, we modify our algorithm for producing the generalized PQ tree so that it returns an odd cycle of length at most n + 2 when it discovers that F does not have the consecutive-ones property. We represent the odd cycle in the form of a cyclic sequence of elements of the form ((a, b), (b, c), X ), where X is a member of F that contains a and c but does not contain b. To check the answer, the user may verify that the cycle has odd length and that the witnesses are valid, that is, that a witness X contains a and c but does not contain b. This takes O(n) tests of membership of an element of V in a member of F , which can be carried out in O(n log n) time if each set is implemented with a sorted array, in O(n) time if the sets are represented with a 0-1 matrix, or in O(n) expected time if they are implemented with a hash table. This is sublinear, so it constitutes what is called a strong certificate [9]. Let us make use of a concept that is dual to the incompatibility graph, namely, a forcing relation. Definition 6.2. If ((a, b), (b, c)) is an edge of the incompatibility graph, then (a, b) forces (c, b) and (b, a) ­ {(a, b), (b, a)} forces (b, c). Let F 's forcing graph be the undirected ­ {(a, b), (b, c)}| a, b, c V and there exists graph that has AF as its vertex set and the forcing reX F such that a, c X and b X . lation as its edge set. The following is the basis for our certificate when Conceptually, if one ordered pair forces another, F does not have the consecutive-ones property. then either they both appear in the tournament of a consecutive-ones ordering or neither does. Theorem 6.1. Let F be an arbitrary set family on domain V . Then F has the consecutive-ones property Lemma 6.2. Let X and Y be overlapping members of if and only if its incompatibility graph is bipartite, and F . If (a, b) and (c, d) are two pairs that each go if it does not have the consecutive-ones property, the from from an earlier to a later member of the sequence incompatibility graph has an odd cycle of length at most (X - Y , X Y , Y - X ), then there is a path of length at n + 2. most three from (a, b) to (c, d) in F 's forcing graph. 775 Proof. Suppose a X - Y , b X Y . If d Y - X , then, because of Y , (a, b) forces (a, d) and because of X , (a, d) forces (c, d). If d X Y , then c X - Y . Let e Y - X . Then (a, b) forces (a, e) because of Y , (a, e) forces (c, e) because of X , and (c, e) forces (c, d) because of Y . The only case that doesn't map to the foregoing by renaming of elements is a, c X - Y and b, d Y - X . Because of X , (a, b) forces (c, b) and (c, b) forces (c, d). Corollary 6.1. During execution of the partitioning algorithm of Section 5.3, let F be the set of processed sets, and let X1 , X2 , ...FXk be the sequence of blocks , . that are contained in Let (a, b), (c, d) be two pairs that each go from an earlier to a later member of this sequence. Then there is a path of length at most k - 1 from (a, b) to (c, d) in the forcing graph. Proof. By induction on the number of pivots. The base case is given by Lemma 6.2. The new ordered pairs that must be considered are those that go between two halves of a set that is split by a pivot on Z . Let Xi , Xj , and P be as in the description of the algorithm. If Z splits set Xi into Xi - Z and Xi Z , then there is a path of length 2 in the forcing graph connecting any pair of members of (Xi - Z ) (Xi Z ) by Lemma 6.2. For u Xi - Z and w Xi Z , y Z Xi+1 , (u, w) is forced by (u, y ), which has paths of of length k to the other pairs that go from earlier to later classes. Splitting Xi yields k + 1 classes and (u, w) has a path of length k + 1 to all other pairs that go from earlier to later classes. The same argument applies if Z splits S and S Z is placed at the beginning of the ordering of P . A symmetric argument applies if Xj is split or if S is split and S Z is placed at the end of the ordering. If two sets are split by the pivot, applying this argument in sequence to each of the splits yields k + 2 sets and a path of length k + 2. meet the required conditions, which means that there exist Xp , Xq , Xr with p < q < r such that Xp contains a Z , Xq contains b Z and Xr contains c Z . Therefore, ((a, b), (b, c)) is an edge of the incompatibility graph. There is a path ((a, b), (x2 , y2 ), (x3 , y3 ), (xk-1 , yk-1 ), (b, c)) of length at most k in the forcing graph from (a, b) to (b, c). This corresponds to a path ((a, b), (y2 , x2 ), (x3 , y3 ), (y4 , x4 ), ...) in the incompatibility graph, where every other element of the path is reversed. If k is even, the last element of the path is (b, c). If k is odd, the last element is (c, b) and it can be turned into an even-length path of length k + 1 by appending (b, c), since (c, b) and (b, c) are adjacent in the incompatibility graph. In either case, we get an even-length path of length at most k + 1 in the incompatibility graph between (a, b) and (b, c). Adding the incompatibility ((a, b)(b, c)) to this gives an odd cycle of length at most k + 2. Since Q is a quotient on F , which has domain of size n, this corresponds to a path of length at most k + 2 in F 's forcing relation. Since k n, Theorem 6.1 follows. The constructive proofs of Corollary 6.1 and Theorem 6.1 are easily turned into linear-time algorithms. As in the proof of the theorem, an incompatible pair (a, b), (b, c) is found during partitioning. The reader may easily verify that the partitioning algorithm can be modified so that, given two pairs (a, b) and (b, c) that each go from earlier to later partition classes, it finds a path in the forcing graph between them of the claimed length, using the construction of the proof of Corollary 6.1. Running the partitioning algorithm a second time to find a forcing path from (a, b) to (b, c) gives a forcing path, from which an odd cycle in the incompatibility graph is obtained as in the proof of the theorem. Using the foregoing, it is not hard to show that References the connected component of the incompatibility graph is either the pairs of elements of V that go between two [1] S. Booth and G. S. Lueker, Testing for the consecutive children of a prime node of the generalized PQ tree, or ones property, interval graphs, and graph planarity the set of pairs that have a single linear or degenerate using PQ-tree algorithms, J. Comput. Syst. Sci., 13 node as their least-common ancestor. This is analogous (1976), pp. 335­379. to the relationship of and the modular decomposition [2] A. Brandstaedt and V. B. Le and J. P. Spinrad, Graph described by Gallai in [6]. Classes: A Survey, SIAM Monographs on Discrete We now give the proof of Theorem 6.1. Mathematics, Philadelphia, 1999. Proof. By Lemma 6.1, we may focus on elements of AF that have the same degenerate least-common ancestor in the decomposition tree of W (F ). Suppose that the algorithm declares that Q does not have the consecutive-ones property. Then Z fails to [3] M. Chein and M. Habib and M. C. Maurer, Partitive hypergraphs, Discrete Mathematics, 37 (1981) pp. 35­ 50. [4] T. H. Cormen and C. E. Leiserson and R. L. Rivest and C. Stein, Introduction to Algorithms, McGraw Hill, Boston, 2001. 776 [5] E. Dahlhaus, Paral lel algorithms for hierarchical clustering, and applications to split decomposition and parity graph recognition, Journal of Algorithms, 36 (2000) pp. 205­240. [6] T. Gallai, Transitiv orientierbare Graphen, Acta Math. Acad. Sci. Hungar., 18, (1967), pp. 25­66. [7] M. C. Golumbic, Algorithmic Graph Theory and Perfect Graphs, Academic Press, New York, 1980. [8] W. L. Hsu and R. M. McConnell, PC trees and circularones arrangements, Theoretical Computer Science, 296 (2003), pp. 59­74. [9] D. Kratsch and R. M. McConnell and K. Mehlhorn and J. P. Spinrad, Certifying algorithms for recognizing interval graphs and permutation graphs, ACM-SIAM SODA, 14 (2003), pp. 866­875. [10] A. Lempel and S. Even and I. Cederbaum, An algorithm for planarity testing of graphs, in Theory of Graphs, P. Rosenstiehl (ed.), Gordon and Breach, New York, 1967. [11] R. M. McConnell and J. P. Spinrad, Linear-time modular decomposition and efficient transitive orientation of comparability graphs, ACM-SIAM SODA, 5 (1994), pp. 536­545. [12] R. M. McConnell and J. P. Spinrad, Modular decomposition and transitive orientation, Discrete Mathematics, 201 (1999), pp. 189­241. [13] T. A. McKee and F. R. McMorris, Topics in Intersection Graph Theory, SIAM, Philadelphia, 1999. [14] K. Mehlhorn and S. N¨eher, The LEDA Platform for a Combinatorial and Geometric Computing, Cambridge University Press, Cambridge", 1999. [15] R. H. M¨hring, R. H., Algorithmic aspects of the substio tution decomposition in optimization over relations, set systems and Boolean functions, Annals of Operations Research, 4 (1985), pp. 195­225. [16] A. Tucker, A structure theorem for the consecutive 1's property, J. Comb. Th. (B), 12 (1972), pp. 153­162. [17] H. Wasserman and M. Blum, Software reliability via run-time result-checking, J. ACM, 44 (1997), pp. 826­ 849. 777