Fast Estimation of First-Order Clause Coverage through Randomization and Maximum Likeliho o d
Ondej Kuelka r z kuzelo1@fel.cvut.cz  Filip Zelezny ´ zelezny@fel.cvut.cz Czech Technical University in Prague, Technick´ 2, 166 27 Prague 6, Czech Republic a

Abstract
In inductive logic programming, subsumption is a widely used coverage test. Unfortunately, testing -subsumption is NP-complete, which represents a crucial efficiency bottleneck for many relational learners. In this paper, we present a probabilistic estimator of clause coverage, based on a randomized restarted search strategy. Under a distribution assumption, our algorithm can estimate clause coverage without having to decide subsumption for all examples. We implement this algorithm in program ReCovEr. On generated graph data and real-world datasets, we show that ReCovEr provides reasonably accurate estimates while achieving dramatic runtimes improvements compared to a state-of-the-art algorithm.

the algorithm Django. Django is currently considered the fastest subsumption checker, outperforming traditional techniques (based on the Prolog unification mechanism) by orders of magnitude. Therefore we employ Django in comparative experiments later in this paper. Another stream of research dealt with incomplete heuristic algorithms for -subsumption. Sebag et al. (1997) presented a tractable approximation of -subsumption called stochastic matching. Arias et al. (2007) implemented a randomized table-based method. Unlike the mentioned incomplete heuristic algorithms, our approach uses a complete, albeit randomized, subsumption procedure that correctly decides both subsumption and non-subsumption if given sufficient finite time. Our ultimate estimation of the clause coverage (i.e. the number of subsumed examples) is however an approximation, rapidly achieved by restarting the subsumption procedure each time with a bounded runtime. Subsequent restarts generate an integer sequence, from which the coverage is estimated by maximum likelihood. Randomized restarted strategies, exploited in our work, have been extensively studied in the past decade (Gomes et al., 2000). They have been demonstrated to be extremely useful for solving many hard combinatorial problems such as satisfiability of boolean formulas or for solving constraint satisfaction problems. Reported reduction in runtimes are often in orders of magnitude. Randomized restarted strategies have been also used in inductive logic programming  (Zelezny et al., 2006), however, not for subsumption ´ checking. Rather, restarts were applied on the clausesearch procedure. This paper is organized as follows. In Section 2 we formalize subsumption and expose the basic algorithms employed as building blocks in our estimation approach. In Section 3 we conduct a preliminary motivating study of runtime distribution. The estimation

1. Introduction
In most inductive logic programming (ILP) algorithms, learned hypothesis are (sets of ) first-order clauses. Usually, -subsumption is used to test whether a clause entails an example. Since ILP systems need to evaluate large numbers of clauses during hypothesis search, efficiency of the subsumption procedure is one of the crucial factors for performance of learning. Unfortunately, deciding -subsumption is an NP-complete problem. One line of research has focused on developing algorithms for this problem using sophisticated heuristics from the field of constraint satisfaction problems (CSP). Maloberti et Sebag (2004) exploited the correspondence of -subsumption with CSP to develop
Appearing in Proceedings of the 25 th International Conference on Machine Learning, Helsinki, Finland, 2008. Copyright 2008 by the author(s)/owner(s).


Fast Estimation of Clause Coverage

Algorithm 1 S ubsumptionC heck (C, e): A simple subsumption check algorithm
Input: Clause C , example e; if C  e then return YES else Cho ose variable V from C using a heuristic function for S  P ossibleS ubs(V , C, e) do C  Substitute V with S if W  Adj acency (V ) : P ossibleS ubs(W, C , e) =  then S earchedN odes  S ear chedN odes + 1 if S ubsumptionC heck(C , e) = YES then return YES end if end if end for return NO end if

tion P ossibleS ubs(V , C, e) returns all terms S (in a random order), which can be substituted for V satisfying that all literals l  C remain consistent with e. The function prunes away a subset of possible groundings for V whose inclusion in  would imply C  e. In general though, not all such groundings are detected by the function.

3. Subsumption Test Runtimes
We first aimed at obtaining a domain-independent runtime distribution of the subsumption algorithm and thus conducted preliminary experiments with randomly generated hypotheses and examples from the domain of oriented colored graphs. In the clausal representation, each graph acquires the form of a definite clause h  l1  l2  . . . where h is a fixed head and li are first-order atoms, each being one of edg e(t1 , t2 ), black (t3 ), red(t4 ). In hypotheses, ti are variables, in examples these are constants. For generality, we devised two different graph generators. The first generator generates Erdos-R´nyi rane dom graphs where any two vertices are connected with a pre-set probability c (by an edge of a random orientation). The second produces scale-free ("small world") graphs. Here, the graph grows until some desired size is reached; at any step a vertex is added and connected to k vertices already present in the graph. An edge is attached to a vertex with probability increasing with the number of edges already connected to the vertex. In both algorithms, all vertices are colored as black with probability 0.5 and red otherwise. We will refer to the parameter c (k , respectively) of a random uniform (scale-free, respectively) graph as the connectiv ity of the graph. We sub jected Algorithm 1 to experiments with random sets of hypotheses and examples, under various settings of n and c (n and k , respectively), where n denotes number of vertices in the underlying graph and c and k are parameters of the random graphs explained above. Here, we review our principal findings about the respective runtime distributions, since they motivate the design of the estimation algorithm in the next section. Our first ob jective was to verify the presence of heavy tails in the runtime distributions F (t). For t > 0, the number F (t) is the probability that the tested algorithm resolves a random subsumption instance in no more than t units of time, corresponding to the number of explored search nodes. Informally, a heavy-

algorithm is then developed in Section 4. In Section 5, we compare our algorithm with Django on synthetic and on real-life data. Section 6 concludes the paper.

2. Preliminaries
2.1. Language In the rest of the paper we assume for simplicity that hypotheses C are clauses without function and constant symbols and examples e are ground clauses. When needed, clauses will be treated as atom sets, e.g. for two clauses C and D, C  D will denote that C contains all literals contained by D. -subsumption is defined as follows Definition We say that clause C -subsumes clause D (denote C  D) iff there exists a substitution  such that C   D. 2.2. Subsumption Algorithm We consider a simple heuristic algorithm (Algorithm 1) for verifying whether a clause C subsumes an example e. Similarly to Django (Maloberti & Sebag, 2004) this algorithm is inspired by the CSP framework. It is a backtracking search algorithm with forward checking, a variable selection heuristic and randomization. The heuristic function aims at choosing variables whose substitution makes it likely that an inconsistency, if one exists, is detected soon. For a variable V , the function computes the sum of occurrences of variables in clause C that have already been grounded and that share at least one literal with V . 1 This sum is then multiplied by 1 + D , where D is an upper bound on the size of the domain of V computed in the initialisation phase of the algorithm's run. The variable which maximizes this function is selected; in case of a tie, a random choice is made with uniform probability among the highest scoring variables. Func-


Fast Estimation of Clause Coverage
10
0

10

0

NO region (c  0.25) 10
-1

10

-1

1-F(x)

1-F(x)
2 4

YES region (c  0.1)

10

-2

10

-2

10

-3

10

-3

10

0

10

-4

10

10

10

0

10

1

10

2

nodes searched

10 nodes searched

3

10

4

10

5

10

6

10

0

10
YES region (c  0.1)

0

10

-1

10

-1

NO region (c  0.35)
-2

1-F(x)
10
-2

1-F(x)
10
-3

10

10
10
0

-3

10

2

10 nodes searched

4

10

0

10

2

10

4

nodes searched

Figure 1. Top: The runtime distributions for satisfiable instances with hypotheses built using the Erdos-R´nyi rane dom graph generator with n = 15 vertices and connectivity consecutively c  {0.1, 0.15, 0.2, 0.25}. The graphs corresponding to examples had n = 50 vertices and connectivity p = 0.3. Bottom: The subsumption test runtime distributions for unsatisfiable instances with hypotheses with connectivity consecutively c  {0.15, 0.2, 0.25, 0.3, 0.35, 0.4}.

Figure 2. Effect of the restarted strategy for satisfiable (top) and unsatisfiable instances (bottom). The random graphs corresponding to hypotheses had n = 15 vertices and connectivity c = 0.15. In both cases, the random graphs corresponding to examples had n = 50 vertices and connectivity c = 0.3. Both hypotheses and examples were randomly generated by the Erdos-R´nyi generator. e

tailed distribution indicates the non-negligible probability of subsumption instances on which the checking algorithm gets stuck for an extremely long runtime. For example, a heavy tail is exhibited if 1 - F (t) decays at a power-law rate, i.e. slower than standard distributions which decay exponentially. The presence of a heavy tail in an empirically obtained runtime distribution F (t) is usually checked graphically, by plotting 1 - F (t) against t on a log-log scale. In the case of a power-law distribution, this plot then acquires a linear shape (Gomes et al., 2000). A series of experiments in the phase transition framework (Giordana & Saitta, 2000), which we have performed, revealed a systematic progression from heavytailed regimes corresponding to configurations located in the YES region of the phase transition spectrum to non-heavy-tailed regimes corresponding to configurations located in the NO region. This observation agrees with the previous study (Gomes et al., 2005).

This progression is shown in Fig. 1 for the Erdos-R´nyi e graph data. The same trends were observed for the small-world graph data. The plotted runtime distributions refer to subsumption checks between hypotheses with fixed numbers of vertices and connectivity changing among particular distributions, and examples with fixed numbers of vertices and with fixed connectivity. The runtime distributions plotted in the top panel of Fig. 1 refer to satisfiable problem instances, i.e. those where the hypotheses -subsume the examples. The distributions in the bottom panel of Fig. 1 refer to unsatisfiable problem instances. Due to the observed presence of heavy tails in a range of parameters, we next assessed the impact of restarts. For this sake we designed a complete restarted randomized subsumption algorithm, which repeatedly executes Algorithm 1. At each execution n = 1, 2, . . . , the number of search nodes in the Algorithm 1 is bounded by some pre-defined number R(n). This loop is terminated once answer YES or NO is obtained from


Fast Estimation of Clause Coverage

Algorithm 1. Completeness of this restarted approach is guaranteed by the assumption that R(n)   as n  . Recall that randomization is facilitated by tie-breaking in the heuristic function used in Algorithm 1 and by randomization of the value ordering. The basic trends we observed for all tested parameter values are represented by Fig. 2: (i) restarts significantly reduce runtime expectation for both satisfiable and unsatisfiable instances, (ii) unsatisfiable instances take much longer to prove in the restarted approach. Observation (i) alone motivates to use the restarted variant of Algorithm 1 as a fast complete method for subsumption testing. We explore this idea  elsewhere (Kuelka & Zelezny, 2009), whereas this paz ´ per addresses observation (ii). This observation is easily explained: while satisfiability can in principle be shown in any single restart, unsatisfiability can only be shown after n restarts making R(n) sufficiently high. We would like to avoid the runtime components corresponding to R(n) series growing to excessive values.

Algorithm 2 ReC ov E r(C, E , R, M , ): Algorithm for coverage estimation
Input: Clause C and set of examples E , Integers R (`cutoff '), M , ; tr ies  0 U nknown  E xamples C ov er edI nI thT r y  [] rep eat tr ies  tr ies + 1 C ov er edI nT hisT r y  0 for E  U nknown do Answer  Run S ubsumptionC heck(C, E ) with numb er of searched no des limited to R if Answer = P ositiv eM atching then C ov er edI nT hisT r y  C ov er edI nT hisT r y + 1 U nknown  U nknown\E end if end for C ov er edI nI thT ry [tries]  C ov er edI nT hisT r y until TerminationCondition return LikelihoodE stimate(tr ies)

4. ReCovEr: A Restarted Coverage Estimator
We first explain the intuition underlying ReCovEr. We are given a clause C , and example set E and we would like to estimate the coverage cov (C, E ) = |{e  E |C  e}|. Let us run Algorithm 1 on C and e, successively for all e  E . For each e, we however stop the algorithm if no decision has been made in R steps. Let E  E be the subset of examples proven to be subsumed by C in this experiment. Denote s1 = |E |. We now remove all examples in E from E and repeat this experiment, obtaining analogical number s2 . Further such iterations generate numbers s3 , s4 , etc. Clearly, for the desired value cov (C, E ), we have that j cov (C, E ) = limj  Sj where Sj = i=1 si . Under a certain assumption, the series Sj is geometrical rather than arbitrary. The main idea of ReCovEr is that the limit of Sj for j   can thus be estimated by extrapolating the series from its first few elements S1 , S2 , . . . . Thus we achieve a coverage estimate without excessive effort to refute subsumption for the examples not subsumed by C . In order to precisely derive an estimation algorithm following the above idea, we first need to make the following assumption. Assumption 4.1 Given a clause C and a set of examples E , the probability p that Algorithm 1 finds a solution (i.e. returns YES as its answer) before it explores more than R nodes of the search tree, is the same for al l e  E such that C subsumes e.

In other words, we assume that properties of particular examples such as their size are not dramatically different. The assumption will be empirically validated in the next section. We assume a given clause C and we fix a constant cutoff value R. In the first step, for each e  E we run S ubsumptionC heck (P, e) (Algorithm 1), stopping it as soon as the number of searched nodes has reached R. Then, after |E | restarts (each time with a different e  E ), we can derive the probability that the algorithm has produced exactly m1 `YES' responses in this first step. In particular, this probability P (m1 ) is A P (m1 ) =
m1

p

m1

(1 - p)A-m1

(1)

where A = |{e  E |C  e}|. In the next step, all m1 examples shown to be subsumed in the first step are removed from E and the procedure is repeated with the remaining examples. In general, we can derive the probability that exactly mi YES answers are generated in the i-th step. Thus for i = 2, we obtain A P (m2 |m1 ) = p

- m1 m2

m2

(1 - p)A-m1 -m2

(2)

and similarly for an arbitrary i  1, we have
,, P (mi |mi-1 , . . . , m1 ) = Pi-1 mi
j =1

A-

mj

« p
mi

(1 - p)

A-

Pi j =1 mj

(3)

The probability of a sequence (m1 , . . . , mk ), where mi is the number of examples for which YES was produced in the i-th step, is given by


Fast Estimation of Clause Coverage

P (m1 , . . . , mk ) =

ik
=1

P (mi |mi-1 , . . . , m1 )

(4)

Substituting for P (mi |mi-1 , . . . , m1 ) from Eq. 3 and taking the logarithm Eq. 4 results in

ln (P (m1 , . . . , mk )) = where a  = ln nd   = A - A

ik
=1

( + mi ln p +  ) i- 1

(5)

The first termination condition stops generating the sequence when two subsequent estimates differ by less than some e , specified as a parameter. The second termination condition stops generating the sequence when estimate and number of examples already shown to be covered by the clause differ again by less than some c , which ensures that the estimator will never overestimate the actual coverage by more than c . A minimum length M of the sequence is however imposed in both previous cases, to avoid premature estimates coinciding by chance. Another degree of freedom in Algorithm 2 is the cutoff R, which may significantly affect the performance of the restarted algorithm. A heuristic method suggests itself that first tries to find a suitable cutoff. Unlike Algorithm 2 it starts with a base cutoff value, and then doubles it after every single restart. If at any restart Algorithm 1 with cutoff set to R covers fewer examples than the same algorithm at previous restart with cutoff set to R , then we can accept cutoff R . 2 2

-

j =1 mj mi

ji
=1

 mj  ln(1 - p)

To find the parameters A and p for which P (m1 , . . . , mk ) is maximized, we take the partial derivative of Eq. 5 with respect to p and then find its roots, yielding k p= k
i=1 i=1

5. Exp eriments
In this section, we first investigate the sensitivity of ReCovEr to a violation of Assumption 4.1. Then we evaluate its performance and precision on graph data generated by the two random graph generators described in Section 2 and on real-world data from organic chemistry and from engineering. We compare performance of ReCovEr with that of the state-ofthe-art -subsumption algorithm Django. 5.1. Sensitivity Analysis Here we address Assumption 4.1. Informally, we first want to verify (i) how the assumption deviates from the empirical `truth', and subsequently, (ii) how much these deviations influence ReCovEr's precision. (i) According to Assumption 4.1, probability p  [0; 1] would be a constant. Dismissing this assumption, we treat p as a random variable with some distribution on [0; 1], which we would like to estimate. A standard approach to this task is to parameterize a Beta distribution on [0; 1] from empirical data. To obtain the data, we experimented with the settings from Section 3 with parameters of generated hypotheses c = 0.35, n = 10, parameters of generated examples c = 0.3, n = 50 and ReCovEr's cutoff R = 75. This resulted in Beta distributions with standard deviation extending up to about 0.25 (i.e. 25% of the p's range). These distributions are plotted in dashed lines in Fig. 3. (ii) Now we investigate ReCovEr's sensitivity to the modeled deviations. We assume to have n = 100 ex-

mi +

k

i=1

mi A -

i
j =1

mj

(6)

Finding the global maximum of P (m1 , . . . , mk ) from Eq. 4 on the set

D = {(A, p)|A  {1, 2, . . . , |E |}  p  [0; 1]}

(7)

is now straightforward, since using (6) we can find the maximum on every line Li = {(i, p)|p  [0; 1]} (8)

The maximum on line Li is located either at the value of p given by (6) or at one of the borders of Li . It then suffices to evaluate (4) at these three points of Li for every i (1  i  |E |). The estimate of A then equals the index i of the Li on which the maximum is located. The described estimator is used in ReCovEr (Algorithm 2). The question how to choose k , i.e. how long a sequence (m1 , . . . , mk ) should be generated as the input to the estimator, is tackled iteratively: the sequence is being extended until a termination condition is met. We have considered several termination conditions, of which two turned out to be quite useful.


Fast Estimation of Clause Coverage
100 9 90 8 7 80 70

Probability density of p

estimated count
0 0.1 0.2 0.3 0.4 0.5 p 0.6 0.7 0.8 0.9

6 5 4 3 2 1

60 50 40 30 20 10 0 0 20 40 real count 60 80 100

100 7 90 80 70 5

6

Root mean square error

estimated count
0 0.01 0.02 0.03 var(p) 0.04 0.05 0.06

60 50 40 30 20 10

4

3

2

1

0

0

20

40 real count

60

80

100

Figure 3. Top: Beta distributions with mean µ = 0.5 and variance consecutively 0, 0.005, . . . , 0.06, which model the distribution of p (solid lines). Beta distributions fitted to actual probabilities are shown in dashed lines. Bottom: Dependence of root mean square error of ReCovEr's estimates on the variance of p.

Figure 4. Precision of ReCovEr (Algorithm 2) presented as 1000 points with coordinates (estimated coverage, actual coverage). Hypotheses and examples were generated by the Erdos-R´nyi random graph generator with c = 0.3, e n = 15 for hypotheses and c = 0.3, n = 100 for examples. The 1000 estimates correspond to 1000 different hypotheses tested on a pre-fixed set of 100 examples. Top: Base value for cutoff is R = 100. Bottom: Base value for cutoff is R = 200.

amples, of which 50 were covered by a clause C . Further, probabilities pi that Algorithm 1 finds a solution for a covered example ei in time less than R were sampled from the Beta distribution with given mean µ = 0.5 and variance consecutively 0, 0.005, . . . , 0.06 (i.e. growing up to the 25% standard deviation). Then, we simulated ReCovEr's estimation procedure on these data. The top panel of Fig. 3 displays the beta distributions (solid lines) from which probabilities pi were sampled. We used the stopping condition based on difference of estimate and lower bound, the parameters were M = 3,  = 1. The bottom panel of Fig. 3 displays the dependence of root mean square error on the variance of the beta distributions in the top panel. It is encouraging to see that the root mean square error grows roughly linearly with growing variance in p's distribution, indicating ReCovEr's robustness towards this variance. Of course, the ultimate judge of whether this dependence is acceptable is the extent to which a learning algorithm based on ReCovEr would be affected by

the estimation imprecision caused by the estimation. This is studied further. 5.2. Exp eriments with Generated Graph Data Figure 4 demonstrates the precision of ReCovEr on the graph data generated by the Erdos-R´nyi generae tor by showing 1000 pairs (estimated coverage, actual coverage). Hypotheses and examples were generated with c = 0.3 for hypotheses and c = 0.3 for examples. The graphs corresponding to hypotheses had 15 vertices, and the graphs corresponding to examples had 100 vertices. The top panel refers to estimates obtained by Algorithm 2 enhanced by cutoff selection with base cutoff R = 100, while the bottom panel refers to estimates obtained by the same algorithm with base cutoff R = 200. A bias towards coverage under-estimation can be observed, as well as a positive effect of the higher base cutoff on estimation precision.


Fast Estimation of Clause Coverage

Table 1 shows average runtimes of ReCovEr and Django. Table 2 shows average runtimes of Django and ReCovEr on artificial graph data with smallworld topology generated by Algorithm 5. In this case, the graphs underlying the hypotheses had 15 vertices and their connectivity was k = 4. The graphs underlying the examples had 100 vertices and connectivity k = 20. A dramatic speedup from Django's runtime is exhibited in both cases. Note that there is no immediate reason to avoid the under-estimation bias because coverage is usually tested on two example sets (positive and negative). The two results are usually subtracted thus (mostly) canceling the bias. Whether the observed estimation variance is tolerable for the task of clause ranking usual in inductive logic programming is the sub ject of the experiments in the next section. Algorithm ReCovEr, R = 100 ReCovEr, R = 200 Django Avg. Time [s] 6.7 12.5 483.2

Algorithm 3 Learner(, p, B eamS iz e, T ries): Clause Learner

A

Input: Most sp ecific clause , Real numbers p, Integers B eamS iz e, M axS ear ched B eam  {} B estC lause   rep eat C andidates  B eam for hi  B eam do for i = 1 . . . B eamS iz e do Gener ateC lause(hi ) C  connected comp onents of c Evaluate each ci  C C andidates  candidates  C end for end for for h  C andidates such that h is estimated to b e b etter than B estC lause do if h is shown to b e b etter than B estC lause by a deterministic subsumption algorithm then B estC lause  h end if end for Cho ose B eamS iz e b est hyp otheses from C andidates and add them to B eam E xplored  E xplor ed + 1 until B eam = {} or E xplor ed = T r ies

Table 1. Average coverage test runtimes for the configuration from Fig. 4.

Algorithm ReCovEr, R = 100 ReCovEr, R = 200 Django

Avg. Time [s] 8.9 16.3 519.8

Table 2. Average coverage test runtimes for the configuration with small world graph data.

5.3. Exp eriments with Real-World Data In order to assess performance in conditions of a reallife learning setting, we decided not to generate clauses entirely randomly. Our intention was to simulate general principles of clause production in an inductive logic programming system, while avoiding an overfit to a specific clause search strategy (which would e.g. be a result of adhering to a specific heuristic function for selecting literals). Thus we developed a simple relational learner, which we use for further experiments with ReCovEr. The learner (Algorithm 3) is a randomized variation of a specific-to-general beam search. It starts with the most specific clause  and at each search step, it generates at least n · |B eam| new hypotheses by removing random subsets of literals from the hypotheses already present in B eam. The output of the algorithm is one best clause, which is why we assess its quality through precision and recall.

The first set of experiments, which we have conducted with Algorithm 3, deals with the Mutagenesis dataset (Srinivasan et al., 1996). This dataset consists of descriptions of 188 organic molecules, which are marked according to their mutagenicity. In our experiments, we used only the information about atombond relationships and about types of atoms. We did not consider numerical parameters such as lumo or log p. Our relational-logic representation of these molecules consisted of ternary literals for atomic bonds bond(at1, at2, bondT y pe), unary literals representing types of particular bonds and unary literals for atom types. We have considered three variants of relational logic description of the molecules, with growing complexity (size of examples). The first version Muta-v1 uses a naive representation. Here, each molecular bond is represented by a single literal bond(at1, at2, bondT y pe), thus imposing a bond orientation (atom order) chosen at random. The second source of imprecision of this representation is that two variables in a clause may represent the same (chemical) atom, which does not make intuitive sense. The second version Muta-v2 deals with the first source of imprecision, as it represents every atomic bond with a pair of literals bond(at1, at2, bondT y pe) and bond(at2, at1, bondT y pe). The third version Mutav3 solves the second source of imprecision by adding literals different (a, b) for all pairs of atom-representing constants a, b. The second set of experiments pertains to class-labeled a a CAD data (product structures) described in (Z´kov´


Fast Estimation of Clause Coverage

et al., 2007), consisting of 96 CAD examples each containing several hundreds of first-order literals. The main observation provided by the experiments is that ReCovEr becomes quickly superior to Django as the example size grows, whereas the two algorithms do not significantly differ in terms of the training-set1 accuracy of the discovered clauses. It is interesting to note that Django's poor runtime performance on the learning tasks with large examples (CAD data and Muta-v2) was often due to occasional subsumption cases. Clearly, this is a manifestation of heavy tails present in Django's runtime distribution. Unlike Django, ReCovEr was exhibiting steady performance. Dataset Muta-v1 Muta-v2 Muta-v3 CAD ReCovEr [s] 42 513 1695 121 Django [s] 29 1627 >5h >2h

experiments on synthetic graph data and on reallife data from organic chemistry and engineering. In future work we mainly want to develop theoretical bounds for ReCovEr's estimation precision.
Acknowledgements. We are grateful to ICML 2008 reviewers for their insightful comments and to the area chair for his extensive effort to understand and improve our initially unclear presentation. The authors are supported by the Grant Agency of the Czech Republic through the pro ject 201/08/0486 Merging Machine Learning with Constraint Satisfaction.

References
Arias, M., Khardon, R., & Maloberti, J. (2007). Learning Horn expressions with Logan-H. Journal of Machine Learning Research, 8, 549­587. Giordana, A., & Saitta, L. (2000). Phase transitions in relational learning. Machine Learning, 41, 217­251. Gomes, C. P., Fern´ndez, C., Selman, B., & Bessi`re, a e C. (2005). Statistical regimes across constrainedness regions. Constraints, 10, 317­337. Gomes, C. P., Selman, B., Crato, N., & Kautz, H. A. (2000). Heavy-tailed phenomena in satisfiability and constraint satisfaction problems. Journal of Automated Reasoning, 24, 67­100.  Kuelka, O., & Zelezny, F. (2009). A restarted stratz ´ egy for efficient subsumption testing. Fundamenta Informaticae, spec. issue on multi-relational data mining. (Accepted). Maloberti, J., & Sebag, M. (2004). Fast thetasubsumption with constraint satisfaction algorithms. Machine Learning, 55, 137­174. Sebag, M., & Rouveirol, C. (1997). Tractable induction and classification in first-order logic via stochastic matching. IJCAI97 (pp. 888­893). Morgan Kaufmann. Srinivasan, A., Muggleton, S., Sternberg, M. J. E., & King, R. D. (1996). Theories for mutagenicity: A study in first-order and feature-based induction. Artificial Intel ligence, 85, 277­299.  Zelezny, F., Srinivasan, A., & Page, D. (2006). Ran´ domised restarted search in ILP. Machine Learning, 64, 183­208. a a  Z´kov´, M., Zelezny, F., Garcia-Sedano, J., Tissot, ´ C. M., Lavra, N., Kemen, P., & Molina, J. c r (2007). Relational data mining applied to virtual engineering of product designs. ILP06 (pp. 439­453). Springer.

Table 3. Average runtimes of the learner (Algorithm 3, p = 0.75, T ries = 10) for real-world datasets.

Dataset Muta-v1 Muta-v2 Muta-v3 CAD

Avg. Precision 0.84 0.81 0.83 0.92

Avg. Recall 0.61 0.65 0.84 0.7

Table 4. Quality of learned hypotheses for ReCovEr

Dataset Muta-v1 Muta-v2 Muta-v3 CAD

Avg. Precision 0.86 0.82 n.a. n.a.

Avg. Recall 0.6 0.65 n.a. n.a.

Table 5. Quality of learned hypotheses for Django

6. Conclusions
In this paper, we have introduced ReCovEr, an algorithm exploiting restarts for a maximum-likelihood based estimation of clause coverage. ReCovEr avoids heavy tails as well as laborious proving of certain unsatisfiable subsumption instances. We have shown that ReCovEr provides favorable runtimes while achieving reasonable precision, which is illustrated by
As this paper is not concerned with improving generalization performance, we did not measure accuracies on hold-out test sets.
1