Interactive Submodular Set Cover Andrew Guillory Jeff Bilmes University of Washington, Seattle, WA 98195, USA guillory@cs.washington.edu bilmes@ee.washington.edu Abstract We introduce a natural generalization of submodular set cover and exact active learning with a finite hypothesis class (query learning). We call this new problem interactive submodular set cover. Applications include advertising in social networks with hidden information. We give an approximation guarantee for a novel greedy algorithm and give a hardness of approximation result which matches up to constant factors. We also discuss negative results for simpler approaches and present encouraging early experimental results. snowboarding or people that listen to jazz music. If the members of the target group are unknown and we have no way of learning the members of the target group, there is little we can do except assume every member of the social network is a member of the target group. However, if we assume the group has some known structure and that we receive feedback from sending advertisements (e.g. in the form of ad clicks or survey responses), it may be possible to simultaneously discover the members of the group and find people that are influential in the group. We call problems like this learning and covering problems. In our example, the learning aspect of the problem is discovering the members of the target group (the people that like snowboarding), and the covering aspect of the problem is to select a small set of people that achieve a desired level of influence in the target group (the people to target with advertisements). Other applications have similar structure. For example, we may want to select a small set of representative documents about a topic of interest (e.g. about linear algebra). If we do not initially know topic labels for documents, this is a learning and covering problem. We propose a new problem called interactive submodular set cover that can be used to model many learning and covering problems. Besides addressing interesting new applications, interactive submodular set cover directly generalizes submodular set cover and exact active learning with a finite hypothesis class (query learning) giving new insight into many previous theoretical results. We derive and analyze a new algorithm that is guaranteed to perform approximately as well as any other algorithm. Our algorithm considers simultaneously the learning and covering parts of the problem. It is tempting to try to treat these two parts of the problem separately for example by first solving the learning problem and then solving the covering problem. We prove this approach and other simple approaches may perform much worse than the optimal algorithm. Some proofs are omitted for space but are in the technical report (Guillory & Bilmes, 2010). 1. Introduction As a motivating example, we consider viral marketing in a social network. In the standard version of the problem, the goal is to send advertisements to influential members of a social network such that by sending advertisements to only a few people our message spreads to a large portion of the network. Previous work (Kempe et al., 2005; 2003) has shown that, for many models of influence, the influence of a set of nodes can be modelled as a submodular set function. Therefore, selecting a small set of nodes with maximal influence can be posed as a submodular function maximization problem. The related problem of selecting a minimal set of nodes to achieve a desired influence is a submodular set cover problem. Both of these problems can be approximately solved via a simple greedy approximation algorithm. Consider a variation of this problem in which the goal is not to send advertisements to people that are influential in the entire social network but rather to people that are influential in a specific target group. For example, our target group could be people that like Appearing in Proceedings of the 27 th International Conference on Machine Learning, Haifa, Israel, 2010. Copyright 2010 by the author(s)/owner(s). Interactive Submodular Set Cover 2. Background 2.1. Submodular Set Cover A submodular function is a set function satisfying a natural diminishing returns property. We call a set function F defined over a ground set V submodular iff for all A B V and v V \ B F (A + v) - F (A) F (B + v) - F (B) (1) In other words, adding an element to A, a subset of B, results in a larger gain than adding the same element to B. F is called modular if Equation 1 holds with equality. F is monotone non-decreasing if for all A B V , F (A) F (B). In the submodular set cover problem the goal is to find a set S V minimizing a modular cost function c(S) = sS c(s) subject to the constraint F (S) = F (V ) for a monotone non-decreasing submodular F . This problem is closely related to the problem of submodular function maximization under a modular cost constraint c(S) < k for a constant k. A number of interesting real world applications can be posed as submodular set cover or submodular function maximization problems including influence maximization in social networks (Kempe et al., 2003), sensor placement and experiment design (Krause et al., 2008), and document summarization (Lin & Bilmes, 2010). In the sensor placement problem, for example, the ground set V corresponds to a set of possible locations. An objective function F (S) measures the coverage achieved by deploying sensors to the locations corresponding to S V . For many reasonable definitions of coverage, F (S) turns out to be submodular. Submodular set cover is a generalization of the set cover problem, and, as is the case for set cover, a greedy algorithm has approximation guarantees (Wolsey, 1982). In particular, if F is integer valued, then the greedy solution is within H(maxvV F ({v})) of the optimal solution where H(k) is the kth harmonic number. Up to lower order terms, this matches the hardness of approximation lower bound for set cover (Feige, 1998). We note a variation of submodular set cover uses a constraint F (S) for a fixed threshold . It can be shown this variation is equivalent (Krause et al., 2008; Narayanan, 1997), so without loss of generality or specificity, we use this variation. 2.2. Exact Active Learning In the exact active learning problem we have a known finite hypothesis class given by a set of objects H, and we want to identify an initially unknown target hypothesis h H. We identify h by asking questions. Define Q to be the known set of all possible questions. A question q maps an object h to a set of valid responses q(h) R with q(h) = where R qQ,hH q(h) is the set of all possible responses. We know the mapping for each q (i.e. we know q(h) for every q and h). Asking q reveals some element r q(h ) which may be chosen adversarially (chosen to impede the learning algorithm). Each question q Q has a positive cost c(q) defined by the modular cost function c. The goal of active learning is to ask a sequence of questions with small total cost that identifies h . By identifying h , we mean that for every h = h we have received some response r to a question q such that r q(h). Questions are chosen sequentially so that / the response from a previous question can be used to decide which question to ask next. In a typical exact learning problem, H is a set of different classifiers and h is a unique zero-error classifier. Questions in Q can, for example, correspond to label (membership) queries for data points. If we have a fixed data set consisting of data points xi , we can create a question qi corresponding to each xi and set qi (h) = {h(xi )}. We note that we assume that there is no noise in responses and that the hypothesis class is correct (h H). Questions can also correspond to more complicated queries. For example, a question can ask if any points in a set are positively labelled. The setting we have described allows for mixing arbitrary types of queries with different costs. ^ For a set of question-response pairs S, define the ver^ to be the subset of H consistent with sion space V (S) ^ ^ ^ S. More formally V (S) {h H : (q, r) S, r q(h)}. In terms of the version space, the goal of exact active learning is to ask a sequence of questions such ^ that |V (S)| = 1. Building on previous work (Balc´zar et al., 2007), Hana neke (2006) showed that a simple greedy active learning strategy is approximately optimal in the setting we have described. The greedy strategy selects the question which relative to cost distinguishes the greatest number of hypotheses from h . Hanneke (2006) shows this strategy incurs no more than ln |H| times the cost of any other question asking strategy. 3. Problem Statement We use notation similar to the exact active learning problem we described in the previous section. Assume we have a finite hypothesis class H containing an unknown target hypothesis h H. We again assume Interactive Submodular Set Cover there is a finite set of questions Q, a question q maps each object h to a set of valid responses q(h) R with q(h) = , and each question q Q has a positive cost c(q) defined by the modular cost function c. We also again assume that we know the mapping for each q (i.e. we know q(h) for every q and h). Asking q reveals some adversarially chosen element r q(h ). In the exact active learning problem the goal is to identify h through questions. In this work we consider a generalization in which the goal is instead to satisfy a submodular constraint that depends on h . We assume that for each object h there is a corresponding monotone non-decreasing submodular function Fh defined over subsets of Q×R (sets of questionresponse pairs). We repeatedly ask a question qi and ^ receive a response ri . Let the sequence of questions ^ ^ be Q = (^1 , q2 , . . . ) and sequence of responses be q ^ ^ = (^1 , r2 , . . . ). Define S = ^ R r ^ qi Q {(^i , ri )} to be ^ ^ q ^ the final set of question-response pairs corresponding to these sequences. Our goal is to ask a sequence of ^ ^ questions with minimal total cost c(Q) so Fh (S) for some threshold without knowing h beforehand. We call this interactive submodular set cover. about h and the feedback we receive from questions that allows us to model learning and covering. 3.1. Connection to Submodular Set Cover If we know h (e.g. if |H| = 1) and we assume |q(h)| = 1 q Q, h H (i.e. that there is only one valid response to every question), our problem reduces exactly to the standard submodular set cover problem. Under these assumptions, we can compute ^ Fh (S) for any set of questions without actually asking these questions. Krause et al. (2008) study a noninteractive version of interactive submodular set cover in which |q(h)| = 1 q Q, h H and the entire sequence of questions must be chosen before receiving any responses. This restricted version of the problem can also be reduced to standard submodular set cover Krause et al. (2008). 3.2. Connection to Active Learning Define ^ Fh (S) ^ ^ F (S) = |H \ V (S)| ^ where V (S) is again the version space (the set of hy^ potheses consistent with S). This objective is the number of hypotheses eliminated from the version space by ^ S and is submodular and monotone non-decreasing. For this objective, if we set = |H| - 1 we get the standard exact active learning problem: our goal is to identify h using a set of questions with small total cost. Note that in this case the objective Fh does not actually depend on h (i.e. Fh = Fh for all h, h H) but the problem still differs from standard submodu^ lar set cover because Fh (S) is defined over questionresponse pairs. We can also model other variations of active learning; ^ for example, instead of requiring |V (S)| = 1, with an ^ appropriate Fh we can require that every h V (S) has low error with respect to h . 3.3. Connection to Adaptive Submodularity In concurrent work, Golovin & Krause (2010) show results similar to ours for a different but related class of problems which also involve interactive (i.e. sequential, adaptive) optimization of submodular functions. What Golovin & Krause call realizations correspond to hypotheses in our work while items and states correspond to queries and responses respectively. Golovin & Krause consider both average-case and worst-case settings and both maximization and min-cost coverage problems. In contrast, we only consider worst-case, min-cost coverage problems. In this sense our results are less general. Interactive Submodular Set Cover Given: · Hypothesis class H containing an unknown target h · Query set Q and response set R with known q(h) R for every q Q, h H · Modular query cost function c defined over Q · Submodular monotone non-decreasing objective functions Fh for h H defined over Q × R · Objective threshold Repeat: Ask a question qi Q and receive a re^ sponse ri qi (h ) ^ ^ ^ ^ Until: Fh (S) where S = i {(^i , ri )} q ^ Objective: Minimize i c(^i ) q Note that although we know the hypothesis class H and the corresponding objective functions Fh , we do not initially know h . Information about h is only revealed as we ask questions and receive responses to questions. Responses to previous questions can be used to decide which question to ask next, so in this way the problem is "interactive." Furthermore, the objective function for each hypothesis Fh is defined over sets of question-response pairs (as opposed to, say, sets of questions), so when asking a new question we cannot predict how the value of Fh will change until after we receive a response. The only restriction on the response we receive is that it must be consistent with the initially unknown target h . It is this uncertainty Interactive Submodular Set Cover an ad. This is a variation of the minimum dominating set problem, and we use the following objective ^ Fh (S) I v VS or s VS : (v, s) E + |V \ Vh | ^ ^ vVh Figure 1. A cartoon example social network. However, in other ways our results are more general. The main greedy approximation guarantees shown by Golovin & Krause require that the problem is adaptive submodular ; adaptive submodularity depends not only on the objective but also on the set of possible realizations and the probability distribution over these realizations. In contrast we only require that for a fixed hypothesis the objective is submodular. Golovin & Krause call this pointwise submodularity. Pointwise submodularity does not in general imply adaptive submodularity (see the clustered failure model discussed by Golovin & Krause). Some other previous work also considers interactive versions of covering problems in the average-case (Asadpour et al., 2008; Goemans & Vondr´k, 2006). a where V and E are the nodes and edges in the social network, Vh is the set of nodes in group h, and VS is ^ the set of nodes corresponding to ads we have sent. ^ With this objective Fh (S) = |V | iff we have achieved the stated coverage goal. This objective is also submodular and monotone non-decreasing. Figure 1 shows a cartoon social network. For this example, assume the advertiser knows the target group is one of the four clusters shown (marked A, B, C, and D) but does not know which. This is our hypothesis class H. The node marked v is initially very useful for learning the members of the target group: if we send an ad to this node, no matter what response we receive we are guaranteed to eliminate two of the four clusters (either A and B or C and D). However, this node has only a degree of 2 and therefore sending an ad to this node does not cover very many nodes. On the other hand, the nodes marked x and w are connected to every node in clusters B and D respectively. x (resp. w) is therefore very useful for achieving the coverage objective if the target group is B (resp. D). An algorithm for learning and covering must choose between actions more beneficial for learning vs. actions more beneficial for covering (although sometimes an action can be beneficial for both to a certain degree). The interplay between learning and covering is similar to the exploration-exploitation trade-off in reinforcement learning. In this example an optimal strategy is to end an ad to v and then cover the remaining two clusters using two ads for a worst case cost of 3. 4. Example In the advertising application we described in the introduction, the target hypothesis h corresponds to the group of people we want to target with advertisements (e.g. the people that like snowboarding), and the hypothesis class H encodes our prior knowledge about h . For example, if we know the target group forms a small dense subgraph in the social network, then the hypothesis class H would be the set of all small dense subgraphs in the social network. The query set Q and response set R correspond to advertising actions and feedback respectively, and finally the objective function Fh measures advertising coverage within the group corresponding to h. To make the discussion concrete, assume the advertiser sends a single ad at a time and that after a person is sent an ad the advertiser receives a binary response indicating if that person is in the target group (i.e. likes snowboarding). Let qi correspond to sending an ad to user i (i.e. node i), and qi (h) = {1} if user i is in group h and qi (h) = {0} otherwise. For our coverage goal, assume the advertiser wants to ensure that every person in the target group either receives an ad or has a friend that receives an ad. We say a node is "covered" if it has received an ad or has a neighbor that has received 5. Greedy Approximation Guarantee We are interested in approximately optimal polynomial time algorithms for the interactive submodular set cover problem. We call a question asking strategy correct if it always asks a sequence of questions ^ ^ such that Fh (S) where S is again the final set of question-response pairs. A necessary and sufficient ^ condition to ensure Fh (S) for worst case choice ^ ^ of h is to ensure minhV (S) Fh (S) where V (S) is ^ the version space. Then a simple stopping condition which ensures a question asking strategy is correct is to ^ continue asking questions until minhV (S) Fh (S) . ^ We call a question asking strategy approximately optimal if it is correct and the worst case cost incurred by the strategy is not much worse than the worst case Interactive Submodular Set Cover Algorithm 1 Worst Case Greedy ^ 1: H H ^ 2: S ¯ ^ 3: while F (S) < do ¯ ^ 4: q argmaxqi Q minhV (S) minri qi (h) (F (S + ^ ^ ¯ ^ (qi , ri )) - F (S))/c(qi ) 5: Ask q and receive response r ^ ^ ^ ^ 6: S S + (^, r) q ^ 7: end while cost of any other strategy. As discussed informally in the previous section, it is important for a question asking strategy to balance between learning (identifying h ) and covering (increasing Fh ). Ignoring either aspect of the problem is in general suboptimal (we show this formally in Section 6). We propose a reduction which converts the problem over many objective functions Fh into a problem ¯ over a single objective function F that encodes the trade-off between learning and covering. We can then use a greedy algorithm to maximize this single objective, and this turns out to overcome the shortcomings of simpler approaches. This reduction is inspired by the reduction used by Krause et al. (2008) in the noninteractive setting to convert multiple covering constraints into a single covering constraint. Define ¯ ^ F (S) (1/|H|)( ^ hV (S) We now argue that Algorithm 1 is an approximately optimal algorithm for interactive submodular set cover. Note that although it is a simple greedy algorithm over a single submodular objective, the standard submodular set cover analysis doesn't apply: the objective function is defined over question-response pairs, and the algorithm cannot predict the actual objective function gain until after selecting and commiting to a question and receiving a response. We use an Extended Teaching Dimension style analysis (Hanneke, 2006) inspired by previous work in query learning. We are the first to our knowledge to use this kind of proof for a submodular optimization problem. Define an oracle (teacher) T RQ to be a function mapping questions to responses. As a short hand, for ^ a sequence of questions Q define ^ T (Q) {(^i , T (^i ))} q q qi Q ^ ^ ^ T (Q) is the set of question-response pairs received ^ when T is used to answer the questions in Q. We now define a quantity analogous to the General Identification Cost for exact active learning (Hanneke, 2006). Define the General Cover Cost, GCC GCC ^ ¯ ^ T RQ Q:F (T (Q)) max ( min ^ c(Q)) ^ ^ min(, Fh (S)) + |H \ V (S)|) ¯ ^ ^ ^ F (S) iff Fh (S) for all h V (S) so a ques¯ ^ tion asking strategy is correct iff it satisfies F (S) . This objective balances the value of learning and cov^ ering. The sum over h V (S) measures progress towards satisfying the covering constraint for hypotheses h in the current version space (covering). The second ^ term |H \ V (S)| measures progress towards identify ing h through reduction in version space size (learning). Note that the objective does not make a hard distinction between learning actions and covering actions. In fact, the objective will prefer actions that ^ ^ both increase Fh (S) for h V (S) and decrease the ^ Crucially, F retains submodularity. ¯ size of V (S). ¯ Lemma 1. F is submodular and monotone nondecreasing when every Fh is submodular and monotone non-decreasing. Algorithm 1 shows the worst case greedy algorithm which at each step picks the question qi that maximizes ¯ the worst case gain of F ^ hV (S) ri qi (h) GCC depends on H, Q, , c, and the objective functions Fh , but for simplicity of notation this dependence is suppressed. GCC can be viewed as the cost of sat¯ ^ isfying F (T (Q)) for worst case choice of T where ^ the choice of T is known to the algorithm selecting Q. Here the worst case choice of T is over all mappings between Q and R. There is no restriction that T answer questions in a manner consistent with any hypothesis h H. We first show that GCC is a lower bound on the op^ timal worst case cost of satisfying Fh (S) . Lemma 2. If there is a correct question asking strat^ egy for satisfying Fh (S) with worst case cost C then GCC C . Proof. Assume the lemma is false and there is a correct question asking strategy with worst case cost C and GCC > C . Using this assumption and the definition of GCC, there is some oracle T such that ^ ¯ ^ Q:F (T (Q)) min ^ c(Q) = GCC > C min ¯ ^ ¯ ^ min (F (S + (qi , ri )) - F (S))/c(qi ) When we use T to answer questions, any sequence ^ of questions Q with total cost less than or equal to Interactive Submodular Set Cover ¯ ^ ¯ ^ C must have F (S) < . F (S) < in turn im^ < for some target hypothesis choice plies Fh (S) ^ h V (S). This contradicts the assumption there is a correct strategy with worst case cost C . We now establish that when GCC is small, there must ¯ be a question which increases F . Lemma 3. For any initial set of questions-response ^ pairs S, there must be a question q Q such that ^ hV (S) rq(h) After some algebra we get ¯ ^ ¯ ^ - F (Si ) ( - F (Si-1 ))(1 - c(^i )/GCC) q Now using 1 - x < e-x q ¯ ^ ¯ ^ - F (Si ) (- F (Si-1 ))e-c(^i )/GCC = e-Ci /GCC min ¯ ^ ¯ ^ min F (S + (q, r)) - F (S) ¯ ^ c(q)( - F (S))/GCC ¯ ^ We have shown that the gap - F (Si ) decreases exponentially fast with the cost of the questions asked. The remainder of the proof proceeds by showing that (1) we can decrease the gap to 1/|H| using questions with at most GCC ln(|H|) cost and (2) we can decrease the gap from 1/|H| to 0 with one question with cost at most GCC. ¯ ^ Let j is the largest integer such that - F (Sj ) 1/|H| holds. Then 1/|H| e-Cj /GCC Solving for Cj we get Cj GCC ln(|H|). This completes (1). ¯ ^ ¯ ^ By Lemma 3, F (Si ) < F (Si+1 ) (we strictly increase the objective on each iteration). Because is an integer and for every h Fh is an integral function, we ¯ ^ ¯ ^ can conclude F (Si ) < F (Si+1 ) + 1/|H|. Then qj+1 will be the final question asked. By Lemma 3, qj+1 can have cost no greater than GCC. This completes (2). We can finally conclude the cost incurred by the greedy algorithm is at most GCC(1 + ln(|H|)) By combining Theorem 1 and Lemma 2 we get Corollary 1. For integer and integral monotone non-decreasing submodular Fh , the worst case cost of Algorithm 1 is within 1 + ln(|H|) of that of any other correct question asking strategy We have shown a result for integer valued and objective functions. We speculate that for more general non-integer objectives it should be possible to give results similar to those for standard submodular set cover (Wolsey, 1982). These approximation bounds typically add an additional normalization term. Proof. Assume the lemma is false and for every ques^ tion q there is some h V (S) and r q(h) such that ¯ ^ ¯ ^ ¯ ^ F (S + (q, r)) - F (S) < c(q)( - F (S))/GCC Define an oracle T which answers every question with a response satisfying this inequality. For example, one such T is T (q) ¯ ^ ¯ ^ argminr F (S + (q, r)) - F (S) By the definition of GCC ^ ¯ ^ Q:F (T (Q)) min ^ c(Q)) max ( ^ ¯ ^ T RQ Q:F (T (Q)) min ^ c(Q)) = GCC ^ ^ so there must be a sequence of questions Q with c(Q) ¯ (T (Q)) . Because F is mono^ ¯ GCC such that F ¯ ^ ^ tone non-decreasing, we also know F (T (Q) S) . ¯ , Using the submodularity of F ¯ ^ ^ F (T (Q) S) ¯ ^ F (S) + < ¯ ^ F (S) + ^ qQ ¯ ^ ¯ ^ (F (S {(q, T (q))}) - F (S)) ^ qQ ¯ ^ c(q)( - F (S))/GCC which is a contradiction. We can now show approximate optimality. Theorem 1. Assume that is an integer and, for any h H, Fh is an integral monotone non-decreasing submodular function. Algorithm 1 incurs at most GCC(1 + ln(n)) cost. Proof. Let qi be the question asked on the ith iter^ ^ ation, Si be the set of question-response pairs after asking qi and Ci be ji c(^j ). By Lemma 3 ^ q ¯ ^ ¯ ^ ¯ ^ F (Si ) - F (Si-1 ) c(^i )( - F (Si-1 ))/GCC q 6. Negative Results The method we propose for interactive submodular set cover simultaneously solves the learning problem and covering problem in parallel, only solving the learning problem to the extent that it helps solve the covering problem. A simpler strategy is to solve these two problems in series (i.e. first identify h using the standard greedy query learning algorithm and second solve the submodular set cover problem for Fh using the standard greedy set cover algorithm). We call this the Interactive Submodular Set Cover Learn then Cover approach. We can show that this approach and in fact any approach that identifies h exactly can perform very poorly. Therefore it is important to consider the learning problem and covering problem simultaneously. Theorem 2. Assume Fh is integral for all h and that is an integer. Any algorithm that exactly identifies h has approximation ratio at least (|H| maxi c(qi )/ mini c(qi )). Another simple approach is to ignore feedback and solve the covering problem for all h H. We call this the Cover All method. This method is an example of a non-adaptive method: a non-adaptive (i.e. non interactive) method is any method that does not use responses to previous questions in deciding which question to ask next. The adaptivity gap (Dean et al., 2004) for a problem characterizes how much worse the best non-adaptive method can perform as compared to to the best adaptive method. For interactive submodular set cover we define the adaptivity gap to be the maximum ratio between the cost of the optimal non-adaptive strategy and the optimal adaptive strategy. With this definition, we can show that, in contrast to related problems (Asadpour et al., 2008) where the adaptivity gap is a constant, the adaptivity gap for interactive submodular set cover is quite large. Theorem 3. The adaptivity gap for interactive submodular set cover is at least (|H|/ ln |H|). This result shows, even if we optimally solve the submodular set cover problem, the Cover All method can incur exponentially greater cost than the optimal adaptive strategy. We can finally also show that the 1 + ln(|H|) approximation factor achieved by the method we propose is in fact the best possible up to the constant factor assuming there are no slightly superpolynomial time algorithms for NP. The result and proof are very similar to those for the non-interactive setting (Krause et al., 2008). Theorem 4. Interactive submodular set cover cannot be approximated within a factor of (1 - ) max(ln |H|, ln ) in polynomial time for any > 0 unless NP has nO(log log n) time deterministic algorithms. The approximation factor we have shown for the greedy algorithm is 1+ln(|H|) = 1+ln +ln |H| < 1+2 max(ln |H|, ln ) so our hardness of approximation result matches up to the constant factor and lower order term. 7. Experiments We tested our method on the interactive dominating set problem described in Section 4. In this problem, we are given a graph and H is a set of possibly overlapping clusters of nodes. The goal is to find a small set of nodes which forms a dominating set of an initially unknown target group h H. After selecting each node, we receive feedback indicating if the selected node is in the target group. Our proposed method (Simultaneous Learning and Covering) simultaneously learns about the target group h and finds a dominating set for it. We compare to two baselines: a method which first exactly identifies h and then finds a dominating set for the target group (Learn then Cover) and a method which simply ignores feedback and finds a dominating set for the union of all clusters (Cover All). Note that Theorem 2 and Theorem 3 apply to Learn then Cover and Cover All respectively, so these methods do not have strong theoretical guarantees. However, we might hope however that for reasonable real world problems they perform well. We use real world network data sets with simple synthetic hypothesis classes designed to illustrate differences between the methods. The networks are from Jure Leskovec's collection of datasets available at http://snap.stanford.edu/data/index.html. We convert all the graphs into undirected graphs and remove self edges. Table 1 shows our results. Each reported result is the average number of queries over 100 trials. Bolded results are the best methods for each setting with multiple results bolded when differences are not statistically significant (within p = .01 with a paired t-test). In the first set of results (Clusters), we create H by using the METIS graph partition package 4 separate times partitioning the graph into 10, 20, 30, and 40 clusters. H is the combined set of 100 clusters, and these clusters overlap since they are taken from 4 separate partitions of the graph. The target h is chosen at random from H. With this hypothesis class, we've found that there is very little difference between the Simultaneous Learning and Covering and the Learn then Cover methods. The Cover All method performs significantly worse because without the benefit of feedback it must find a dominating set of the entire graph. In the second set of results, we use a hypothesis class designed to make learning difficult (Noisy Clusters). We start with H generated as before. We then add to H 100 additional hypotheses which are each very similar to h . Each of these hypotheses consists of the target group h with a random member removed. H is then the combined set of the 100 original hy- Interactive Submodular Set Cover Table 1. Average number of queries required to find a dominating set in the target group. Data Set / Hypothesis Class Simultaneous Learning and Covering Learn then Cover Cover All Enron / Clusters 156.64 161.81 3091.00 Physics / Clusters 175.97 177.88 3340.00 Physics Theory / Clusters 172.38 175.12 3170.00 Epinions / Clusters 774.81 779.23 15777.00 Slashdot / Clusters 709.30 715.39 15383.00 Enron / Noisy Clusters 179.00 231.03 3091.00 Physics / Noisy Clusters 186.13 225.02 3340.00 Physics Theory / Noisy Clusters 160.62 201.24 3170.00 Epinions / Noisy Clusters 788.52 788.06 15777.00 Slashdot / Noisy Clusters 804.87 804.86 15383.00 potheses and these 100 variations of h . For this hypothesis class, Learn then Cover performs significantly worse than our Simultaneous Learning and Covering method on 3 of the 5 data sets. Learn then Cover exactly identifies h , which is difficult because of the many hypotheses similar to h . Our method learns about h but only to the extent that it is helpful for finding a small dominating set. On the other two data sets Learn then Cover and Simultaneous Learning and Covering are almost identical. These are larger data sets, and we've found that when the covering problem requires many more queries than the learning problem, our method is nearly identical to Learn then Cover. This makes sense since when is large compared to ^ ¯ the sum over Fh (S) the second term in F dominates. Although we use real world graph data, the hypothesis classes and target hypotheses we use are very simple and synthetic, and as such these experiments are primarily meant to provide reasonable examples in support of our theoretical results. active learning. In ICML, 2006. Balc´zar, J.L., Castro, J., Guijarro, D., K¨bler, J., and a o Lindner, W. A general dimension for query learning. Journal of Computer and System Sciences, 73(6):924­ 940, 2007. Dean, B., Goemans, M., and Vondrak, J. Approximating the stochastic knapsack problem: The benefit of adaptivity. In FOCS, 2004. Feige, U. A threshold of ln n for approximating set cover. Journal of the ACM, 45(4), 1998. Goemans, M. and Vondr´k, J. Stochastic covering and a adaptivity. In LATIN, 2006. Golovin, D. and Krause, A. Adaptive submodularity: A new approach to active learning and stochastic optimization. In COLT, 2010. Guillory, Andrew and Bilmes, Jeff. Interactive submodular set cover, 2010. Technical Report. UWEETR-2010-0001. Hanneke, S. The cost complexity of active learning, 2006. Unpublished. //www.stat.cmu.edu/~shanneke/docs/2006/ cost-complexity-working-notes.pdf. interhttp: 8. Future Work We believe there are other interesting applications which can be posed as interactive submodular set cover. In some applications it may be difficult to com¯ pute F exactly because H may be very large or even infinite. In these cases, it may be possible to approximate this function by sampling from H. It's also important to consider methods that can handle misspecified hypothesis classes and noise within the learning. One approach could be to extend agnostic active learning (Balcan et al., 2006) results to a similar interactive optimization setting. ´ Kempe, D., Kleinberg, J., and Tardos, E. Maximizing the spread of influence through a social network. In KDD, 2003. ´ Kempe, D., Kleinberg, J., and Tardos, E. Influential nodes in a diffusion model for social networks. In ICALP, 2005. Krause, A., McMahan, H.B., Guestrin, C., and Gupta, A. Robust submodular observation selection. JMLR, 2008. Lin, Hui and Bilmes, Jeff. Multi-document summarization via budgeted maximization of submodular functions. In NAACL/HLT, 2010. Narayanan, H. Submodular Functions and Electrical Networks. North Holland, 1997. Wolsey, L.A. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica, 2(4), 1982. References Asadpour, A., Nazerzadeh, H., and Saberi, A. Stochastic submodular maximization. In Workshop on Internet and Network Economics, 2008. Balcan, M.F., Beygelzimer, A., and Langford, J. Agnostic