An Online Algorithm for Maximizing Submodular Functions Matthew Streeter Google, Inc. Pittsburgh, PA 15213 mstreeter@google.com Daniel Golovin Carnegie Mellon University Pittsburgh, PA 15213 dgolovin@cs.cmu.edu Abstract We present an algorithm for solving a broad class of online resource allocation problems. Our online algorithm can be applied in environments where abstract jobs arrive one at a time, and one can complete the jobs by investing time in a number of abstract activities, according to some schedule. We assume that the fraction of jobs completed by a schedule is a monotone, submodular function of a set of pairs (v , ), where is the time invested in activity v . Under this assumption, our online algorithm performs near-optimally according to two natural metrics: (i) the fraction of jobs completed within time T , for some fixed deadline T > 0, and (ii) the average time required to complete each job. We evaluate our algorithm experimentally by using it to learn, online, a schedule for allocating CPU time among solvers entered in the 2007 SAT solver competition. 1 Introduction This paper presents an algorithm for solving the following class of online resource allocation problems. We are given as input a finite set V of activities. A pair (v , ) V × R>0 is called an action, and represents spending time performing activity v . A schedule is a sequence of actions. We use S to denote the set of all schedules. A job is a function f : S [0, 1], where for any schedule S S , f (S ) represents the proportion of some task that is accomplished by performing the sequence of actions S . We require that a job f have the following properties (here is the concatenation operator): 1. (monotonicity) for any schedules S1 , S2 S , we have f (S1 ) f (S1 S2 ) and f (S2 ) f (S1 S2 ) 2. (submodularity) for any schedules S1 , S2 S and any action a V × R>0 , fa (S1 S2 ) fa (S1 ), where we define fa (S ) f (S a ) - f (S ). We will evaluate schedules in terms of two objectives. The first objective, which we call benefitmaximization, is to maximize f (S ) subject to the constraint (S) T , for some fixed T > 0, where (S) equals the sum of the durations of the actions in S . For example if S = (v1 , 3), (v2 , 3) , then (S) = 6. The second objective is to minimize the cost of a schedule, which we define as S d c (f , S ) = 1-f t t t=0 where S t is the schedule that results from truncating schedule S at time t. For example if S = (v1 , 3), (v2 , 3) then S 5 = (v1 , 3), (v2 , 2) .1 One way to interpret this objective is to imagine More formally, if S = a1 , a2 , . . . , where ai = (vi , i ), then S t = a1 , a2 , . . . , ak-1 , ak , (vk+1 , P P where k is the largest integer such that k=1 i < t and = t - k=1 i . i i 1 ) , 1 that f (S ) is the probability that some desired event occurs as a result of performing the actions in S . For any non-negative random variable X , we have E [X ] = t=0 P [X > t] dt. Thus c (f , S ) is the expected time we must wait before the desired event occurs if we execute actions according to the schedule S . The following example illustrates these definitions. Example 1. Let each activity v represent a randomized algorithm for solving some decision problem, and let the action (v , ) represent running the algorithm (with a fresh random seed) for time . Fix some particular instance of the decision problem, and for any schedule S , let f (S ) be the probability that one (or more) of the runs in the sequence S yields a solution to that instance. So f (S T ) is (by definition) the probability that performing the runs in schedule S yields a solution to the problem instance in time T , while c (f , S ) is the expected time that elapses before a solution is obtained. It is clear that f (S ) is monotone, because adding runs to the sequence S can only increase the probability that one of the runs is successful. The fact that f is submodular can be seen as follows. For any schedule S and action a, fa (S ) equals the probability that action a succeeds after every action in S has failed, which can also be written as (1 - f (S )) · f ( a ). This, together with the monotonicity of f , implies that for any schedules S1 , S2 and any action a, we have fa (S1 S2 ) = (1 - f (S1 S2 )) · f ( a ) (1 - f (S1 )) · f ( a ) = fa (S1 ). In the online setting, an arbitrary sequence f (1) , f (2) , . . . , f (n) of jobs arrive one at a time, and we must finish each job (via some schedule) before moving on to the next job. When selecting a schedule S (i) to use to finish job f (i) , we have knowledge of the previous jobs f (1) , f (2) , . . . , f (i-1) but we have no knowledge of f (i) itself or of any subsequent jobs. In this setting we aim to minimize regret, which measures the difference between the average cost (or average benefit) of the schedules produced by our online algorithm and that of the best single schedule (in hindsight) for the given sequence of jobs. 1.1 Problems that fit into this framework A number of previously-studied problems can be cast as the task of computing a schedule . that S 1 n ( 1 minimizes c (f , S ), where f is of the form f (S ) = n i=1 - v, )S (1 - pi (v , )) This expression can be interpreted as follows: the job f consists of n subtasks, and pi (v , ) is the probability that investing time in activity v completes the ith subtask. Thus, f (S ) is the expected fraction of subtasks that are finished after performing the sequence of actions S . Assuming pi (v , ) is a non-decreasing function of for all i and v , it can be shown that any function f of this form is monotone and submodular. P I P E L I N E D S E T C OV E R [11, 15] can be defined as the special case in which for each activity v there is an associated time v , and pi (v , ) = 1 if v and pi (v , ) = 0 otherwise. M I N - S U M S E T C OV E R [7] is the special case in which, additionally, v = 1 or v = for all v V . The problem of constructing efficient sequences of trials [5] corresponds to the case in which we are given a matrix q , and pi (v , ) = qv,i if 1 and pi (v , ) = 0 otherwise. The problem of maximizing f (S T ) is a slight generalization of the problem of maximizing a monotone submodular set function subject to a knapsack constraint [14, 20] (which in turn generalizes B U D G E T E D M A X I M U M C OV E R AG E [12], which generalizes M A X k - C OV E R AG E [16]). The only difference between the two problems is that, in the latter problem, f (S ) may only depend on the set of actions in the sequence S , and not on the order in which the actions appear. 1.2 Applications We now discuss three applications, the first of which is the focus of our experiments in §5. 1. Online algorithm portfolio design. An algorithm portfolio [9] is a schedule for interleaving the execution of multiple (randomized) algorithms and periodically restarting them with a fresh random seed. Previous work has shown that combining multiple heuristics for NP-hard problems into a portfolio can dramatically reduce average-case running time [8, 9, 19]. In particular, algorithms based on chronological backtracking often exhibit heavy-tailed run length distributions, and periodically restarting them with a fresh random seed can reduce the mean running time by orders of magnitude [8]. As illustrated in Example 1, our algorithms can be used to learn an effective algorithm portfolio online, in the course of solving a sequence of problem instances. 2 2. Database query processing. In database query processing, one must extract all the records in a database that satisfy every predicate in a list of one or more predicates (the conjunction of predicates comprises the query). To process the query, each record is evaluated against the predicates one at a time until the record either fails to satisfy some predicate (in which case it does not match the query) or all predicates have been examined. The order in which the predicates are examined affects the time required to process the query. Munagala et al. [15] introduced and studied a problem called P I P E L I N E D S E T C OV E R (discussed in §1.1), which entails finding an evaluation order for the predicates that minimizes the average time required to process a record. Our work addresses the online version of this problem, which arises naturally in practice. 3. Sensor placement. Sensor placement is the task of assigning locations to a set of sensors so as to maximize the value of the information obtained (e.g., to maximize the number of intrusions that are detected by the sensors). Many sensor placement problems can be optimally solved by maximizing a monotone submodular set function subject to a knapsack constraint [13], a special case of our benefit-maximization problem (see §1.1). Our online algorithms could be used to select sensor placements when the same set of sensors is repeatedly deployed in an unknown or adversarial environment. 1.3 Summary of results We first consider the offline variant of our problem. As an immediate consequence of existing results [6, 7], we find that, for any > 0, (i) achieving an approximation ratio of 4 - for the cost-minimization problem is NP-hard and (ii) achieving an approximation ratio of 1 - 1 + for the e benefit-maximization problem is NP-hard. We then present a greedy approximation algorithm that simultaneously achieves the optimal approximation ratios (of 4 and 1 - 1 ) for these two problems, e building on and generalizing previous work on special cases of these two problems [7, 20]. In the online setting we provide an online algorithm whose worst-case performance guarantees approach those of the offline greedy approximation algorithm asymptotically (as the number of jobs approaches infinity). We then show how to modify our online algorithm for use in several different "bandit" feedback settings. Finally, we prove information-theoretic lower bounds on regret. We conclude with an experimental evaluation. 2 Related Work As discussed in §1.1, the offline cost-minimization problem considered here generalizes M I N - S U M S E T C OV E R [7], P I P E L I N E D S E T C OV E R [11, 15], and the problem of constructing efficient sequences of trials [5]. Several of these problems have been considered in the online setting. Munagala et al. [15] gave an online algorithm for P I P E L I N E D S E T C OV E R that is asymptotically O (log |V |)-competitive. Babu et al. [3] and Kaplan et al. [11] gave online algorithms for P I P E L I N E D S E T C OV E R that are asymptotically 4-competitive, but only in the special case where the jobs are drawn independently at random from a fixed probability distribution (whereas our online algorithm is asymptotically 4-competitive on an arbitrary sequence of jobs). Our offline benefit-maximization problem generalizes the problem of maximizing a monotone submodular set function subject to a knapsack constraint. Previous work gave offline greedy approximation algorithms for this problem [14, 20], which generalized earlier algorithms for B U D G E T E D M A X I M U M C OV E R AG E [12] and M A X k - C OV E R AG E [16]. To our knowledge, none of these problems have previously been studied in an online setting. Note that our problem is quite different from online set covering problems (e.g., [1]) that require one to construct a single collection of sets that covers each element in a sequence of elements that arrive online. In this paper we convert a specific greedy approximation algorithm into an online algorithm. Recently, Kakade et al. [10] gave a generic procedure for converting an -approximation algorithm into an online algorithm that is asymptotically -competitive. Their algorithm applies to linear optimization problems, but not to the non-linear problems we consider here. Independently of us, Radlinkski et al. [17] developed a no-regret algorithm for the online version of M A X k - C OV E R AG E, and applied it to online ranking. As it turns out, their algorithm is a special case of the algorithm OGunit that we present in §4.1. 3 3 Offline Greedy Algorithm In the offline setting, we are given as input a job f : S [0, 1]. Our goal is to compute a schedule S that achieves one of two objectives, either minimizing the cost c (f , S ) or maximizing f (S ) subject to the constraint (S) T .2 As already mentioned, this offline problem generalizes M I N - S U M S E T C OV E R under the former objective and generalizes M A X k - C OV E R AG E under the latter objective, which implies the following computational complexity result [6, 7]. Theorem 1. For any > 0, achieving a 4 - (resp. 1 - 1 + ) approximation ratio for the coste minimization (resp. benefit-maximization) problem is NP-hard. We now consider an arbitrary schedule G, whose j th action is gj = (vj , j ). Let sj = jj , f - ) (G ) where Gj = g1 , g2 , . . . , gj -1 , and let j = max(v, )V ×R>0 (v, j sj . We will prove bounds on the performance of G in terms of thef j values. ( ote that we can ensure j = 0 j by N ) greedily choosing gj = arg max(v, )V ×R>0 (v, j i.e., greedily appending actions to the schedule so as to maximize the resulting increase in f per unit time). A key property is stated in the following lemma, which follows from the submodularity assumption (for the proof, see [18]). fg (Gj ) (G ) Lemma 1. For any schedule S , any positive integer j , and any t > 0, f (S t ) f (Gj ) + t · (sj + j). Using Lemma 1, together with a geometric proof technique developed in [7], we now show that the greedy algorithm achieves the optimal approximation ratio for the cost-minimization problem. Theorem 2. Let S = arg minS S c (f , S ). If j = 0 j , then c (f , G) 4 · c (f , S ). More L generally, let L be a positive integer, and let T = j =1 j . For any schedule S , define cT (f , S ) S d T L l 1-f t. Then cT (f , G) 4 · c (f , S ) + j =1 Ej j , where Ej = t j =1 j . 1 1m fS L -e axS S T - j =1 jj . Given a set of jobs {f (1) , f (2) , . . . , f (n) }, we can optimize the average schedule cost (or benefit) simply P 1 by applying our offline algorithm to the job f = n n 1 f (i) (since any convex combination of jobs is a job). i= 2 4 4 Online Greedy Algorithm In the online setting we are fed, one at a time, a sequence f (1) , f (2) , . . . , f (n) of jobs. Prior to receiving job f (i) , we must specify a schedule S (i) . We then receive complete access to the function f (i) . We measure performance using S o different notions of r1 gren. For tS e cost-minimization objective, tw et h - , n 1 we define Rcost = n i=1 cT (i) , f (i) 4·minS S n i=1 c , f (i) for some fixed T > 0. d S T T t to be the value of Here for any schedule S and job f , we define c (S, f ) = t=0 1 - f t c (S, f ) when the integral is truncated at time T . Some form of truncation is necessary because S c c (i) , f (i) ould be infinite, and without bounding it we could not prove any finite bound on regret (our regret bounds 1 ill be m ated as a1function of T ). For the benn fit-maxiS izat.ion objective, we w st e m S n 1 (i) (i) define Rbenefit = -1 axS S n i=1 f (i) Here we require T -n i=1 f S (ie = ) that for each i, E T , where the expectation is over the online algorithm's random bits. That is, we allow the online algorithm to treat T as a budget in expectation, rather than a hard budget. Our goal is to bound the worst-case expected values of Rcost and Rbenefit . For simplicity, we consider the oblivious adversary model, in which the sequence of jobs is fixed in advance and does not change in response to the decisions made by our online algorithm. We confine our attention to schedules that consist of actions that come from some finite set A, and assume that the actions in A have integer durations (i.e. A V × Z>0 ). 4.1 Unit-cost actions In the special case in which each action takes unit time (i.e., A V × {1}), our online algorithm OGunit is very simple. OGunit runs T action-selection algorithms, E1 , E2 , . . . , ET , where T is the number of time steps for which our schedule is defined. The intent is that each action-selection algorithm is a no-regret algorithm such as randomized weighted majority (WMR) [4], which selects actions so as to maximize payoffs associated with the actions. Just before job f (i) arrives, each action-selection algorithm Et selects an action ai . The schedule used by OGunit . on job f (i) is t S (i) (i) (i) i i i S = a1 , a2 , . . . , aT . The payoff that Et associates with action a is fa t-1 a T nd E [Rcost ] = Theorem 4. Algorithm OGunit has E [Rbenefit ] = O n ln |A| i TT O n the worst case, when WMR [4] is the subroutine action-selection algorithm. n ln |A| Proof. We will vinw OGunit as producing an approximate version of the offline greedy schedule for e 1 the job f = n i=1 f (i) . First, view the sequence of actions selected by Et as a single meta-action at , and extend the domain of each f (i) to include the meta-actions by defining f (i) (S at ) = ~ ~ f (i) (S ai ) for all S S (note each f (i) remains monotone and submodular). Thus, the online t ~ algorithm produces a single schedule S = a1 , a2 , . . . , aT for all i. Let r~ be the regret experienced ~~ ~ ft - ~ . by action-selection algorithm Et . By construction, rt = maxaA a S t-1 fat S t-1 ~ Thus OGunit behaves exactly like the greedy schedule G for the function f , with t = rt . Thus, T Theorem 3 implies that Rbenefit t=1 rt R. Similarly, Theorem 2 implies that Rcost T R. To complete the ana,lysis, it remains to bound E [R]. WMR has worst-case expected regret 1G On where Gmax is the maximum sum of payoffs payoff for any single acmax ln |A| tion.3 BecauTe each payoff is at most 1 and there are n rounds, Gmax n, so a trivial bound is s1 . nf E [R ] = O ln |A| In fact, the worst case is when Gmax = T or all T action-selection n T ( algorithms, leading to an improved bound of E [R] = O ln |A| for details see [18]), which n completes the proof. 3 This bound requires Gmax to be known in advance; however, the same guarantee can be achieved by guessing a value of Gmax and doubling the guess whenever it is proven wrong. 5 4.2 From unit-cost actions to arbitrary actions In this section we generalize the online greedy algorithm presented in the previous section to accommodate actions with arbitrary durations. Like OGunit , our generalized algorithm OG makes use of a series of action-selection algorithms E1 , E2 , . . . , EL (for L to be determined). On each round i, OG constructs a schedule S (i) as follows: for t = 1, 2, . . . , L, it uses Et to choose an action (i) 1 ai = (v , ) A, and appends this action to S (i) with probability . Let St denote the schedule t (i) that results from the first t steps of this process (so St contains between 0 and t actions). The (i) 1 payoff that Et associates with an action a = (v , ) equals fa (St-1 ) (i.e., the increase in f per unit time that would have resulted from appending a to the schedule-under-construction). As in the previous section, we view each action-selection algorithm Et as selecting a single metaaction at . We extend the domain of each f (i) to include the meta-actions by defining f (i) (S at ) = ~ ~ f (i) (S ai ) if ai was appended to S (i) , and f (i) (S at ) = f (i) (S ) otherwise. Thus, the online ~ t t ~ algorithm produces a single schedule S = a1 , a2 , . . . , aL for all i. Note that each f (i) remains ~~ ~ monotone and submodular. For the purposes of analysis, we will imagine that each meta-action at always takes unit time ~ (whereas in fact, at takes unit time per job in expectation). We show later that this assumption ~ does not invalidate any of our arguments. n 1 ~ ~ Let f = n i=1 f (i) , and let St = a1 , a2 , . . . , at . f hus S can be -ewed as a ver,sion of the ~~ ~1 T vi f ~ ~ where we greedy schedule from §3, with t = max(v, )A (v, ) (St-1 ) at (St-1 ) ~ are using the assumption that at takes unit time. Let rt be the regret experienced by Et . Although ~ rt = t in general, the two quantities are equal in expectation (proof omitted). Lemma 2. E [ t] = E [rt ]. We now prove a bound on E [Rbenefit ]. Because each f (i) is monotone and submodular, f is monotone and submodular as well, so the greedy schedule's approximation guarantees apply to f . In T particular, by Theorem 3, we have Rbenefit t=1 t. Thus by Lemma 2, E [Rbenefit ] E [R], T where R = t=1 rt . To bound E [Rbenefit ], it remains to justify the assumption that each meta-action at always takes unit ~ ~) is independent of how long each metatime. First, note that the value of the objective function f (S action at takes. Thus, the only potential Sanger= that in making this assumptionS e hav= overlooked ~ d is w e (i) ( i) a constraint violation of the form E T . But by construction, E L for each i, regardless of what actions are chosen by each action-selection algorithm. Thus if we set L = T there is no constraint violation. Combining the bound on E [R] stated in the proof of Theorem 4 with the fact that E [Rbenefit ] E [R] yields the following theorem. Theorem 5. Algorithm OG, run with input L = T , has E [Rbenefit ] E [.R]. If WMR [4] is used T as the subroutine action-selection algorithm, then E [R] = O n ln |A| The argument bounding E [Rcost ] is similar, ai though somewhat more involved (for details, see [18]). l One additional complication is that S (i) s now a random variable, whereas in the definition of Rcost the cost of a schedule is always calculated up to time T . This can be addressed by making the < probability that S (i) T sufficiently small, which can be done by setting L T and applying concentration of measure inequalities. However, E [R] grows as a function of L, so we do not want to make L too large. The (approximately) best bound is obtained by setting L = T ln n. T Theorem 6. Algorithm OG, run with input L = T ln n, has E [Rcost ] = O(T ln n · E [R] + n ). T ( i 3 In particular, E [Rcost ] = O ln n) 2 T n ln |A| f WMR [4] is used as the subroutine actionselection algorithm. 6 4.3 Dealing with limited feedback Thus far we have assumed that, after specifying a schedule S (i) , the online algorithm receives complete access to the job f (i) . We now consider three more limited feedback settings that may arise in practice. In the priced feedback model, to receive access to f (i) we must pay a price C , which S f (i) is added to our regret. In the partially transparent feedback model, we only observe f (i) or t S (i) . each t > 0. In the opaque feedback model, we only observe f (i) The priced and partially transparent feedback models arise naturally in the case where action (v , ) represents running a deterministic algorithm v for time units, and f (S ) = 1 if some action in S yields a solution to some particular problem instance, and f (S ) = 0 otherwise. If we execute a schedule S and halt as soon as some action yields a solution, we obtain exactly the information that is revealed in the partially transparent model. Alternatively, running each algorithm v until it returns a solution would completely reveal the function f (i) , but incurs a computational cost, as reflected in the priced feedback model. Algorithm OG can be adapted to work in each of these three feedback settings; see [18] for the specific bounds. In all cases, the high-level idea is to replace the unknown quantities used by OG with (unbiased) estimates of those quantities. This technique has been used in a number of online algorithms (e.g., see [2]). 4.4 Lower bounds on regret We now state lower bounds on regret; for the proofs see the full paper [18]. Our proofs have the same high-level structure as that of the lower bound given in [4], in that we define a distribution over jobs that allows any online algorithm's expected performance to be easily bounded, and then prove a bound on the expected performance of the best schedule in hindsight. The upper bounds in Theorem 4 match the lower bounds in Theorem 7 up to logarithmic factors, although the latter apply to standard regret as opposed to Rbenefit and Rcost (which include factors of 1 - 1 and 4). e T |V | Theorem 7. Let X = n ln T . Then any online algorithm has worst-case expected regret (X ) (resp. (T X )) for the online benefit-maximization (resp. cost-minimization) problem. 5 Experimental Evaluation on SAT 2007 Competition Data The annual SAT solver competition (www.satcompetition.org) is designed to encourage the development of efficient Boolean satisfiability solvers, which are used as subroutines in state-ofthe-art model checkers, theorem provers, and planners. The competition consists of running each submitted solver on a number of benchmark instances, with a per-instance time limit. Solvers are ranked according to the instances they solve within each of three instance categories: industrial, random, and hand-crafted. We evaluated the online algorithm OG by using it to combine solvers from the 2007 SAT solver competition. To do so, we used data available on the competition web site to construct a matrix X , where Xi,j is the time that the j th solver required on the ith benchmark instance. We used this data to determine whether or not a given schedule would solve an instance within the time limit T (schedule S solves instance i if and only if, for some j , S T contains an action (hj , ) with Xi,j ). As illustrated in Example 1, the task of maximizing the number of instances solved within the time limit, in an online setting in which a sequence of instances must be solved one at a time, is an instance of our online problem (under the benefit-maximization objective). Within each instance category, we compared OG to the offline greedy schedule, to the individual solver that solved the most instances within the time limit, and to a schedule that ran each solver in parallel at equal strength. For these experiments, we ran OG in the full-information feedback model, after finding that the number of benchmark instances was too small for OG to be effective in the limited feedback models. Table 1 summarizes the results. In each category, the offline greedy schedule and the online greedy algorithm outperform all solvers entered in the competition as well as the na¨ve parallel schedule. i 7 Table 1: Number of benchmark instances solved within the time limit. Category Industrial Random Hand-crafted Offline greedy 147 350 114 Online greedy 149 347 107 Parallel schedule 132 302 95 Top solver 139 257 98 References [1] Noga Alon, Baruch Awerbuch, and Yossi Azar. The online set cover problem. In Proceedings of the 35th STOC, pages 100­105, 2003. ` [2] Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48­77, 2002. [3] Shivnath Babu, Rajeev Motwani, Kamesh Munagala, Itaru Nishizawa, and Jennifer Widom. Adaptive ordering of pipelined stream filters. In Proc. Intl. Conf. on Management of Data, pages 407­418, 2004. ` [4] Nicolo Cesa-Bianchi, Yoav Freund, David Haussler, David Helmbold, Robert Schapire, and Manfred Warmuth. How to use expert advice. Journal of the ACM, 44(3):427­485, 1997. [5] Edith Cohen, Amos Fiat, and Haim Kaplan. Efficient sequences of trials. In Proceedings of the 14th SODA, pages 737­746, 2003. [6] Uriel Feige. A threshold of ln n for approximating set cover. Journal of the ACM, 45(4):634­ 652, 1998. ´´ ´ [7] Uriel Feige, Laszlo Lovasz, and Prasad Tetali. Approximating min sum set cover. Algorithmica, 40(4):219­234, 2004. [8] Carla P. Gomes and Bart Selman. Algorithm portfolios. Artificial Intelligence, 126:43­62, 2001. [9] Bernardo A. Huberman, Rajan M. Lukose, and Tad Hogg. An economics approach to hard computational problems. Science, 275:51­54, 1997. [10] Sham Kakade, Adam Kalai, and Katrina Ligett. Playing games with approximation algorithms. In Proceedings of the 39th STOC, pages 546­555, 2007. [11] Haim Kaplan, Eyal Kushilevitz, and Yishay Mansour. Learning with attribute costs. In Proceedings of the 37th STOC, pages 356­365, 2005. [12] Samir Khuller, Anna Moss, and Joseph (Seffi) Naor. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39­45, 1999. [13] Andreas Krause and Carlos Guestrin. Near-optimal nonmyopic value of information in graphical models. In Proceedings of the 21st UAI, pages 324­331, 2005. [14] Andreas Krause and Carlos Guestrin. A note on the budgeted maximization of submodular functions. Technical Report CMU-CALD-05-103, Carnegie Mellon University, 2005. [15] Kamesh Munagala, Shivnath Babu, Rajeev Motwani, Jennifer Widom, and Eiter Thomas. The pipelined set cover problem. In Proc. Intl. Conf. on Database Theory, pages 83­98, 2005. [16] G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14(1):265­294, 1978. [17] Filip Radlinski, Robert Kleinberg, and Thorsten Joachims. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th ICML, pages 784­791, 2008. [18] Matthew Streeter and Daniel Golovin. An online algorithm for maximizing submodular functions. Technical Report CMU-CS-07-171, Carnegie Mellon University, 2007. [19] Matthew Streeter, Daniel Golovin, and Stephen F. Smith. Combining multiple heuristics online. In Proceedings of the 22nd AAAI, pages 1197­1203, 2007. [20] Maxim Sviridenko. A note on maximizing a submodular set function subject to a knapsack constraint. Operations Research Letters, 32:41­43, 2004. 8