Online Feature Elicitation in Interactive Optimization

Craig Boutilier Kevin Regan Paolo Viappiani Dept. of Computer Science, University of Toronto, Toronto, ON, CANADA

cebly@cs.toronto.edu kmregan@cs.toronto.edu paolo@cs.toronto.edu

Abstract
Most models of utility elicitation in decision support and interactive optimization assume a predefined set of "catalog" features over which user preferences are expressed. However, users may differ in the features over which they are most comfortable expressing their preferences. In this work we consider the problem of feature elicitation: a user's utility function is expressed using features whose definitions (in terms of "catalog" features) are unknown. We cast this as a problem of concept learning, but whose goal is to identify only enough about the concept to enable a good decision to be recommended. We describe computational procedures for identifying optimal alternatives w.r.t. minimax regret in the presence of concept uncertainty; and describe several heuristic query strategies that focus on reduction of relevant concept uncertainty.

that user preferences are specified over a predefined set of attributes. For instance, in consumer product configuration--say, choice of an automobile-- preferences are assumed to be defined in terms of product features and specifications (e.g., color, engine size, fuel economy, available options, etc.). We refer to such "universal" attributes in a specific domain as the catalog features.1 However, just as preferences can vary significantly from user to user, so too can the features over which their preferences are most naturally expressed. For example, some users may be concerned about the "degree of safety" of a car, but different users may have different notions of safety in mind, none of which are catalog features. Furthermore, the user-specific subjectivity of safety prevents one from adding a new feature to the catalog. In this paper, we develop a framework for eliciting subjective features in interactive optimization. We assume that subjective features can be defined in terms of the set of catalog features and that preferences are defined over both catalog and subjective features. As such, learning subjective feature definitions can be cast as a concept learning problem (Angluin, 1987; Haussler, 1988). However, unlike traditional concept learning, our aim is not to learn the concept definition per se; rather we want to learn just enough about it to make a good or optimal decision on the user's behalf. To illustrate, suppose we have positive and negative examples of safe cars, which constrain but do not fully specify (subjective) safety: as a result, only certain cars are consistent with possible realizations of the definition of safety. Other user preference information (e.g., utility tradeoffs between safety, performance and cost) may render any "safety consistent" car so undesirable that we will not recommend a safe car no matter how the concept definition is realized. Further elicitation (i.e., refinement of the version space) is therefore useless. Our focus in this paper is on feature elicitation. We
1 This is meant merely to be evocative of a product catalog; we do not assume that an explicit catalog exists.

1. Introduction
Many decision support settings are designed to help users effectively explore a space of possible alternatives (products, system configurations, plans, etc.) to find one that is optimal (or at least acceptable) with respect to the user's preferences. The ability to tailor recommendations to the needs and desires of particular users requires the incorporation of preference or utility elicitation into the navigation process. Considerable work in AI, decision analysis and operations research has been devoted to effective means of eliciting preferences; much of this work can be viewed as a form of interactive optimization (Boutilier et al., 2006; Salo & H皂甸甬nen, 2001). a aa Typical frameworks for preference elicitation assume
Appearing in Proceedings of the 26 th International Conference on Machine Learning, Montreal, Canada, 2009. Copyright 2009 by the author(s)/owner(s).

Online Feature Elicitation in Interactive Optimization

develop a framework in which other aspects of a user's utility function are known and only the feature definition is unspecified. We use minimax regret (Boutilier et al., 2006; Savage, 1954) as our decision criterion given concept uncertainty, allowing good or optimal decisions to be made without complete feature specification. We describe an integer program formulation for computation of minimax regret in the case of conjunctive concepts. We also present several heuristic techniques for querying concept definitions that reduce minimax regret quickly. In contrast to standard concept learning models, we reduce "relevant" concept uncertainty w.r.t. the utility model, rather than learn an accurate concept definition for its own sake. While simultaneously querying users about both utility and feature definitions is important, in this work we focus on feature elicitation itself (however, we briefly discuss an extension of our model to the simultaneous case).

in some space W . We use the minimax-regret decision criterion for making decisions in the face of such utility function uncertainty (Boutilier et al., 2006; Salo & H皂甸甬nen, 2001; Savage, 1954). Following Boutilier a aa et al. (2006) we have: Definition 1 Given utility space W , define the max regret of x  X, the minimax regret of W and the minimax optimal configuration as follows: MR(x; W ) = max max u(x ; w) - u(x; w)
wW x X

MMR(W ) = min MR(x, W )
xX

x = arg min MR(x, W ) W
xX

2. Regret-based Optimization and Elicitation
We assume a decision support task in which our system is charged with recommending an outcome to a user from some feasible set; for concreteness, we will will use terminology associated with consumer product configuration, but our model applies to any multiattribute decision domain with constraints on feasible outcomes. Products are characterized by a set of attributes X = {X1 , ...Xn }, each with finite domains Dom(Xi ). Let X  Dom(X ) denote the set of feasible configurations. For instance, attributes may correspond to the catalog features of various cars (engine size, fuel economy, cargo capacity, etc.) with X defined by constraints on attribute combinations, or an explicit database of feasible configurations. The user has a utility function u : Dom(X )  R, typically parameterized compactly (e.g, using linear, additive or generalized additive form (Braziunas & Boutilier, 2007; Fishburn, 1967)). The precise form of u is not critical, only that u(x; w) is linear in the parameters (or weights) w. Our system does not have direct access to the user's utility parameters w; but elicitation can be used to refine its knowledge of w. However, decisions will still generally be made without full knowledge of w for two key reasons (Boutilier et al., 2006). First, good or optimal decisions can often be made with little utility information. Second, the value of obtaining certain utility information--in terms of its impact on decision quality--is often not worth the cost of obtaining it. Assume a decision must be made, but the system only knows that w  W , i.e., the user's utility function lies

Intuitively, MR(x, W ) is the worst-case loss associated with recommending configuration x; i.e., by assuming an adversary will choose the user's utility function w from W to maximize the difference in utility between the optimal configuration (under w) and x. The minimax optimal configuration x minimizes this potenW tial loss. MR(x, W ) bounds the loss associated with x, and is zero iff x is optimal for all w  W . Minimax regret has been applied successfully to robust optimization given utility uncertainty in additive and generalized additive models (Boutilier et al., 2006; Braziunas & Boutilier, 2007) for decision problems involving large-scale mixed integer programs (MIPs). While regret-optimization requires the solution of a minimax problem with a quadratic objective, the application of Benders' decomposition, constraint generation, and various reformulations renders the problem feasible, converting it to a (linear) MIP (Boutilier et al., 2006). We will adapt these techniques below. Minimax regret has also proven to be effective as a means of utility elicitation (Boutilier et al., 2006; Braziunas & Boutilier, 2007). One especially effective heuristic strategy is the current solution strategy, where preference queries are asked that involve the current minimax-optimal configuration x and/or W the adversarial configuration (or witness). Unlike volumetric-based approaches to elicitation, regretbased elicitation reduces utility uncertainty only in the relevant regions of utility space, exploiting knowledge of which configurations are actually feasible. In contrast to Bayesian elicitation, regret models require no prior over utility functions, nor expensive, approximate probabilistic inference.

3. Subjective Feature Elicitation
In many cases, some of the attributes over which a user forms her preferences will not coincide with system

Online Feature Elicitation in Interactive Optimization

catalog features. We consider subjective features that are "objectively" definable using catalog attributes, but where the definition varies from user to user.2 For instance, the notion of a safe car may be different for a parent with small children, a young, single professional interested in high-performance vehicles, and a family that takes frequent trips to the mountains. The concept in question, namely, safety, has personalized definitions. The user also has preferences for safety (as she does for other car attributes) represented in the form of a utility function: it is both the user's utility function over this extended attribute space, as well as her personal definition of safety, that determines the optimal vehicle. As such, the decision support system must engage in both preference elicitation and feature elicitation to make a suitable recommendation. This leads to interesting tradeoffs in elicitation. One could engage in feature elicitation using well-known concept learning techniques (Angluin, 1987; Hellerstein et al., 1996), and then move to preference elicitation (e.g., using techniques mentioned above); but this could be wasteful. For instance, suppose we learn that safety requires attribute Xi to be true (e.g., have side airbags) but know nothing else about the concept. If we engaged in preference elicitation simultaneously and ascertained that no cars in the user's price range satisfy Xi --or that other more important features must be sacrificed to attain Xi --then the full concept definition is not needed for optimal allocation. For similar reasons, full preference elicitation followed by feature elicitation may be undesirable. This suggests that interleaved feature and utility elicitation can be much more effective. In this paper, we focus on the problem of feature elicitation in isolation, setting aside utility uncertainty. Apart from forming the foundation for work on joint utility and feature elicitation, this problem is interesting in its own right, as an extension of concept/query learning. In the remainder of this section, we describe a regret-based framework for pure feature elicitation, assuming a known utility function. We briefly discuss a more general framework that incorporates both utility and feature uncertainty in Sec. 5. 3.1. Feature Elicitation Model We have features X = {X1 , ...Xn }, which we take to be Boolean for ease of exposition, and a feasible set X  Dom(X ). A known reward r(x) for each x  X reflects user utility for x w.r.t. known product features.
2 Other subjective features may not be so definable (e.g., visual features); for this, data-intensive collaborative filtering techniques are more appropriate.

The user also has preference for configurations satisfying some target concept c. Concept c is an unknown Boolean function over X : c(x) = c(x1 , . . . , xn ).3 We suppose that c is drawn from a particular function class/hypothesis space H. We treat identification of c as a problem of concept learning (Angluin, 1987; Haussler, 1988; Hellerstein et al., 1996), with some query set Q used to refine the target concept. For instance, membership queries would be quite natural ("do you consider the following car to be safe?").4 Finally, a known bonus p is associated with any x s.t. c(x) holds, representing user utility for concept satisfaction. For given a target concept c and x  X, define the utility of x under concept c to be: u(x; c) = r(x) + pc(x) (We treat c(x) as an indicator function for concept satisfaction.) In other words, the utility of x is its reward, plus the bonus p if x satisfies c. The optimal configuration is x = arg max u(x; c). 3.2. Minimax Regret over Concepts If we do not know the target concept, we cannot generally identify x ; but we can still make a decision with partial concept information. Let version space V  H represent our current set of consistent hypotheses (Mitchell, 1977). We define minimax regret w.r.t. feature uncertainty: Definition 2 Given version space V , the max regret of x  X, the minimax regret of V and the minimax optimal configuration are: MR(x; V ) = max max u(x ; c) - u(x; c)
cV x X

(1) (2) (3)

MMR(V ) = min MR(x, V )
xX xX

x = arg min MR(x, V ) V

This provides a simple model of subjective feature uncertainty that abstracts away utility uncertainty. An adversary can cause us to regret recommendation x through suitable choice of concept c  V . If MMR(V ) = 0, then x is optimal for any realization V of the concept c  V . We ask queries from query class Q to refine knowledge of c to a point where minimax regret MMR(V ) is reduced to some acceptable tolerance  (possibly zero). Our goal is very different from that of classical concept learning. We aim not to learn the concept c itself, but
Allowing multivalued concepts is straightforward. Other query types (e.g., equivalence queries) are less natural in this domain, but may play a role in others.
4 3

Online Feature Elicitation in Interactive Optimization

simply learn enough about it to make a good decision. The reward model and feasibility constraints on X often allow us to make optimal recommendations with relatively weak concept knowledge. We can characterize the minimax optimal solution in terms of the structure of the version space. We use the standard general-specific version space lattice (Mitchell, 1977): concept c is at least as general as c (c  c ) iff c  c (we often treat concept c and its extension on X indistinguishably). For any concept c, let Xc = X  c be the set of feasible outcomes satisfying c. Define the best outcome satisfying c:
 x = arg max r(x); and rc = max r(x). c xXc xXc    Clearly rc  rc if c  c . The measure rc induces a natural ordering r on V that respects the generalspecific ordering:   Observation 1 If c  c then rc  rc .   We say c  V is reward optimal iff rc  rc , c  V . + Define X  X to be the set of configurations with highest reward (independent of V ) and r+ = r(X+ ). Finally, let r1 > r2 . . . > rm be an ordering of the  m (distinct) values in {rc : c  V }. Define Ci =  {c  V : rc = ri } and Si = Ci ; i.e., Si is the set of configurations in the intersection of all concepts in V with the ith-largest r value (Si may be empty).

Proposition 1 If x is not consistent with V , then V x  X + (and all x  X + have identical max regret). V Proposition 2 If x  X + , then: (a) x is consisV V tent with V ; (b) x  arg max{r(x) : x  S1 . . .Si } V for some i  1; and (c) either cw  C1 , or cw  Ci+1 . We also have: Observation 2 x  cw only if xw  cw . V
 Observation 3 If rc  r+ - p, then MMR(V ) = 0  w + and x = x = x . Thus if we have reduced V so that no V -consistent x  X has reward within p of r+ , then the (true) optimal choice does not satisfy the target concept.

3.3. Computing Regret: Conjunctive Concepts We assume that the underlying configuration problem is represented as a MIP maxxX r(x). When subjective feature uncertainty exists, minimax regret computation can often be directly incorporated into the MIP for particular concept and query classes. We illustrate this in the case of (nonmonotone) conjunctive concepts with membership queries. Assume target c is a conjunction of literals over variables Xj . Memberships queries ask whether x  c for a particular configuration. Let E + be the set of positive examples, E - the set of negative examples, and (nonempty) V the induced version space. Instead of representing V using most general and most specific concepts, we encode E + and E - directly in our MIP below (see, e.g., Hirsch (1992) who uses negative examples to represent most general concepts). Constraint Generation: We formulate the minimax problem Eq. 2 as a minimization with O(|V |) constraints. Let xc = arg maxxX u(x; c) and constant p(x, c) = p if c(x) and 0 otherwise. Let indicator variable I c , for each c  V , denote that configuration (X1 , � � � , Xn ) satisfies c; and write xj  c (xj  c) to denote that variable Xj occurs positively (negatively) in c. Then MMR(V ) is:
min  s.t.   r(xc )-r(X1 , � � � Xn )+p(xc , c)-pI c c  V I c  Xj c  V, xj  c I c  1 - Xj c  V, xj  c (4) (5) (6)

Let MMR(V ) be the current minimax regret level, x a minimax optimal solution, xw the adversarial V witness outcome, and cw the witness concept. Minimax regret can be viewed as a game between a player choosing x and an adversary choosing xw and cw . Clearly, if x is not consistent with V (i.e., x  c for all c  V ), then x  X + . So restrict attention to V consistent configurations. Suppose the player chooses a V -consistent x . If x  S1 , the adversary can choose a cw  C1 that excludes x and xw = xw , c thus maximizing regret by taking a highly rewarding configuration, obtaining the bonus, and preventing the player from getting the bonus. Regret will be strictly lower if the player chooses x  S1 itself: for the adversary to get the bonus and prevent the player from getting it, it must choose a concept outside C1 (hence a less rewarding xw ). The player can further limit the adversary by ensuring x  S2 (forcing the adversary to move to C3 if it wants to get the bonus while the player does not). Of course, this comes at a price: the player gets a lower reward by moving to an x  S1  S2 (if indeed this set is nonempty); this is traded off against the benefit of further restricting the adversary. This can be formalized to prove:

For any fixed concept c, the adversary maximizes the regret of (X1 , � � � , Xn ) with witness xc . The MIP above chooses a configuration that minimizes against the "worst-case" choice of the adversary (with (4) ensuring MMR is as great as regret given any c  V ; and (5, 6) encoding whether (X1 , � � � , Xn ) satisfies c). Regret constraints for most c  V will be inactive, so we use constraint generation to search through the

Online Feature Elicitation in Interactive Optimization

space of adversarial concepts.5 Let Gen  V ; we solve a relaxed MIP using only c  Gen. Let   and x be the solution to the relaxed MIP. We test for violated constraints by solving the max regret problem MR(x , V ), detailed below. If MR(x , V ) >   , concept c (produced as a witness in MR computation) offers larger regret for x than any c  Gen. So we add c to Gen and resolve. If MR(x , V ) =   , x is the optimal solution to MMR(V ). Generating Violated Constraints: We can compute the maximally violated constraint for MMR computation above by solving the max regret problem MR(x, V ) for the current relaxed solution x . This too can be formulated as a MIP. For each configuration variable Xj , let binary indicator variables I(xj ) (resp. I(xj )) denote that Xj is positive (resp. negative) in concept c. We also introduce binary variables B x and B w indicating that x and the witness allocation satisfy the chosen concept. Using x[j] to denote the jth literal of x, this MIP gives MR(x; V ):
max r(X1 , � � � , Xn ) - r(x) + pB w - pB x s.t. B w + I(xj )  Xj + 1.5 j  n B w + I(xj )  (1 - Xj ) + 1.5 j  n X X I(xj ) - Bx  1 -
j:x[j] positive j:x[j] negative

3.4. Query Strategies We now consider several heuristic query strategies that can quickly reduce MMR(V ). We focus on membership queries, since these are most natural in preference assessment. Query strategies from concept learning can be used directly to refine feature uncertainty. However, the motivation for many query strategies is very dependent on the learning model in question. For instance, the mistake-bound model (Angluin, 1987) provides strategies for certain hypothesis classes, such as conjunctions of literals, which offer a polynomial mistake bound. However, these minimize mistakes in prediction, not queries: even a mistake-free prediction requires user concept feedback. Our desire to provide worst-case guarantees on performance without distributional information over potential user-defined features prevents us from adopting a PAC-model (Haussler, 1988). The most natural connection to our model is with work on exact concept learning that attempts to minimize queries (Hellerstein et al., 1996). However, results for these models tend to be rather weak. For instance, conjunctive concepts cannot be learned without exponentially many membership queries in the worst case (Hellerstein et al., 1996). A halving strategy can be adapted to our utility-based, choice-oriented concept learning model: we ask random memberships queries until a positive example is found (in the mistake-bound model negative predictions would be used); then queries are asked by negating literals one by one in the (unique) most specific conjunctive hypothesis. Once a positive example is found, this converges to the true conjunctive concept using a number of queries linear in |X |. We need not identify the concept exactly however; we terminate once minimax regret reaches an acceptable level. We call this the halving strategy. Despite its exponential worst-case for exact concept learning, the requirement to simply reduce regret should give rise to better performance in our model. One problem with halving is its lack of regard for reward or configuration feasibility: it wastes effort learning about parts of a concept definition that are irrelevant to making a good "choice." So we consider an alternative heuristic based on using information in the current solution: either the minimax optimal x or V the witness xw (Boutilier et al., 2006). Intuitively, our best hope for (immediate) reduction in regret is to change the version space so that the status of at least one of x or xw changes w.r.t. V . We define the current solution strategy (CSS) by considering three distinct cases for suggesting membership queries. First consider the case where x , xw  cw : we know

(7) (8) I(xj ) (9) (10) (11) (12)

X
j

I(流[j]) = 0 I(流[j])  1

y  E + y  E -

X
j

(X1 , � � � , Xn )  X

Apart from the binary indicator variables B w , B x , I(xj ), I(xj ) described above, we have configuration variables Xj . (12) ensures that the configuration is feasible, and notation r(X1 , � � � , Xn ) loosely denotes the reward of any configuration (which is encoded using a linear parameterization of utility). (7) and (8) ensure that the adversary does not get the concept bonus p (i.e., cannot set B w = 1) unless (X1 , � � � , Xn ) satisfies the concept dictated by the I-variables. Similarly, (9) ensures that the input configuration x cannot be denied the bonus (i.e., the adversary cannot set B x = 0) unless x violates at least one conjunct in the chosen concept. Finally, (10, 11) restrict the conjunctive concept to be consistent with all positive and negative examples.6
5 V is exponential in |X | with conjunctive concepts; constraint generation is even more important in other hypothesis spaces, since they can have doubly exponential size. 6 The version space for conjunctions of literals can, of course, be encoded more succinctly, i.e., without constraints for each example.

Online Feature Elicitation in Interactive Optimization

cw  C1 must be reward-optimal and xw must be a V -consistent configuration with greatest reward. CSS asks a membership query xw (does xw satisfy the concept), which has great potential value: if the answer is positive, then we know xw is, in fact, optimal whatever the concept (and we reduce MMR(V ) to zero); and if the answer is negative, we rule out a reward-optimal concept cw from the version space, potentially reducing minimax regret, and strictly reducing max regret (ignoring ties in reward) for all x  cw . (Asking a membership query x has less impact in this case). Second suppose x  cw , xw  cw : we either have x  X+, in which case cw  C1 ; or x is V -consistent, in which case cw  Ci+1 (where x is chosen from S1  . . .  Si ). In this case, CSS asks a membership query x . A positive response reduces pairwise max regret between x and xw (and hence often reduces MMR(V )). A negative response does not reduce pairwise regret, but it rules out all concepts from the version space that include x and all concepts more general (if any) than cw , giving us additional flexibility in the choice of allocation without losing value. Finally, suppose x , xw  cw (recall, we cannot have x  cw , xw  cw ). If xw is not V -consistent, we know xw  X+ , so CSS asks a membership query x (the rationale is as in case 2 above). If xw is V -consistent, then xw  cw only if xw  c implies x  c, for all c  V (in which case, choosing a concept that satisfies xw has no impact on MMR(V )). In this case, CSS asks a membership query xw (as in case 1 above).7

40

35

CSS Halving Random

30

25 Max Regret

20

15

10

5

0 0

50

100 Number of Queries

150

200

Figure 1. Max regret vs. number of queries (20 constraints, p = 25%), 50 runs

. bility 0.25), negatively (0.25), or not at all (0.50). Minimax regret computation was reasonably fast (under 1 sec. using CPLEX 11 on a quad-core Intel machine). We first illustrate the performance of our three main strategies using some typical parameter settings: 20 binary constraints and a bonus p set at 25% of the maximum reward value. Fig. 1 shows reduction in MMR as a function of the number of membership queries by each strategy. The performance of CSS is considerably better than that of Halving. Key to the performance difference is the ability to exploit the current solution to identify which concepts have the greatest impact on performance given feasibility restrictions and reward values (in particular, r values). The relative density of feasible configuration space has a significant impact on the relative performance of Halving versus CSS. Intuitively, with a sparser solution space, fewer configurations satisfy a given concept  c (hence reducing rc ), and the value of considering most concepts is diminished. In such a case, CSS has an advantage over Halving. Conversely, with dense solution spaces, narrowing down the version space to get close to full concept identification becomes more important, making Halving potentially more attractive. Fig. 2 shows the number of membership queries needed to reduce minimax regret to 80% of its original level (prior to any queries) while varying the number of binary constraints. More constraints give a sparser solution space; and with 20 (or more) constraints, CSS has a significant advantage. Similarly, the relative magnitude of the feature bonus p to overall reward can have a dramatic impact on the number of required queries and the need to narrow down the feature definition more or less precisely. Intuitively, when p is relatively small, it has a rela-

4. Empirical Evaluation
We experimented with the two heuristic strategies, Halving and CSS, as well as Random, which asks random membership queries. Configuration problems with 30 Boolean variables were generated, each with random binary constraints to reflect the realistic assumption that the space of feasible products is relatively sparse (each constraint rules out (on average) half of the configurations). Rewards were generated using a linear utility function: each variable Xi was randomly assigned a reward ri  U [0, 10] and a parity (if positive, xi gets reward ri and xi reward 0; if negative, the opposite). Reward r(x) is the sum of the variable rewards. The bonus p for concept satisfaction was fixed at a certain percentage of the maximum reward. Conjunctive concepts were randomly drawn from a pool of ten variables, with each variable (independently) occurring in the concept positively (proba7 If CSS recommends x or xw when membership (or nonmembership) is logically certain given V , the other query is asked. (Both can't be certain unless MMR is 0.)

Online Feature Elicitation in Interactive Optimization
120

Queries until Max Regret Reduction of 80 %

CSS Halving
100

40

35

CSS without seed Halving without seed CSS with seed Halving with seed

80

30

60

25 Max Regret
0 5 10 15 20 Number of Binary Constraints 25

40

20

20

15

10
0

5

Figure 2. Sensitivity to number of binary constraints; bonus p = 25%, 50 runs.
70

0 0

20

40 60 Number of Queries

80

100

Queries until Max Regret Reduction of 80 %

CSS Halving
60

Figure 4. Max regret vs. number of queries with/without an initial positive seed (25 constraints, p = 25%), 50 runs.

50

40

30

20

10

over 35. Halving is able to reduce regret to 0 in just 10 queries (cf. no initial seed, where it needs 100 queries to get to regret 5). The CSS strategy, dominant in the setting without seeding, still compares favorably to Halving if a positive example is available.
0.05 0.1 0.25 Bonus as Percentage of Maximum Reward 0.5

0

5. Concluding Remarks
We have presented a model of subjective feature elicitation for use in preference elicitation. The use of minimax regret allows us to focus user queries on those aspects of a feature or concept that have the greatest impact on decision quality, thus reducing the effort required to learn the relevant parts of a concept. Related Work Our work has connections to work in concept learning and active learning. Our model differs from the mistake-bound model in our need to minimize queries (not prediction error), while the PACmodel emphasis on probabilistic correctness w.r.t. a fixed data distribution renders it inapplicable to our setting.8 Exact query learning (Hellerstein et al., 1996) bears the closest link to our model; but ours differs from all concept models in the aim to learn just enough about a concept to make a good decision, not the entire concept. However, query strategies from these models (like halving) can be adapted to the regret setting as we have seen. Connections between concept learning and preference elicitation have been explored by Blum et al. (2004), who deal with exact learning of a combinatorial valuation function. While
8 Another difference is in the emphasis on computational tractability: we generally deal with underlying optimization problems, like configuration, that are inherently NPhard. While the computation methods in this paper are very fast (all queries generated in well under a second), our primary concern is minimizing user burden.

Figure 3. Sensitivity to value of bonus p (20 random constraints), 50 runs.

tively small impact on utility and far fewer queries will be needed to reduce regret to zero (since even a loose feature definition will lead to little error). As the magnitude of the bonus increases, a greater need for more precise concept constraints will emerge. Fig. 3 shows this effect. We show the number of membership queries needed to reduce minimax regret to 80% of its original level (prior to any queries), varying the relative size of the bonus from 5% to 50% of r+ . While an average of roughly five queries suffice for the CSS strategy at p = 5%, about 25 queries are needed at p = 50%. By way of contrast, Halving needs significantly more queries (from nearly 30 to over 60). Finally, in certain elicitation settings, it is reasonable to assume that a user can provide a positive concept example (e.g., a "safe" car configuration). In the exact learning model, conjunctive concept identification is hampered by a potentially exponential number of membership queries with negative responses. Once a positive response is obtained, halving can be utilized. Fig. 4 shows a comparison of Halving in which an initial positive instance is assumed, with Halving (no initial seed) and CSS (with initial seed). We plot regret reduction as a function of the number of queries. The availability of a positive seed has a strong impact on regret, reducing the initial minimax regret to 15 from

Online Feature Elicitation in Interactive Optimization

their focus is on exact learning differs from ours, their valuation model should lend itself to approximation and regret analysis. Our work can also be viewed as a form of active learning (Cohn et al., 1994; Dasgupta et al., 2008; Freund et al., 1997). Indeed, the focus on regret reduction (and termination when regret reaches some ) is a non-Bayesian analog of the value of information criterion that underlies much work on active learning. We assume a realizable setting in this work, and our membership query selection technique falls within the framework of Cohn et al. (Cohn et al., 1994) (we only query points on which two c  V disagree). While most techniques from active learning rely on probabilistic data models (Dasgupta et al., 2008; Freund et al., 1997), there is certainly potential to adapt active query strategies to our setting (where the data is not drawn randomly, but the set X is implicitly available). The key will be to bias queries in a way that accounts for the contribution of the reward function to error (minimax regret).

References
Angluin, D. (1987). Queries and concept learning. Machine Learning, 2, 319�342. Blum, A., Jackson, J. C., Sandholm, T., & Zinkevich, M. (2004). Preference elicitation and query learning. Journal of Machine Learning Research, 5, 649�667. Boutilier, C., Patrascu, R., Poupart, P., & Schuurmans, D. (2006). Constraint-based optimization and utility elicitation using the minimax decision criterion. Artifical Intelligence, 170, 686�713. Braziunas, D., & Boutilier, C. (2007). Minimax regretbased elicitation of generalized additive utilities. Proc. 23rd Conf. on Uncertainty in AI (UAI-07) (pp. 25�32). Vancouver. Cohn, D., Atlas, L., & Ladner, R. (1994). Improving generalization with active learning. Machine Learning, 25, 201�221. Dasgupta, S., Hsu, D., & Monteleoni, C. (2008). A general agnostic active learning algorithm. Advances in Neural Information Processing Systems 20 (pp. 353�360). Cambridge, MA: MIT Press. Fishburn, P. C. (1967). Interdependence and additivity in multivariate, unidimensional expected utility theory. International Economic Review, 8, 335�342. Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28, 133�168. Haussler, D. (1988). Quantifying inductive bias: Ai learning algorithms and valiant's learning framework. Artificial Intelligence, 36, 177�221. Hellerstein, L., Pillaipakkamnatt, K., Raghavan, V. V., & Wilkins, D. (1996). How many queries are needed to learn? Journal of the ACM, 43, 840�862. Hirsh, H. (1992). Polynomial-time learning with version spaces. Proc. 10th National Conf. on AI (AAAI-92) (pp. 117�122). San Jose. Mitchell, T. M. (1977). Version spaces: A candidate elimination approach to rule learning. Proc. 5th Intl. Joint Conf. on AI (IJCAI-77) (pp. 305�310). Cambridge. Salo, A., & H皂甸甬nen, R. P. (2001). Prefera aa ence ratios in multiattribute evaluation (PRIME)� elicitation and decision procedures under incomplete information. IEEE Trans. on Sys., Man and Cyber., 31, 533�545. Savage, L. J. (1954). The foundations of statistics. New York: Wiley.

Joint Preference and Feature Elicitation Space precludes a full discussion, but the MIP approach above can be generalized to account for simultaneous uncertainty in both feature definition and utility (i.e., reward function and bonus p). We assume general linear constraints relating the utility of two outcomes (e.g., a user comparison of two products will tell us that the utility of one is greater than the other). A key complication, compared to standard elicitation models, is that we do not simply get linear constraints on utility parameters w. We must encode a set of "conditional constraints" that relate the difference in reward between the two products to whether either or both satisfy the unknown concept. These can be linearized and encoded in single MIP. We defer a full investigation to future work.

Future Directions A number of important directions for future research remain. Among these are exploring whether existing active learning models and query strategies can be adapted to our choice-based, regret model. We are currently investigating new query types and practical models for richer hypothesis spaces. Generalizing the form of catalog and subjective features to real-valued domains is of interest, as is investigation of the conceptually straightforward extension to discrete, non-Boolean features. Finally, extending our approach to non-additive utility models (e.g., adapting techniques for GAI models (Braziunas & Boutilier, 2007)) is essential.