Identifying Optimal Sequential Decisions A. Philip Dawid Statistical Laboratory, Centre for Mathematical Sciences, Cambridge, UK apd@statslab.cam.ac.uk Vanessa Didelez Department of Mathematics University of Bristol, UK vanessa.didelez@bristol.ac.uk Abstract We consider conditions that allow us to find an optimal strategy for sequential decisions from a given data situation. For the case where all interventions are unconditional (atomic), identifiability has been discussed by Pearl & Robins (1995). We argue here that an optimal strategy must be conditional, i.e. take the information available at each decision point into account. We show that the identification of an optimal sequential decision strategy is more restrictive, in the sense that conditional interventions might not always be identified when atomic interventions are. We further demonstrate that a simple graphical criterion for the identifiability of an optimal strategy can be given. are called conditional, dynamic or adaptive treatment strategies. A particular feature of such strategies is that for a given patient it is not known what exact dose they will receive at future interventions as this will depend on their future coagulation test results which are sub ject to random variation and influenced by many factors other than treatment, such as diet and lifestyle. Here we consider the question of what kind of data situation will allow us to construct, i.e. identify, an optimal decision strategy. This is particularly relevant when data is obtained from observational studies, but also informs us about the design of experimental studies. Essentially we need to be able to identify all conditional strategies over which we want to optimise. Identifiability of unconditional sequential strategies (atomic interventions) has been considered by Pearl & Robins (1995), where a graphical criterion is given to read identifiability off a causal diagram. Dawid & Didelez (2005) generalise their graphical criterion to the case where some or all of the interventions are allowed to depend on some or all of the observable previous information. Our main result here is that if al l interventions are allowed to depend on al l of the observable previous information, as would be required to find an optimal strategy, then the graphical check simplifies considerably and we only need to check what is called simple stability (Dawid & Didelez, 2005). Motivated by the question of identifying conditional (sequential) interventions, the identification of conditional interventional distributions has been considered (e.g. Pearl, 2000, section 4.2; Tian, 2004; Shpitser & Pearl, 2006) within the context of causal diagrams where all hidden variables are represented implicitly using bidirected edges. In particular, Shpitser & Pearl (2006) give necessary and sufficient criteria for this identification problem. Our approach is slightly different as we use influence diagrams, where the interventions are made explicit by suitable decision nodes and where unobservable variables are also explicitly shown by individual nodes (Dawid, 2002). 1 INTRODUCTION Consider the case of a chronically ill patient who regularly sees their doctor in order to adjust their treatment to the individual development of their disease. For example patients who receive anticoagulation treatment are sub ject to regular blood testing; depending on the outcome of such a test, and possibly on earlier blood tests, the dosage of the anticoagulant might be modified. An optimal strategy in this context is a rule that stipulates, for each point in time where an intervention is carried out, a dosage to be given so as to optimise the treatment outcome, e.g. to keep the coagulation measure stable within certain bounds. It is intuitively obvious that to achieve optimality such a rule will typically have to be a function of the individual patient's history, e.g. for a patient whose blood coagulates too fast the dose has to be increased while for a patient whose blood coagulates too slowly it has to be decreased. Decision strategies where each intervention is allowed to depend on the individual history We do not consider in this paper the actual estimation and optimisation procedure required to find an optimal strategy, which is a numerically demanding task and a topic of its own. A regret­based approach to this has been proposed by Murphy (2003), while Robins (2004) suggests structural nested models (cf. also discussion by Moodie et al. (2007)). An application of the regret­based method to the anticoagulation problem described above can be found in Rostho j et al. (2008). The paper is organised as follows. In section 2 we introduce the notation used throughout. Optimal strategies are addressed in section 3. The general problem of identifying possibly conditional strategies is covered in section 4, where we present a simple sufficient criterion called simple stability as well as a more general criterion and explain how both can be checked graphically using influence diagrams. The latter reduces to simple stability when we consider optimal strategies as shown in section 5. We discuss and compare our approach in sections 6 and 7. 3 OPTIMAL STRATEGIES If the distribution p(y ; s) of Y when following strategy s is known, we can evaluate for any function k (·) the expectation E {k (Y ); s}; typically k (·) would be a loss function. This calculation can be implemented recursively. Define f (aj , li ) = E {k (Y )|aj , li ; s} where j = i or j = i - 1 and i = 1, . . . , N + 11 . We have that f (aN , lN +1 ) = k (y ) and f () = E {k (Y ); s}, where starting with the former the latter can be obtained by iteratively applying the following operations for i = N + 1, . . . , 1 a p(ai |ai , L>i , U>i , Y |Ai , Li , Ui ; s) Hence p0 (·) = p(·; s) is the joint distribution under the strategy s, while pN (·) = p(·; o) is the joint distribution under the observational regime, where we exploit extended stability to obtain p(Y |A, L, U; s) = p(Y |A, L, U; o). Theorem 1. Under extended stability, the strategy s is identified by G­recursion if pi-1 (y |ai , li ) = pi (y |ai , li ), i = 1, . . . , N . (5) among (A, L, U, Y ) and the way we can intervene in A graphically we can check the above conditions for identifiability by simple graphical checks. Two approaches are possible. Firstly, we can augment the graph with the intervention indicator as advocated in Pearl (1993), Lauritzen (2001), Dawid (2002); this augmented graph (influence diagram) will be denoted by D. Simple stability (3) wrt. observables (A, L, Y ) can, for example, be checked on such influence diagrams as in Figures 1 and 2. Secondly, and as is common in much of the mainstream causal literature, we can take the interventions in A as implicit and formulate graphical conditions involving (A, L, U, Y ) only, omitting . We denote the graph which implicitly assumes extended stability with respect to sequential interventions in A by D (this is also called a causal graph with respect to A (Pearl, 2000). The graphs D and D only differ in that the former has the additional decision node with arrows into A. It is easy to see b that simple stability can therefore be checked on D y assessing whether L i, are identified. For the graphical check of (5) we first define the different parent sets for the actions Ai under different regimes. Let pao (Ai ) be the parent nodes (excluding ) of Ai in D when = o and let pas (Ai ) be the parent nodes (excluding ) of Ai in D when = s, i.e. if si (a i while the parent set for Ai is the union of the parents under both regimes and . Such a graph represents the factorisation of the distribution pi constructed in section 4.4 if Ai arises under = o and of pi-1 if Ai is generated according to = s. Let [· · |·]Di denote graph separation in Di then we have that (5) holds if [Y |Ai , Li ]Di (6) Pro of: see appendix, and Dawid & Didelez (2005). Property (5) can be paraphrased as the distribution of Y given ai , li having to be the same regardless of whether ai has arisen out of the strategy s or from observation o, when we know that future actions will follow the strategy s. A graphical check for (5) is more involved than for simple stability as it has to reflect the particular construction of the distributions pi . This will be addressed in the next section. 4. 5 GRAPHICAL CHECKS If we express our sub ject matter background knowledge about the conditional independence structure This procedure is illustrated in Figure 3 for the example of graph (a) from Figure 2 assuming an unconditional intervention in A2 , i.e. pas (A2 ) = . Hence there are no arrows into A2 in D1 . We can easily see that [Y |A1 ]D1 and [Y |A1 , A2 , L2 ]D2 show ing that a strategy where A2 is chosen without taking U1 A1 L2 A2 U2 U1 A1 Y L2 A2 U2 Y (b) ( a) into Aj , j = i + 1, . . . , N , are deleted that the intervention sj does not depend on. This will always be the case for all edges from Uj into Aj because the intervention can obviously not be a function of unobserved quantities. This is illustrated in Figure 5 for the same example as above with a conditional intervention at A2 . We can see that [Y A1 ]D1 is violated while [Y A2 |A1 , L2 ]D2 holds. Figure 3: Same example as in Figure 2(a); here (a) shows D1 and (b) shows D2 with uncond. intervention in A2 . past covariates into account is identifiable even though simple stability as investigated in Figure 2 is violated. In contrast, in the same example if we assume that the intervention s2 in A2 does depend on previous covariates, i.e. pas(A2 ) = (A1 , L2 ) then we have to modify D1 as shown in Figure 4. Now we find that [Y |A1 ]D1 is violated and we cannot guarantee that such a conditional strategy is identifiable. U1 A1 L2 A2 U2 U1 A1 Y L2 A2 U2 Y (b) ( a) Figure 5: Pearl & Robins' check when s2 is conditional, (a) D1 and (b) D2 5 U1 A1 L2 A2 U2 IDENTIFIABILITY OF OPTIMAL STRATEGIES Y Figure 4: Same example as in Figure 3(a), D1 with conditional intervention in A2 . Pearl & Robins (1995) show that based on a causal diagram D (that does not include a node ) and for unconditional sequential interventions sufficient graphical conditions are given by [Y Ai |Ai removed. Comparing this with our procedure we can see that the idea is the same: deleting the edges out of Ai corresponds to retaining only the back­door paths from Ai as it is only these that are relevant when checking (6) due to having only an arrow into Ai . Further, deleting every edge into Ai+1 , . . . , AN corresponds to changing the parent sets of these variables to only include the parents under s. Note that if the interventions are unconditional then Aj has no parents among A i, can be identified by the back­door criterion as mentioned in section 4.5. actions and hence we cannot assume p(l; s) = p(l; o). Tian (2004) gives an example where p(y |l; s) is identified while p(l; s) is not. Hence, identifiability of conditional sequential plans is not covered by the identifiability of conditional interventional distributions alone. Our results do not provide a necessary criterion for identifiability. However, they do not require a semi­ Markovian graph nor a causal model (which in our case would assume that we can intervene in any of L1 , . . . , LN as well as in A) as long as the background knowledge can be encoded in a DAG on (A, L, U, Y , ). More importantly, as mentioned earlier, simple stability implies that the effect of each action on later covariates p(l; s) as well as the conditional intervention distribution p(y |l; s) are identifiable, so that (8) applies. 7 DISCUSSION We have addressed the question of identifying optimal sequential strategies within the framework of decision theory. Simple stability (3) provides a straightforward graphical check for identifiability on a single influence diagram, and the more involved check for the conditions in section 4.4 that extends Pearl & Robins (1995) approach is in fact not more general. We would like to point out that even though the target of inference E (k (Y ); s) can be constructed using the G­recursion, it is in practice not advisable to estimate an optimal strategy (when it is identified) by estimating the individual factors of the G­recursion formula (Robins & Wasserman, 1997). This has motivated the alternative approaches such as suggested by Murphy (2003) and Robins (2004). The reasoning regarding identifiability, however, remains valid. References Cowell, R.G., Dawid, A.P., Lauritzen, S.L., Spiegelhalter, D.J. (1999). Probabilistic Networks and Expert Systems. Springer. Dawid, A.P. (1979). Conditional independence in statistical theory (with discussion). JRSSB, 41, pp. 1-31. Dawid, A.P. (2002). Influence diagrams for causal modelling and inference. Int. Statist. Rev., 70, pp. 161-89. Dawid, A.P. and Didelez, V. (2005). Identifying the consequences of dynamic treatment strategies. Research Report No. 262, Department of Statistical Science, University College London. Didelez, V., Dawid, A.P., Geneletti, S. (2006). Direct and indirect effects of sequential decisions. In: 6 RELATION TO OTHER APPROACHES It has been argued that conditional strategies can be identified if conditional intervention distributions can be identified (Pearl, 2000, section 4.2). For the case of a non­stochastic conditional strategy s that fixes a = s(l) this is seen as follows. We have that l p(y ; s) = p(y |l; s)p(l; s) (8) (the recursive version of which is based on (2)). As l in p(y |l; s) is given, we have p(y |l; s) = p(y |l; a), where = a denotes an unconditional strategy and a = s(l). Hence we can identify p(y ; s) if we can identify p(y |l; a) for every sequence l and every unconditional strategy = a. Shpitser & Pearl (2006) give a sound and complete algorithm to identify such conditional intervention distributions, like p(y |l; a), which outputs FAIL iff the problem is not identified. This takes a semi­ Markovian graph as input which is based on a causal model. However, for the whole p(y ; s) to be identifiable by (8) we also need p(l; s) to be identifiable, where it is important to note that due to the sequential nature of the problem some covariates will be affected by earlier Proc. of the 22th Conference on Uncertainty in Artificial Intelligence (UAI-06), 138-146. AUAI P r es s . Lauritzen, S.L. (2001). Causal inference from graphical models. In: O.E. Barndorff-Nielsen, D.R. Cox and C. Kluppelberg (eds.), Complex Stochastic ¨ Systems, pp. 63-107. Chapman and Hall. Moodie, E.E.M., Richardson, T.S., Stephens, D.A. (2007). Demystifying optimal dynamic treatment regimes. Biometrics, 63, pp. 447-455. Murphy, S.A. (2003). Optimal Dynamic Treatment Regimes (with discussion), JRSSB, 65, pp. 331366. Pearl, J. (1993). Graphical models, causality and interventions. Statistical Science, 8, pp. 266-9. Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82, pp. 669-710. Pearl, J. (2000). Causality -- Models, Reasoning and Inference. Cambridge University Press. Pearl, J. and Robins, J. (1995). Probabilistic evaluation of sequential plans from causal models with hidden variables. In: Proc. of the 11th Conference on Uncertainty in Artificial Intelligence (UAI-95), pp. 444-453. Morgan Kaufmann. Raiffa, H. (1968). Wesley. Decision analysis. Addison­ Rostho j S., Fullwood C., Henderson R., Stewart S. (2008). Estimation of optimal dynamic anticoagulation regimes from observational data: a regretbased approach. Statistics in Medicine, to appear. Tian, J. (2004). Identifying conditional causal effects. In: Proc. of the 20th Conference on Uncertainty in Artificial Intelligence (UAI-04), pp. 561-568. AUAI Press. Shpitser, I., Pearl, J. (2006). Identification of conditional interventional distributions. In: Proc. of the 22th Conference on Uncertainty in Artificial Intelligence (UAI-06), pp. 437-444. AUAI Press. Verma, T., Pearl, J. (1990). Causal Networks: Semantics and Expressiveness. In: Proc. of the 4th Conference on Uncertainty in Artificial Intelligence (UAI-90), pp. 561-568. Elsevier Science. A P P E N DI X PROOF OF THEOREM 1 This is a special case of a more general result (Dawid and Didelez, 2005, § 7.1). The following argument is specialised to the current context, and assumes that events conditioned on have positive probability. Similar to section 3, define f (ai , li ) f (a