Identifying Optimal Sequential Decisions

A. Philip Dawid Statistical Laboratory, Centre for Mathematical Sciences, Cambridge, UK apd@statslab.cam.ac.uk

Vanessa Didelez Department of Mathematics University of Bristol, UK vanessa.didelez@bristol.ac.uk

Abstract
We consider conditions that allow us to find an optimal strategy for sequential decisions from a given data situation. For the case where all interventions are unconditional (atomic), identifiability has been discussed by Pearl & Robins (1995). We argue here that an optimal strategy must be conditional, i.e. take the information available at each decision point into account. We show that the identification of an optimal sequential decision strategy is more restrictive, in the sense that conditional interventions might not always be identified when atomic interventions are. We further demonstrate that a simple graphical criterion for the identifiability of an optimal strategy can be given.

are called conditional, dynamic or adaptive treatment strategies. A particular feature of such strategies is that for a given patient it is not known what exact dose they will receive at future interventions as this will depend on their future coagulation test results which are sub ject to random variation and influenced by many factors other than treatment, such as diet and lifestyle. Here we consider the question of what kind of data situation will allow us to construct, i.e. identify, an optimal decision strategy. This is particularly relevant when data is obtained from observational studies, but also informs us about the design of experimental studies. Essentially we need to be able to identify all conditional strategies over which we want to optimise. Identifiability of unconditional sequential strategies (atomic interventions) has been considered by Pearl & Robins (1995), where a graphical criterion is given to read identifiability off a causal diagram. Dawid & Didelez (2005) generalise their graphical criterion to the case where some or all of the interventions are allowed to depend on some or all of the observable previous information. Our main result here is that if al l interventions are allowed to depend on al l of the observable previous information, as would be required to find an optimal strategy, then the graphical check simplifies considerably and we only need to check what is called simple stability (Dawid & Didelez, 2005). Motivated by the question of identifying conditional (sequential) interventions, the identification of conditional interventional distributions has been considered (e.g. Pearl, 2000, section 4.2; Tian, 2004; Shpitser & Pearl, 2006) within the context of causal diagrams where all hidden variables are represented implicitly using bidirected edges. In particular, Shpitser & Pearl (2006) give necessary and sufficient criteria for this identification problem. Our approach is slightly different as we use influence diagrams, where the interventions are made explicit by suitable decision nodes and where unobservable variables are also explicitly shown by individual nodes (Dawid, 2002).

1

INTRODUCTION

Consider the case of a chronically ill patient who regularly sees their doctor in order to adjust their treatment to the individual development of their disease. For example patients who receive anticoagulation treatment are sub ject to regular blood testing; depending on the outcome of such a test, and possibly on earlier blood tests, the dosage of the anticoagulant might be modified. An optimal strategy in this context is a rule that stipulates, for each point in time where an intervention is carried out, a dosage to be given so as to optimise the treatment outcome, e.g. to keep the coagulation measure stable within certain bounds. It is intuitively obvious that to achieve optimality such a rule will typically have to be a function of the individual patient's history, e.g. for a patient whose blood coagulates too fast the dose has to be increased while for a patient whose blood coagulates too slowly it has to be decreased. Decision strategies where each intervention is allowed to depend on the individual history


We do not consider in this paper the actual estimation and optimisation procedure required to find an optimal strategy, which is a numerically demanding task and a topic of its own. A regret­based approach to this has been proposed by Murphy (2003), while Robins (2004) suggests structural nested models (cf. also discussion by Moodie et al. (2007)). An application of the regret­based method to the anticoagulation problem described above can be found in Rostho j et al. (2008). The paper is organised as follows. In section 2 we introduce the notation used throughout. Optimal strategies are addressed in section 3. The general problem of identifying possibly conditional strategies is covered in section 4, where we present a simple sufficient criterion called simple stability as well as a more general criterion and explain how both can be checked graphically using influence diagrams. The latter reduces to simple stability when we consider optimal strategies as shown in section 5. We discuss and compare our approach in sections 6 and 7.

3

OPTIMAL STRATEGIES

If the distribution p(y ; s) of Y when following strategy s is known, we can evaluate for any function k (·) the expectation E {k (Y ); s}; typically k (·) would be a loss function. This calculation can be implemented recursively. Define f (aj , li ) = E {k (Y )|aj , li ; s} where j = i or j = i - 1 and i = 1, . . . , N + 11 . We have that f (aN , lN +1 ) = k (y ) and f () = E {k (Y ); s}, where starting with the former the latter can be obtained by iteratively applying the following operations for i = N + 1, . . . , 1 a p(ai |a<i , li ; s) × f (ai , li ) (1) f (a<i , li ) =
i

(note that p(ai |a<i , li ; s) is by definition determined by the strategy s and equal to the indicator function I{ai = si (a<i , li )} unless s is a randomised strategy) and l p(li |a<i , l<i ; s) × f (a<i , li ). (2) f (a<i , l<i ) =
i

2

GENERAL SET­UP

Let A1 , . . . , AN be variables that can be set by some intervention (action variables), while L1 , . . . , LN (each of which can be vector valued) are additional observations / covariates, and let LN +1  Y be the outcome of interest. To simplify exposition we restrict ourselves to the case of all variables being discrete. We assume that Li = (L1 , . . . , Li ) can be observed before a decision about action Ai is made. We will denote L = LN = (L1 , . . . , LN ) and similar for A. A strategy s = (s1 , . . . , sN ) consists of a set of functions that assign to any partial history (a<i , li ) = (a1 , . . . , ai-1 , l1 , . . . , li ) a value ai = si (a<i , li ) in the space of Ai , and by following strategy s we mean that Ai are set to si (a<i , li ) by some intervention for all i = 1, . . . , N . Let S be the set of relevant strategies (this set might be restricted e.g. due to feasibility). As mentioned above, a strategy where si (a<i , li )  si does not actually depend on a<i , li is called unconditional, as the actual values that the Ai 's are set to do not depend on the observed history. Otherwise we call the strategy conditional. More generally we might also allow random (or stochastic) strategies, where si (a<i , li ) specifies a distribution over the space of Ai meaning that the intervention consists of drawing ai from this distribution and then setting Ai = ai by an intervention. Our framework allows for such randomised strategies (cf. Didelez et al. (2006) where this is used in a similar context, focusing on direct effects).

This is exactly the procedure underlying the "extensive form" analysis of sequential decision theory (see e.g. Raiffa (1968)). The optimal strategy sopt is given by optimising E {k (Y ); s} = f () over the set S ; assuming that k (·) is chosen such that large values are better, we have sopt = arg max E {k (Y ); s}.
S

The dynamic programming method to find an optimal strategy essentially proceeds as follows: starting with i = N +1 and working down to i = 1 find ai as function of (a<i , li ) that maximises f (ai , li ). Hence, the optimal strategy will typically be conditional on past observations.

4

IDENTIFIABILITY

In practice we do not know the conditional distributions p(li |a<i , l<i ; s), i = 1, . . . , N + 1 for all s  S required to evaluate (2). Identifiability addresses the question whether data that have been gathered under an observational regime (which might be a sequentially randomised trial, or on observational study in the traditional meaning) can, in principle, inform us about these conditional distributions. We first address the
1 If the subscript or sup erscript of a set is not defined then the set is defined to b e 


question of identifiability of a single strategy s. It then follows that the optimal strategy can be identified if all strategies in S are identified so that the above optimisation can be carried out. In order to formalise the question of identification we introduce an indicator variable  for the regime, where  = o denotes the observational regime, under which the data is collected, and  = s denotes the regime under which strategy s is followed. In a sequentially randomised study p(ai |a<i , li ; o) would typically be known, whereas in a traditional observational study it would be unknown but might be estimable from data. Identifiability: A strategy s is identified if we can obtain E {k (Y ); s} uniquely from the joint distribution of (A, L, Y ) under  = o. Conditions for identifiability require a positivity condition, such that actions that arise from the strategy ai = si (a<i , li ) actually have a positive probability to occur under  = o if the history (a<i , li ) has a positive probability to occur under  = s (for a more formal definition of positivity see Dawid & Didelez (2005)); here, we will take positivity for granted. 4. 1 SIMPLE STABILITY

L1

A1

L2

A2

Y

Figure 1: Example for simple stability. with i = N + 1, where in (1) p(ai |a<i , li ; s) is known by definition of the strategy s and (2) can be modified to l f (a<i , l<i ) = p(li |a<i , l<i ; o) × f (a<i , li ).
i

due to simple stability. The name G­recursion has been coined by Robins (1986). 4. 3 EXTENDED STABILITY

It can easily be seen from (2) that a sufficient condition for identifiability of s is that the following conditional distributions are the same under both regimes p(li |a<i , l<i ; s) = p(li |a<i , l<i ; o), i = 1, . . . , N + 1, whenever the conditioning event has positive probability under both regimes (Dawid & Didelez, 2005). This is called simple stability and also symbolised by Li   |(A<i , L<i ),  i = 1, . . . , N + 1 (3)

It will typically be difficult to believe in simple stability (3) without additional considerations. We might want to proceed by constructing a more complex but more acceptable model, typically including additional not necessarily observable variables, and investigate whether or not we can deduce the desired simple stability property. We denote the set of additional variables by U. Extended stability then holds if this set can be partitioned into (U1 , . . . , UN ) (each Ui can be vector valued or the empty set) such that U<i are not affected by an intervention in Ai , i = 1, . . . , N , and (Li , Ui )  |(A<i , L<i , U<i ),  i = 1, . . . , N + 1, (4)

Clearly (4) implies that strategy s could be identified if U was observed as it is then just the same as simple stability w.r.t. (A, L, U, Y ). In many problems an extended stability assumption might be regarded as more reasonable and defensible than simple stability, so long as appropriate unobserved variables U are taken into account. For example, this might be the case if we believed that under  = o the actions Ai were taken by a decision-maker who had access to variables in the set U as well as L. Extended stability does not in general allow G­ recursion if U is unobserved. However, it may do so if we can assume an (in)dependence structure on (A, L, U, Y ) and  that allows us to deduce that simple stability (3) does hold. Dawid & Didelez (2005) give some further conditions for this and show how these can be verified graphically. As an example consider the graphs in Figure 2, where extended stability holds (here L1 = ). In graph (a) simple stability is violated as L2   |A1 . In graph (b) the only change is  /

where  takes values in {o, s}. Simple stability is closely related to the no unmeasured confounders assumption (Robins, 1997). Note that the above notation for conditional independence (Dawid, 1979) has been generalised for problems involving decision variables by Dawid (2002), and can be checked graphically2 on appropriate influence diagrams as shown in Figure 1, for example. 4. 2 G­RECURSION

With simple stability, the target E {k (Y ); s) = f (} can now be obtained by iterating (1) and (2) starting
2 using the moralisation criterion (cf. Cowell et al., 1999) or, equivalently, d­separation (Verma & Pearl, 1990)


that U1 = L1 can now be observed and then simple stability holds as L1   , L2   |A1 , L1 and Y   |A, L.   

U1 A1

L2 A2

U2

L1 A1 Y

L2 A2

U2

Y (b)

( a)

Figure 2: Examples for extended stability; in (a) simple stability is violated while in (b) it holds. 4. 4 GENERAL CONDITIONS

A strategy s can be identified under weaker conditions than simple stability. This has been demonstrated by Pearl & Robins (1995) for the case of unconditional strategies. Next we extend the result to conditional (possibly stochastic) strategies. We assume extended stability with respect to (A, L, U, Y ) and definejoint distributions pi (·) as pi (A, L, U, Y ) = p(Ai , Li , Ui ; o) ×p(A>i , L>i , U>i , Y |Ai , Li , Ui ; s) Hence p0 (·) = p(·; s) is the joint distribution under the strategy s, while pN (·) = p(·; o) is the joint distribution under the observational regime, where we exploit extended stability to obtain p(Y |A, L, U; s) = p(Y |A, L, U; o). Theorem 1. Under extended stability, the strategy s is identified by G­recursion if pi-1 (y |ai , li ) = pi (y |ai , li ), i = 1, . . . , N . (5)

among (A, L, U, Y ) and the way we can intervene in A graphically we can check the above conditions for identifiability by simple graphical checks. Two approaches are possible. Firstly, we can augment the graph with the intervention indicator  as advocated in Pearl (1993), Lauritzen (2001), Dawid (2002); this augmented graph (influence diagram) will be denoted by D. Simple stability (3) wrt. observables (A, L, Y ) can, for example, be checked on such influence diagrams as in Figures 1 and 2. Secondly, and as is common in much of the mainstream causal literature, we can take the interventions in A as implicit and formulate graphical conditions involving (A, L, U, Y ) only, omitting  . We denote the graph which implicitly assumes extended stability with respect to sequential interventions in A by D (this is also called a causal graph with respect to A (Pearl, 2000). The graphs D and D only differ in that the former has the additional decision node  with arrows into A. It is easy to see b that simple stability can therefore be checked on D y assessing whether L<i satisfies the back­door criterion relative to (A<i , Li ) (Pearl, 1995). This implies that the causal effects of each Ai on later covariates Lj , j > i, are identified. For the graphical check of (5) we first define the different parent sets for the actions Ai under different regimes. Let pao (Ai ) be the parent nodes (excluding  ) of Ai in D when  = o and let pas (Ai ) be the parent nodes (excluding  ) of Ai in D when  = s, i.e. if si (a<i , li ) is constant in some of its arguments then these are not in pas (Ai ). The two parent sets are not the same, as under  = o Ai may depend on some variables in Ui , while under a strategy s we can obviously only take observable variables into account when choosing an action. Now, we construct augmented graphs Di such that the only arrow out of the node  is into Ai , and such that the graph parents of the action variable Aj are given by pao (Aj ) for j < i and pas (Aj ) for j > i while the parent set for Ai is the union of the parents under both regimes and  . Such a graph represents the factorisation of the distribution pi constructed in section 4.4 if Ai arises under  = o and of pi-1 if Ai is generated according to  = s. Let [· · |·]Di denote  graph separation in Di then we have that (5) holds if [Y   |Ai , Li ]Di  (6)

Pro of: see appendix, and Dawid & Didelez (2005). Property (5) can be paraphrased as the distribution of Y given ai , li having to be the same regardless of whether ai has arisen out of the strategy s or from observation o, when we know that future actions will follow the strategy s. A graphical check for (5) is more involved than for simple stability as it has to reflect the particular construction of the distributions pi . This will be addressed in the next section. 4. 5 GRAPHICAL CHECKS

If we express our sub ject matter background knowledge about the conditional independence structure

This procedure is illustrated in Figure 3 for the example of graph (a) from Figure 2 assuming an unconditional intervention in A2 , i.e. pas (A2 ) = . Hence there are no arrows into A2 in D1 . We can easily see that [Y   |A1 ]D1 and [Y   |A1 , A2 , L2 ]D2 show  ing that a strategy where A2 is chosen without taking


U1 A1

L2 A2

U2

U1 A1 Y

L2 A2

U2

Y (b)

( a)

into Aj , j = i + 1, . . . , N , are deleted that the intervention sj does not depend on. This will always be the case for all edges from Uj into Aj because the intervention can obviously not be a function of unobserved quantities. This is illustrated in Figure 5 for the same example as above with a conditional intervention at A2 . We can see that [Y  A1 ]D1 is violated  while [Y  A2 |A1 , L2 ]D2 holds. 

Figure 3: Same example as in Figure 2(a); here (a) shows D1 and (b) shows D2 with uncond. intervention in A2 . past covariates into account is identifiable even though simple stability as investigated in Figure 2 is violated. In contrast, in the same example if we assume that the intervention s2 in A2 does depend on previous covariates, i.e. pas(A2 ) = (A1 , L2 ) then we have to modify D1 as shown in Figure 4. Now we find that [Y   |A1 ]D1  is violated and we cannot guarantee that such a conditional strategy is identifiable.

U1 A1

L2 A2

U2

U1 A1 Y

L2 A2

U2

Y (b)

( a)

Figure 5: Pearl & Robins' check when s2 is conditional, (a) D1 and (b) D2

5

U1 A1

L2 A2

U2

IDENTIFIABILITY OF OPTIMAL STRATEGIES

Y
Figure 4: Same example as in Figure 3(a), D1 with conditional intervention in A2 . Pearl & Robins (1995) show that based on a causal diagram D (that does not include a node  ) and for unconditional sequential interventions sufficient graphical conditions are given by [Y  Ai |A<i , Li ]Di ,  (7)

As argued in section 3, the optimal strategy will typically be a conditional strategy, i.e. a strategy that must allow action ai = si (a<i , li ) to depend on all of (a<i , li ) (notice that dependence on a<i is only relevant for random strategies as otherwise aj is a function of lj , j = 1, . . . , i - 1 anyway). We show now that in this case the more general conditions of section 4.4 reduce to simple stability. First we need some regularity assumptions. Assumption 1. We assume pas (Ai ) pao (Ai ) for all i = 1, . . . , N . The assumption means that the parents of Ai when we follow the strategy s are a subset of its parents under the observational regime. This can easily be satisfied by redefining, if necessary, pao (Ai ) as pao (Ai )pas (Ai ), with any added parents having no effect on the conditional probabilities for Ai under o. Note that adding edges from observable nodes into action nodes cannot destroy identifiability. Assumption 2. Each L1 , . . . , LN is an ancestor of Y in the graph D0 , where the parents of Ai are as under strategy s, i = 1, . . . , N . This means that the covariates predict Y when we follow the strategy s. Remark. Assumption 2 is implied if (i) each A1 , . . . , AN is an ancestor of Y in D0 , and

where Di is the graph D with all edges out of Ai and all edges into A>i removed. Comparing this with our procedure we can see that the idea is the same: deleting the edges out of Ai corresponds to retaining only the back­door paths from Ai as it is only these that are relevant when checking (6) due to  having only an arrow into Ai . Further, deleting every edge into Ai+1 , . . . , AN corresponds to changing the parent sets of these variables to only include the parents under s. Note that if the interventions are unconditional then Aj has no parents among A<j , Lj , Uj under  = s. Hence, an immediate extension of Pearl & Robins' approach, that has also been suggested by Robins (1997), to the case of conditional interventions is given by modifying the meaning of Di in (7) so that only edges


(ii) each L1 , . . . , LN is an ancestor of some Aj in D0 . As we want to investigate whether the actions affect Y , part (i) will be plausible because otherwise we have at least one action of which we know a priori that it does not predict Y and this would not typically be included in the investigation. Part (ii) is relevant in the context of optimal strategies as these must in principle be allowed to depend on all previous observable covariates. Theorem 2. Suppose Assumptions 1 and 2 hold. Then if the graphical check of (5) succeeds, the problem exibits simple stability with respect to (A, L, Y ). Pro of: see appendix. In consequence we can say that if the target is to find an optimal strategy and we are asking whether this is possible from a given data situation we do not need to apply the more complex graphical check of (5) but can just check simple stability, which has the advantage that it can be seen from a single graph like e.g. Figure 2(b). This also implies that a sufficient criterion for identification of an optimal strategy is that the causal effects of each Ai on later covariates Lj , j > i, can be identified by the back­door criterion as mentioned in section 4.5.

actions and hence we cannot assume p(l; s) = p(l; o). Tian (2004) gives an example where p(y |l; s) is identified while p(l; s) is not. Hence, identifiability of conditional sequential plans is not covered by the identifiability of conditional interventional distributions alone. Our results do not provide a necessary criterion for identifiability. However, they do not require a semi­ Markovian graph nor a causal model (which in our case would assume that we can intervene in any of L1 , . . . , LN as well as in A) as long as the background knowledge can be encoded in a DAG on (A, L, U, Y ,  ). More importantly, as mentioned earlier, simple stability implies that the effect of each action on later covariates p(l; s) as well as the conditional intervention distribution p(y |l; s) are identifiable, so that (8) applies.

7

DISCUSSION

We have addressed the question of identifying optimal sequential strategies within the framework of decision theory. Simple stability (3) provides a straightforward graphical check for identifiability on a single influence diagram, and the more involved check for the conditions in section 4.4 that extends Pearl & Robins (1995) approach is in fact not more general. We would like to point out that even though the target of inference E (k (Y ); s) can be constructed using the G­recursion, it is in practice not advisable to estimate an optimal strategy (when it is identified) by estimating the individual factors of the G­recursion formula (Robins & Wasserman, 1997). This has motivated the alternative approaches such as suggested by Murphy (2003) and Robins (2004). The reasoning regarding identifiability, however, remains valid. References Cowell, R.G., Dawid, A.P., Lauritzen, S.L., Spiegelhalter, D.J. (1999). Probabilistic Networks and Expert Systems. Springer. Dawid, A.P. (1979). Conditional independence in statistical theory (with discussion). JRSSB, 41, pp. 1-31. Dawid, A.P. (2002). Influence diagrams for causal modelling and inference. Int. Statist. Rev., 70, pp. 161-89. Dawid, A.P. and Didelez, V. (2005). Identifying the consequences of dynamic treatment strategies. Research Report No. 262, Department of Statistical Science, University College London. Didelez, V., Dawid, A.P., Geneletti, S. (2006). Direct and indirect effects of sequential decisions. In:

6

RELATION TO OTHER APPROACHES

It has been argued that conditional strategies can be identified if conditional intervention distributions can be identified (Pearl, 2000, section 4.2). For the case of a non­stochastic conditional strategy s that fixes a = s(l) this is seen as follows. We have that l p(y ; s) = p(y |l; s)p(l; s) (8) (the recursive version of which is based on (2)). As l in p(y |l; s) is given, we have p(y |l; s) = p(y |l; a), where  = a denotes an unconditional strategy and a = s(l). Hence we can identify p(y ; s) if we can identify p(y |l; a) for every sequence l and every unconditional strategy  = a. Shpitser & Pearl (2006) give a sound and complete algorithm to identify such conditional intervention distributions, like p(y |l; a), which outputs FAIL iff the problem is not identified. This takes a semi­ Markovian graph as input which is based on a causal model. However, for the whole p(y ; s) to be identifiable by (8) we also need p(l; s) to be identifiable, where it is important to note that due to the sequential nature of the problem some covariates will be affected by earlier


Proc. of the 22th Conference on Uncertainty in Artificial Intelligence (UAI-06), 138-146. AUAI P r es s . Lauritzen, S.L. (2001). Causal inference from graphical models. In: O.E. Barndorff-Nielsen, D.R. Cox and C. Kluppelberg (eds.), Complex Stochastic ¨ Systems, pp. 63-107. Chapman and Hall. Moodie, E.E.M., Richardson, T.S., Stephens, D.A. (2007). Demystifying optimal dynamic treatment regimes. Biometrics, 63, pp. 447-455. Murphy, S.A. (2003). Optimal Dynamic Treatment Regimes (with discussion), JRSSB, 65, pp. 331366. Pearl, J. (1993). Graphical models, causality and interventions. Statistical Science, 8, pp. 266-9. Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82, pp. 669-710. Pearl, J. (2000). Causality -- Models, Reasoning and Inference. Cambridge University Press. Pearl, J. and Robins, J. (1995). Probabilistic evaluation of sequential plans from causal models with hidden variables. In: Proc. of the 11th Conference on Uncertainty in Artificial Intelligence (UAI-95), pp. 444-453. Morgan Kaufmann. Raiffa, H. (1968). Wesley. Decision analysis. Addison­

Rostho j S., Fullwood C., Henderson R., Stewart S. (2008). Estimation of optimal dynamic anticoagulation regimes from observational data: a regretbased approach. Statistics in Medicine, to appear. Tian, J. (2004). Identifying conditional causal effects. In: Proc. of the 20th Conference on Uncertainty in Artificial Intelligence (UAI-04), pp. 561-568. AUAI Press. Shpitser, I., Pearl, J. (2006). Identification of conditional interventional distributions. In: Proc. of the 22th Conference on Uncertainty in Artificial Intelligence (UAI-06), pp. 437-444. AUAI Press. Verma, T., Pearl, J. (1990). Causal Networks: Semantics and Expressiveness. In: Proc. of the 4th Conference on Uncertainty in Artificial Intelligence (UAI-90), pp. 561-568. Elsevier Science.

A P P E N DI X
PROOF OF THEOREM 1 This is a special case of a more general result (Dawid and Didelez, 2005, § 7.1). The following argument is specialised to the current context, and assumes that events conditioned on have positive probability. Similar to section 3, define f (ai , li ) f (a<i , li ) = Ei {k (Y ) | ai , li } = Ei-1 {k (Y ) | a<i , li },

Robins, J. (1986). A new approach to causal inference in mortality studies with sustained exposure periods -- application to control for the healthy worker survivor effect. Mathematical Model ling, 7, 1 3 9 3 -5 1 2 . Robins, J.M. (1997). Causal inference from complex longitudinal data. In: M. Berkane (ed.), Latent Variable Modeling and Applications to Causality, Lecture Notes in Statistics, Vol. 120, pp. 69-117. Springer. Robins, J.M. (2004). Optimal structural nested models for optimal sequential decisions. In: D.Y. Lin and P. Heagerty (eds.), Proc. of 2nd Seattle Symposium on Biostatistics. Springer. Robins, J.M., Wasserman, L. (1997). Estimation of effects of sequential treatments by reparameterizing directed acyclic graphs. In: Proc. of the 13th Conference on Uncertainty in Artificial Intelligence (UAI-97), pp. 444-453. Morgan Kaufmann.

where Ei denotes expectation under distribution pi . In particular, f () = E {k (Y )|s}, which is what we wish to compute; while f (aN , lN ) = E {k (Y )|aN , lN ; o} is estimable from the observational data. By extended stability, the distribution of (Li , Ui ) given (A<i , L<i , U<i ) is the same under both the observational regime o and the strategy s. It then follows from the definition of pi-1 that the joint distribution of (A<i , Li , Ui ) is the same under pi-1 as in o, whence in particular pi-1 (li |a<i , l<i ) = p(li |a<i , l<i ; o). Thus, f (a<i , l<i ) = l pi-1 (li |a<i , l<i ) × Ei-1 {k (Y ) | a<i , li } = l
i

p(li |a<i , l<i ; o) × f (a<i , li ).
i

(9)

Also, from the construction of pi-1 , the fact that p(ai |a<i , li ; s) is fully determined independently of ui , and (5), f (a<i , li ) = a pi-1 (ai | a<i , li ) × Ei-1 {k (Y ) | ai , li }
i


=

a
i

p(ai | a<i , li ; s) × f (ai , li ),

(10)

where Ei-1 {k (Y )|ai , li } = f (ai , li ) due to (5). Together, (9) and (10) show that f can be computed by G-recursion: starting with f (aN , lN ) = E {k (Y ) | aN , lN ; o}, we will exit the recursion with f () = E {k (Y ) | s}. PROOF OF THEOREM 2 We introduce the following abbreviations for certain graph-separation properties: i ij k k Hk : [Y   |(Li , Ai )]Di  : [Lj +1  i  |(Li , Ai )]Di  : ij holds for all 0  i  j  k : [Lk+1   |(Ak , Lk )]D  : k  k

Also, we use mani (A, B , C ) to denote the moralised subgraph of Di on the set An(A  B  C )3 . Lemma 1. Suppose Assumptions 1 and 2 hold. Then for 1  i  j  N , i  ij . Pro of: Fix i, and suppose that for some j  i, ij fails. Then there is a path 1 in mani (Ai , Li , Lj +1 ) joining  to some L  Lj +1 avoiding (Ai , Li ). Since by Assumption 2 we have Lj +1 is an ancestor of Y in Di , 1 is a path in mani (Ai , Li , Y ) with the same property. But again by Assumption 2, there is a descending path 2 in Di joining L to Y avoiding (Ai , Li ) since these are non­descendants of Lj +1 . Hence, the concatenation of 1 and 2 is a path in mani (Ai , Li , Y ) joining  and Y while avoiding (Ai , Li ) showing that i fails. Lemma 2. Suppose Assumption 1 holds. Then Hk holds, for all 0  k  N . Pro of: The proof is by induction on k . H0 holds since 0 holds by extended stability. Now fix k  N and suppose that Hl holds for all 0  l < k and argue that Hk holds. To do so we assume that k holds, and argue that k follows. Since for all 0  l < k l must hold, from Hl so must l . Before proceeding we introduce some notation. For 1  j  k , we denote by Mj (resp. Nj ) the moralised ancestral graph of (Aj , Lj , Lk+1 ) in D (resp. in Dj ). We write V1 V2 to denote that there is a descending directed path in D from node V1 to node V2 whose
3 We use the moralisation criterion here to verify graph separation in DAGs

intermediate nodes are all in U. Now suppose k fails, then there exists a path  in Mk having the following property: P:  connects  to Lk+1 , all intermediate nodes being in U. Let j denote the property that any moral link in  is due to a common child in C in (Aj , Lj ). If  contains a moral link due to a common child C  (Ak , Lk ) then C  U, and we can modify /  by adding the intermediate node C and removing the initial moral link, while still satisfying property P. Proceeding in this way for all such moral links we see that we can suppose that k holds. Let U1  U be the first internal node in  after  : this must exist because Lk+1 and  are not directly connected, nor can they have a common child in Mk . The link  --U1 can only be a moral link due to a common child Ai for some i  k . Then we cannot have U1 Lk+1 since this would create a path violating ik . Let U2 be the first internal node, if one exists, in  that is not an ancestor of (Ak , Lk ) in D. Then we must Lk+1 by some path 2 , say. In particular have U2 U1 = U2 . Denote by 1 the section of  between  o and U2 , and let 1 be this section stripped of its endo points. Since U1  1 , it is not empty. Now replace the original section section of  from U2 to Lk+1 by 2 . The new path will still possess properties P and k . We replace  by the concatenation of 1 and 2 which will be renamed  for the sequel. Consider now the hypotheses (Qj ), 0  j  k , defined o by Qj : 1 an(Aj , Lj ) and property j holds. We shall show that Qj  Qj -1 . o Thus suppose Qj . There cannot exist U  1 with U Aj since this would violate j k . We deduce that o 1 an(Aj , Lj ) and any moral link in  must be due to a common child in (A<j , Lj ). Now Qj implies that  is a path in Mj . Then there o cannot exist U  1 with U Lj since this would violate j -1 . We deduce Qj -1 . Now Qk holds by construction. Applying the above o repeatedly we deduce Q0 , which can only hold if 1 is o empty. But U1  1 . This contradiction proves that k holds and so we have demonstrated Hk . The desired result follows by induction. Pro of of Theorem 2: The graphical check succeeds just when we can show i , 1  i  N . By the above Lemmas, N must hold which implies that j , 1  j  N does. This is exactly the condition for simple stability.