The Evaluation of Causal Effects in Studies with an Unobserved Exp osure/Outcome Variable: Bounds and Identification

Manabu Kuroki Department of Systems Innovation Graduate School of Engineering Science Osaka University mkuroki@sigmath.es.osaka-u.ac.jp

Zhihong Cai Department of Biostatistics School of Public Health Kyoto University cai@pbh.med.kyoto-u.ac.jp equation model. In observational studies, there often exist unobserved variables, which makes it difficult to evaluate reliable causal effects. Many researchers have proposed various useful approaches to evaluate causal effects when unobserved variables are confounding factors between an exposure variable and an outcome variable, such as the instrumental variable method and sensitivity analysis. In the context of graphical causal models, Pearl (2000) provided the mathematical definition of the causal effect. In addition, when both an exposure variable and an outcome variable are observed, Pearl (2000), Tian and Pearl (2002) and Shpitser and Pearl (2006) discussed several graphical identification conditions for causal effects, which enable us to recognize situations where the causal effects can be evaluated from observational data. However, in some situations, even an exposure/outcome variable is unobserved. For example, in a study to examine whether the socioeconomic gradient has an influence on low birth-weight, socioeconomic status is measured by some proxy variables such as income, wealth, education and occupation, since the true socioeconomic status is unobserved (Finch, 2003). Another example concerning an unobserved exposure is in occupational settings. Many epidemiological studies have addressed the question of carcinogenicity in workers exposed to diesel exhaust and coal mine dust, and most showed a low-to-medium increase in the risk of lung cancer. However, exposure measurement in these studies is mainly inferred on the basis of job classifications and may lead to misclassification (Hoffmann and Jockel, 2006). On the other hand, as an example concerning an unobserved outcome, Fleiss et al. (1976) reported a comparative clinical trial of ibuprofen, aspirin and placebo in the relief of postextraction pain. Since the true outcome (pain relief ) is unobserved, they used the Ridit analysis (Bross, 1958) to divide patients into five categories of pain relief: none, poor, fair, good and very good. These examples

Abstract
This paper deals with the problem of evaluating the causal effect using observational data in the presence of an unobserved exposure/outcome variable, when cause-effect relationships between variables can be described as a directed acyclic graph and the corresponding recursive factorization of a joint distribution. First, we propose identifiability criteria for causal effects when an unobserved exposure/outcome variable is considered to contain more than two categories. Next, when unmeasured variables exist between an unobserved outcome variable and its proxy variables, we provide the tightest bounds based on the potential outcome approach. The results of this paper are helpful to evaluate causal effects in the case where it is difficult or expensive to observe an exposure/outcome variable in many practical fields.

1

INTRODUCTION

The evaluation of causal effects from observational studies is one of the central aims in many fields of practical science. For this purpose, many researchers have attempted to clarify cause-effect relationships and to evaluate the causal effect of an exposure variable on an outcome variable through observed data. Statistical causal analysis, which is one of powerful tools for solving these problems, started with path analysis (Wright, 1923, 1934), and advanced to structural equation models (Wold, 1954; Bollen, 1989). It also has been modified in order to be applicable to categorical data (Goodman, 1973, 1974a, 1974b; Hagenaars, 1993). Recently, Pearl (2000) developed a new framework of causal modeling based on a directed acyclic graph and the corresponding nonparametric structural


show the importance of evaluating causal effects when an exposure/outcome variable is unobserved. Kuroki et al. (2005) pointed out that it is difficult to apply the identification criteria proposed by Pearl and his colleagues to evaluate causal effects in such situations, and provided the graphical identifiability criteria when an unobserved exposure/outcome variable is continuous. In addition, Kuroki (2007) arranged the identification conditions proposed by Kuroki et al. (2005) to the case where an exposure/outcome variable is dichotomous. However, in many situations, researchers and practitioners are more interested in the different exposure levels (e.g., none, low, medium and high) than the pure binary exposure (exposed vs. unexposed), and are also more interested in the response levels (e.g., none, poor, fair, good and very good) than the simple binary response (improved vs. not improved). Then, the main purpose of this paper is to provide identifiability criteria for causal effects from observational studies in the presence of an unobserved exposure/outcome variable with more than two categories. It will be shown that if we can observe some proxy variables that are affected by the unobserved variable, then the causal effect can be evaluated by using statistical causal analysis. More generally, we consider the case where there exist unmeasured variables between the unobserved exposure/outcome variable and its proxy variables. Under such a situation, the causal effect is not identifiable but the bounds on the causal effect can be derived. Finally, we illustrate our results with an example about social science.

If a joint distribution is factorized recursively according to the graph G, the conditional independencies implied by the factorization (1) can be obtained from the graph G according to the d-separation criterion (Pearl, 1988), that is, if Z 1 d-separates Z 2 from Z 3 in a directed acyclic graph G (Z 1 , Z 2 , Z 3 V ), then Z 2 is conditionally independent of Z 3 given Z 1 in the corresponding recursive factorization (1); See, for example, Geiger et al. (1990). 2.2 CAUSAL EFFECT

Pearl (2000) defined a causal effect as a distribution of an outcome variable when conducting an external intervention, where an `external intervention' means that a variable is forced to take on some fixed value, regardless of the values of other variables. If the distribution of the remaining variables represented in the directed acyclic graph remains essentially unchanged by such an external intervention, then the graph can be regarded as a causal diagram and the effect of the external intervention can be calculated from the joint factorized distribution. The exact definition is given as follows. DEFINITION 1 Let V = {X, Y }Q ({X, Y }Q = ) be a set of variables represented in a Bayesian network G. If the distribution of Y after setting X to a value x is given by q f (x, y , q) , (2) f (y |set(X = x)) = f (x|pa(x)) then G is called a causal diagram with regard to X and equation (2) is called a causal effect of X on Y . Here, set(X = x) means that X is set to a value x by an external intervention. 2 If Definition 1 holds true with regard to all pairs of variables in the graph, then the whole graph is said to be causal. For more details about the relationship between Bayesian networks and causal diagrams, see Pearl (2000). Given a causal diagram G, in order to evaluate the causal effect f (y |set(X = x)) of X on Y from a joint factorized distribution of observed variables, it is required to observe not only X and Y but also a set Z of other variables, such as confounders. Pearl (2000) provided `the back door criterion' as one of graphical identifiability criteria for causal effects f (y |set(X = x)), where `identifiable' means that f (y |set(X = x)) can be determined uniquely from a joint distribution of observed variables. DEFINITION 2 Suppose that X is a non-descendant of Y in a directed acyclic graph G. If a set Z of vertices satisfies the

2
2.1

PRELIMINARIES
BAYESIAN NETWORKS

Let f (v1 , v2 , . . . , vn ) be a strictly positive joint distribution of a set V = {V1 , V2 , З З З , Vn } of variables, f (vi |vj ) the conditional distribution of Vi given Vj = vj (Vi , Vj V ) and f (vi ) the marginal distribution of Vi . Similar notations are used for other distributions. For graph theoretic terminology used in this paper, refer to Kuroki et al. (2005). Suppose that a set V of variables and a directed acyclic graph G = (V , E ) are given. When the joint distribution of V is factorized recursively according to the graph G as the following equation, the graph is called a Bayesian network: f (v1 , v2 , З З З , vn ) =

 f (vi |pa(vi )). i =1

n

(1)

When pa(vi ) is an empty set, f (vi |pa(vi )) is the marginal distribution f (vi ) of vi .


following conditions relative to an ordered pair (X, Y ) of vertices, then Z is said to satisfy the back door criterion relative to (X, Y ): (i) no vertex in Z is a descendant of X ; (ii) Z blocks every path between X and Y that contains an arrow pointing to X . 2 If a set Z of variables satisfies the back door criterion relative to (X, Y ), then the causal effect f (y |set(X = x)) of X on Y is identifiable through the observation of Z {X, Y } and is given by the formula z f (y |x, z )f (z ). (3) f (y |set(X = x)) = When the back door criterion can not be applied to evaluate causal effects, Pearl (2000) provided `the front door criterion', which is as follows: DEFINITION 3 Suppose that X is a non-descendant of Y in a directed acyclic graph G. If a set Z of variables satisfies the following conditions relative to an ordered pair (X, Y ) of variables, then Z is said to satisfy the front door criterion relative to (X, Y ): (i) Z blocks all directed paths from X to Y ; (ii) an empty set blocks every path between X and Z that contains an arrow pointing to X ; (iii) X blocks every path between any vertex in Z and Y . 2 If a set Z of variables satisfies the front door criterion relative to (X, Y ), then the causal effect f (y |set(X = x)) of X on Y is identifiable through the observation of Z {X, Y } and is given by the formula x f (y |x , z )f (z |x)f (x ). (4) f (y |set(X = x)) = ,z

y2 and y3 may represent the poor, fair and good response levels. Then, let U be either X or Y which is an unobserved variable (u{u1 , З З З , uk }). In addition, let a set S and a set T be observed proxy variables that are affected by the unobserved variable U . Assume that we can select k distinct vectors from the domains of a set S and a set T of variables, denoted as t1 , З З З , tk and s1 , З З З , sk , respectively. A set W and a set Z are assumed to be continuous and/or discrete variables. Furthermore, let P and Q be k dimensional nonsingular matrices such that P =     f (t1 |z ) f (s1 , t1 |z ) . . . ЗЗЗ ЗЗЗ . . . f (tk-1 |z ) f (s1 , tk-1 |z ) . . .    ,  (5) Q=  f (w |z )  f (w, s1 |z )   . .  .  f (w, t1 |z ) З З З f (w, tk-1 |z ) f (w , s1 , t1 |z ) З З З f (w, s1 , tk-1 |z )   . . . . . . .  . . . f (w, sk-1 |z ) f (w, sk-1 , t1 |z )З З З f (w , sk-1 , tk-1 |z ) (6) Then, the following theorem is obtained. THEOREM 1 Given a causal diagram G on V with S T {U } Z W (V ), suppose that (i) Z {U } d-separates S from T and W from S T ; (ii) f (u1 |z ) < З З З < f (uk |z ) holds true for any z ; (iii) For the matrices defined as equations (5) and (6), both P and Q are k dimensional nonsingular matrices and |Q - P | = 0 has non-zero distinct solutions of  (0 < 1 < ... < k ) for any z (P =Q), then the distribution f (u, w, z ) is identifiable through the observation of S T Z W . 2 PROOF OF THEOREM Let  1 f (t1 |u1 , z ) . . . P1 =  . . . 1 ЗЗЗ . . .

1 f (s1 |z ) . . .

f (sk-1 |z ) f (sk-1 , t1 |z ) З З З f (sk-1 , tk-1 |z )

3

IDENTIFICATION OF CAUSAL EFFECTS

In section 2, it is assumed that both an exposure variable and an outcome variable are observable. If either of them is unobserved, we cannot identify the causal effect of an exposure on an outcome even if a set of variables satisfying the back door criterion or the front door criterion are observed. In this section, we consider the case where an unobserved exposure/outcome variable is assumed to be discrete. Let X be an exposure variable and Y be an outcome variable. Though X or Y is unobserved, researchers are interested in dividing them into k categories. For example, when the domain of Y is divided into k = 3 categories, y1 ,

P2

=

 f (tk-1 |u1 , z )  . . , . 1 f (t1 |uk , z ) З З З f (tk-1 |uk , z )   1 f (s1 |u1 , z ) З З З f (sk-1 |u1 , z ) .  . . . . . . . , . . . . 1 f (s1 |uk , z ) З З З f (sk-1 |uk , z )

and  = diag(f (w |u1 , z ), З З З , f (w|uk , z )) be the k dimensional diagonal matrices of conditional probabili-


ties of observed variables W given U and Z . In addition, let M = diag(f (u1 |z ), З З З , f (uk |z )) be the k dimensional diagonal matrix of conditional probabilities of U given Z . Then, the followings are derived: f (w|z ) = f (tj , sl |z ) = ik
=1

Then, i(1) = 1/ai,1 can be uniquely obtained (i = 1 1, 2, З З З , k ) from the first column, which indicates that P1 is also estimable from E1 A-1 according to the order of 1 , З З З , k , where E1 = diag(1/a1,1 , З З З , 1/ak,1 ). 1 1
- On the other hand, letting A2 = P2 1 E2 , since , E2 and M are diagonal matrices and the elements of  are correspondent to the solutions of (8), we can obtain

f (w|ui , z )f (ui |z ), f (sl |ui , z )f (tj |ui , z )f (ui |z ), ik
=1

ik
=1

Q A2 = P

A 2 .

f (w, tj , sl |z ) =

f (w|ui , z )f (tj |ui , z ) (7)

- This means that matrix A2 = P2 1 E2 is the solution of the characteristic equation

(Q

-

P )x = 0k

зf (sl |ui , z )f (ui |z ) for j, l = 1, 2, З З З , k . Then, we can obtain P = P2 M P1 and Q = P2 M P1 .

for x. Thus, A2 is also estimable ( {1 , З З З , k }). Since P2 = E2 A-1 , letting A-1 = (ai,j ), we can obtain 2 2 2   1 f (s1 |u1 , z ) З З З f (sk-1 |u1 , z )  . . . . . . . P2 =  .  . . . . 1 =  f (s1 |uk , z ) З З З f (sk-1 |uk , z ) 1(2) a1,2 2 2(2) a2,2 2 . . . ЗЗЗ ЗЗЗ . . . 1(2) a1,k 2 2(2) a2,k 2 . . .    .  E2 A-1 2

Thus, by noting that both P and Q are nonsingular matrices of conditional probabilities of observed variables, consider the following equation for : |Q - P | = |P M P2 - P M P2 |
1 1 1

=|P ||M || - Ik ||P2 | = 0,

(8)

where Ik is a k dimensional identity matrix. By solving equation (8), we can obtain the element f (w|ui , z ) (i = 1, 2, З З З , k ) of . Here, let i be a disjoint solution of the above equation satisfying 0 < 1 < З З З < k . This means that  is identifiable if the order of f (w |u1 , z ), З З З , f (w|uk , z ) is known. Let Ei = diag(1(i) , З З З , k(i) ) be a k dimensional diagonal - matrix (i = 1, 2) and A1 = P1 1 E1 be a k dimensional matrix. Since , E1 and M are diagonal matrices and the elements of  are correspondent to the solutions of equation (8), we can obtain QA1 = P A1 .
- This means that matrix A1 = P1 1 E1 is the solution of the characteristic equation

  = 

k(2) ak,1 2

1(2) a1,1 2 2(2) a2,1 2 . . .

k(2) ak,2 2

З З З k(2) ak,k 2

Then, i(2) = 1/ai,1 can be uniquely obtained (i = 2 1, 2, З З З , k ) from the first column, which indicates that P2 is also estimable from E2 A-1 according to the or2 der of 1 , З З З , k , where E2 = diag(1/a1,1 , З З З , 1/ak,1 ). 2 2 From these results, we can obtain
- - P2-1 P P1 1 = P2-1 (P2 M P1 )P1 1 = M .

(9)

(Q - P )x = 0k for x, which indicates that A1 is estimable ({1 , З З З , k }). Here, 0k is a k dimensional zero vector. Since P1 = E1 A-1 , letting A-1 = (ai,j ), we can obtain 1 1 1   1 f (t1 |u1 , z ) З З З f (tk-1 |u1 , z ) .  . . . . . . P1 =  .  . . . . 1 f (t1 |uk , z ) З З З f (tk-1 |uk , z ) 1(1) a1,2 1 2(1) a2,2 1 . . . ЗЗЗ ЗЗЗ . . . 1(1) a1,k 1 2(1) a2,k 1 . . .    .  = E1 A-1 1  1(1) a1,1 1  2(1) a2,1 1  = . .  .

Thus, we can obtain the element f (ui |z ) (i = 1, 2, З З З , k ) of M from equation (9), which is determined uniquely according to the order of disjoint solution 1 < З З З < k of equation (8). Inversely, since the order of the elements of M is identifiable from condition (ii), the order of 1 , З З З , k is identifiable. Thus, the conditional distribution of U given z is estimable through the observation of S T W Z . Then, since f (u, z, w ) = f (w|u, z )f (u|z )f (z ), f (u, z , w) is estimable through the observation of S T W Z . Q.E.D. Based on Theorem 1, the following corollary can be derived immediately. COROLLARY 1 Suppose that one element of {X, Y } is an unobserved variable U and the other element is included in a

k(1) ak,1 1

k(1) ak,2 1

З З З k(1) ak,k 1


set Z W of observed variables. Let C be a subset of Z W \{X, Y } that satisfies the identifiability criteria for the causal effect f (y |set(X = x)). If a set Z W S T of observed variables satisfies conditions (i)-(iii) in Theorem 1, then the causal effect f (y |set(X = x)) is identifiable. 2

true, letting the diagonal elements of M be m1 , З З З , mk determined according to the order 1 < З З З < k , the bounds of the causal effect f (y |set(X = x)) are z miin{i mi } f (x|z ) f (z )f (y |set(X = x)) a z mi x{i mi } f (x|z ) f (z ).

f (y |set(X = x))

4
4.1

BOUNDS ON CAUSAL EFFECT
POTENTIAL OUTCOME APPROACH

Fig. 1 : Causal diagram (1)

Fig. 2 : Causal diagram (2)

We use two examples to illustrate Corollary 1. First, consider the causal diagram shown in Fig. 1. Setting W in Corollary 1 to X in Fig. 1, we can recognize that {Z, Y } d-separates S from T and X from {S, T }. In addition, C ={Z } satisfies the back door criterion relative to (X, Y ). Then, based on the proof of Theorem 1, the distribution of X, Y and Z can be constructed according to the distribution of X, Z, S and T . Thus, if conditions (ii) and (iii) hold true, then the causal effect f (y |set(X = x)) is identifiable through the observation of X, S, T and Z . The closed form expression in the case where Y is a dichotomous variable is provided in Kuroki (2007). Next, consider the causal diagram shown in Fig. 2, where the back door criterion cannot be applied to identify the causal effect of f (y |set(X = x)), because there is a bi-directed arrow in Fig. 2 which indicates that there exist some unmeasured confounders between X and Y . Letting U , W and Z in Corollary 1 be Y ,  and X in Fig. 2, we can recognize that {X, Y } d-separates S from T and Z from {S, T }. In addition, C ={Z } satisfies the front door criterion relative to (X, Y ). Then, based on the proof of Theorem 1, the distribution of X, Y and Z can be constructed according to the distribution of X, Z, S and T . Thus, if conditions (ii) and (iii) hold true, then the causal effect f (y |set(X = x)) is identifiable through the observation of X, S, T and Z . This example shows that our result can also be applied to situations where there is no variable that satisfies the back door criterion. When identifying the causal effect using Theorem 1, it is required that f (u1 |z ) < f (u2 |z ) < З З З < f (uk |z ) holds true for any z . If such an order information is not available, it is impossible to judge whether the causal effect is identifiable or not from Theorem 1. But we can evaluate the bounds of the causal effect. Consider the causal diagram shown in Fig. 1 as an example. By noting that f (y |x, z ) = f (x|y , z )f (y |z )/f (x|z ) holds

In this section, we consider the case where there exist unmeasured variables between an unobserved outcome variable and its proxy variables. Under such a situation, it is impossible to identify the causal effect, but we can derive the bounds on the causal effect by using the potential outcome approach. For simplicity, we only consider the case of an unobserved dichotomous outcome variable, though our result can apply to multi-categorical case directly. Let X and Y be a dichotomous exposure variable (x{x0 , x1 }) and a dichotomous outcome variable (y {y0 , y1 }). Then, the ith of the N sub jects has both an outcome Yx1 (i) that have resulted if he was exposed to x1 , and an outcome Yx0 (i) that have resulted if he was exposed to x0 . When the N sub jects in the study are considered as a random sample from the target population, since Yx1 (i) and Yx0 (i) can be referred to as the values of random variables Yx1 and Yx0 respectively, the causal effect can be defined as the proba= bility P (Yx = y ) f (yx ) of the potential outcome (x{x0 ,x1 }). The potential outcome Yx is observed only if the sub ject receives exposure x (x{x1 , x0 }). Thus, when randomized experiment is conducted and compliance is perfect, the causal effect of X on Y is f (yx )
=

f (y |set(X = x)) = f (y |x),

(10)

by using the consistency condition (Pearl, 2000) X = x  Yx = Y . On the other hand, when randomized experiment is difficult to conduct and only observational data is available, we can still estimate the causal effect according to the strongly-ignorable-treatmentassignment (SITA) condition (Rosenbaum and Rubin, 1983). That is, for the exposure variable X , if there exists such a set Z of covariates that X is conditionally independent of (Yx1 , Yx0 ) given Z , denoted as X || (Yx1 , Yx0 ) | Z , we shall say treatment assignment is strongly ignorable given Z , or Z satisfies the SITA condition. Thus, f (y |set(X = x)) is estimable by using Z and is given as equation (3).


4.2

FORMULATION

p01З1 = p10З1 = p11З1 = p00З0 = p01З0 = p10З0 = p11З0 =

i

j

k

qij k + qij k + qij k + qij k + qij k + qij k + qij k +

i

j

k

qij k qij k , qij k qij k , qij k qij k , qij k . (12)

In order to describe our problem, we consider the simple causal diagram shown in Fig. 3, where X , S and T are observed dichotomous variables (x{x0 , x1 }, s{s0 , s1 }, t{t0 , t1 }). In addition, Y is an unobserved dichotomous variable (y {y0 , y1 }). Furthermore, there is no confounder between X and Y , but there exist unmeasured variables between Y , S and T .

i =0 , 1 j =2 , 3 k =0 , 2 i=2,3 j=0,1 k=0,2 i =2 , 3 j =2 , 3 k =0 , 2 i=0,1 j=0,1 k=0,1 i =0 , 1 j =2 , 3 k =0 , 1 i =2 , 3 j =0 , 1 k =0 , 1
=2 , 3 =2 , 3 =0 , 1

i =0 , 2 j =1 , 3 k =1 , 3 i=1,3 j=0,2 k=1,3 i =1 , 3 j =1 , 3 k =1 , 3 i=0,2 j=0,2 k=2,3 i =0 , 2 j =1 , 3 k =2 , 3 i =1 , 3 j =0 , 2 k =2 , 3
=1 , 3 =1 , 3 =2 , 3

Then, the quantities we wish to bound are: f (y1 |set(X = x1 )) Fig. 3: Causal Diagram (3) Then, the potential outcomes corresponding to this figure can be introduced as follows: (i) Potential outcome Rt in the context of Y as an exposure and T as an outcome: rt0 :(Ty0 , Ty1 ) = (t0 , t0 ),rt1 :(Ty0 , Ty1 ) = (t0 , t1 ), rt2 :(Ty0 , Ty1 ) = (t1 , t0 ),rt3 :(Ty0 , Ty1 ) = (t1 , t1 ), (ii) Potential outcome Rs in the context of Y as an exposure and S as an outcome: rs0 :(Sy0 , Sy1 ) = (s0 , s0 ),rs1 :(Sy0 , Sy1 ) = (s0 , s1 ), rs2 :(Sy0 , Sy1 ) = (s1 , s0 ),rs3 :(Sy0 , Sy1 ) = (s1 , s1 ), (iii) Potential outcome Ry in the context of X as an exposure and Y as an outcome: ry0 :(Yx0 , Yx1 ) = (y0 , y0 ),ry1 :(Yx0 , Yx1 ) = (y0 , y1 ), ry2 :(Yx0 , Yx1 ) = (y1 , y0 ),ry3 :(Yx0 , Yx1 ) = (y1 , y1 ). Letting qij k = P (rti , rsj , ryk ) be counterfactual probabilities (i, j, k = 0, 1, 2, 3), these parameters are constrained by the probabilistic equality i 3 j 3 k3
=0 =0 =0

=

i3 j3 k
=0 =0 =1 , 3

qij k , qij k .

(13)

f (y1 |set(X = x0 ))

=

i3 j3 k
=0 =0 =2 , 3

(14)

Optimizing the functions (13) and (14), sub ject to equality constraints (11) and (12), defines a linear programming (LP) problem that lends itself to closedform solution. Balke (1995) describes a computer program that takes symbolic description of LP problems and returns symbolic expressions for the desired bounds. The problem works by systematically enumerating the vertices of the constraint polygon of the dual problem. The bounds reported in this paper were produced by using Balke's program, and will be stated here without proofs; their correctness can be verified by manually enumerating the vertices as described in Balke (1995). These bounds are guaranteed to be sharp because the optimization is global. Given the observed conditional probabilities, the constraints (11) and (12) induce the bounds [0, 1], which indicates that the causal knowledge available from Fig. 3 can not provide useful evaluation of the causal effect. However, if we assume the monotonic assumption, which leads to q2j k = qi2k = qij2 = 0 i, j, k = 0, 1, 2, 3.

qij k = 1 and 0qij k 1.

(11)

Let pijЗk be the observed conditional probabilities of (T , S ) = (ti , sj ) given X = xk , that is, pijЗk = P (ti , sj |xk ). These observed conditional probabilities impose the constraints by applying both the consistency condition and X || (Sy0 , Sy1 , Ty0 , Ty1 , Yx0 , Yx1 ) to the counterfactual probabilities: i i j k j k p00З1 = qij k + qij k ,
=0 , 1 =0 , 1 =0 , 2 =0 , 2 =0 , 2 =1 , 3

Then, we can obtain the tightest bounds on the causal effects: 0f (y1 |set(X = x0 ))   p01З0 + p10З0 + p11З0 + p00З1   p01З0 + p11З0 + p10З1 + p00З1 min  p10З0 + p11З0 + p01З1 + p00З1   p11З0 + p00З1 + p10З1 + p01З1       

,

(15)


p 00 З 0 - p 00 З 1 p 11 З 1 - p 11 З 0 max  p00З0 + p10З0 - p00З1 - p10З1   p00З0 + p01З0 - p00З1 - p01З1 f (y1 |set(X = x1 ))1.

   

       (16)

It is noted that these formulas require two proxy variables S and T . If only one proxy variable is available, the tightest bounds on the causal effects become [0, 1], which shows that one proxy variable provides no useful information for evaluating the causal effect. Finally, we consider a more complicated situation that there are confounders between X and Y. In the case, if we can observe a set Z of covariates that satisfy the SITA condition, and X || (Sy0 , Sy1 , Ty0 , Ty1 , Yx0 , Yx1 )|Z holds true, by the same procedure as the above, the bounds on the causal effects can be evaluated as 0f (y1 |set(X = x0 ))   p01З0z + p10З0z   z p01З0z + p11З0z min  p10З0z + p11З0z   p11З0z + p00З1z z     max  p00З0z   p00З0z p 00 З 0 z p 11 З 1 z + p 10 З 0 z + p 01 З 0 z + p11З0z + p10З1z + p01З1z + p10З1z + p 00 З 1 z + p 00 З 1 z + p 00 З 1 z + p 01 З 1 z       

First, we consider the situation where there is no bidirected arrow in Fig. 3. Since Y d-separates any two vertices in {S, T , X }, and there is no unmeasured variables between X and Y , Theorem 1 can be used to achieve our aim. Because the real data of this model is not available, we generate hypothetical data according to Fig. 3, which is shown in Table 1. Table 1. Hypothetical Data of the Example x1 x0 s1 s0 s1 s0 t1 0.0648 0.1392 0.1092 0.1568 t0 0.0432 0.0528 0.2478 0.1862

P (z ),

In this example, we suppose that f (u1 ) < f (u2 ). Then, letting 1 Q = 1 .000 f (t1 ) .000 0.47 P= f (s1 ) f (s1 , t1 ) 0.465 0.174 f = 0 , (x1 ) f (t1 , x1 ) .3 0.204 = f (s1 , x1 ) f (s1 , t1 , x1 ) 0.108 0.0648 the eigenvalues of P -1 Q and P 1 Q are 0.533 and 0.109, and the corresponding eigenmatrices are 0 , 0 .196 -0.625 .514 -0.287 A2 = A1 = -0.981 0.781 -0.857 0.958 for P -1 Q and P 1 Q respectively. Thus, letting Ei = diag(a1(i) , a2(i) ) and 1 1 , , .000 f (s1 |y1 ) .000 f (t1 |y1 ) P1 = P2 = 1.000 f (s1 |y2 ) 1.000 f (t1 |y2 ) since we can provide E1 = diag(-0.588, -0.469) and E2 = diag(0.257, 0.287), noting that P1 = E1 A-1 and 1 P2 = E2 A-1 , the followings can be derived: 2 1 , 1 . .000 0.800 .000 0.300 P2 = P1 = 1.000 0.200 1.000 0.600
- Thus, M = diag(f (u1 ), f (u2 )) = P2 1 P P1 1 = diag(0.45, 0.55) and the causal effects of X on Y are f (y1 |set(X = x1 )) = 0.533з0.45/0.3 = 0.8 and f (y1 |set(X = x0 )) = 1.000 - (1.000 - 0.109)з0.55/0.7 = 0.3, respectively.
- - -

- p00З1z - p11З0z - p00З1z - p10З1z - p00З1z - p01З1z

      

,

P (z )

f (y1 |set(X = x1 ))1. With the similar procedure above, we can also derive the bounds on the causal effect when there exist unmeasured variables between an unobserved exposure variable and its proxy variables.

5

EXAMPLE

We illustrate our results using the data from a political action study reported by Hagenaars (1993). He analyzed the data in order to evaluate the causal effect of education on political involvement. The variables of interest are the following: X : education (1: some college; 2: less than college), S : ideological level (1: ideologues; 2: nonideologues), T : repression potential (1: low; 2: high), Y : political involvement (1: high; 2: low), Here, we concentrate our discussion on evaluating the causal effect of X on Y , where Y is unobserved. In order to help readers understand our results, we consider a submodel in Hagenaars (1993), which is shown in Fig. 3.

Second, we consider the situation where there are bidirected arrows shown in Fig. 3. Under this situation, we can not evaluate the causal effects of X on Y by Theorem 1. However, under the monotonic assumption, we can provide the tightest bounds on the causal effects as 0.000 < P (y1 |set(X = x0 )) < min{0.567, 0.549, 0.384, 0.478} = 0.384


max{0.205, -0.044, 0.151, 0.338} = 0.338 < P (y1 |set(X = x0 )) < 1.000, based on equations (15) and (16).

qualitative variables when some of the variables are unobservable: A modified path analysis approach. American Journal of Sociology, 79, 1179-259. Goodman, L. A. (1974b). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, 215-31. Hagenaars, J. A. (1993). Loglinear models with latent variables. Sage Publications. Hoffmann, B and Jockel K. H. (2006). Diesel exhaust and coal mine dust: Lung cancer risk in occupational settings. Annals of the New York Academy of Sciences, 1076, 253-65. Kuroki, M. (2007). Graphical identifiability criteria for causal effects in studies with an unobserved treatment/response variable. Biometrika, accepted. Kuroki, M., Cai, Z. and Motogaito, H. (2005). Graphical identifiability criteria for total effects by using surrogate variables. Proceeding of the 21st Conference on Uncertainty in Artificial Intel ligence, 340-345. Pearl, J. (1988). Probabilistic reasoning in intel ligence systems. Morgan Kaufmann Publishers. Pearl, J. (2000). Causality : models, reasoning, and inference. Cambridge University Press. Rosenbaum, P. and Rubin, D. B. (1983). The Central Role of the Propensity Score in Observational Studies for Causal Effects, Biometrika, Vol. 70, pp. 41-55. Shpitser, I. and Pearl, J. (2006). Identification of Joint Interventional Distributions in Recursive SemiMarkovian Causal Models, Proceedings of the TwentyFirst National Conference on Artificial Intel ligence, 1219-1226. Tian, J. & Pearl, J. (2002). A general identification condition for causal effects. Proceeding of the 18th National Conference on Artificial Intel ligence, 567-73. Wold, H. O. (1954). Causality and econometrics. Econometrika, 22, 162-177. Wright, S. (1923). The theory of path coefficients: a reply to Niles' criticism. Genetics, 8, 239-255. Wright, S. (1934). The method of path coefficients. Annals of Mathematical Statistics, 5, 161-215.

6

DISCUSSION

This paper derived the graphical identifiability criteria for causal effects based on causal modeling in observational studies with an unobserved multi-categorical exposure/outcome variable. In addition, when unmeasured variables exist between an unobserved outcome variable and its proxy variables, we provided the tightest bounds on causal effects by using Balke's LP program method. The results of this paper enable us to evaluate causal effects when it is difficult to observe an exposure/outcome variable. ACKNOWLEGDEMENT This research was partly supported by the Kurata Foundation, the Mazda Foundation and the Ministry of Education, Culture, Sports, Science and Technology of Japan. REFERENCES Balke, A. (1995). Probabilistic Counterfactuals: Semantics, Computation, and Applications, UCLA Cognitive Systems Laboratory, Technical Report. Balke, A. and Pearl, J. (1997). Bounds on Treatment Effects from Studies with Imperfect Compliance. Journal of the American Statistical Association Vol. 92, pp.1171-1176. Bollen, K. A. (1989). Structural equations with latent variables. John Wiley & Sons. Bross, I. D. J. (1958). How to use ridit analysis. Biometrics, 14, 18-38. Finch, B. K. (2003). Socioeconomic gradients and low birth-weight: empirical and policy considerations. Health Services Research, 38, 1819-1842. Fleiss, J. L., Chilton N. W. and Wallenstein S. (1976). Ridit analysis in dental clinical studies. Journal of Dental Research, 58, 2080-2084, Geiger, D., Verma, T. S. & Pearl, J. (1990). Identifying independence in Bayesian networks. Networks, 20, 507-34. Goodman, L. A. (1973). The analysis of multidimensional contingency tables when some variables are posterior to others: a modified path analysis approach. Biometrika, 60, 179-92. Goodman, L. A. (1974a). The analysis of systems of