On Identifying Total Effects in the Presence of Latent Variables and Selection bias Zhihong Cai Department of Biostatistics School of Public Health Kyoto University cai@pbh.med.kyoto-u.ac.jp Manabu Kuroki Department of Systems Innovation School of Engineering Science Osaka University mkuroki@sigmath.es.osaka-u.ac.jp tention from epidemiologists (Greenland, 2003; Hernan et al., 2004), AI researchers (Cooper, 2000) and statisticians (Stanghellini and Wermuth, 2005; Kuroki and Cai, 2006). In observational studies, it is not rare that both latent variables and selection bias exist in one dataset. However, when we examine current results on latent variables and selection bias, we find that most of them deal with either latent variables or selection bias separately, only a few of them take into account these two situations at the same time. In the presence of latent common causes between a treatment and a response, Pearl and his colleagues provided the back door criterion, the front door criterion (Pearl, 2000) and the conditional instrumental variable (IV) method (Brito and Pearl, 2002) as identifiability criteria for total effects in the framework of linear structural equation models. In addition, in the framework of nonparametric structural equation models, Shpitser and Pearl (2006) and Huang and Valtorta (2006) solved the identification problems of causal effects and provided the complete algorithms to derive the causal effects in the presence of latent variables. In general, these criteria are based on the idea that some observed variables which have no direct association with the latent common causes are used to evaluate the total effects. However, in many practical studies, such latent variables may have an effect on some important observed variables to be used to identify total effects. Under such situations, it is difficult to apply these identification criteria to evaluate the total effects. On the other hand, when both confounding bias and selection bias may be at work, Spirtes et al. (1999) described the FCI algorithm (Spirtes et al., 2000) as a method to test whether there is a causal path from one variable to another. In addition, Richardson and Spirtes (2002) introduced ancestral graph models as a graphical model in the presence of latent variables and selection bias, and clarified some properties regarding the ancestral graph models. However, these studies fo- Abstract Assume that cause-effect relationships between variables can be described as a directed acyclic graph and the corresponding linear structural equation modelWe consider the identification problem of total effects in the presence of latent variables and selection bias between a treatment variable and a response variable. Pearl and his colleagues provided the back door criterion, the front door criterion (Pearl, 2000) and the conditional instrumental variable method (Brito and Pearl, 2002) as identifiability criteria for total effects in the presence of latent variables, but not in the presence of selection bias. In order to solve this problem, we propose new graphical identifiability criteria for total effects based on the identifiable factor models. The results of this paper are useful to identify total effects in observational studies and provide a new viewpoint to the identification conditions of factor models. 1 INTRODUCTION The evaluation of total effects from observational studies is one of the central aims in many fields of practical science. In observational studies, there may exist latent variables, for example, a variable measured with error, or an unmeasured confounder. On the other hand, observational data may suffer from selection bias, if a sample is selected according to some selection criteria. The existence of latent variables and selection bias hinder the evaluation of total effects from observational data. Many researchers have provided approaches to deal with latent variables in observational studies (Brito and Pearl, 2002; Pearl, 2000; Stanghellini, 2004; Stanghellini and Wermuth, 2005; Tian, 2004). Recently, selection bias has attracted at- cused on the specification problem of causal structure, but not on the identification problem of total effects. In this paper, we assume that cause-effect relationships between variables can be described as a directed acyclic graph and the corresponding linear structural equation modelThen, we consider the problem of identifying total effects from observational studies with latent variables and selection bias between a treatment variable and a response variableBased on the theory of the identifiable factor model, we propose new graphical identifiability criteria to identify total effects under situations where it is difficult to use the identifiability criteria provided by Pearl and his colleagues and Huang and Valtorta (2006). Different from the identification problem of the factor models, it should be noted that we are interested in evaluating the total effects but not the whole causal model. That is, it will be shown in section 3 that there are some situations where the total effect is identifiable even when the whole causal model is not identifiable. These new criteria are useful to identify total effects in observational studies, and they also provide a new viewpoint to the identification conditions of factor models. are faithful to each other; that is, the conditional independence relationship in the joint distribution is also reflected in G, and vice versa (Spirtes et al., 2000). Here, we denote some notations for further discussion. Let xy ·zs = cov(X, Y |Z = z , aS b) and yy ·zs = var(Y |Z = z , aS b) and yx·zs = xy ·zs /xx·zs (s indicates that each element of s is conditioned by the interval). For disjoint sets X , Y , Z and S , let xy ·zs be a conditional covariance matrix of X and Y given Z = z and aS b. We use the same notations in the case where either X or Y is univariate. In addition, let yy ·xs be a conditional covariance matrix of Y given X = x and aS b. When S or Z is an empty set, they are omitted from 1 these arguments. Furthermore, let Byx·z = yx·z -x·z x be the regression coefficient matrix of x in the regression model of Y on xz . The similar notations are used for other parameters. A total effect yx of X on Y is defined as the total sum of the products of the path coefficients on the sequence of arrows along all directed paths from X to Y . In this paper, it is assumed that the readers are familiar with the identifiability criteria for total effects, for example, the back door criterion, the front door criterion (Pearl, 2000) and the IV method (Bowden and Turkington, 1984; Brito and Pearl, 2000). When a total effect can be determined uniquely from the covariance parameters of observed variables, it is said to be identifiable, that is, it can be estimated consistently. When Z d-separates X from Y in a path diagram G, then both xy ·z = yx·z = 0 and Byz ·x = Byz hold true (e.g. Spirtes et al., 2000). 2 PRELIMINARIES In statistical causal analysis, a directed acyclic graph that represents cause-effect relationships is called a path diagram. A directed graph is a pair G = (V , E ), where V is a finite set of vertices and the set E of arrows is a subset of the set V ×V of ordered pairs of distinct vertices. For graph theoretic terminology used in this paper, see, for example, Lauritzen (1996). Suppose a directed acyclic graph G = (V , E ) with a set V = {V1 , V2 , · · · , Vn } of variables is given. The graph G is called a path diagram, when each childparent family in the graph G represents a linear structural equation model V vi vj Vj + vi i = 1, . . . , n, (1) Vi = j pa(Vi ) 3 3.1 IDENTIFICATION OF TOTAL EFFECTS LEMMA To derive new graphical identifiability criteria for total effects, we first introduce the following lemmas: LEMMA 1 When {X, Y } S T are normally distributed, yx·s = yx·st + Byt·xs Btx·s , yy ·xs = yy ·x - Bys·x ss·x B y s ·x . where pa (Vi ) is a set of parents of Vi . In this paper, if there is no special statement, v1 , . . . , vn are assumed to be independent and normally distributed with mean 0. In addition, vi vj (=0) is called a path coefficient. The conditional independence induced from a set of equations (1) can be obtained from the graph G according to the d-separation (Pearl, 2000), that is, when Z d-separates X from Y in a path diagram G, X is conditionally independent of Y given Z in the corresponding linear structural equation model (e.g. Spirtes et al., 2000). In this paper, it is assumed that a path diagram G and the corresponding joint distribution (2) (3) 2 Equations (2) and (3) are the results of Cochran (1938) and Whittaker (1990), respectively. In addition, the following lemma is given by Wermuth (1989). LEMMA 2 When {X, Y } S T are normally distributed, if T is conditionally independent of X given S or Y is condi- tionally independent of T given {X }S , then yx·st = yx·s holds true. In addition, if T is conditionally independent of Y given S {X }, then yy ·xst = yy ·xs holds true. 2 3.2 DUALITY BETWEEN LATENT VARIABLES AND SELECTION BIAS Then, the covariance matrix of the selected population can be provided as xx·s = xx - Bxs Bxs (ss - s s ) = xx - (s)(s) , (8) In this section, we consider two different situations: one is a situation where a latent variable exists shown in Fig.1 (a); the other is a situation where selection bias exists shown in Fig.1 (b), where X = (X1 , · · · , Xp ) is a set of observed variables, and U is a latent variable which has an effect on X . In addition, Fig.1(b) indicates that the data have been observed according to the selection criterion aS b (both a and b are possible values of S ). Regarding Fig.1 (a), the corresponding linear structural equation model can be provided as where (s) is a p dimensional vector (Johnson and Kotz, 1972). Here, ss = ss - ss·s 0 since ss·s = var(S |aS < b) is the variance of a doubly-truncated normal distribution. In the selected population, it should be noted that xx·s can be observed but xx , ss or Bxs can not be observed. 1 On the other hand, regarding -x·s , we can obtain x 1 1 -x·s = -x + (s)(s) , x x (9) where (s) is a p dimensional vector, which is also called a factor loading in this paper. In addition, when we discuss latent variable problems and selection bias problems based on conditional distribution given Z , the following equations hold true: xx·z = xx·uz + Bxu·z Bxu·z uu·z and xx·zs = xx·z - Bxs·z Bxs·z s ·z , s (a): Latent Variable Case (b): Selection Bias Case where ss·z = ss·z - ss·zs 0. From these equations, we can understand that equations (5) and (6) take the same form as equations (9) and (8), respectively. In this paper, such relationships are called the duality between latent variables and selection bias. By using the dual relationships, we will show below that the identification conditions of factor models are useful to solve the selection bias problems. · Let Gxou be the undirected graph obtained by concv necting any two variables Xi and Xj (i=j) in X by an undirected edge only if the conditional covariance · of Xi and Xj given U is not equal to zero. Let Gxou be cn the undirected graph obtained by connecting any two variables Xi and Xj (i=j) in X by an undirected edge only if the conditional covariance of Xi and Xj given {U } X \{Xi , Xj } is not equal to zero. When we are concern with the covariance structure of X not conditioning on U , U are omitted from these arguments. Fig.1: Graphical representation Xi = xi u U + xi , i = 1, . . . , p, (4) which is called a single factor model (with correlated errors). Then, the covariance matrix of X can be provided as xx = = xx·u + Bxu Bxu uu xx·u + (u)(u) , (5) where (u) is a p dimensional vector, which is called a factor loading in this paper. Here, it should be noted that xx can be observed but uu , xx·u or xu can not be observed. 1 On the other hand, regarding the -x , by the x Sherman-Morrison-Woodbury formula for matrix inversion (Rao, 1972), the inverse matrix of xx can be represented as the form of 1 1 -x = -x·u - (u)(u) , x x Then, Stanghellini and Wermuth (2005) provided the following lemma. LEMMA 3 Equations (5)( or equation (6)) can be solved with re1 spect to xx·u and (u)(u) (or -x·u and (u)(u) ) x if and only if one of the following conditions holds true: (1) (u)=0 and the structure of zeros in xx·u is such that every connectivity component of the complemen· tary graph of Gxou contains an odd cycle; cv (6) where (u) is a p dimensional vector. Regarding Fig.1 (b), the corresponding linear structural equation model can be provided as S= jp =1 si xj Xj + si . (7) 1 (2) (u)=0 and the structure of zeros in -x·u is such x that every connectivity component of the complemen· 2 tary graph of Gxou contains an odd cycle. cn ing conditions in a directed acyclic graph G: (1) {X, U }T d-separates Y from Z , (2) {U }T d-separates {X, Z } from W , and (3) {X }T does not d-separate Z from W . When X is an nondescendant of Y , if {U }T satisfies the back door criterion relative to (X, Y ), then the total effect yx of X on Y is identifiable and is given by the formula yx = xw·t yz ·t - zw·t xy ·t . xw·t zx·t - zw·t xx·t (10) 2 PROOF OF THEOREM 1 Since {U }T satisfies the back door criterion relative to (X, Y )yx = yx·ut can be obtained. In addition, from Lemma 1, the following can be derived: yz ·t = yz ·xut zz ·t + yx·uzt xz ·t + yu·xzt uz·t , xy ·t = yx·ut xx·t + yu·xt ux·t , xz ·t = zx·tw xx·t + zw·tx wx·t , zw·t = zw·xt ww·t + zx·tw xw·t . The similar results hold true for equations (8) and (9) 3.3 IDENTIFIABILITY CRITERION: LATENT VARIABLE CASE It is well known that the graphical identifiability criteria proposed by Pearl and his colleagues are useful to evaluate total effects. However, Stanghellini (2004) pointed out that there are some situations where these identifiability criteria can not be applied to evaluate total effects. As an example, we consider the problem of evaluating the total effect yx of X on Y based on the path diagram shown in Fig. 2, where U is an unobserved variable and {X, Y , Z, W } is a set of observed variables Fig. 2: Path Diagram (1) In Fig. 2, since U is an unobserved variablewe can not apply the back door criterion relative to (X, Y ) to evaluate the total effect yx . In additionsince we can not observe a set of variables that satisfies the front door criterion relative to (X, Y ), the front door criterion can not be applied, either. Furthermore, since there are arrows pointing from the unobserved variable U to every observed variablethe conditional IV method can not be appliedUnder such a situation, it is necessary to propose new identifiability criteria different from current results. When we consider the linear structural equation model corresponding to the directed acyclic graph obtained by deleting from Fig. 2 an arrow pointing from W to Y (i.e., yw = 0)since we can obtain the same covariance structure as the identifiable single factor model (e.g. Stanghellini, 1997), the total effect yx is identifiable (Stanghellini, 2004)Howeverin Fig. 2, since there is an arrow pointing from W to Y the number of observed covariances is less than that of the path coefficients, which indicates that the whole linear structural equation model can not be identifiable even if the variance information on U is known (e.g. uu = 1). However, the path coefficient yx is identifiable. This result is summarized as follows (Kuroki, 2007): THEOREM 1 Suppose that a set {X, Y , W, Z }T of observed variables and an unobserved variable U satisfy the follow- From condition (1)since Y is conditionally independent of Z given {X, U }T , yz ·xut = 0 can be obtained. In additionby using Lemma 2we can obtain yx·uzt = yx·ut and yu·xzt = yu·xt Noting these results, we have yz ·t = yx·ut xz ·t + yu·xt uz·t . Then, we can obtain xw·t yz ·t - zw·t xy ·t = yx·ut (xw·t xz ·t - zw·t xx·t ) +yu·xt (xw·t uz·t - zw·t ux·t ). Here ince zw·tx =0 holds true from condition (3) and s the faithful condition, from Lemma 1 and xw·t zw·t we can obtain xw·t xz ·t - zw·t xx·t = -zw·tx ww·t (xx·t - xw·t wx·t ) = -zw·tx ww·t xx·tw = 0. Thusby noting that xw·ut = zw·ut = 0 can be obtained from condition (2), we have xw·t uz·t - zw·t ux·t = 0. By noting these results, equation (10) can be derived. Q.E .D. = xw·ut ww·t + xu·tw uw·t , = zw·ut ww·t + zu·tw uw·t , It should be noted that the assumption of the variance of an unobserved variable U is not required in Theorem 1, which is different from the identification condition of factor models (e.g. Stanghellini, 1997). In the case where there are more than one unmeasured confounder, Kuroki (2007) pointed out that the identification condition for multi-factor models (e.g.Grzebyk et al., 2004) is also useful to identify the total effects. The results can be summarized as follows. THEOREM 2 Let X = {X1 , · · · , Xp } be a set of observed variables and U = {U1 , · · · , Uk } a set of unobserved variables in the path diagram G. When a linear structural equation model obtained by conditioning on {U1 , · · · , Ui-1 } and marginalizing on {Ui+1 , · · · , Uk } is regarded as a single factor model of Ui , if the single factor model of Ui is identifiable for any i(1ik )xx·u is also identifiable. PROOF OF THEOREM 2 Firstthe covariance matrix corresponding to the linear structural equation model which is marginalized on {U2 , · · · , Uk } is given by xx = xx·u1 + 1 u1 u1 xu 1 xu 1 . 3.4 Fig. 3: Path Diagram (2) IDENTIFAIBLITY CRITERION: SELECTION BIAS CASE Selection bias is another case that the identifiability criteria proposed by Pearl and his colleagues can not be applied to evaluate total effects. Consider the identification problem for the total effect yx based on the path diagram shown in Fig. 4, which indicates that sample selection is conducted according to a criterion aS b. Then, S is called a selection variable. In addition, {X, Y , Z, W } is a set of observed variables. Then, xx·u1 is identifiable from the assumption Here, we assume that xx·u1 ···ui-1 is identifiable for i(2). Then, the covariance matrix corresponding to the linear structural equation model which is marginalized on {Ui+1 , · · · , Uk } and conditioned on {U1 , · · · , Ui-1 } is given by xx·u1 ···ui-1 = xx·u1 ···ui + 1 ui ui ·u1 ···ui-1 xui ·u1 ···ui-1 xui ·u1 ···ui-1 . Fig. 4: Path Diagram (3) In Fig. 4, since a sample is selected from the population using such a criterion as aS b, the statistical dependencies among {X, Y , Z, W } are biased. Thus, we can not apply any identifiability criteria proposed by Pearl and his colleagues to identify the total effect. On the other hand, when we consider the linear structural equation model corresponding to the directed acyclic graph obtained by deleting from Fig. 4 an arrow pointing from Y to W (i.e. wy = 0), since the number of the observed covariances is equal to that of the path coefficients, the total effect yx can be evaluated through the observed covariances. However, in Fig. 4, since the number of the observed covariances is less than that of the path coefficients, the whole linear structural equation model is not identifiable. But, the total effect yx is identifiable through the following theorem. THEOREM 3 Suppose a set {X, Y , W, Z }T of observed variables and a selection variable S satisfy the following conditions in a directed acyclic graph G: (1) {X }T d-separates Y from Z , (2) T d-separates {X, Z } from {W }, (3) {X }T does not d-separate S from Z , and Thus, xx·u1 ···ui is identifiable from the assumption By repeating this procedure, the result can be obtained. QE D When Theorem 2 holds true for {X, Y }Z X if Z U satisfies the back door criterion relative to (X, Y )the total effect yx is identifiable As an example, we consider the problem of evaluating total effect yx in the path diagram shown in Fig. 3. Although we can not apply the identifiability criteria proposed by Pearl and his colleagues, since Theorem 2 holds true, the total effect yx is identifiable and is given by the formula yz1 z2 w1 - yw1 z1 z2 yx = . xz1 z2 w1 - xw1 z1 z2 It is interesting that the above equation does not include the covariance parameter about W2 . (4) T does not d-separate S from W . When X is a nondescendant of Y , if T satisfies the back door criterion relative to (X, Y ), then the total effect yx of X on Y is identifiable and is given by the formula yx = xw·ts yz ·ts - zw·ts xy ·ts . xw·ts zx·ts - zw·ts xx·ts (11) 2 PROOF OF THEOREM 3 Since T satisfies the back door criterion relative to (X, Y )yx = yx·t can be obtained. In addition, from Lemma 1 and condition (2), yz ·ts = yz ·xt + yx·tz xz ·t - ys·t zs·t ss·t xy ·ts = yx·t - xs·t ys·t ss·t , zx·ts = zx·t - zs·t xs·t ss·t , zw·ts = -zs·t ws·t ss·t , xw·ts = -xs·t ws·t ss·t . From conditions (1)since Y is conditionally independent of Z given {X }T yz ·xt = 0 can be obtained. In additionfrom Lemma 2we can obtain yx·zt = yx·t Using these results, we have yz ·ts = yx·t xz ·t - ys·t zs·t ss·t . Thus, we can obtain xw·ts yz ·ts - zw·ts xy ·t s = = yx·t ws·t ss·t xx·t (xs·t zx·t - zs·t ) -yx·t ws·t ss·t xx·t zs·xt /ss·t . criterion relative to (X, Y ). If the answer is affirmative, go to Step 2. Step 2: By noting that xx·ts = xx·t - Bxs·t Bxs·t ss·t , check whether or not the structure of zeros in xx·t 1 (or -x·t ) is such that every connectivity component x · · of the complementary graph of Gxot (or Gxot ) contains cv cn an odd cycle. If the answer is affirmative, since xx·t is identifiable (Stanghellini and Wermuth, 2005), then go to Step 3. Step 3: Check whether Theorem 2 holds true for xx·t with regard to U . If the answer is affirmative, since xx·tu is identifiable, then we can evaluate the total effect yx of X on Y . 4 APPLICATION The above results are applicable to analyze the data from a study about setting up painting conditions of car bodies, reported by Okuno et al. (1986). The data was collected with the purpose of setting up the process conditions, in order to increase transfer efficiency. The size of the sample is 38 and the variables of interest, each of which has zero mean and variance one, are the following: Painting ConditionDilution Ratio (X1 ), Degree of Viscosity (X2 ), Painting Temperature (X8 ) Spraying ConditionGun Speed (X3 ), Spray Distance (X4 ), Atomizing Air Pressure (X5 ), Pattern Width (X6 ), Fluid Output (X7 ) Environment ConditionTemperature (X9 ), Degree of Moisture (X10 ) Response: Transfer Efficiency (Y ) Concerning this process, Kuroki et al. (2003) presented the path diagram shown in Fig. 5 (for the detail, see Kuroki et al., 2003). Based on the path diagram, Kuroki et al. (2003) presented the estimated correlation matrix. We here provide a part of the correlation matrix in Table 1. From Table 1, we assume that the covariance information on X4 and X10 is not obtained. Although X1 , · · · , X6 are considered to be controllable variables in Okuno et al. (1986), X2 and X6 are taken as treatment variables from controllable variables in order to evaluate their total effects from nonexperimental data in this paper. Table 2 shows the selected variables for estimating total effects. The treatment variables of interest are listed in the first column. The second column shows Heresince ws·t zs·xt =0 holds true from conditions (3) and (4) and the faithful condition, according to Lemma 1we can obtain xw·ts xz ·ts - zw·ts xx·ts = -ws·t xx·t ss·t (zs·t - xs·t zx·t ) = -ws·t ss·t xx·t zs·xt /ss·t . By noting these results, equation (11) is derived. Q.E .D. 3.5 IDENTIFAIBLITY CRITERION: LATENT VARIABLE AND SELECTION BIAS CASE Finally, we consider the case where both latent variables U and selection bias according to a selection criterion aS b exist. Let X (X, Y X ) and T be sets of observed variables, the steps for judging whether or not the total effect is identifiable are as follows: Step 1: Check whether or not the combination of a subset of X \{X, Y } and U T satisfies the back door Fig. 5Path Diagram (Kuroki et al., 2003) Table 1Estimated Correlation Matrix (Kuroki X1 X2 X5 X6 X8 1.000 -0.736 0.028 -0.042 0.216 -0.736 1.000 -0.063 0.095 -0.684 0.028 -0.063 1.000 0.291 0.076 -0.042 0.095 0.291 1.000 -0.114 0.216 -0.684 0.076 -0.114 1.000 0.283 -0.635 0.099 -0.149 0.761 -0.091 0.326 -0.277 -0.250 -0.493 total effect X1 X2 X5 X6 X8 X9 Y treatment et al., 2003) X9 Y 0.283 -0.091 -0.635 0.326 0.099 -0.277 -0.149 -0.250 0.761 -0.493 1.000 -0.475 -0.475 1.000 Table 2Estimates of Total Effects X2 X6 Z = X1 ,W = X9 ,T = X8 Z = X5 ,W = X9 ,T = covariates 5 DISCUSSION -0.116 -0.465 sets of covariates used for identifying total effects. The third columns shows the estimates of total effects. First, consider a situation that we wish to evaluate the total effect of X2 on Y . Then, it can be recognized that the total effect can not be evaluated based on the back door criterion or the conditional IV method, because the covariance information on X10 can not be obtained from Table 2. In addition, since X10 exists in the back door path between X2 and X7 , the front door criterion can not be applied, eitherHowever since a set of variables provided in the second column satisfies the conditions in Theorem 1, the total effect can be evaluated by using equation (10) Next, consider a situation that we wish to evaluate the total effect of X6 on Y . Then, it can be recognized that the total effect can not be evaluated based on the back door criterion or the conditional IV method, because the covariance information on X4 can not be obtained from Table 2. In addition, there is not a set of variables satisfying the front door criterionHoweversince a set of variables provided in the second column satisfies the conditions in Theorem 1, the total effect can be evaluated by using equation (10) This paper discussed identification problems for total effects based on causal modeling in observational studies with latent variables and selection bias. In order to derive the graphical identifiability criteria, we introduced identification condition for factor models to the identification problem of total effects. In addition, we pointed out that there are some cases where the total effect is identifiable even when the identification condition for factor models does not hold true. Furthermore, we proposed new identification conditions of total effects, and provided the closed form expression of the identifiable total effects. The results of this paper help us judge from graph structure whether the total effect can be evaluated from observational studies in the presence of latent variables and selection bias. ACKNOWLEGDEMENT This research was supported by the Ministry of Education, Culture, Sports, Science and Technology of Japan, the Kurata Foundation and the Mazda Foundation. REFERENCES Bowden, R. J. , and Turkington, D. A. (1984). Instrumental Variables, Cambridge University Press. Brito, C. and Pearl, J. (2002). Generalized instrumental variables. Proceeding of the 18th Conference on Uncertainty in Artificial Intel ligence, 85-93. Cochran, W.G. (1938). The omission or addition of an independent variate in multiple linear regression, Sup- plement to the Journal of the Royal Statistical Society, 5, 171-176. Cooper, G. F. (2000). A Bayesian method for causal modeling and discovery under selection. Proceedings of the Conference on Uncertainty in Artificial Intel ligence, 16, 98-106. Greenland, S. (2003). Quantifying biases in causal models: Classical confounding versus colliderstratification bias. Epidemiology, 14, 300-306. Grzebyk, M., Wild, P. and Chouaniere, D. (2004). On identification of multi-factor models with correlated residuals. Biometrika, 91, 141-151. Hernan, M. A. , Hernandez-Diaz, S. and Robins, J. M. (2004). A structural approach to selection bias. Epidemiology, 15, 615-625. Huang, Y. and Valtorta, M. (2006). Pearl's Calculus of Intervention is Complete. Proceedings of the Conference on Uncertainty in Artificial Intel ligence, 22, 437-444. Johnson, N. L. & Kotz, S. (1972). Distributions in Statistics : Continuous Multivariate Distributions. New York: John Wiley & Sons. Kuroki, M.(2007). Identifiability Criteria for Total Effects in the Presence of Unmeasured Confounders (In Japanese),Japanese Journal of Applied Statistics,36, 71-85. Kuroki, M. and Cai, Z. (2006). On Recovering Population's Covariance Matrix in the Presence of Selection Bias, Biometrika, 93, 601-611. Kuroki, M., Miyakawa, M. and Cai, Z. (2003). Joint Causal Effect in Linear Structural Equation Model and Its Application to Process Analysis, Proceedings of the Workshop on Artificial Intel ligence and Statistics, 9, 70-77 Lauritzen, S. L. (1996). Graphical models, Clrendon PressOxford. Okuno, T., Katayama, Z., Kamigori, N., Itoh, T., Irikura, N. and Fujiwara, N. (1986). Kougyou ni okeru Tahenryou Data no Kaiseki (In Japanese), Nikkagiren, Tokyo. Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge University Press. Rao, C. R. (1973). Linear statistical inference and its applications. John Wiley & Sons. Richardson, T. S. and Spirtes, P. (2002). Ancestral graph markov models. Annals of Statistics, 30, 9621030. Shpitser,I. and Pearl, J. (2006). Identification of Joint Interventional Distributions in Recursive SemiMarkovian Causal Models. Proceedings of the National Conference on Artificial Intel ligence,21, 1219-1226. Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, prediction, and search, 2nd edition, MIT Press. Spirtes, P. , Meek, C. , and Richardson, T. (1999). An algorithm for causal inference in the presence of latent variables and selection bias. In Glymour, C. and Cooper, G. , editors, Computation, Causation, and Discovery, 211-252. Stanghellini, E. (1997). Identification of a single-factor models using graphical Gaussian rules. Biometrika, 84, 241-244. Stanghellini, E(2004)Instrumental variables in Gaussian directed acyclic graph models with an unobserved confounderEnvironmetrics, 15, 463-469 Stanghellini, E. and Wermuth, N. (2005). On the identification of directed acyclic graph models with one hidden variable. Biometrika, 92, 337-350. Tian, J. (2004). Identifying linear causal effects. Proceeding of the Nineteenth National Conference on Artificial Intel ligence, 104-111. Wermuth, N. (1989). Moderating effects in multivariate normal distributions. Methodika, 3, 74-93. Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics, John Wiley and Sons.