Explanation Trees for Causal Bayesian Networks
Ulf H. Nielsen1
uln@zurich.ibm.com

Jean-Philippe Pellet1,2
jep@zurich.ibm.com

André Elissee1
ael@zurich.ibm.com

1

Business Optimization Group IBM Research GmbH 8803 Rüschlikon, Switzerland

2

Machine Learning Group 8092 Zurich, Switzerland

Swiss Federal Institute of Technology Zurich

Abstract
Bayesian networks can b e used to extract explanations ab out the observed state of a subset of variables. In this pap er, we explicate the desiderata of an explanation and confront them with the concept of explanation prop osed by existing metho ds. The necessity of taking into account causal approaches when a causal graph is available is discussed. We then intro duce causal explanation trees, based on the construction of explanation trees using the measure of causal information ow (Ay and Polani, 2006). This approach is compared to several other metho ds on known networks.

·

Explanation of the

reasoning process.

When we

have received some evidence and b elief states are up dated by probabilistic inference, how was the reasoning pro cess by which we arrive at this state?

·

Explanation of the

model,

which provides insight

into the static comp onents of a network such as (conditional) indep endence relationships, causal mechanisms, etc.

We shall fo cus our attention on the explanation of evidence: we wish to explain why variables in sp ecic observed values using assignments in

O took on V \ O.

To this purp ose, we discuss in section 2 the requirements of such an explanation. In section 3, we list the standard approaches to evidence explanation as well as some recent metho ds to make explanations more concise, and explain some of their drawbacks. We then present causal information trees in section 4, and detail exp eriments and comparisons in section 5.

1

INTRODUCTION

A Bayesian network (BN, Pearl, 1988) is an algebraic to ol to compactly represent the joint probability distribution of a set of variables

NOTATION
Boldface capitals denote sets of random variables or no des in a graph, dep ending on the context. like

V by exploiting conditional

indep endence amongst variables. It represents all variables in a directed acyclic graph (DAG), where the absence of arcs b etween no des denotes (conditional) indep endence. In addition to graphically representing the structure of the dep endencies b etween the variables, BNs allow inference tasks to b e solved more eciently. In this pap er, we discuss the extraction of explanations in

V

is the

set of all variables in the analysis. Italicized capitals ements of

X , Xi , Y V;

are random variables or no des and elcalligraphic capitals such as

X,Y

are

their resp ective domains. face lowercase, as

Vectors are denoted b oldscalars in italics. Unless

e

or

p;

causal BNs

(Pearl, 2000; Spirtes et al., 2001)BNs

otherwise stated, the scalars

x, y

are assumed to b e

where the arcs depict direct causeeect relationships b etween variables. Generally, explanations in BNs can b e classied in three categories (Lacave and Diez, 2002) dep ending on the fo cus of the explanation:

a value of their resp ective upp ercase random variable. The probability distribution of a random variable denoted by

X is p(X ), and we write p(X = x) or p(x) the probability of x. We only work with discrete variables.

·

Explanation of

evidence.
O = o?

2
O V,
what is

AN IDEALIZED EXPLANATION
O
may b e observed, the

Given a subset of obEven though many variables

served (instantiated) variables that b est explains

the state of (some of ) the other variables

V\O

explanation can b e fo cused only on a sp ecic subset

E  O.

The state

E=e

is then called

explanandum.


The set of explanatory variables planation is an assignment

H

V

can include with

is adaptable to any causal mo del that can predict the eect of interventions on certain variables. In addition to assuming that the relationships b etween the variables

b oth observed and unobserved variables, and an ex-

O=o

for variables b oth in

H = h (compatible H and in O).
all variables V

V

can b e represented by a fully oriented

causal BN, we assume that the corresp onding joint probability distribution is faithful and causally sucient (Pearl, 2000; Spirtes et al., 2001).

explanatory variables H

Faithfulness

explanandum E=e

of the distribution ensures that there is a unique graph whose arcs depict all (conditional) dep endencies of the distribution, and only those.

observed variables O=o

Causal suciency
V,

forsuch

bids hidden common causes for variables in rect causation.

that we can build a DAG whose arcs represent diAlthough most exp ert-designed BN We insist on the distinction b etween the explanandum are naturally oriented causally, the output of structure causal learning algorithms are often partially directed graphs and may need additional exp ert knowledge to b e fully oriented. To summarize, we wish our explanations to give us causal information by detailing the mechanisms that lead to the explanandum, using all the available information we have ab out the state of the network.

e

and the observations

o

(Cha jewska and Halp ern,

1997). Observations are all our knowledge ab out the current state of a system, and this might not coincide exactly with what we want explained. Consider for example the case where we wish to know why the grass is wet while we know it has b een raining. We do not seek an explanation for why it has rained, only for why the grass is wet. A p erfectly valid explanation is that the grass is wet b ecause it is raining if no other factors can suciently explain the facts. An algorithm resp ecting this should then determine, for each variable in relevant to explain in

3

EXISTING METHODS

O,

whether its observed state is

This section reviews and discusses some of the ma jor techniques to nd explanations.

e, and for each unobserved variable
This excludes

V \ O,

whether knowing its state adds explanatory

p ower to the prop osed explanation. metho ds which marginalize out

O,

preventing these

3.1 MOST PROBABLE EXPLANATION & VARIANTS
A common noncausal measure of explanatory p ower is the conditional probability of the explanatory variables

variables from b eing part of an explanation. To explain

why

a given system is observed in a given

state, we must intuitively convey some information ab out the causal mechanisms that lead to the observation made. If we observe that it is raining and some explanation tells us that it rains b ecause the grass is wet, we do not nd it a go o d explanation as it contradicts our understanding of how the system works; that the grass b eing wet cannot make it rain. p ose we have an explanation Supan

explanation
siders

H

given the explanandum

e.

The

most probable

(MPE) approach (Pearl, 1988) then conas the b est explanation

h = arg maxh p(h | e)

(or, alternatively, lo oks for the

k

b est explanations by

maximizing this probability). The explanandum tions

e

is

in the case of MPE equal to the full set of observa-

O = o,

and the set

H

is

V \ E.

This list can b e

H=h

for

E = e:

long and uninformative b ecause of lack of conciseness; moreover, it is hard to distinguish b etween long explanations, whose resp ective probabilities are low anyway and close to one another. In the

intuitive interpretation of this result is that manually

H=h observe E = e.
setting

will b e a favourable conguration to As Halp ern and Pearl (2005) discuss,

explanations need to b e causal to b e consistent with users' knowledge of the mechanisms of the system. It is therefore imp ortant that the explanations are given in a data-generating direction, such that users can infer interventional rules from the given explanations (for instance, if I can make it rain somehow, then I know that the grass will b e wet, as opp osed to an imp ossible let me make the grass wet so as to make it rain). Causal metho ds are sub ject to the availability of causal information. In this pap er, we extract the causal information from causal BNs, but in general, the approach

partial abduction

approach (Shimony, 1991),

the set of explanatory variables is a strict subset

H

V \ E. x

The set of variables

X = V\H\E
we lo ok for

excluded from the explanation is then marginalized out b efore the maximum is computed:

riori

arg maxh

p(h, x | e).

This is the

maximum a poste-

(MAP) mo del approach. The excluded variables Automatically selecting the rel-

X are selected either by a user, or via automated analysis of the network. mony, 1991). evant explanatory variable is a nontrivial issue (Shi-


Partial ab duction is computationally more exp ensive than standard MPE, b ecause it cannot b e readily solved by message passing algorithms, but approximations exist (e.g., Park, 2002). On the other hand, it globally leads to more concise explanations than MPE. Further eorts to make explanations more concise include de Camp os et al. (2001), where the

SE analysis also works by comparing two explanations

hi

ratio

and

hj ,

usually with

Bayes' factor

or the

likelihood

(Jereys, 1961): Bayes' factor =

p osterior ratio prior ratio

=

p(e | hi ) p(hi | e) / p(hj | e) = . p(hi ) / p(hj ) p(e | hj )

k

most prob-

able explanations are found and then simplied based on relevance and probabilistic criteria; and Henrion and Druzdzel (1990), where also partial assignments are allowed but only within a predened tree that limits the set of p ossible explanations. An explanation is then a path from the ro ot of the tree to a leaf, denoting variable assignments for each branch taken. This is known as

The empirical interpretation of Bayes' factor given by Jereys (1961) is that if it is less than 1 it is in favor of

hj ,

if less than 3 it is a slight supp ort for

hi .

If it is

b etween 3 and 12, it is a p ositive supp ort; and higher than 12, it is a strong supp ort for

hi .

In Yuan and Lu (2007), Bayes' factor is used to search for explanations consisting of only a few variables by ranking them by their Bayes' factor computed as the ratio b etween the probability of the explanation given the explanandum and its opp osite:

scenario-based explanation.

The b est expla-

nation is the one with the highest p osterior probability. There are several concerns with these approaches MPE/MAP or scenario-basedmaximizing some conditional probability of the explanatory variables (Chajewska and Halp ern, 1997). First, they do not distinguish the explanandum and the observations, such that the additional state information that is not meant to b e explained is excluded from a p ossible explanation. Furthermore, there is no distinction b etween observing an explanatory variable and forcing it to have the

Bayes' factor

=

p(h | e) . 1 - p(h | e)

An exhaustive search is p erformed over all subsets of the hyp othesis, and the explanations are shown to b e more concise in a sample network than MPE, Shimony's (1991) MAP, and the simplications describ ed by de Camp os et al. (2001). A similar criticism as b efore can b e applied to these metho ds: additional observations are discarded, and the causal directionality is ignored in the selection of the relevant explanatory variables.

X in a certain state x, 1 value x. Thus, dep ending


on the choice of the explanatory variables, the intuitive interpretation (as describ ed in the previous section) stating that setting conguration for

H = h will be a favourable observing E = e do es not hold.

MPEs and, to a lesser extent, MAP mo del explanations, are not robust: little changes in the network will often change the result of the analysis, even though the changes o ccur in parts of the network largely indep endent of the explanandum (Chan and Darwiche, 2006). Common to the metho ds in this subsection is that they order explanations by to

3.3 EXPLANATION TREES
The metho d of Flores (2005) constructs a set of b est explanations while at the same time giving a preference for concise explanations, summarizing the results of the analysis in an

p(h, e)

(this is equivalent

explanation tree.

We describ e this

p(h | e)

as

p(e)

is constant for a given

e):

this joint

metho d in more detail, as the causal information tree metho d (describ ed in section 4) is based on a similar representation.

probability cannot b e considered alone to determine the explanatory p ower of

h

on

e.

Some of these prob-

lems are illustrated by the exp eriments in section 5.

3.2 SE ANALYSIS
In SE analysis, Jensen (2001) additionally considers the sensitivity of an explanation

h

with resp ect to the

explanandum. Less sensitive explanations ensure that little changes in the network's parameters will not lead to severely dierent explanations, so that the explanation is stable with resp ect to the sp ecication of the network.

Denition 1 An explanation tree for an explanandum E = e is a tree in which every node X is an explanatory variable (with X  V \ E), and each branch out of X is a specic instantiation x  X of X . A path from the root to a leaf is then a series of assignments X = x, Y = y , · · · , Z = z , summarized as P = p, which constitutes a ful l explanation.
Flores's (2005) algorithm, summarized in Algorithm 1, builds such an explanation tree. Starting with an empty tree, the variable to use as the next no de is selected to b e the one that, given the explanandum, reduces the uncertainty the most in the rest of the explanatory variables according to some measure. The no des that are on the path b eing grown are added to

The dierence between observation and intervention is fundamental to causality and is best described with the example of Simpson's paradox in Pearl (2000), chap. 6.

1


a conditioning set, so that the part of the explanatory space they already account for is taken into account. Two stopping criteria are used to determine when to stop growing the tree: the minimum p osterior probability

ab out the remaining variables in the set of explanatory variables. But this do es not measure the information of the added variables shared with the explanandum. Moreover, the explanandum actually grows as the tree is constructed, since there is no dierence b etween the constructed path


of the current branch, and the mini-

mum amount of uncertainty reduction


that must b e

p

and

e

at line 2. Thus, this max-

achieved by adding a new variable. Among all explanations represented by the nal tree, the b est one is the one with the largest p osterior probability

imization cannot b e interpreted as selecting variables reducing the uncertainty in the explanandum. Second, the algorithm makes no distinction b etween explanandum and observations. To try to x this, we could either additionally condition on observations

p(p | e).

Algorithm 1 Flores's (2005) Explanation Tree
1:
function Input:

H : set of explanatory variables E = e : explanandum P = p : path of variable assignments ,  : stopping criteria Output: T : an explanation tree P  2: X  arg maxX H Y H (X ; Y | e, p) (X ; Y | e, p) <  or p(p | e) <  then 3: if max 
Y H\X

T=

ExplanationTree(H, e, p; ,  )

O = o, e,

or marginalize out

O

altogether. The former

case is no dierent from adding

and the latter case excludes all

o to the explanandum X  O from expla-

nations, such that b oth cases are unsatisfactory. Third, the criterion to cho ose the b est explanation is the probability of the explanation path given the evidence

Inf


Inf

p(p | e),

and not how likely the system is to

4: 5:

return end if

have pro duced the evidence we are trying to explain with a conguration


p, p(e | p).

Both measures are

6: T  new tree with root X 7: for each x  domain(X  ) do 8: T  ExplanationTree(H \ X  , e, p  {x}) 9: add a branch x to T with subtree T and 10: assign it the label p(p, x | e) 11: end for 12: return T
The algorithm is parametrized with the measure of uncertainty reduction plementation, we used the

linked, but since several explanations can cover almost an equal share of the explanation space and often only one will b e included in the explanation tree, the criterion

p(p | e)

will miss explanations which could have

explained the evidence well, but do not cover as large a fraction of the explanation space. Fourth, causal considerations are ignored: there is no

Inf(X ; Y | e, p).2

distinction b etween ancestors and descendants of a variables, such that we can get explanations of the typ e it rains b ecause the grass is wet. From an end-user p ersp ective though, trees are a go o d solution for representing several comp eting explanations compactly and readably. sues discussed here. We intro duce in section 4 a mo died approach, which can address the is-

mation.
ab out and is

conditional mutual inforI (X ; Y | z)
is a sym-

For our im-

The mutual information we get by knowing

metrical measure of how much reduction in uncertainty

Y

X in the context Z = z, dened as: I (X ; Y | z) = x y p(y | x, z) p(x | z) p(y | x, z) log . (1) p(y | z)
X Y

If

X and Y are independent given Z = z, we have I (X ; Y | z) = 0. If X fully determines Y , then knowing
one is enough to know the other and full information is shared. Explanation trees are interesting in that they can present many mutually exclusive explanations in a compact form. Flores (2005) also argues that explanations as constructed by Algorithm 1 are reasonable and more sensible than (k -)MPE in the sense that on simple networks, the returned explanations are those that we exp ect. Four elements, however, are sub ject to discussion. First, on line 2 of Algorithm 1, variables are added to the tree in order of how much information they provide

4

CAUSAL EXPLANATION TREES
The tree is

Like the previous metho d, causal explanation trees take advantage of a tree representation. are causal: grown so as to ensure that explanations in any path variables can b e selected as explanatory only if they causally inuence the explanandum. Before dening the causal criterion used in this ap-

vention distribution

proach, we need to dene the concept of

postinter-

(Pearl, 2000, p. 72). A standard

conditional probability of the form

probability (or probability density) of bility of

p(e | x) gives the e when X = x

is observed. It do es not represent, however, the proba-

2 See Flores (2005) for additional cases where max at line 3 is replaced by min or avg, and Inf is the Gini index.

e if we manually force variable X to have value x. Causally, we are interested in the intervention on X , which we denote by do(X = x), rather the observation of x. In causal BNs, the to ol used to evaluate


the eect of these conditionings is Pearl's (1995) to evaluate the p ostintervention distribution.

do -

prior probability ing

calculus, which uses the structure of the causal graph

^ p(e | o, p), so that we end up comput^ I (X  e | o, p) = p(e | o, x, p) ^^ ^ p(x | o, p)p(e | o, x , p) ^^
X

Given a causal Bayesian network B in the sense of Pearl (2000, p. 23) over variables V = {X1 , · · ·, Xd }, the postintervention distribution p(v | do(Xi = xi )), also denoted p(v | xi ), after ^ an intervention do(Xi = xi ) can be expressed as:
j p(x1 , · · ·, xd | xi ) = ^ 0
=i p(xj | paj )

Denition 2

x
X

^ p(x | o, p)p(e | o, x, p) ^^ logx ^ p(e | o, p)

,

ensuring that the exp ected value

EE

I

e

E

^ ^ p(e | o, p)I (X  e | o, p)

equals

= ^ (X  e | o, p) ^ I (X  E | o, p).

if xi = xi , if xi = xi ,

Using this criterion, (2) built as follows:

the explanation tree is then i.e., the no de which has the

the ro ot no de is selected as b eing

arg maxX I (X  e | o);

maximum information ow to the state of the ex-

where paj is the values of the graphical parents (i.e., direct causes) of the node Xj in B.
The

planandum. The imp ortant part is that we may condition on

o without confusing observation and explananXO
in addition to unobserved vari-

truncated factorization of (2) states that the probXi
had no incoming causal inuence (i.e., no

dum. Furthermore, we also allow selection of an observed variable ables, consistently with our desiderata. When it is observed and we know its value to

ability distribution is computed as if the manipulated variable direct causes), and as if one. This makes sense, distribution. With this concept, we can now dene the

X  O, x X = x: .

p(Xi = xi ) had probability as forcing Xi to have a certain

x.

We must then

compute the p ointwise causal information ow from

value eectively ignores its direct causes and natural

e,

with

o

b

eing

o

without the observation

formation ow

causal in-

^ I (x  e | o , p) = log x
X

^^ p(e | o , p, x) ^ p(x | o , p)p(e | o , x , p) ^^

(Ay and Polani, 2006), which will b e value for b eing

our measure of causal contribution of explanatory variables towards our explanandum.

The tree is then grown recursively: for each p ossible

X,

a branch is added to the ro ot. For each and so on, where the We

X to Y given the interventions do(Z = z), written ^ ^ I (X  Y | z), is: I (X  Y | z) = x
X

Denition 3 The causal information ow from

new leaf, the next explanatory variable is selected as

do -conditioning
mation ow

arg maxY I (Y  e | o, x), ^

set always reects the selected vari-

able values from the ro ot to the current leaf.

^ p(x | z)

y
Y

p(y | x, z) log ^^

p(y | x, z) ^^ , ^ p (y | z)

use only one stopping criterion, the minimum infor(3)


we accept as a causal information con-

tribution. The algorithm furthermore allows explicitly

^ where p (y | z) =

The expression

x
X

^ p(x z)p(y | x z). ^^
| ,

^ I (X  Y | z) measures the amount of X to Y if we intervene on Z, setting it to z (i.e., if we block the causal ow on all paths going through Z). Note that (3) is, in
information owing from

H V \ {E }). Fi( ally, each leaf is labeled n p ^ with log (e | o, p)/p(e | o) where we make sure that ^ variables selected in p are removed from o if needed).
to restrict the search set for explanatory variables (defaulting to This measures how much p erforming the interventions

^ p

changes the probability of the explanandum (given

the observations) with resp ect to the prior probability of the explanandum. Higher values indicate b etter explanations; negative values indicate that the probability of the explanandum actually decreases with the prop osed explanation. Using the information ow criterion brings us two advantages over standard (conditional) mutual information: rst, we automatically only consider variables that can causally inuence the explanandum. Second, when selecting the through

essence, similar to (1). For faithful probability distributions, paths

^ I (X  Y | z) = 0 if and only if all (if any) from X to Y go through Z in X
regardless of

directed the cor-

resp onding causal graph.

For binary variables, if

Y

is a deterministic function of

Z,

then

^ I (X  Y | z) = 1.
In our application, we use the causal information ow to decide which variable should b e added to the tree b eing built. This is shown in Algorithm 2. At line 2, we use the

ith

variable on a tree branch, we

do -conditioning

on the already build path

take into account the previously selected variables 1

p, Y

and we allow inputting additional observed vari-

i-1

causally, as they enter the conditioning

ables

O

in an additional conditioning set. We replace

set of variables that have b een intervened on. In practice, computing a causal information ow of the typ e

from (3) with the state of the explanandum

e,

sup-

press the corresp onding summation and divide by the

I (X  e | o, p) ^

at line 2 of Algorithm 2 requires


Algorithm 2 Causal Explanation Tree
1:
function Input:

T=

Output:

H: O=o: E=e: p: ^ : T:

CausalExplTree(H, o, e, p; ) ^

but only

20%

when given the drug.

Thus, b oth the

absence of drug and b eing a male can explain a go o d recovery rate.

set of explanatory variables observation set explanandum path of interventions stopping criterion a causal explanation tree

Sex f 0.5 m 0.5

2: X  arg maxX H I (X  e | o, p) ^ 3: if I (X   e | o, p) <  then return  ^


Recovery f , d f , ¬d m, d m, ¬d recovery 0.2 0.3 0.6 0.7 death 0.8 0.7 0.4 0.3
Figure 1: The

Drug f m yes (d) 0.25 0.75 no (¬d) 0.75 0.25

4: T  new tree with root X  5: for each x  domain(X  ) do ^ 6: T  CausalExplTree(H \ {X  }, o, e, p  {x}) ^ 7: add a branch x to T with subtree T and ` ´ ^^ 8: assign it the contribution log p(e | o, p, x)/p(e | o) 9: end for 10: return T

Drug network.

Sex
female male
-0.710 0.474

Sex
female male
0.306 0.694

Drug
yes no

Drug
yes no
0.415 0.637 0.056

Drug
yes no
0.250

Drug
yes
0.500

no
0.194

to know the distributions p(X | o, p), p(E | o, p), and ^ ^ ^^ p(E | o, X , p) (i.e., p(E | o, x, p) for all x  X ). Addi^^ tionally, p(e | o) is needed to lab el the leaves, but as it is only dep endent on e and o, we compute it only once. We can further avoid unnecessary computations by using the graphical reachability criterion from a candidate no de

-1.170

-0.585

(a) Causal explanation tree

(b) Explanation tree
rec.)

X

to

E , blocking paths going through O  P.

(c) MPE: p(Sex = m, Drug = yes | Recovery = (d) BF: BF(Sex = m ) = 2.27 BF(Sex = m, Drug = no) = 1.68 BF(Drug = yes ) = 1.25
Figure 2:

= 0.5

The inference steps were implemented using the factor graph message-passing algorithm (Frey et al., 2001). The complexity of this algorithm, in terms of numb er of calls to an inference engine p er no de in the constructed tree, is planatory

Drug: explain Recovery = recovery.
Sex = m
All aplargely acHowever, ET selects

In Figure 2, we try to explain a recovery. proaches correctly realize that counts for the recovery.

O(nd), variables |H|

where and

n

is the numb er of ex-

d

is the average domain

m  Drug = yes

Sex =

as the b est explanation according to

size of the variables, e.g., 2 for binary variables. For comparison, Flores's (2005) approach is

the leaves' lab els, just like MPE. This contradicts the natural idea of explanation, since the drug has a negative impact on the recovery. CET lab els the leaves more sensibly: branches where the drug was not given have a higher rank. Moreover, the branches where

O(n2 d2 ).

5

EXPERIMENTS
=0
to Most Probable Explanation (MPE),

We compare causal explanation trees (CET) with parameter Bayes' factor (BF) following Yuan and Lu (2007), and standard (noncausal) explanation trees (ET) with parameters

=f

Sex

have a negative lab el, indicating that they actu-

ally decrease the probability of recovery. Although the rst two BF explanations are sensible, the third one is mistakenly selects

Drug = yes

as an explanation.

 = 0.02

and

 = 0.

on three simple networks

3 to compare the relevance of

We test the approach

Academe
some

(Figure 3).

This network depicts the re-

lationships b etween various marks given to students following a course. The

explanations. A more extended version of these exp eriments and comments can b e found in Nielsen (2007).

Drug

(Figure 1).

This network comes from Pearl

Final mark is determined by Other outside factors and an intermediate mark (T.P. mark ), which is in turn determined by the student's abilities in Theory and Practice as well as Extra
curricular activities in this tested sub ject. In Figure 4, the explanandum was set to

(2000, chap. 6). It represents the outcome of an exp eriment designed to check the eciency of a new drug on male and female patients. The males have a natural recovery rate of it to

60%.

Similarly,

70%; taking the drug decreases 30% of females recover naturally,

= fail ;

Final mark
have b een

i.e., we want to explain why a student failed

the course.

T.P. mark

and

Global mark

3 The conditional probability tables have been omitted in the gures of the two larger BNs due to lack of space and can be found at http://www.zurich.ibm.com/~uln/ causalexpl/.

excluded from the p ossible explanatory variables in the two tree algorithms as they are mo deling artifacts. ET tells us that

We could have exp ected

Theory = bad is the best explanation. Practice to also be part of


Theory
{good, average, bad}

Practice
{good, average, bad}

Visit to Asia
{yes, no}

Tuberculosis
{yes, no}

Lung Cancer
{yes, no}

T .P. mark
{pass, fail}

Extra
{yes, no}

Smoking
{yes, no}

Global mark
{pass, fail}

Other
{positive, negative}

TborCa
{yes, no}

Final mark

Bronchitis
{normal, abnormal}

Figure 3: The

Academe network.
when it is

{pass, fail}

X - Ray

Dyspnea
{yes, no}

{yes, no}

Figure 5: The

Asia network.

theory
good average bad
-1.135 -0.242

sis,

but ET uses

practice
good average
-1.360 -2.984

practice
good
-1.728

0.911

theory
good average bad
0.182

are not

absent : CET selects, justiably, TuberculoDyspnea and then Bronchitis, which causes of X-ray and cannot explain it, esp e-

cially not when we know that no lung cancer is present.
bad average bad

MPE surprisingly excludes a visit to Asia, when this is exp ected to make abnormal X-rays more likely through

other

0.911

other
plus
-2.433

0.613

0.911

other

0.254

0.564

Tuberculosis.

BF, while still providing very concise ex-

plus minus
-1.848 -0.257

minus
-0.380 0.111

plus minus
0.071

planations, is stuck on the middle no de like ET, ignore the imp ortant

TbOrCa Tuberculosis.
Lung cancer
Absent Present
0.511

and,

(a) Causal explanation tree

(b) Explanation tree

(c) MPE: p(Theory = bad, T.P. mark = fail, Global mark = fail, Extra = no, Other = positive, Practice = good | Final mark = fail) = 0.208 (d) BF: BF(Theory = bad ) = 3.02 BF(Theory = bad, Extra = no ) = 2.78 BF(Theory = bad, Other = negative ) = 2.53
Figure 4:

Lung cancer
Absent Present
-0.886

Dyspnea

0.489

Absent Present
0.243

Academe: explain Final mark = fail.
Theory.

Tuberculosis

3.151

0.269

Bronchitis
Absent Present
0.058 0.185

Absent Present

alternate explanations, as it inuences the nal mark

-1.141

3.151

Practice to explain the nal failure when Theory is average or good. MPE includes Practice = good
cluding in its long list of states, which do es not seem intuitively likely. BF provides more concise explanations, but, like ET, ignores

very similarly to

This is what CET do es, in-

(a) Causal explanation tree

(b) Explanation tree

Practice

altogether, although a

bad practice can account for failure equally well.

Asia

(c) MPE: p(Bronchitis = yes, Dyspnea = yes, Lung cancer = yes, TbOrCa = yes, Smoker = yes, Tuberculosis = yes, Visit to Asia = no | X-ray = abnormal) = 0.24 (d) BF: BF(TbOrCa = yes ) = 19.60 BF(TbOrCa = yes, Visit to Asia = no ) = 19.21 BF(TbOrCa = yes, Lung cancer = yes ) = 16.42
Figure 6:

(Figure

5).

This

network

(Lauritzen

and

Spiegelhalter, 1988) mo dels the relationships b etween two indicators, X-ray results and dyspnea, of severe diseases for a p erson. Tub erculosis (more likely if a visit to Asia o ccurred) and lung cancer (more likely when the p erson smokes) b oth increase abnormal Xray results and dyspnea; bronchitis also causes increased dyspnea.

Asia: explain X-ray = abnormal.

In Figure 7, we try to explain the presence of dyspnea for a smoker. While CET is still able to select

Smoker

as explanatory variable, ET can only add this observation to the explanandum and thus cannot select it. Instead, the b est explanation according to ET is normal X-rays, which do es not seem very likely. Although ET do es select the imp ortant

TbOrCa

is a mo deling artifact, ex-

cluded from the the analysis in the two tree algorithms. In Figure 6, we try to explain abnormal X-ray results. Whereas b oth tree algorithms select a

Lung cancer

culosis,

Lung cancer

and

Tuber-

it ignores the largest factor according to BF

as

and CET, namely

Bronchitis.

Here to o, CET selects

the b est explanation, they dier on how to explain

the more intuitively interpretable explanations.


Smoking
Smoker
0.343

References
Ay, N. & Polani, D. (2006). Information ows in causal networks. Technical report, Max Planck Institute for Mathematics in the Sciences.
X-ray result
Present
0.895

Bronchitis
Absent
-1.396

Normal Abnormal
0.201 0.799

Cha jewska, U. & Halpern, J. Y. (1997). Dening explanation in probabilistic systems. In: UAI-97, pages 6271. Morgan Kaufmann. Chan, H. & Darwiche, A. (2006). On the robustness of most probable explanations. In: UAI-06. de Campos, L. M., Gémez, J. A., & Moral, S. (2001). Simplifying explanations in Bayesian belief networks.
International Journal of Uncertainty, Know ledge-Based Systems 9

Lung cancer
Absent
-2.037

Lung cancer
Absent
0.878

Lung cancer
Absent
0.055

Present

Present

Present

Tuberculosis
Absent
-2.124

0.683

Tuberculosis

1.046

Tuberculosis

0.145

Present
0.683

Absent Present
0.876 1.046

Absent Present
0.042 0.014

.

Fuzziness

and

Flores, M. J. (2005).
tion

Bayesian networks Inference:

Ad-

vanced algorithms for triangulation and partial abduc-

(a) Causal explanation tree

(b) Explanation tree

. PhD thesis, Universidad De Castilla-La Mancha.

(c) MPE: p(Bronchitis = yes, X-ray = normal, Lung cancer = no, TbOrCa = no, Smoker = yes, Tuberculosis = no, Visit to Asia = no | Dyspnea = yes) = 0.46 (d) BF: BF(Bronchitis = yes ) = 6.14 BF(Bronchitis = yes, Visit to Asia = no ) = 5.89 BF(Bronchitis = yes, Tuberculosis = no ) = 5.84
Figure 7:

Frey, B. J., Kschischang, F. R., & Loeliger, H.-A. (2001). Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47. Halpern, J. Y. & Pearl, J. (2005). Causes and explanations: A structural-model approach. Part II: Explanations. The British Journal for the Philosophy of Science. Henrion, M. & Druzdzel, M. J. (1990). Qualitative propagation and scenario-based scheme for exploiting probabilistic reasoning. In: UAI, pages 1732. Jereys, H. (1961). sity Press.
Theory of Probability

Asia: explain Dyspnea = yes |Smoker = yes.

. Oxford Univer-

6

CONCLUSION
Explanations

Jensen, F. V. (2001). Graphs. Springer.

Bayesian Networks and Decision

We have presented an approach to explanation in causal BNs, causal explanation trees. are presented as a tree, compactly representing several explanations and making it more readable than a (p ossibly long) list. Assuming that the BN is causal allows us to use the causal information ow criterion to build the tree. This leads to more sensible explanations, in that we only explain a given state with variables that can causally inuence it. The approach makes an explicit distinction b etween observation and explanandum. This lets the user input all available knowledge ab out the network as observation, while still fo cusing on explaining one of them and allowing the observed variables to b e selected as part of a go o d explanation. The algorithm lab els the leaves so as to reect how a prop osed explanation changes the probability of the explanandum, making the tree easy to interpret. Causal explanation trees, unlike other techniques, do not condition on the explanandum to maximize the probability of the explanatory variables fo cus on

Lacave, C. & Diez, F. J. (2002). A review of explanation methods of Bayesian networks. Know ledge Engineering Review, 17(2):107127. Lauritzen, S. L. & Spiegelhalter, D. J. (1988). Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society, Series B, 50:157224. Nielsen, U. H. (2007). On causal explanations in Bayesian networks. Master's thesis, IT University of Copenhagen. Park, J. D. (2002). Map complexity results and approximation methods. In: UAI-02, pages 388396. Pearl, J. (1988).
tems: Probabilistic Reasoning in Intel ligent Sys-

mann.

Networks of Plausible Inference

. Morgan Kauf-

Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82(4):669709. Pearl, J. (2000). Causality: Models, Reasoning, ence. Cambridge University Press.
and Infer-

Shimony, S. E. (1991). Explanation, irrelevance, and statistical independence. In: AAAI, pages 482487. Spirtes, P., Glymour, C., & Scheines, R. (2001). Causation, Prediction, and Search, Second Edition. The MIT Press. Yuan, C. & Lu, T.-C. (2007). Finding explanations in bayesian networks. In: The 18th International Workshop on Principles of Diagnosis.

p(h | e),

but

p(e | h)

instead by means of the causal inThis allows it to compare favorably

formation ow.

to MPE, Bayes' factor, and Flores's (2005) noncausal explanation trees on the tested networks b ecause the returned explanations are intuitive, and those with a p ositive lab el in the tree ensure that they increase the probability of the explanandum.