Sensitivity analysis for finite Markov chains in discrete time

Gert de Cooman, Filip Hermans and Erik Quaeghebeur SYSTeMS Research Group Ghent University, Belgium {gert.decooman,filip.hermans,erik.quaeghebeur}@UGent.be

Abstract
When the initial and transition probabilities of a finite Markov chain in discrete time are not well known, we should perform a sensitivity analysis. This is done by considering as basic uncertainty models the so-called credal sets that these probabilities are known or believed to belong to, and by allowing the probabilities to vary over such sets. This leads to the definition of an imprecise Markov chain. We show that the time evolution of such a system can be studied very efficiently using so-called lower and upper expectations. We also study how the inferred credal set about the state at time n evolves as n  : under quite unrestrictive conditions, it converges to a uniquely invariant credal set, regardless of the credal set given for the initial state. This leads to a non-trivial generalisation of the classical Perron­Frobenius Theorem to imprecise Markov chains.

by Satia and Lave's (1973) original work--have paid attention to this issue of `imprecision' in Markov chains. Early work on the more mathematical aspects of modelling such `imprecision' in Markov chains was done by Kozine and Utkin (2002). Armed with linear programming techniques, these authors also performed an experimental study of the limit behaviour of Markov chains with uncertain transition probabilities. More recently, Skulj (2006, 2007) has begun a formal study of the time evolution and limit behaviour of such systems. All these approaches use sets of probabilities to deal with the imprecision in the transition probabilities. When these probabilities are not well known, they are assumed to belong to certain sets, and robustness analyses are performed by allowing the transition probabilities to vary over such sets. As we shall see, this approach leads to a number of computational difficulties, which we show can be overcome by tackling the same problem from another angle, using lower and upper expectations, rather than sets of probabilities. In the rest of this Introduction, we give an overview of the theory of classical Markov chains and formulate the classical Perron­Frobenius theorem. Then, in Sections 2 and 3, we introduce imprecise Markov chains and generalise many aspects of the classical theory. In Section 4, we generalize the Perron­Frobenius theorem. We discuss a number of theoretical and numerical examples in Section 5, and we give perspectives for further research in the Conclusions. 1.1 Analysis of classical Markov chains

1 Setting the stage
One convenient way to model uncertain dynamical systems is to describe them as Markov chains. These have been studied in great detail, and their properties are well known. However, in many practical situations, it remains a challenge to accurately identify the transition probabilities in the Markov chain: the available information about physical systems is often imprecise and uncertain. Describing a real-life dynamical system as a Markov chain will therefore often involve unwarranted precision, and may lead to conclusions not supported by the available information. For this reason, it seems quite useful to perform probabilistic robustness studies, or sensitivity analyses, for Markov chains. This is especially relevant in decision-making applications. Many researchers in Markov Chain Decision Making (White and Eldeib, 1994; Harmanec, 2002; Nilim and El Ghaoui, 2005; Itoh and Nakamura, 2007)--inspired

Consider a finite Markov chain in discrete time, where at times n = 1, 2, 3, . . . , N , N  N the state X (n) of a system can assume any value in a finite set X . Here N denotes the set of non-zero natural numbers, and N is the time horizon. The time evolution of such a system can be modelled as if it traversed a so-called event tree (Shafer, 1996). An example of such a tree for X = {a, b} and N = 3 is given in Figure 1. The situations, or nodes, of the tree have the form x1:k := (x1 , . . . , xk )  X k , k = 0, 1, . . . , N . For k = 0 there is some abuse of notation as we let X 0 := { }, where is the


m1 X1 X
2

a (a, a) (a, b) (b, a)

b (b, b) (a, a)

a q1 (·|a) (a, b) q2 (·|a) q2 (·|b)
(a, b, b) (b, a, a)

b q1 (·|b) (b, a) q2 (·|a)
(b, a, b) (b, b, a)

(b, b) q2 (·|b)
(b, b, b)

(a, a, a)

(a, a, b)

(a, b, a)

(a, b, b)

(b, a, a)

(b, a, b)

(b, b, a)

(b, b, b)

(a, a, a)

(a, a, b)

(a, b, a)

Figure 1: The event tree for the time evolution of system that can be in two states, a and b, and can change state at time instants n = 1, 2. Also depicted are the respective cuts X 1 and X 2 of where the states at times 1 and 2 are revealed. so-called initial situation, or root of the tree. In the cuts X n of , the value of the state X (n) at time n is revealed. In a classical analysis, it is generally assumed that we have: (i) a probability distribution over the initial state X (1), in the form of a probability mass function m1 on X ; and (ii) for each situation x1:n that the system can be in at time n, a probability distribution over the next state X (n + 1), in the form of a probability mass function q(·|x1:n ) on X . This means that in each non-terminal situation1 x1:n of the event tree, we have a local probability model telling us about the probabilities of each of its child nodes. This turns the event tree into a so-called probability tree; see Shafer (1996, Chapter 3) and Kemeny and Snell (1976, Section 1.9). The probability tree for a Markov chain is special, because the Markov Condition states that when the system jumps from state X (n) = xn to a new state X (n + 1), where the system goes to will only depend on the state X (n) = xn the system was in at time n, and not on its states X (k) = xk at previous times k = 1, 2, . . . , n - 1. In other words: q(·|x1:n ) = qn (·|xn ), x1:n  X n , n = 1, . . . , N - 1, (1)

Figure 2: The probability tree for the time evolution of a Markov chain that can be in two states, a and b, and can change state at each time instant n = 1, 2. on X N that may depend on the values of the states X (1) to X (N ). Let us indicate briefly how this is done, also taking into account the simplifications due to the Markov Condition (1). A prominent part is played by the so-called transition operators Tn and Tn . Consider the linear space L (X ) of all real-valued maps on X . Then the linear operator Tn : L (X )  L (X ) is defined by Tn h(xn ) := En (h|xn ) =  h(xn+1 )qn (xn+1 |xn )
xn+1 X

(2)

for all real-valued maps h on X . In other words, Tn h is the real-valued map on X whose value Tn h(xn ) in xn  X is the conditional expectation of the random variable h(X (n + 1)), given that the system is in state xn at time n. More generally, we also consider the linear maps Tn from L (X n+1 ) to L (X n ), defined by Tn f (x1:n ) :=Tn f (x1:n , ·)(xn ) = En ( f (x1:n , ·)|xn )
xn+1 X

=  f (x1:n , xn+1 )qn (xn+1 |xn )

(3)

where qn (·|xn ) is some probability mass function on X . The Markov chain may be non-stationary, as the transition probabilities are allowed to depend explicitly on the time n. Figure 2 gives an example of a probability tree for a Markov chain with X = {a, b} and N = 3. With the local probability mass functions m1 and qn (·|xn ) we associate the linear real-valued expectation functionals E1 and En (·|xn ), given, for all real-valued maps h on X , by E1 (h) :=  h(x1 )m1 (x1 ),
x1 X

for all x1:n in X n and all real-valued maps f on X n+1 . We begin with the expectation E ( f |x1:n ) for n = N - 1: E ( f |x1:N -1 ) =  f (x1:N -1 , xN )q(xN |x1:N -1 )
xN X xN X

=  f (x1:N -1 , xN )qN -1 (xN |xN -1 ) =TN -1 f (x1:N -1 ), where the second inequality follows from the Markov Condition (1), and the third from Eq. (3). Similar arguments for n = N - 2 and the Law of Iterated Expectations yield: E ( f |x1:N -2 ) = E (E ( f (x1:N -2 , ·, ·)|x1:N -2 , ·)|x1:N -2 ) = TN -2 TN -1 f (x1:N -2 ). Repeating this argument leads to the backwards recursion formulae (for n = 1, . . . , N - 1) E ( f |x1:n ) = Tn Tn+1 . . . TN -1 f (x1:n ) E ( f ) := E ( f | ) = E1 (T1 T2 . . . TN -1 f ). (4) (5)

En (h|xn ) :=  h(xn+1 )qn (xn+1 |xn ).
xn+1 X

In any probability tree, probabilities and expectations can be calculated very efficiently using backwards recursion. Suppose that in situation x1:n , we want to calculate the conditional expectation E ( f |x1:n ) of some real-valued function f
1A

non-terminal situation is a node of the tree that is not a leaf.


In these formulae, f is any real-valued function on X N . If we let f be the indicator functions I{x1:N } of the singletons {x1:N }, these formulae allow us for instance to calculate the joint probability mass function p(x1:N ) = E (I{x1:N } ) for all the variables X (1), . . . , X (N ). We can also use them to find the conditional mass functions p(xn+1:N |xn ) = p(xn+1:N |x1:n ) := E (I{x1:N } |x1:n ). 1.2 The Perron­Frobenius Theorem for classical Markov chains We are especially interested in the case of a stationary Markov chain, and in the (marginal) expectation En (h) of a real-valued function h (on X ) that depends only on the state X (n) at time n. Here, Eq. (5) becomes En (h) := E1 (Tn-1 h), (6)

than with single probability measures. Let X denote the set of all probability mass functions on X , an (|X | - 1)dimensional unit mimplex in the |X |-dimensional linear s i  X : (x  X )(m(x)  1 ) s a crespace RX , then i2 m dal set, but  X : (x  X )(m(x)  1 ) s not. 2 There is a growing body of literature on this interesting and fairly new area of imprecise probabilities, starting with the publication of Walley's (1991) seminal work. We refer to the literature (Walley, 1991, 1996; Weichselberger, 2001; De Cooman and Miranda, 2007) for more details and discussion. Specifying a closed convex set P of probability mass functions p on a finite set Y is equivalent (Walley, 1991, Section 3.4.1) to specifying its lower and upper expectation (functionals) E P : L (Y )  R and E P : L (Y )  R, defined by E E E P (g) := min p (g) : p  P f E p (g) : p  P P (g) := max or real-valued maps g on Y , where E p (g) = yY g(y) p(y) is the expectation of g associated with the probability mass function p. In a sensitivity analysis, such functionals are quite useful, because they give tight lower and upper bounds on the expectation of any real-valued map. Since the functionals E P and E P are conjugate in the sense that E P (g) = -E P (-g) for all real-valued maps g on Y , one is completely determined if the other is known. Below, we concentrate on upper expectations. What is the upshot of all this for the Markov chain problem we are considering here? First of all, in the initial situation , corresponding to time n = 0, rather than a single initial probability mass function m1 , we now have a local credal set M1 of candidate mass functions m1 for the state X (1) that the system will be in at time k = 1. We denote by E 1 the upper expectation associated with M1 : f  E 1 (h) := max h(x)m1 (x) : m1  M1
xX

where T := T1 = T2 = · · · = TN -1 , and where we denote by Tk the k-fold composition of T with itself; in particular, T0 is the identity operator on L (X ). If we let h = I{xn } , this allows us to find the probability mass function mn (xn ) = En (I{xn } ) for the state X (n). Under some restrictions on the transition operator T, the classical Perron­Frobenius Theorem then tells us that, as n and the time horizon N recede to infinity, this probability mass function converges to some limit, independently of the initial probability mass function m1 ; see Kemeny and Snell (1976, Theorem 4.1.6) and Luenberger (1979, Chapter 6). In terms of expectation functionals and transition operators: Theorem 1 (Classical Perron­Frobenius Theorem, Expectation Form). Consider a stationary Markov chain with finite state set X and transition operator T. Suppose that T is regular, meaning that there is some k > 0 such that min Tk I{xk } > 0 for all xk in X . Then for every initial expectation operator E1 , the expectation operator En = E1  Tn-1 for the state at time n converges point-wise to the same limit expectation operator E : for all h  L (X ),
n

lim En (h) = lim E1 (Tn-1 h) = E (h).
n

Moreover, the limit expectation E is the only T-invariant expectation on L (X ), in the sense that E = E  T.

2 Towards imprecise Markov chains
The treatment above rests on the assumption that the initial probabilities and the transition probabilities are precisely known. If such is not the case, then it seems necessary to perform some kind of sensitivity analysis, in order to find out to what extent any conclusions we might reach using such a treatment, depend on the actual values of these probabilities. A very general way of performing a sensitivity analysis for probabilities involves calculations with closed convex sets of probability mass functions, also called credal sets, rather

or all h  L (X ). Also, in any situation x1:n  X n , corresponding to time n = 1, 2, . . . , N - 1, instead of a single transition mass function qn (·|xn ), we now have a local credal set Qn (·|xn ) of candidate conditional mass functions qn (·|xn ) for the state X (n + 1) that the system will be in at time n + 1. We denote by E n (·|xn ) the upper expectation associated with Qn (·|xn ), i.e., for all h  L (X ): .  E n (h|xn ) := max h(x)q(x) : q  Qn (·|xn ) (7)
xX

We call the resulting model an imprecise Markov chain. Figure 3 gives an example of an imprecise Markov chain probability tree. A classical, or precise, Markov chain is an imprecise one with credal sets that are singletons.


M1 a Q1 (·|a) (a, a) Q2 (·|a)
(a, a, a) (a, a, b) (a, b, a)

b Q1 (·|b) (a, b) Q2 (·|b)
(a, b, b) (b, a, a)

points, rather than with the infinite sets themselves. But even then, the computational complexity of this approach will generally be exponential in the number of time steps.
(b, b) Q2 (·|b)
(b, b, b)

(b, a) Q2 (·|a)
(b, a, b)

(b, b, a)

Figure 3: The tree for the time evolution of an imprecise Markov chain that can be in two states, a and b, and can change state at each time instant n = 1, 2. How, then, is a sensitivity analysis typically performed (Kozine and Utkin, 2002; Skulj, 2006, 2007) for such an imprecise Markov chain? We choose, in each non-terminal situation x1:k of the above-mentioned event tree, a local transition probability mass q(·|x1:k ) in the set of possible candidates Qk (·|xk ).2 For k = 0, we get the initial situation , where we choose some element m1 in the set of possible candidates M1 . We then obtain a so-called compatible probability tree, for which we may calculate all (conditional) expectations and probability mass functions:
N -1

However, we prove in Section 3 that the upper expectations E and E (·|x1:n ) associated with the closed convex sets of (conditional) probability mass functions for the compatible probability trees of an imprecise Markov chain can be calculated in the same way as the expectations E and E (·|x1:n ) in a precise one: using counterparts of the backwards recursion formulae (4)­(6). Because of this, making inferences about the mass function of the state at time n, i.e., finding the upper envelope E n of the En given in Eq. (6) now has a complexity that is linear, rather than exponential, in the number of time steps n. This is our first contribution. Our second contribution in this paper is a Perron­Frobenius Theorem for a special class of so-called regular stationary imprecise Markov chains: in Section 4 we prove a generalisation of Theorem 1, which tells us that under the fairly weak condition of regularity, the upper expectation operators E n converge to limits that do not depend on the initial upper expectation operators E 1 .

3 Sensitivity analysis of imprecise Markov chains
We are now ready to take our most important step: the backwards recursion formulae for the conditional and joint upper expectations in an imprecise Markov chain. We first define upper transition operators Tn and Tn . The operator Tn : L (X )  L (X ) is defined by Tn h(xn ) := E n (h|xn ) (10)

E ( f |x1:n ) =  f (x1:n , xn+1:N )  q(xk+1 |x1:k ),
xn+1:N X N -n k=n N -1

(8)

E ( f ) =  f (x1:N )m1 (x1 )  q(xk+1 |x1:k ),
x1:N X N k=1

(9)

for n = 1, . . . , N - 1, and for all real-valued maps f on X N . As we have just come to realise, the probability trees that are compatible with an imprecise Markov chain are no longer necessarily (precise) Markov chains themselves. It is still possible to calculate the E ( f |x1:n ) and E ( f ) in Eqs. (8) and (9) using backwards recursion (Shafer, 1996, Chapter 3), but the formulae for doing so will be more complicated than the ones for precise Markov chains given by Eqs. (4) and (5). If we repeat this for every other choice of the m1 in M1 and the q(·|x1:k ) in Qk (·|xk ), we end up with an infinity of compatible probability trees, for which the associated (conditional) expectations and probability mass functions turn out to be closed convex sets. We denote their corresponding upper expectation functionals on L (X N ) by E (·|x1:n ) and E . These upper expectations, and the conjugate lower expectations, are the final aim of our sensitivity analysis. The procedure we have just described is computationally very complex. When the closed convex sets M1 and Qk (·|x) each have a finite number of extreme points (are polytopes), we can limit ourselves to working with these sets of extreme
local transition probability masses themselves depend on the situation x1:k they are attached to, but the sets Qk (·|xk ) they are chosen from only depend on the last state xk .
2 These

for all real-valued maps h on X , and all xn in X . In other words, Tn h is the real-valued map on X , whose value Tn h(xn ) in xn  X is the conditional upper expectation of the random variable h(X (n + 1)), given that the system is in state xn at time n. More generally, we also consider the maps Tn from L (X n+1 ) to L (X n ), defined by Tn f (x1:n ) := Tn f (x1:n , ·)(xn ) = E n ( f (x1:n , ·)|xn ) (11)

for all x1:n in X n and all real-valued maps f on X n+1 . Of course, we can also consider lower expectations and lower transition operators, which are related to the upper expectations and upper transition operators by conjugacy. The upper expectations E (·|x1:n ) and E on L (X N ) can be calculated very easily by backwards recursion. Theorem 2 (Concatenation Formula). For any x1:n in X n , n = 1, . . . , N - 1, and for any real-valued map f on X N : E ( f |x1:n ) = Tn Tn+1 . . . TN -1 f (x1:n ) E ( f ) = E 1 (T1 T2 . . . TN -1 f ). (12) (13)


Call, for any non-empty subset I of {1 . . . , N }, a real-valued map f on X N I -measurable if f (x1:N ) = f (z1:N ) for all x1:N and z1:N in X N such that xk = zk for all k  I . In other words, an I -measurable f only depends on the states X (k) at times k  I . As an example, an {n}-measurable map h only depends on the state X (n) at time n, and we identify it with a map on X (but remember that it acts on states at time n). The following proposition tells us that all upper conditional expectations satisfy a Markov condition. Proposition 3 (Markov Condition). Consider an imprecise Markov chain with finite state set X and time horizon N . Fix n  {1, . . . , N - 1}. Let x1:n-1 and z1:n-1 be arbitrary elements of X n-1 , and let xn  X . Let f be any {n, n + 1, . . . , N }-measurable real-valued map on X N . Then it holds that E ( f |x1:n-1 , xn ) = E ( f |z1:n-1 , xn ), and therefore we may write E ( f |x1:n-1 , xn ) = E |n ( f |xn ). The index `|n' is meant to make clear that we are considering an expectation conditional on X (n) = xn . If we apply the joint upper expectation E to maps h that only depend on the state X (n) at time n, we get the marginal upper expectation E n (h) := E (h), which is a model for the uncertainty about the state X (n) at time n. More generally, taking into account Proposition 3, we use the notation E n| (h|x ) := E | (h|x ) for the upper expectation of h(X (n)), conditional on X ( ) = x with 1  < n. With notations established in Eq. (7), E n+1|n (h|xn ) = E n (h|xn ) = Tn h(xn ). Such expectations can be found using simpler recursion formulae than Eqs. (12) and (13), as they are based on the simpler upper transition operators Tk . Corollary 4. For any real-valued map h on X , and for any 1  < n  N and all x in X : E n| (h|x ) = T
T +1 . . . Tn-1 h(x ),

and for all 1  m  N and all x1:m  X m that
m-1

E ({x1:m }) = E 1 ({x1 })  Tk I{xk+1 } (xk ).
k=1

(16)

Analogous expressions hold for the lower expectations.

4 Convergence for imprecise Markov chains
Let us now consider a stationary imprecise Markov chain with infinite horizon, meaning that T1 = T2 = · · · = Tn = . . . =: T. Analogous to the precise case, we define the regularity condition for the upper transition operator T. It will turn out to be a sufficient condition for convergence. Definition 6 (Regularity for upper transition operators). We call an upper transition operator T regular if there is some n  N such that min Tn I{y} > 0 for all y in X . We call an upper expectation E on L (X ) T-invariant whenever E  T = E , i.e., if E (Th) = E (h) for all h  L (X ). With this definition, we can formulate the upper expectation form of the Perron­Frobenius theorem. Theorem 7 (Perron­Frobenius Theorem, Upper Expectation Form). Consider a stationary imprecise Markov chain with finite state set X and an upper transition operator T that is regular. Then for every initial upper expectation E 1 , the upper expectation E n = E 1  Tn-1 for the state at time n converges point-wise to the same upper expectation E  :
n

lim E n (h) = lim E 1 (Tn-1 h) =: E  (h)
n

for all h in L (X ). Moreover, the limit upper expectation E  is the only T-invariant upper expectation on L (X ). The classical Perron­Frobenius Theorem (Theorem 1) is of course a special case of our Theorem 7. Skulj (2007) uses a different, credal set approach to prove a similar (but much weaker) result for imprecise Markov chains with regular lower transition operators T. He also proves a convergence result for conservative (too large) approximations of the E n , in the special case that T is regular but 2-alternating; see Section 5.3 for further details.

E n (h) = E 1 (T1 T2 . . . Tn-1 h).

(14)

This offers a reason for formulating our theory in terms of real-valued maps rather than events: suppose we want to calculate the upper probability E n (A) that the state X (n) at time n belongs to the set A. According to Eq. (14), E n (A) = E 1 (T1 . . . Tn-1 IA ), and even if Tn-1 can still be calculated using upper probabilities only, it will generally assume values other than 0 and 1, and therefore will not be the indicator of some event. Already after one step, i.e., in order to calculate Tn-2 Tn-1 IA , we need to leave the ambit of events, and turn to the more general real-valued maps; even if we only want to calculate upper probabilities after n steps. But for joint upper and lower probability mass functions, however, we can remain within the ambit of events: Proposition 5 (Chapman­Kolmogorov Equations). For an imprecise Markov chain, we have for all 1  n < m  N and all (xn , xn+1:m )  X m-n+1 that
m-1

5 Examples
In this section, we indicate how the above theory can be applied in a number of practical situations, where the upper expectations are of some special types, described in the literature on imprecise probabilities. We present concrete and explicit examples, as well as a number of simulations. 5.1 Contamination models

E |n ({xn+1:m }|xn ) =  Tk I{xk+1 } (xk ),
k=n

(15)

Suppose we consider a precise stationary Markov chain, with transition operator T. We contaminate it with a vacuous model, i.e., we take a convex mixture with the upper


transition operator IX max. This leads to the upper transition operator T, defined by Th = (1 -  )Th + IX  max h, (17) for all h  L (X ), where  is some constant in the open real interval (0, 1). The underlying idea is that we consider a specific convex neighbourhood of T. Since for all x in X , min TI{x} = (1 -  ) min TI{x} +  > 0, this upper transition operator is always regular, regardless of whether T is! We infer from Theorem 7 that, whatever the initial upper expectation operator E 1 is, the upper expectation operator E n for the state X (n) will always converge to the same E  . What is this E  is for given T and  ? For any n  1, Tn h = -1 (1 -  )n Tn h + IX  n=0 (1 -  )k max Tk h, and therefore k
n-1

1 .8 .6 .4 .2 0 1 5 10 15 20 n E n ({a}) n ({a}) E n ({a})

E n+1 (h) = (1 -  )n E 1 (Tn h) +  (1 -  )k max Tk h. (18)
k=0

Figure 4: The time evolution of (i) the upper and lower probability of finding the imprecise Markov chain of Example 5.3 in the state a (outer plot marks and connecting lines); and of (ii) the probability of finding the classical Markov chain of Example 5.3 in the state a (inner plot marks and connecting lines). The filled area denotes the hull of the evolution of this probability, under the contamination model of Example 5.3, for all possible initial mass functions. 5.2 Belief function models

If we now let n  , we see that the limit is indeed independent of the initial upper expectation E 1 : E  (h) =  (1 -  )k max Tk h.
k=0 

(19)

Example 5.1 (Contaminating a cycle). Consider for instance X = {a, b}, and let the precise Markov chain be the cycle with period 2, with transition operator T given by Th(a) = h(b) and Th(b) = h(a). Then T2n h = h and T2n+1 h = Th, and therefore max T2n h = max T2n+1 h = max h, whence E  (h) = max h. E xample 5.2 (Contaminating a random walk). Consider a random walk, where X = {a, b} and Th = IX h(a)+h(b) . 2 Then we find that h(a) + h(b) E . E  (h) =  max h + (1 -  ) 2 xample 5.3 (Another contamination model). To illustrate the convergence properties of an imprecise Markov chain, let us look at a simple numerical example. Again consider X = {a, b} and let the stationary imprecise Markov chain be defined by a, initial credal set n m M1 =  {a,b} : 0.6  m(a)  0.9 and a contamination model of the type (17), with  = 0.1, and for which the precise transition operator T is defined by the Markov q(a|a) q(b|a) = 0.15 0.85 . matrix T := q(a|b) q(b|b) 0.85 0.15 In Figure 4 we plot the evolution of E n ({a}) and E n ({a}), the upper and lower probability for finding the system in state a at time n, which can be calculated efficiently using Eq. (18). For comparison, we also plot the evolution of En ({a}), the probability for finding the system in state a at time n, for a (precise) Markov chain defined by probability mass functions that lie on the boundaries of the credal sets defining t c si ahe above imprecise Markovm hain; to wit, it=in0tial mass .9 0.1 function is given by M1 := 0 1 (a) m1 (b) . nd its Markov matrix is 0.135 0.865 Here E ({a}) = .865 0.135 E ({b}) = 0.5.

The contamination models we have just described are a special case of a more general and quite interesting class of models, based on Shafer's notion of a belief function Shafer (1976). We can consider a number of subsets Fj , j = 1, . . . , n of X , and a convex mixture of the vacuous upper expectations relative to these subsets:
n

E (h) =

j=1

 m(Fj ) max h(x), xFj

(20)

with m(Fj )  0 and n=1 m(Fj ) = 1. In Shafer's terminoj logy, the sets Fj are called focal elements, and the m(Fj )'s the basic probability assignment. We can now consider imprecise Markov chains where the local models, attached to the non-terminal situations in the tree, are of this type. The general backwards recursion formulae we have given in Section 3 can then be used in combination with the simple formulae of the type (20) for an efficient calculation of all conditional and joint upper and lower expectations in the tree. We leave this implicit however, and move on to another example, which is rather more popular in the literature. 5.3 Models with lower and upper mass functions

An intuitive way to introduce imprecise Markov chains (Kozine and Utkin, 2002; Campos et al., 2003; Skulj, 2006) goes by way of so-called probability intervals, studied in a paper by De Campos et al. (1994); see also Walley (1991, Section 4.6.1). It consists in specifying lower and upper bounds for mass functions. Let us explain how this is done in the specific context of Markov chains. For the initial mass function m1 , we specify a lower bound m1 : X  R, also called a lower mass function, and an upper bound m1 : X  R, called an upper mass Eunction. f


The credal set M1 attached to the initial situation, which corresponds to these bounds, is then given by M1 := {m  X : (x  X )(m1 (x)  m(x)  m1 (x))} . Similarly, in each non-terminal situation x1:k  X k , k = 1, . . . , N - 1 we have a credal set Qk (·|xk ) that is defined in terms of conditional lower and upper mass functions qk (·|xk ) and qk (·|xk ). Here, for instance, qk (xk+1 |xk ) gives a lower bound on the transition probability qk (xk+1 |xk ) to go from state X (k) = xk to state X (k + 1) = xk+1 at time k. Under some consistency conditions--see (De Campos et al., 1994) for more details--the upper expectation associated with M1 is then given in all subsets A of X by  , E 1 (A) = min m1 (z), 1 -  m1 (z)
zA zX \A

In Figure 5, we plot conservative approximations for the credal sets Mn corresponding to the upper expectation operators E n . Each approximation is based on the constraints that can be found by calculating E 1 (Tn-1 I{x} ) and E 1 (Tn-1 I{x} ) using the backwards recursion method, for x = a, b, c. The Mn evolve clockwise through the simplex, which is not all that surprising as the lower and upper Markov matrices are quite `close' to the precise cyclic Markov matrix    001 q(a|a) q(b|a) q(c|a) q(a|b) q(b|b) q(c|b) = 1 0 0 . q(a|c) q(b|c) q(c|c) 010 After a while, the Mn converge to a limit that is independent of the initial credal set M1 , as can be predicted from the regulanity of the upper transition operator. r

This E 1 is 2-alternating: E 1 (A  B) + E 1 (A  B)  E 1 (A) + E 1 (B) for all subsets A and B of X . This implies (Walley, 1991, Section 3.2.4) that for all h  L (X ) the upper expectation E 1 (h) can be found by Choquet integration:
m m ax h

=1

n=2

n=3

E 1 (h) = min h +

E 1 ({z  X : h(z)   }) d ,

(21)

in h

where the integral is a Riemann integral. Similar considerations for the 2-alternating E k (·|xk ) lead to formulae for the upper transition operators Tk : for all xk in X , (  22) Tk IA (xk ) = min qk (z|xk ), 1 -  qk (z|xk )
zA zX \A m m ax h

n=4

n=5

n=6

Tk h(xk ) = min h +

Tk I{zX : h(z) } (xk ) d .
in h

(23)
n = 11 n = 22 n = 1000

Example 5.4 (Close to a cycle). Consider a three-state stationary imprecise Markov model with X = {a, b, c} and with marginal and transition probabilities given by probability intervals. It follows from Eqs. (22) and (23) that the upper transition operator T is fully determined by the upper and lower Markov matrices:     q(a|a) q(b|a) q(c|a) 9 9 162 q(a|b) q(b|b) q(c|b) = 1 144 18 18  200 9 162 9 q(a|c) q(b|c) q(c|c)     q(a|a) q(b|a) q(c|a) 19 19 172 q(a|b) q(b|b) q(c|b) = 1 154 28 28  , 200 q(a|c) q(b|c) q(c|c) 19 172 19 where the numerical values are particular to this example. Similarly, the initial upper expectation E 1 is completely determined by the matrices M 1 and M 1 : M m M 1 := 1 (a) m1 (b) m1 (c) m . 1 (a) m1 (b) m1 (c) 1 :=

Figure 5: Evolution in the simplex {a,b,c} of the credal sets Mn for the near-cyclic transition operator from Example 5.4 for three different choices of the initial credal set M1 .

6 Conclusions
Regularity seems to be a reasonably weak condition on the upper transition operator T for a stationary imprecise Markov chain, but we have seen that it is strong enough to guarantee that the upper expectation for the state at time n converges to a uniquely T-invariant upper expectation E  , regardless of the initial upper expectation E 1 . Even when the regularity condition is not satisfied, it is not so hard to see that any upper transition operator T is still non-expansive under the supremum norm given for every h  L (X ) by h  := max |h|: Tg - Th   g - h ,


Moreover, the sequence Tn h  is bounded because Tn h   h  . It then follows from non-linear Perron­ Frobenius theory (Sine, 1990; Nussbaum et al., 1998) that the sequence Tn h has a periodic limit cycle. More precisely, there is a h  L (X ) such that T ph h = h i.e., h is a periodic point of T with (smallest) period ph , and such that Tn ph h  h (point-wise) as n  . It would be a very interesting topic for further research to study the nature of the periods and periodic points of upper transition operators. In our discussions, for instance in Section 3, we have consistently used the sensitivity analysis interpretation of imprecise probability models such as upper expectations. Upper and lower expectations can also be given another, so-called behavioural interpretation, in terms of some subjects dispositions towards accepting risky transactions. This is for instance Walley's (1991) preferred approach. The results we have derived here remain valid on that alternative interpretation, and the concatenation formulae (12) and (13) can then be shown to be special cases of so-called marginal extension procedure (Miranda and De Cooman, 2007), which provides the most conservative coherent (i.e., rational) inferences from the local predictive models Tk to general lower and upper expectations. In another paper (De Cooman and Hermans, 2008), we give more details about how to approach a process theory using imprecise probabilities on a behavioural interpretation.

Undergraduate Text in Mathematics. Springer-Verlag, New York, 1976. Reprint of the 1960 Edition. Igor O. Kozine and Lev V. Utkin. Interval-valued finite Markov chains. Reliable Computing, 8(2):97­113, April 2002. doi: doi:10.1023/A:1014745904458. David G. Luenberger. Introduction to Dynamic Systems. Theory, Models & Applications. John Wiley & Sons, New York, 1979. Enrique Miranda and Gert de Cooman. Marginal extension in the theory of coherent lower previsions. International Journal of Approximate Reasoning, 46(1):188­225, September 2007. doi: 10.1016/j.ijar.2006.12.009. A. Nilim and L. El Ghaoui. Robust control of Markov decision processes with uncertain transition matrices. Operations Research, 53:780­798, January 2005. R. D. Nussbaum, M. Scheutzow, and S. M. Verduyn Lunel. Periodic points of nonexpansive maps and nonlinear generalizations of the Perron­Frobenius theory. Selecta Mathematica, 4:141­181, January 1998. J. K. Satia and R. E. Lave. Markovian decision processes with uncertain transition probabilities. Operations Research, 21:728­740, January 1973. G. Shafer. A Mathematical Theory of Evidence. Princeton University Press, Princeton, NJ, 1976. G. Shafer. The Art of Causal Conjecture. The MIT Press, Cambridge, MA, 1996. R Sine. A nonlinear Perron­Frobenius theorem. Proceedings of the American Mathematical Society, 109(2):331­ 336, January 1990. D. Skulj. Finite discrete time Markov chains with interval probabilities. In J. Lawry, E. Miranda, A. Bugarin, S. Li, M. A. Gil, P. Grzegorzewski, and O. Hryniewicz, editors, Soft Methods for Integrated Uncertainty Modelling, pages 299­306. Springer, Berlin, 2006. D. Skulj. Regular finite Markov chains with interval probabilities. In G. de Cooman, J. Vejnarová, and M. Zaffalon, editors, ISIPTA '07 ­ Proceedings of the Fifth International Symposium on Imprecise Probability: Theories and Applications, pages 405­413. SIPTA, 2007. P. Walley. Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London, 1991. P. Walley. Measures of uncertainty in expert systems. Artificial Intelligence, 83(1):1­58, May 1996. K. Weichselberger. Elementare Grundbegriffe einer allgemeineren Wahrscheinlichkeitsrechnung. I. Intervallwahrscheinlichkeit als umfassendes Konzept. Physica-Verlag Heidelberg, June 2001. C. C. White and H. K. Eldeib. Markov decision-processes with imprecise transition-probabilities. Operations Research, 42:739­749, January 1994.

References
M. A. Campos, G. P. Dimuro, A. C. da Rocha Costa, and V. Kreinovich. Computing 2-step predictions for intervalvalued finite stationary Markov chains. Technical Report UTEP-CS-03-20a, University of Texas at El Paso, 2003. L. M. de Campos, J. F. Huete, and S. Moral. Probability intervals: a tool for uncertain reasoning. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2:167­196, 1994. Gert de Cooman and Filip Hermans. Imprecise probability trees: Bridging two theories of imprecise probability. Artificial Intelligence, 2008. doi: 10.1016/j.artint.2008.03. 001. In press. Gert de Cooman and Enrique Miranda. Symmetry of models versus models of symmetry. In W. L. Harper and G. R. Wheeler, editors, Probability and Inference: Essays in Honor of Henry E. Kyburg, Jr., pages 67­149. King's College Publications, 2007. D. Harmanec. Generalizing Markov decision processes to imprecise probabilities. Journal of Statistical Planning and Inference, 105:199­213, January 2002. H. Itoh and K. Nakamura. Partially observable Markov decision processes with imprecise parameters. Artificial Intelligence, 171:453­490, January 2007. John G. Kemeny and J. Laurie Snell. Finite Markov Chains.