Quantitative Stochastic Parity Games
Krishnendu Chatterjee
Abstract
We study perfect-information stochastic parity games. These are two-player nonterminating games which are played on a graph with turn-based probabilistic transitions. A play results in an infinite path and the conflicting goals of the two players are  -regular path properties, formalized as parity winning conditions. The qualitative solution of such a game amounts to computing the set of vertices from which a player has a strategy to win with probability 1 (or with positive probability). The quantitative solution amounts to computing the value of the game in every vertex, i.e., the highest probability with which a player can guarantee satisfaction of his own ob jective in a play that starts from the vertex. For the important special case of one-player stochastic parity games (parity Markov decision processes ) we give polynomial-time algorithms both for the qualitative and the quantitative solution. The running time of the qualitative solution is O(d · m3/2 ) for graphs with m edges and d priorities. The quantitative solution is based on a linearprogramming formulation. For the two-player case, we establish the existence of optimal pure memoryless strategies. This has several important ramifications. First, it implies that the values of the games are rational. This is in contrast to the concurrent stochastic parity games of de Alfaro et al.; there, values are in general algebraic numbers, optimal strategies do not exist, and -optimal strategies have to be mixed and with infinite memory. Second, the existence of optimal pure memoryless strategies together with the polynomial-time solution for one-player case implies that the quantitative two-player stochastic parity game problem is in NP  co-NP. This generalizes a result of Condon for stochastic games with reachability ob jectives. It also constitutes an exponential improvement over the best previous algorithm, which is based on a doubly exponential procedure of de Alfaro and Ma jumdar for concurrent stochastic parity games and provides only -approximations of the values.

Marcin Jurdzinski ´

Thomas A. Henzinger§

 This research was supp orted in part by the DARPA grant F33615-C-98-3614, the ONR grant N00014-02-1-0671, the NSF grants CCR-9988172 and CCR-0225610, the Polish KBN grant 4-7-T11C-042-25, and the EU RTN grant HPRN-CT-2002-00283.  EECS, UC Berkeley, email: c krish@eecs.berkeley.edu.  EECS, UC Berkeley, email: mju@eecs.berkeley.edu. § EECS, UC Berkeley, email: tah@eecs.berkeley.edu.

1 Intro duction Perfect-information stochastic games [5] are played on a graph (V , E ) by three players --Even, Odd, and Random-- who move a token from vertex to vertex so that an infinite path is formed. Given a partition (V , V , V ) of the set V of vertices, player Even moves if the token is in V , player Odd if the token is in V , and player Random if the token is in V . Player Random always moves by choosing a successor vertex uniformly at random. Thus his freedom of choice is limited and we refer to him as a "half player." Henceforth we refer to stochastic games as 2 1/2-player games, and to stochastic games with V =  or V =  as 1 1/2-player games, often called Markov decision processes [21, 13]. Stochastic games were introduced by Shapley [23] and have been studied extensively in several research communities; the book by Filar and Vrieze [13] provides a unified treatment of the theories of stochastic games and Markov decision processes [21]. The exact computational complexity of stochastic games is a fascinating open problem [22, 5]. In practice, the algorithms known so far are often prohibitively expensive. In their survey on the complexity of solving Markov decision processes, Littman et al. [18] call for investigating subclasses of the general model which admit efficient solution algorithms and are yet sufficiently expressive for modeling correctness or optimality ob jectives found in practice. Raghavan and Filar [22] list several classes of stochastic games whose structural properties bear a promise for efficient solution algorithms. They single out perfect-information stochastic games as one of the most promising, commenting that "despite its obvious appeal, this class has not been studied from an algorithmic point of view." Condon [5] has studied simple stochastic games and she has proved that they are log-space complete in the class of logarithmic-space randomized alternating Turing machines [5]. Condon's simple stochastic games can be seen as perfect-information [22, 13] (or "turn-based" [8]) recursive stochastic games [12] with 0-1 payoffs, or alternatively, as perfect-information stochastic games with reachability ob jectives [8]. We generalize the finitary reachability ob jectives of Condon's model by considering infinitary  -regular ob jectives [25], which are popular for the specification

Copyright © 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688

121


and verification of temporal properties of computational systems [14] and subsume reachability ob jectives as a very special case. On the other hand, our model of perfect-information stochastic parity games is a proper subclass of the general (or "concurrent") stochastic games [13, 19, 7, 9]. For brevity, throughout this paper we will omit the term "perfect information"; whenever we talk about stochastic parity games, we mean perfectinformation stochastic parity games. We will explicitly say "concurrent stochastic parity games" when referring to the more general model of de Alfaro, Henzinger, and Ma jumdar [7, 9]. An ob jective for a player in an infinite path-forming game is specified by the set of infinite paths that are winning for the player. A stochastic parity game is given by a game graph as above and a priority function p: V  N, which labels every vertex v  V with a priority p(v ). An infinite path is winning for the player Even if the smallest priority that occurs infinitely often is even; otherwise the path is winning for the player Odd. This way of specifying a winning ob jective is called the parity (Rabin chain) condition and it is a normal form for all  -regular conditions [25]. Two-player parity games have recently attracted considerable attention in the computer-aided verification community, because they can be used to solve model-checking and control problems for the µ-calculus [1] and its numerous modal and temporal sublogics [10, 25, 17, 7, 9, 14, 2]. Parity games are also a powerful theoretical tool for resolving difficult decidability and expressiveness questions in mathematical logic and automata theory [15, 10, 25, 28, 3, 1, 14]. The basic algorithmic problems for stochastic games are that of determining which of the players has a winning strategy, or more accurately, estimating the probability of winning that a player can ensure by following a certain strategy. Several qualitative winning criteria have been studied, such as almost-sure winning (i.e., winning with probability 1) or positive-probability winning [7, 4]. The quantitative solution [9] amounts to computing the value of a game, i.e., the highest probability of winning that a player can ensure against an arbitrary strategy of the opponent. In Section 2 we define stochastic parity games and the algorithmic problems that we study in this paper. In general, mixed strategies have to be considered when solving stochastic games [23], a common phenomenon in non-perfectinformation games [27]. A mixed strategy prescribes randomized moves, i.e., the player moves the token at random according to a probability distribution over the successor vertices. By contrast, a strategy is pure if it always prescribes deterministic moves, i.e., the player always moves the token deterministically to one of the

successor vertices. A strategy is memoryless if the moves it prescribes do not depend on the history of the play carried out so far, i.e., if it prescribes the same move at every visit to a vertex. In Section 3 we give a graph-theoretic characterization of the almost-sure and positive-probability winning sets in 1 1/2-player parity games, inspired by the work of Courcoubetis and Yannakakis [6]. From this characterization we derive the existence of pure memoryless optimal strategies for the qualitative and quantitative criteria in 1 1/2-player parity games, and we present efficient polynomial-time algorithms for solving both qualitative and quantitative 1 1/2-player parity games. Theorem 1.1. (Complexity of solving 1 1/2player parity games) Let m be the number of edges in the graph of a parity game, and let d be the number of different values assigned to vertices by the priority function. 1. Qualitative 1 1/2-player parity games can be solved in time O(d · m3/2 ). 2. Quantitative 1 1/2-player parity games can be solved in polynomial time by solving a linear program that can be constructed from the qualitative solution. Previously, an efficient O(m3/2 ) algorithm was known only for the qualitative solution of the special cases of 1 1/2-player reachability and Buchi games [4]. A poly¨ nomial-time algorithm for the qualitative solution of 1 1/2-player parity games can also be obtained by modifying a reduction [20, 16] to limiting-average (aka. meanpayoff [29]) Markov decision processes (MDP's) [21]. This approach, however, requires the solution of a linear program for limiting-average MDP's [21, 13], whereas we give a direct O(d · m3/2 ) algorithm which uses the solution for Buchi MDP's [4] as a subroutine. ¨ In Section 4 we use the existence of pure memoryless qualitatively optimal strategies in 1 1/2-player parity games to give a simplified proof of existence of pure memoryless qualitatively optimal strategies for 2 1/2-player parity games [4]. Then, in Section 5, we apply those results to prove the existence of pure memoryless optimal strategies for the quantitative 2 1/2-player games, which generalizes the result of Condon [5] for stochastic reachability games. Theorem 1.2. (Quantitative pure memoryless optimal strategies) Optimal pure memoryless strategies exist in quantitative 2 1/2-player parity games. This result is in sharp contrast with the concurrent parity games of de Alfaro et al. [8, 7, 9]. Even for qualitative concurrent games, in general only optimal strategies exist, and mixed strategies with

122


infinite memory are necessary. Theorems 1.1 and 1.2 yield an improvement of the computational complexity for solving quantitative 2 1/2-player parity games. Corollary 1.1. (Complexity of solving 2 1/2player parity games) The decision problems for qualitative and quantitative 2 1/2-player parity games are in NP  co-NP, and there is an exponential-time algorithm for computing the exact values. The values of quantitative 2 1/2-player parity games are rational. The latter result is again in contrast to concurrent parity games. Their values are in general algebraic numbers, and there are simple examples with irrational values [9]. The previously best complexity for the quantitative solution of 2 1/2-player parity games was based on a doubly exponential algorithm [9] for solving concurrent parity games; so our results yield an exponential improvement. Moreover, the algorithm of de Alfaro and Ma jumdar [9], which uses a decision procedure for the first-order theory of the reals with addition and multiplication [24], does not compute the exact values of a game but only -approximations. The existence of polynomial-time algorithms for non-stochastic parity games [11, 26], for limitingaverage games [22, 29], and for stochastic reachability games [5] are long-standing open questions. We believe that the existence of pure memoryless optimal strategies for stochastic parity games --i.e., Theorem 1.2-- makes it likely that algorithmic techniques that are developed for any of those classes of games can be carried over also to stochastic parity games. 2 Sto chastic Parity Games A stochastic game graph (2 1/2-player game graph ) G = (V , E , (V , V , V )) consists of a directed graph (V , E ) and a partition (V , V , V ) of the vertex set V . For technical convenience we assume that every vertex has at least one and at most two out-going edges. For a set U  V of vertices, we write G U for the subgraph of the graph (V , E ) induced by the set of vertices U . We say that G U is a subgame of the game G if the for all vertices u  U , we have: if u  V t hen (u, w)  E implies that w  U , and if u  V  V then there is an edge (u, w)  E , such that w  U . An infinite path in a game graph G is an infinite sequence v0 , v1 , v2 , . . . of vertices such that (vk , vk+1 )  E , for all k  N. We write  for the set of all infinite paths, and for every vertex s  V we write s for the set of all infinite paths starting from the vertex s. A nonterminating stochastic game G is a 2 1/2-player infiniteduration path-forming game played on the graph G. Player Even keeps moving a token along edges of the game graph from vertices in V , player Odd keeps mov-

ing the token from vertices in V , and player Random from vertices in V . Player Random always passes the token to one of its two successors with probability 1/2. Stochastic game graphs can be obviously generalized to non-binary graphs and the transition probabilities for the random vertices to arbitrary rational numbers. All results in this paper easily follow for those generalizations by minor modifications and hence for simplicity we stick to the basic simple model without loss of generality. Moreover, for all algorithmic problems we consider in this paper there is a straightforward translation of game graphs with n vertices and m edges into equivalent binary game graphs with O(m) vertices and O(m) edges. This yields our main results listed in Section 1 from the results for binary graphs in the following sections. The non-stochastic perfect-information 2-player game graphs are a special case of the 2 1/2-player parity game graphs with V = . The 1 1/2-player game graphs (Markov decision processes ) are a special case of the 2 1/2-player game graphs with V =  or V = . Strategies. For a countable set A, a probability distribuaion on the set A is a function  : A  [0, 1] such that t A  (a) = 1. We denote the set of probability distributions on the set A by D(A). A strategy for the player Even is a function  : V  V  D(V ), assigning a prob a obility distribution to every finite sequence v  V V f vertices, representing the history of the play so far. We say that the player Even uses (or fol lows ) the strategy  if in each move, given that the current history of the play is v, he chooses the next vertex according to the probability distribution  (v). A strategy must prescribe only available moves, i.e., for all w  V  , v  V , and u  V , if  (w · v )(u) = 0 then (v , u)  E . A strategy  is pure if for all w  V  and v  V , there is a vertex u  V such that  (w · v )(u) = 1. Strategies for player Odd are defined analogously. We write  and  for the sets of all strategies for players Even and Odd, respectively. A pure memoryless strategy is a pure strategy which does not depend on the whole history of the play but only on the current vertex. A pure memoryless strategy for player Even can be represented as a function  : V  V such that (v ,  (v ))  E , for all v  V . Analogously we define pure memoryless strategies for player Odd. Given a strategy    for the player Even, we write G for the game played as the game G with the constraint that the player Even uses the strategy  . Note that if  : V  V is a pure memoryless strategy then we can think of G as the subgraph of the game graph G obtained from G by removing the edges (v , w)  E , such that v  V and  (v ) = w. The definitions for

123


a strategy  for the player Odd are similar. Once a starting vertex s  V and strategies    and    for the two players are fixed, the play of the  game is a random path s , for which the probabilities of events are uniquely defined, where an event A  s is a measurable set of paths. For a vertex s  V and an event A  s , we write Pr, (A) for the probability s that a path belongs to A if the game starts from the vertex s, and the players use the strategies  and  , respectively.

a priority function. For brevity, we sometimes write G(p) to denote the game G(Even(p)). Similarly, a parity Markov decision process (1 1/2-player parity game) is a 1 1/2-player game with the ob jective of the player derived from a priority function p as above.

Value functions and optimal strategies. Consider a parity game G(p). We define the value functions Val G , Val  : V  [0, 1] for the players G Even and Odd, respectively, as follows: Val G (s) =  sup inf  Prs , (Even(p))(s), and Val  (s) = G , Winning ob jectives. A general way of specifying sup   inf   Prs (Odd(p))(s), for all vertices s  V . objectives of players in nonterminating stochastic games Sometimes, when the ob jective of a player is not clear is by providing the set of winning plays W   for from the context, or if it is different from the ob jecthe player. In this paper we study only zero-sum tive of the original parity game G(p) = G(Even(p)), games [22, 13] in which the ob jectives of the players are we make it explicit, e.g., Val G (Reach(T ))(s) stands for strictly competitive. In other words, it is implicit that sup ,   inf   Prs (Reach(T )s ). On the other hand, if the ob jective of a player is a set W then the ob jective if the game G is clear from the context, we sometimes of the other player is the set  \ W . Given a game write Val and Val  for the value functions Val G and graph G and an ob jective W  , we write G(W ) for Val  , respectively. G the game played on the graph G with the ob jective W for the player Even. Proposition 2.1. (Optimality conditions) For If n  N we write [n] for the set {0, 1, 2, . . . , n} and every v  V , if (v , w)  E then the fol lowing hold. [n]+ for the set {1, 2, . . . , n}. In this paper we consider 1. If v  V then Val G (v )  Val G (w). the following classes of winning ob jectives. Reachability. For a set T of target vertices, the reachability ob jective is defined as Reach(T ) = { v0 , v1 , v2 . . .   : vk  T , for some k }. 2. If v  V then Val G (v )  Val G (w). ( . G 1 3. If v  V then Val G (v ) = 2 v ,w)E Val (w )

Buchi. For a set T of target vertices, the Buchi (re- Similar conditions hold for the value function of the ¨ ¨ peated reachability ) ob jective is defined as Buchi(T ) = player Odd. ¨ { v0 , v1 , v2 . . .   : vk  T , for infinitely many k }. The following is the fundamental determinacy result for Parity. Let p be a function p : V  [d] assigning a stochastic parity games, due to de Alfaro and Ma jumpriority p(v ) to every vertex v  V , where d  N. dar [9]; it can be also derived from the determinacy For an infinite path  = v0 , v1 , . . .  , we define result of Martin [19] for the zero-sum stochastic games Inf ( ) = { v  V : vk = v for infinitely many k  0 }. with Borel measurable payoffs. The even parity objective is defined as Even(p) = {    : min(p(Inf ( ))) is even}, and the odd parity objective Theorem 2.1. (Determinacy [19, 9]) In every as Odd(p) = {    : min(p(Inf ( ))) is odd }. stochastic parity game we have Val (s) + Val  (s) = 1, For every s  V , we write Reach(T )s (resp. for al l vertices s  V . Buchi(T )s , Even(p)s , and Odd(p)s ) for the set ¨ if Reach(T )  s (resp. Buchi(T )  s , Even(p)  s , and A strategy  for the player Even is optimal G for all ¨ vertices s  V , we have: Val G (s) = Val  (s) = Odd(p)  s ). The reachability and Buchi ob jectives are ¨ ,  special cases of the parity ob jectives. A Buchi ob jective inf G  Prs (Even(p)s ); it is -optimal if we have: ¨ G , Buchi(T ) can be encoded as an even parity ob jective for Val (s) -   Val  (s) = inf  Prs (Even(p)s ); and ¨ y the priority function p : V  [1] with only two priorities it is qualitativelG optimal if the following two conditions hold: Val (s) = 1 implies Val G (s) = 1, and (0 and 1), such that p(s) = 0 if and only if s  T , for all G G s  V . A reachability ob jective is a special case of the Val (s) > 0 implies Val  (s) > 0. Optimal, -optimal, Buchi ob jective, where all vertices in the target set T and qualitatively optimal strategies for the player Odd ¨ are defined similarly. are absorbing, i.e., if s  T and (s, v )  E then s = v . The almost-sure winning set WG for the player A stochastic parity game (2 1/2-player parity game ) is a stochastic game G with the set Even(p) as the Even in a game G is defined by WG = { v  V : ob jective of the player Even, where p : V  [d] is Val (v ) = 1 } and the almost-sure winning set for player

124


 Odd is WG = { v  V : Val  (v ) = 1 }. The algorithmic problem of solving qualitative stochastic parity games is, given a game G(p), to compute the sets WG and  WG . The algorithmic problem of solving quantitative stochastic parity games is to compute the functions Val G and Val  . The quantitative (resp. qualitative) G decision problem for stochastic parity games is, given a vertex s  V and a rational number r  (0, 1] as a part of the input, to determine whether Val G (s)  r (resp. whether s  WG ).

Lemma 3.2. Let v  V be a c.w.r. vertex and let U  V be a witness for v . Then there is a strategy   , such  that for every vertex u  U , we have Pru (v ) = 1. u Proof. For every vertex u  U  V , we set  (u) to be a successor of u with the minimum distance to vertex v in the subgraph induced by the set U ; the minimum distance to v is well defined since by definition of a witness the set U is strongly connected. Every play starting from a vertex in set U and following the strategy  never leaves the set U because by the definition of a witness no random edges leave the set U . Moreover, in every step the distance to the vertex v is decreased with a fixed positive probability and hence from every vertex in the set U there is a fixed positive probability of reaching the vertex v in at most |U | - 1 steps. Therefore, the vertex v is visited infinitely many times with probability 1, and since p(v ) is even and by definition of a witness we have p(v ) = min(p(U )), we  obtain Pru (v ) = 1. u Proposition 3.1. If a vertex v  V is not c.w.r. and p(v ) is even then for al l strategies   , and for al l  starting vertices s  V , we have Prs (v ) = 0. s
 Proof. By Lemma 3.1 we have Prs (U ) = 0, for every s U  Fv , and for every strategy  U , because vertex v  U is not c.w.r. Observe that v = s Fv s , and hence  v Prs (s ) = 0, since the set Fv is finite.

3 Solving 1 1/2-Player Parity Games In this section we study 1 1/2-player parity games, i.e., a subclass of stochastic parity games, such that V = . We establish Theorem 1.1: first we develop an O(d · n3/2 ) algorithm for the qualitative solution of 1 1/2player parity games and we argue that the quantitative solution then reduces to the quantitative solution of the simpler 1 1/2-player reachability stochastic games for which linear programming formulations are known [6, 5]. We also argue that pure memoryless optimal strategies exist. Our technique is inspired by the work of Courcoubetis and Yannakakis [6]. The key concept underlying the proofs and our algorithm is that of control lably win recurrent vertices. For a vertex v  V we define Fv = { U  V : v  U and p(v ) = min(p(U )) }, i.e., the family of sets of vertices containing vertex v and in which v has the smallest priority. For s, v  V , we define v = {   s : Inf ( )  Fv }, and for U  V we s define U = {   s : Inf ( ) = U }. s Definition 3.1. (Controllably win recurrent vertex) A vertex v  V in a 1 1/2-player game (G, p) is controllably win recurrent (c.w.r.) if p(v ) is even and there is a strongly connected set of vertices U  V , such that U  Fv and there are no random edges out of set U in graph G (i.e., if u  U  V and (u, w)  E imply w  U ). The set of vertices U as above is a witness for the vertex v . We write Vcwr for the set of al l c.w.r. vertices.

Lemma 3.3. (Reach the winning set strategy) If W  W then Val  Val (Reach(W )). Proof. Let    be a strategy such that  Prw (Even(p)) = 1, for all vertices w  W , and let v    be a strategy such that Pr (Reach(W )) = Val (Reach(W )), for all vertices v  V . We define a strategy    as follows:  ( w) if w  (V \ W )+ ,  (w) =  (u) otherwise,

where u  V  is the longest suffix of the sequence w Lemma 3.1. For every vertex s  V , and for every starting from a vertex in the set W . set of vertices U  V , if there is a random edge out Fix a vertex s  V . Then we have the following: of the set U then for every strategy   , we have    Prs (Even(p)s )  Prs (Even(p)s  Reach(W )s ) Prs (U ) = 0. s   = Prs (Even(p)s | Reach(W )s ) · Prs (Reach(W )s )  Proof. Let (u, w)  E be a random edge out of U , i.e., = Prs (Reach(W )s ),  u  U  V and w  U . Each time the random path s   visits vertex u it then visits vertex w in the next step since Prs (Even(p)s | Reach(W )s ) = Prs (Even(p)s ) = 1. ( (  with probability 1/2. Therefore, if s visits vertex u Hence we get Val s)  Val Reach(W ))(s). infinitely many times then it visits vertex w infinitely  many times with probability 1. Hence, Inf (s ) = U Lemma 3.4. Let W be a set of vertices such that Vcwr   U W  W . Then we have Val = Val (Reach(W )). with probability 0, i.e., Prs (s ) = 0.

125


Proof. Observe that vfor every vertex s  V , we v have Even(p)s = Therefore for : p(v ) is even s .  every v trategy   , we have v rs (Even(p)s ) = s P   Prs ( : p(v) is even v ) = Prs ( Vcwr v ), where s s the last equality follows from Proposition 3.1. Note that for every v  V , we have v  Reach(v )s s v v and hence From the Vcwr s  Reach(Vcwr )s .  above equalities it then follows that Prs (Even(p)s )   Prs (Reach(Vcwr )s ), and hence we have that Val (s)  Val (Reach(Vcwr ))(s)  Val (Reach(W ))(s). On the other hand, by Lemma 3.3 we have that Val (s)  Val (Reach(W ))(s). Theorem 3.1. (Qualitative pure memoryless strategies [4]) Player Even in 1 1/2-player parity games has pure memoryless strategies for almost-sure win and for positive-probability win criteria.

if the set U is a witness for the c.w.r. vertex v  V then it has to be disjoint from the set Attr (G, P<p(v) ), since otherwise we have U  P<p(v) =  contradicting the p(v ) = min(p(U )) condition of a witness. Lemma 3.5. (If the smallest priority is odd) Let the smal lest priority k in a 1 1/2-player parity game G(p) be odd. Then the set of c.w.r. vertices in G(p) is equal to the set of c.w.r. vertices in the subgame (G \ Attr (G, Pk ))(p). Proof. Set A = Attr (G, Pk ). No vertex in Pk is c.w.r. because k is odd. For every vertex v  P>k , (from p(v ) > k ) it follows that Pk  P<p(v) , which implies A  Attr (G, P<p(v) ), and hence by Proposition 3.2 every c.w.r. vertex in the game G(p) is in the set V \ A, and moreover, all witnesses of c.w.r. vertices are disjoint from the set A. It follows that if a vertex is c.w.r. in the game G(p) then it is also c.w.r. in the subgame (G \ A)(p). Conversely, if a vertex is c.w.r. in the subgame (G \ A)(p) then it is also c.w.r. in the game G(p) because by the definition of an attractor there are no random edges entering the set A, so a witness in the subgame is also a witness in the game G(p).

Proof. For a vertex v  Vcwr , the strategy  described in Lemma 3.2 defines a pure memoryless almost-sure winning strategy. For other vertices v  V , we set  (v ) to be a successor of vertex v with the minimum distance to the vertex set Vcwr in the graph G. This defines a pure memoryless winning strategy for almost-sure and positive-probability win criteria. Algorithm 3.1. An O(d · n3/2 ) algorithm for the qualitative 1 1/2-player parity games. Definition 3.2. (Attractor) For a set T  V Input : 1 1/2-player parity game (G, p). of vertices in the 1 1/2-player game graph G = Output: The almost-sure winning set W for (V , E , (V , V )) we inductively define the attractor player Even Attr (G, T ) as fol lows. Set R0 = T , and for k  0, 1. W := ; G0 := G set Rk+1 = Rk  { v  V : (v , u)  E for some u  2. rep eat Rk }  { v  V k: u  Rk for al l (v , u)  E }. Then k := smallest priority of a vertex in G Attr (G, T ) = Rk . It is easy to see that the attractor Attr (G, T ) is the set of vertices from which player has a strategy to reach set T in the sense of a non-stochastic 2-player game [28, 25]. For every k  [d]+ , define P<k = p-1 ([k -1]), i.e., the set of vertices with priorities strictly smal ler than k ; Pk = p-1 (k ), i.e., the set of vertices with priority equal to k ; and P>k = V \ (P<k  Pk ), i.e., the set of vertices with priorities strictly bigger than k . The following observations lead to efficient algorithms for solving 1 1/2-player parity games, yielding a proof of Theorem 1.1.
if k is odd then G := G \ Attr (G, Pk ) if k is even then B := Buchi-almost-sure-win(G, Pk ) ¨ W := W  B A := Attr (G \ B , Pk \ B ) G := G \ (B  A) until G =  3. W := Reachability-almost-sure-win(G0 , W )

Lemma 3.6. (If the smallest priority is even) Let the smal lest priority k in a 1 1/2-player parity game G(p) be even, and let B be the set of almost-sure winning vertices in the 1 1/2-player Buchi game G(Pk ). Then al l ¨ vertices in set B are almost-sure winning in the parity Proposition 3.2. If a set U  V is a witness for a game G(p) and al l c.w.r. vertices of priority k in the c.w.r. vertex v  U then the set U is disjoint from the game G(p) are in the set B . Let A = Attr (G \ B , Pk \ set Attr (G, P<p(v) ). B ). No vertex in the set A is c.w.r. in the game G(p). If a vertex in the set V \ (B  A) is c.w.r. in the game Proof. By an easy induction on the stages Rk in the G(p) then it is c.w.r. in the subgame (G \ (B  A))(p). definition of the attractor Attr (G, T ) one can prove that if there are no random edges leaving a set U  V Proof. For the set B of almost-sure winning vertices in then u  U  Attr (G, T ) implies that U  T = . Hence the Buchi game G(Pk ) there is a strategy to visit vertices ¨

126


of priority k infinitely often, and hence it is an almostsure winning strategy for player Even in the game G(p). If a vertex of priority k is c.w.r. then by Lemma 3.2 there is an almost-sure winning strategy from the vertex, i.e., it belongs to the set B . Hence no vertex in the set Pk \ B is c.w.r. Note that there is no edge entering the almost-sure winning set B from the set V \ B since otherwise its source would also be almostsure winning and hence in the set B . Therefore, we have A  Attr (G, Pk \ B ), which by Proposition 3.2 implies that no vertex in the set A \ Pk is c.w.r. This concludes the proof that no vertex in the set A is c.w.r. in the game G(p). Let a vertex v  V \ (B  A) be c.w.r. in the game G(p). We have Pk \ B  P<p(v) because p(v ) > k , and hence A  Attr (G, Pk \ B )  Attr (G, P<p(v) ). Let a set U  V be a witness for the vertex v in the game G(p). Then Proposition 3.2 implies that the set U is disjoint from the set Attr (G, P<p(v) ), and hence the set U is disjoint from the set A. The set U is also disjoint from the set B , since otherwise Lemma 3.2 implies that v  B . It follows that U  V \ (B  A), and hence the vertex v is c.w.r. in the game (G \ (B  A))(p) with the set U as a witness. Lemma 3.7. The Algorithm 3.1 computes the almostsure winning set W for player Even in O(d · n3/2 ) time for binary 1 1/2-player parity game graphs with n vertices and d different priorities. Quantitative solution, i.e., the value function can be computed in polynomial time. Proof. It follows from Lemmas 3.5 and 3.6 that the set of vertices W computed in Step 2. satisfies that Vcwr  W  W . Correctness of the algorithm then follows from Lemma 3.4. The work done in every iteration is either O(n) if the smallest priority is odd (attractor computation can be easily implemented in O(n) time) or O(n3/2 ) if the smallest priority is even, using the O(n3/2 ) time algorithm for solving Buchi 1 1/2¨ player games of Chatterjee et al. [4]. As almost-sure reachability is a special case of the Buchi almost-sure ¨ reachability (with absorbing target vertices) Step 3. can also be computed in O(n3/2 ) time. There are at most d iterations of the algorithm hence the total complexity is O(d · n3/2 ). It follows from Lemma 3.4 that for every vertex s  V , we have Val (s) = Val (Reach(W ))(s), i.e., the value of vertex s in the 1 1/2-player parity game G(p) is its value in the reachability game G(Reach(W )) with the almost-sure winning set as the target set. The quantitative solution then amounts to computing the qualitative solution in time O(d · n3/2 ) and then quantitatively solving the reachability game G(Reach(W )) for which linear programming formulations are known [6, 5]; see

also Section 5 for more details, including the linear program. 4 Qualitative 2 1/2-Player Parity Games In this section we give an alternative, more elementary proof of our earlier result on solving qualitative stochastic parity games by a reduction to non-stochastic parity games [4]. The key consequence of this solution is the existence of pure memoryless qualitatively optimal strategies that we use in the next section to prove existence of pure memoryless optimal strategies. Theorem 4.1. (Qualitatively optimal pure memoryless strategies [4]) In 2 1/2-player stochastic parity games both players have qualitatively optimal pure memoryless strategies. First we recall the reduction [4]. Given a 2 1/2-player parity game (G = (V , E , (V , V , V )), p : V  [d]), we construct a 2-player parity game G with the same set [d] of priorities. For every vertex v  V  V , there is a vertex v  V with "the same" outgoing edges, i.e., (v , u)  E if and only if (v , u)  E . Each random vertex v  V is substituted by the gadget presented in Figure 1. More formally, the players play the following 3-step game in G from vertex v of priority p(v ). First, in vertex v player Odd chooses a successor (v, k ), each of priority p(v ), where k  [p(v ) + 1] is even. Then in vertex (v, k ) player Even chooses from at most two successors: vertex (v, k - 1) of priority k - 1 if k > 0, or vertex (v, k ) of priority k if k  p(v ). Finally, in a vertex (v, k ) the choice is between all vertices u such that (v , u)  E , and it belongs to player Even if k is odd, and to player Odd if k is even. Let U and U  be the winning sets for players Even and Odd, respectively, in the 2-player parity game G. Define sets U and U  of vertices in the 2 1/2-player parity game G by U = { v  V : v  U }, and U  = { v  V : v  U  }. By the determinacy of 2player parity games [10] we have that U  U  = V , and hence U  U  = V . Therefore, the following lemma yields the Theorem 4.1. Lemma 4.1. There are pure memoryless strategies  and  in the stochastic game G for the player Even and Odd, respectively, such that Val G (v ) = 1, for al l v  U , and Val   (v ) > 0, for al l v  U  . G Proof. For brevity, we say that a cycle (resp. strongly connected component, or s.c.c.) in a (stochastic) parity game graph is even (resp. odd) if the minimum priority of a vertex on the cycle (resp. in the s.c.c.) is even (resp. odd). By the pure memoryless determinacy of

127


p(v)

__ v

p(v) (~ ,0) v (^ ,0) v 0 1

p(v)

(~ ,2) v

(~ ,4) v

p(v)

(~ ,p(v)+1) v

p(v)

2

3

(^ ,4) v

4

(^ ,p(v)+1) p(v) v

succ(v)

succ(v)

succ(v)

succ(v)

succ(v)

succ(v)

Figure 1: The gadget for the reduction of a 2 1/2-player parity game to a 2-player parity game.

2-player parity games [10] there are pure memoryless strategies  and  in the game G for the player Even and Odd, respectively, such that all cycles in the subgraph U are even, and all cycles in the subgraph G G U  are odd [17]. We define a pure memoryless strategy  for the player Even in the game G as follows: for all v  V , if  (v ) = u then we set  (v ) = u. In a similar fashion we get a pure memoryless strategy  for the player Odd in the game G from the pure memoryless strategy  . Our goal is to establish that the player Even wins with probability 1 from the set U in the game G using strategy  , and that the player Odd wins with positive probability from the set U  in the game G using strategy  . By Theorem 3.1 it suffices to prove it for strategies  and  only against memoryless strategies. In other words, it suffices to establish the following.

vertex (v, 2 - 1) for < r is in G then the strategy  chooses at vertex v the successor leading to (v, 2 - 1). Otherwise the strategy  chooses at vertex v the successor leading to (v, 2r), and at vertex (v, 2r) it choses a successor shortening the distance to the fixed vertex of priority 2r - 1. C Consider an arbitrary cycle in the subgraph G  , where C is the set of vertices in the gadgets of vertices in C . There are two cases. If there is at least one vertex (v, 2 - 1) with < r on the cycle then the minimum priority on the cycle is odd. Otherwise, in all vertices choices shortening the distance to the vertex with priority 2r - 1 are taken and hence priority 2r - 1 is visited and all other priorities on the cycle are  2r - 1, so 2r - 1 is the minimum priority on the cycle. Now we argue that for every random vertex v  V  U , all of its successors are in U . Otherwise, the player Odd in the vertex v of the game G can choose the successor (v, 0) and then a successor to its winning set U  , which contradicts the assumption that the strategy  is a winning strategy for the player Even in the game G.

2. The proofs for the player Odd are similar. We only briefly sketch the key steps. Let C be an even terminal strongly connected component in G , U  , and let its minimum priority be 2r, for some r  N. We fix a strategy  for the player Even in the game G as follows. For vertices v  C  V we have  (v ) =  (v ). For vertices v  C  V we define the strategy as follows: if at a vertex v the strategy  chooses a vertex (v, 2 - 2), such that  r, then the strategy  chooses 1. For all pure memoryless strategies  , every terminal the successor (v , 2 - 2). Otherwise, > r and then ^ s.c.c. in the graph G, U is even. For every random the strategy  chooses a successor which shortens the vertex v  V  U , all of its successors are in the distance to a fixed vertex with priority 2r. This will set U . construct an even cycle in the subgraph G U  . 2. For all pure memoryless strategies  , every terminal Finally, for every random vertex v  V  U  , s.c.c. in the graph G , U  is odd. For every random at least one successor must be in the set U  , since vertex v  V  U  , at least one of its successors is in otherwise the strategy  cannot be a winning strategy the set U  . for the player Odd in the game G from the vertex v . 1. We prove that every terminal strongly connected i U component in the strategy subgraph G, s 5 Quantitative 2 1/2-Player Parity Games even. We argue that if there is an odd terminal In this section we prove that pure memoryless optimal t U strongly connected component in G, hen we strategies exist for quantitative stochastic parity games, can construct an odd cycle in the subgraph G U , i.e., we establish Theorem 1.2. which is impossible because  is a winning strategy for the player Even from the set U in the 2-player parity Lemma 5.1. (Subgame strategy) Let G = G U game G. be a subgame of the game G, and let r < 1. If Let C be an odd terminal strongly connected com- Val G (u) = 1, for al l u  U , and if Val G (w)  r, for ponent in G, U , and let its minimum priority be al l edges (u, w)  E going out of the set U , such that 2r - 1, for some r  N. We fix a strategy  for the u  V , then Val G (u)  r, for al l u  U . player Odd in the game G as follows. For any vertex v  C  V we have  (v ) =  (v ). For a vertex Proof. By the assumption that for all vertices u  U , v  C  V we define the strategy as follows: if the we have Val G (u) = 1, so by Theorem 4.1, there is a

128


pure memoryless strategy  for the player Even in the , game G , such that inf  Pru  (Even(p)) = 1. Let  be an -optimal strategy for the player Even in the game G. We define a strategy  for the player Even in the game G as follows:  ( v) if w · v  U + ,  (w · v ) =  (u · v ) otherwise, where u  V  is the longest suffix of the sequence w starting from a vertex which is not in the set U . Fix a vertex u  U . Define U = u · U  , the set of infinite sequences of vertices in the set U , starting from the vertex u. Set U = u \ U . Let    be a strategy for the player Odd in the game G. By , the assumption that inf  Pru  (Even(p)) = 1, we  , have that Pru (Even(p) | U ) = 1. For every play  = u = v0 , v1 , v2 , . . .  U , there is the smallest i  N, such that vi  U . Note that if in the play  the player Even uses the strategy  then by the -optimality  of the strategy  , we have Pru, (Even(p) | U )  r - . Therefore, for every strategy    of the player Odd, we have:
   Pru, (Even(p)) = Pru, (Even(p) | U ) · Pru, (U )   + Pru, (Even(p) | U ) · Pru, (U )    Pru, (U ) + (r - ) · Pru, (U )

  Note that Pru, (U ) + Pru, (U ) = 1, so we get , Pru (Even(p))  r - , for all strategies  for the player Odd. This implies that Val G (u)  r. Lemma 5.3. (Locally and qualitatively optimal strategies are optimal) If a pure memoryless stratDefinition 5.1. (Locally optimal strategy) We egy is local ly optimal and qualitatively optimal then it is say that a pure memoryless strategy  for a player optimal. P  { , } (i.e., for player Even and Odd, respectively) is locally optimal in the game G if for every vertex Proof. Let  be a pure memoryless strategy for player v  VP , we have Val P ( (v )) = Val P (v ). G G Odd that is locally optimal and qualitatively opti-

vertex v  V , that if Val  (v ) = 1 then Val  (v ) = 1, G G and that if Val  (v ) > 0 then Val  (v ) > 0. G G By Theorem 4.1 there is a pure memoryless qualitatively optimal strategy  for the player Odd in the game G. Note that there are no edges going out of the  set WG in the strategy subgraph G . Therefore, for  every edge (v , w)  E , such that v  WG , we have   Val G (v ) = Val G (w) = 1, and hence the strategy  is also a valid strategy in the game G , and for all vertices  v  WG , we have Val  (v ) = Val   (v ) = 1. G G We now prove the other claim which can be restated as follows: there is no vertex v  V , such that Val G (v ) < 1 and Val G (v) = 1. Define the number r = min{ Val G (v ) : Val G (v) = 1 }. For the sake of contradiction assume that r < 1. By Theorem 4.1 there is a pure memoryless qualitatively optimal strategy  for the player Even in the game G . Note that there are no edges going out of the set WG in the strategy subgraph G . It follows that G WG is a subgame of the game G. By definition of the set WG , we have Val G (v) = 1, for all vertices v  WG . Moreover, by the optimality condition for a vertex v  V in the game G we have that for every edge (v , u)  E going out of the set WG (and hence not present in the game graph G ) we must have Val  (v ) > Val  (u), i.e., Val G (u) > Val G (v )  r. ApG G plying Lemma 5.1 to the subgame G WG we get that Val G (v ) > r, for every vertex v  WG . This, however is a contradiction with the assumption that there is a vertex v  WG , such that Val G (v ) = r.

mal. The strategy subgraph G of the strategy  is a Markov decision process so by Lemma 3.4 we have G G G Lemma 5.2. (Locally and qualitatively optimal Val  (v ) = Val  (Reach(W  ))(v ), and those values strategies) Both players have pure memoryless qual- are the unique solution to the ,following linear prov sub ject to: V xv itatively optimal strategies that are also local ly optimal. gram [6, 5]: minimize ( v x =1 V xv Proof. We prove this and the next lemma for the player v ,w)E xw 2 =  (xv ) v  V Odd; the proofs for the player Even are similar. v xv  xw v  V , (v , w)  E Let G be the game obtained from the game G by xv  0 vV removing all the edges (v , w)  E , such that v  V , and xv = 1 v  WG Val  (v ) > Val  (w). Observe that the game G differs G G t from the game G only in that in the game G he player Odd can only use the locally optimal strategies in the It follows from the qualitative optimality of the stratG G game G, while the player Even can use all his strategies egy  that W  = W . The local optimality of  imin the game G. Therefore, it suffices to prove for every plies that for every vertex v  V , we have Val G (v ) = We establish the following strengthening of Theorem 4.1

129


Val G ( (v )). Therefore, the valuation xv := Val G (v ), for all v  V , is a feasible solution of the linear program, so we have Val G  Val G . This implies that  is an optimal strategy for player Odd in the game G. Theorem 1.2 follows from Lemmas 5.2 and 5.3. Proof. [of Corollary 1.1] By Theorem 1.2 both players have optimal pure memoryless strategies. There are only exponentially many pure memoryless strategies and once a strategy for a player is fixed we have a 1 1/2-player game whose values can be computed in polynomial time by Theorem 1.1. Hence we get the NP  co-NP upper bound and an exponential time algorithm to compute the exact values. By Theorem 1.2 the values are a solution of a system of linear equations with rational coefficients [5] and hence they are rational. Acknowledgments. We thank Rupak Ma jumdar for clarifying the complexity of the algorithm for solving concurrent stochastic parity games [9], and an anonymous referee for pointing out a polynomial-time solution of qualitative 1 1/2-player parity games by a reduction to limiting-average MDP's. References
[1] A. Arnold and D. Niwinski. Rudiments of µ-Calculus, ´ volume 146 of Studies in Logic and the Foundations of Mathematics. Elsevier, 2001. [2] A. Arnold, A. Vincent, and I. Walukiewicz. Games for synthesis of controllers with partial observation. Theoretical Computer Science, 303(1):7­34, 2003. [3] J. Bradfield. The modal µ-calculus alternation hierarchy is strict. Theoretical Computer Science, 195(2):133­153, 1998. [4] K. Chatterjee, M. Jurdzinski, and T. A. Henzinger. ´ Simple stochastic parity games. In CSL'03, volume 2803 of LNCS, pages 100­113. Springer, 2003. [5] A. Condon. The complexity of stochastic games. Information and Computation, 96:203­224, 1992. [6] C. Courcoubetis and M. Yannakakis. Markov decision processes and regular events. In ICALP'90, volume 443 of LNCS, pages 336­349. Springer, 1990. [7] L. de Alfaro and T. A. Henzinger. Concurrent  regular games. In LICS'00, pages 141­154. IEEE Computer Society Press, 2000. [8] L. de Alfaro, T. A. Henzinger, and O. Kupferman. Concurrent reachability games. In FOCS'98, pages 564­575. IEEE Computer Society Press, 1998. [9] L. de Alfaro and R. Ma jumdar. Quantitative solution of omega-regular games. In STOC'01, pages 675­683. ACM Press, 2001. Full version to appear in the Journal of Computer and Systems Sciences. Elsevier.

[10] E. A. Emerson and C. Jutla. Tree automata, µ-calculus and determinacy. In FOCS'91, pages 368­377. IEEE Computer Society Press, 1991. [11] E. A. Emerson, C. S. Jutla, and A. P. Sistla. On modelchecking for fragments of µ-calculus. In CAV'93, volume 697 of LNCS, pages 385­396. Springer, 1993. [12] H. Everett. Recursive games. In Contributions to the Theory of Games III, volume 39 of Annals of Mathematical Studies, pages 47­78. Princeton University Press, 1957. [13] J. Filar and K. Vrieze. Competitive Markov Decision Processes. Springer, 1997. [14] E. Gr¨del, W. Thomas, and T. Wilke, editors. Aua tomata, Logics, and Infinite Games. A Guide to Current Research, volume 2500 of LNCS. Springer, 2002. [15] Y. Gurevich and L. Harrington. Trees, automata, and games. In STOC'82, pages 60­65. ACM Press, 1982. [16] M. Jurdzinski. Deciding the winner in parity games ´ is in UP  co-UP. Information Processing Letters, 68(3):119­124, 1998. [17] M. Jurdzinski. Small progress measures for solving ´ parity games. In STACS'00, volume 1770 of LNCS, pages 290­301. Springer, 2000. [18] M. L. Littman, T. Dean, and L. P. Kaelbling. On the complexity of solving Markov decision problems. In UAI'95, pages 394­402. Morgan Kaufmann, 1995. [19] D. A. Martin. The determinacy of Blackwell games. Journal of Symbolic Logic, 63(4):1565­1581, 1998. [20] A. Puri. Theory of Hybrid Systems and Discrete Event Systems. PhD thesis, University of California, Berkeley, 1995. [21] M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, 1994. [22] T. E. S. Raghavan and J. A. Filar. Algorithms for stochastic games -- a survey. ZOR -- Methods and Models of Operations Research, 35:437­472, 1991. [23] L. S. Shapley. Stochastic games. Proceedings Nat. Acad. of Science USA, 39:1095­1100, 1953. [24] A. Tarski. A Decision Method for Elementary Algebra and Geometry. University of California Press, Berkeley and Los Angeles, 1951. [25] W. Thomas. Languages, automata, and logic. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, volume 3, Beyond Words, chapter 7, pages 389­455. Springer, 1997. [26] J. V¨ge and M. Jurdzinski. o ´ A discrete strategy improvement algorithm for solving parity games. In CAV'00, volume 1855 of LNCS, pages 202­215. Springer, 2000. [27] J. von Neumann and O. Morgenstern. Theory of Games and Economic Behavior. Princeton University Press, 1953. [28] W. Zielonka. Infinite games on finitely coloured graphs with applications to automata on infinite trees. Theoretical Computer Science, 200:135­183, 1998. [29] U. Zwick and M. Paterson. The complexity of mean payoff games on graphs. Theoretical Computer Science, 158:343­359, 1996.

130