On the Hardness of Finding Symmetries in Markov Decision Processes
S H R AVA N M N @ G M A I L . C O M R AV I @ C S E . I I T M . AC . I N

Shravan Matthur Narayanamurthy Yahoo! Labs, Bangalore Balaraman Ravindran Dept. of Comp. Sci. and Engg., Indian Institute of Technology Madras, Chennai

Abstract
In this work we address the question of finding symmetries of a given MDP. We show that the problem is Isomorphism Complete, that is, the problem is polynomially equivalent to verifying whether two graphs are isomorphic. Apart from the theoretical importance of this result it has an important practical application. The reduction presented can be used together with any off-the-shelf Graph Isomorphism solver, which performs well in the average case, to find symmetries of an MDP. In fact, we present results of using NAutY (the best Graph Isomorphism solver currently available), to find symmetries of MDPs.

graphs, due to the additional structure introduced by MDPs. In this work we show that finding symmetries in MDPs is no harder than the problem of graph isomorphism. We also show that existing graph isomorphism solvers can be used to find symmetries in MDPs. We present some notation in the next section, and some related work in Section 3. In Section 4 we formally define the problem, and present a constructive algorithm in Section 5 for showing the equivalence to graph isomorphism. We discuss some results Section 6 and conclude in Section 7.

2. Homomorphisms and Symmetry Groups
Let B be a partition of a set X. For any x  X, [x]B denotes the block of B to which x belongs. Any function f from a set X to a set Y induces a partition (or equivalence relation) on X, with [x] f = [x ] f if and only if f (x) = f (x ) and x, x are f -equivalent written x  f x . Let B be a partition of Z  X × Y, where X and Y are arbitrary sets. The projection of B onto X is the partition B|X of X such that for any x, x  X, [x]B|X = [x ]B|X if and only if every block containing a pair in which x is a component also contains a pair in which x is a component or every block containing a pair in which x is a component also contains a pair in which x is a component. Definition 1. An MDP homomorphism h from an MDP i M = S, A, , P, R to an MDP M = S , A ,  , P , R s a surjection from  to  , defined by a tuple of surjections < f , { gs |s  S} >, with h((s, a)) = ( f (s), gs (a)), where f : S  S and gs : As  A f (s) for s  S, such that: s, s  S, a  As s P(s, a, s ) (1) P ( f (s), gs (a), f (s )) =
[s ] f

1. Introduction
Markov Decision Processes (MDPs) are widely employed to model sequential decision problems. But current solution techniques for MDPs do not scale well with the size of the MDPs, and hence are proving inadequate in solving large real-world problems. While building abstract models of real-world problems, it can be seen that a high degree of redundancy is present which can be exploited to reduce size of the model. This reduction in size could possibly lead to more efficient solution methods. One such notion of redundancy is a degree of symmetry that is present in any real-world problem. (Amarel, 1968) first looked at exploiting such symmetries in solving a missionaries and cannibals problem. In this work we use the notion of symmetries in MDPs introduced in (Ravindran, 2004). While it is widely believed that finding symmetries in MDPs is a hard problem, no one has investigated before exactly how hard this problem is. Intuitively this seems harder than finding symmetries in
Appearing in Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 2008. Copyright 2008 by the author(s)/owner(s).

R f (s), gs (a)) =
(

R(s, a)

(2)

We use the shorthand h(s, a) for h((s, a)). Often for convenience, we use < f , { gs } > to denote < f , { gs |s  S} >.


On the Hardness of Finding Symmetries in MDPs

Definition 2. Let M' be an image of the MDP M under homomorphism h =< f , { gs } >. For any s  S, g-1 (a ) des notes the set of actions that have the same image a  A f (s) under gs . Let  be a stochastic policy in M . Then  lifted to M is the policy M such that for any a  g-1 (a ), M (s, a) =  ( f (s), a )/| g-1 (a )| s s Definition 3. An MDP homomorphism h =< f , { gs } > from MDP M = S, A, , P, R to MDP M = i, S A ,  , P , R is an MDP isomorphism from M to M f and only if f and gs , are bijective. M is said to be isomorphic to M and vice versa. An MDP isomorphism from MDP M to itself is called an automorphism of M. Definition 4. The set of all automorphisms of an MDP M, denoted by AutM, forms a group under composition of homomorphisms. This group is the symmetry group of M. Let G be a subgroup of AutM. The image of M under G is called the G-reduced image of M. Definition 5. An MDP M is said to be a reduced model of an MDP M, iff there exists an MDP homomorphism h:MM.

(Ravindran, 2004) proposes a more generic framework based on the notion of MDP homomorphisms with statedependent action recoding as introduced in Section 2. This allows a greater reduction in problem size and aids in modeling many other notions of equivalence like symmetries. A polynomial time algorithm to find the reduced model under the notion of MDP homomorphisms is also proposed by extending the algorithm proposed by (Givan et al., 2003) and (Lee & Yannakakis, 1992). Again, the algorithm is polynomial in the number of block operations, the stability criterion is modified to suit the equivalence notion and the same process of iterative splitting is used. The notion of stability used is called the stochastic substitution property, which is an extension of the substitution property for finite state machines (Hartmanis, 1966). However, literature on MDP minimization using symmetries is sparse. (Zinkevich & Balch, 2001) define symmetries based on state-action equivalence but do not make any connections to group-theoretic concepts or minimization algorithms. Another dimension to analyze the literature is the approach to symmetry finding. Two main approaches exist: 1. To derive a set of necessary conditions for elements to be symmetric 2. Prove Isomorphism Completeness and use a graph isomorphism finding system Intuitively symmetries seem easier to identify than homomorphisms and we tried the first approach to find a polynomial time algorithm for symmetry finding, along the lines of the MDP homomorphism finding, with the motivation of finding better algorithms for MDP minimization. The MDP homomorphism definition allows for deriving this easily because, two state action pairs (s1 , a1 ), (s2 , a2 ) are homomorphically equivalent if P f (s1 ), gs1 (a1 ), f (s ) = P ( f (s2 ), gs2 (a2 ), f (s )) T(s1 , a1 , [s ]Bh |S ) = T(s2 , a2 , [s ]Bh |S )
( )

3. Related Work
MDP Minimization is a well studied problem. As stated earlier, in the model minimization approach, a reduced MDP that that preserves some key properties as the original MDP is found by combining "equivalent" states. The reduced MDP found depends on the notion of equivalence between states used in the aggregation. The notion of equivalence chosen will be fundamental in designing and analyzing algorithms for reducing MDPs. In (Dean & Givan, 1997) a minimization algorithm is proposed based on the notion of stochastic bi-simulation homogeneity. Informally, a partition of the state space for an MDP is said to be homogeneous if for each action, states in the same block have the same probability of transitioning to each other block. They also provide an algorithm for finding the coarsest homogeneous refinement of any partition of the state space of an MDP. The algorithm starts with an initial partition P0 and iteratively refines it by splitting the blocks until the coarsest homogeneous refinement of P0 is obtained. A notion of stability of a block with respect to another is defined and unstable blocks are split till all blocks of the partition are stable. The complexity of the algorithm is expressed in terms of the partition manipulation operations. Hence, the actual complexity depends on the underlying partition representation and manipulation algorithms. (Givan et al., 2003) discuss the application of the algorithm to solving factored MDP problems. Enumerating the state space is avoided by describing large blocks of equivalent states in factored form with the block descriptions being inferred directly from the original factored representation.

h(s1 , a1 )

= h(s2 , a2 )

for all s  S. This is the stochastic substitution property and it allows us to deal just with blocks without worrying about the actual functions. However, a similar attempt for symmetries still needs the symmetry f in the necessary condition as below: P( f (s1 ), gs1 (a1 ), f (s ) = P(s2 , a2 , f (s )) P(s1 , a1 , s ) = P(s2 , a2 , f (s ))
)

h(s1 , a1 ) =

(s2 , a2 )

(Flener et al., 2002) and (Crawford, 1992) point that symmetry finding for CSPs in general is Isomorphism Com-


On the Hardness of Finding Symmetries in MDPs

plete. However, there also exist results showing that symmetry finding is NP-complete (in case of geometric automorphism of graphs (Manning, 1990)). So we were still unclear whether symmetry finding for MDPs is Isomorphism Complete or NP-complete due to the presence of factorially many action recoding functions. A better understanding of the use of symmetries for abstraction in MDPs is the motivation for this work.

From (Mathon, 1979), (Read & Corneil, 1977), (Miller, 1977) we have, DGEN(G)  AGEN(G)  IMAP(G1 , G2 )  ISO(G1 , G2 ). We intend to prove that AMGEN(M) is Isomorphism Complete. We are done if we prove that AMGEN(M)  DGEN(GM ), where GM is a weighted graph constructed in polynomial time from M, that is, AMGEN(M)  DGEN(GM ) and DGEN(GM )  AMGEN(M). It is easy to see that DGEN(GM )  AMGEN(M) is true because we can always construct a degenerate MDP from a digraph. So we need to prove that AMGEN(M)  DGEN(GM ). 5.2. Isomorphism Completeness of the problem An MDP M can be considered as a pseudograph with states acting as vertices and actions acting as edges. Since there can be more than one action affecting the transition between 2 states, we need to represent this as a pseudograph. The transition probabilities and rewards can be thought of as weight functions. Next, we formally pose AMGEN(M) as a problem on a weighted pseudograph. Let GM =< a , V, E, WP , WR > be the pseudograph corresponding to M, where a : Alphabet for labelling corresponding to actions

4. Problem Definition
To exploit the power of abstraction using symmetries, we identify them and construct a reduced model by abstracting away the symmetric portions. As the reduced model can be significantly smaller, it can be easier to solve. We use the notion of automorphisms to model symmetries. So formally, given an MDP M, 1. Find the automorphism group, AutM and 2. Given the automorphism group, AutM find the corresponding reduced model, the AutM-Reduced Image

5. Finding Symmetries
5.1. Problem Simplification Let us consider the first part of our problem, i.e., given an MDP M, find the automorphism group of M, AutM. We know that a group can be specified using its generators. So we simplify the problem to finding the generators of AutM. Let AMGEN(M) denote the problem of finding the generators of AutM. We write A  B if a problem A is polynomially reducible to B. We say that problems A and B are polynomially equivalent iff A  B and B  A. We denote polynomial equivalence by  . Definition 6. A problem A is Isomorphism Complete iff A is polynomially equivalent to finding whether two graphs are isomorphic. Let G1 , G2 be two simple graphs unless otherwise mentioned. The following is a list of relevant Isomorphism Complete problems (Booth & Colbourn, 1977) on graphs: · ISO(G1 , G2 ): Isomorphism recognition for G1 and G2 · IMAP(G1 , G2 ): Isomorphism Map from G1 to G2 (if it exists), · AGEN(G1 ): Generators of the automorphism group, AutG1 · DGEN(G): Generators of the automorphism group, AutG, where G is a weighted digraph

V : Set of vertices corresponding to states E : Set of edges, where each edge is a triple (u, a, v) where, u, v  V and a  a corresponding to state transitions WP WR : : E  R corresponding to transition probabilities E  R corresponding to rewards with WR (u, a, v) = WR (u, a, v )  (u, a, v), (u, a, v )  E u Euv where, Euv = { (u , a, v )  E |u
=

Note, E

=

,vV

u and v

=

v}

AMGEN(M) can be formulated as finding the generators of the group of bijections h : V × a  V × a . h is defined by h(u, a) = ( f (u), gu (a)), where f gu : : V  V and a  a defined for each u  V are bijections s. t.

WP ( f (u), gu (a), f (v)) = WP (u, a, v) and WR ( f (u), gu (a), f (v)) = WR (u, a, v)  (u, a, v)  E


On the Hardness of Finding Symmetries in MDPs

These two components of each generator can be interpreted as follows: 1. f is a function that permutes the states/vertices 2. The set of functions { gu } defined for each state/vertex permutes the actions/edge labels. These are called the State-Dependent Action Recoding (SDAR) functions. 5.2.1. SET BIJECTIONS Let us assume, for a moment, that we have a procedure that constructs a weighted digraph W DM from GM . Now, solving DGEN(W DM ) gives the generators of W DM . Even if these were somehow same as the f s we are looking for, we still need a way to find the SDAR functions. To achieve this, we define the notion of a set bijection which represents a set of bijections very compactly. In the worst case, for each f , there can be factorially many SDAR functions. So a normal explicit representation cannot be used. We also define the operations of intersection between two set bijections to find the bijections that are common to both set bijections, composition between two set bijections and an inverse of a set bijection. All these operations can be done in time polynomial of the number of elements in the domain of a bijection belonging to the set bijection. Definition 7. Consider two finite sets A and B. Let UA = {UA1 , UA2 , . . . , UAk } and UB = {UB1 , UB2 , . . . , UBk } be partitions of A and B respectively. UA and UB are said to be similar iff |UA | = |UB | and for each UAi  UA there exists a unique UB j  UB such that |UAi | = |UB j |. We denote it by UA  U B . Note that, by definition the sets A and B will be of the same size. Definition 8. Let A and B be two finite sets and UA = {UA1 , UA2 , . . . , UAk } and UB = {UB1 , UB2 , . . . , UBk } be partitions of A and B respectively such that UA  UB . A bijective map X : UA  UB where X(UAi ) = UB j implies |UAi | = |UB j | for all UAi  UA is called a set bijection. Informally, a set bijection can be interpreted as representing a set of bijections from A to B. X(UAi ) = UB j represents all possible bijective mappings from elements in UAi to elements in UB j . A bijection from A to B in the set of bijections that represent the set bijection, can be formed by collating mappings from each X(UAi ) = UB j . The set bijection represents all mappings that can be formed by such collations. To formalize this notion, we define the interpretation function next. Let XAB { all bijections X : UA  UB such that UA and UB are similar partitions of A and B respectively } be the set of all set bijections. Let 2S|V| be the powerset set of ^ all permutations from A  B. Define, I : XAB  2S|V|

^ such that I(X : UA  UB ) = { all bijections l : A  ^ B | l(x  UAi )  X(UAi ) UAi  UA }. Evidently, I is only injective and not surjective as there exist sets of 2S|V| that cannot be represented by a set bijection. For example, consider the set of bijections, between {a, b, c} and {1, 2, 3}, L = {(a  1, b  2, c  3), (a  2, b  1, c  3), (a  2, b  3, c  1)}. Clearly there does not exist an X : ^ UA  UB such that I(X) = L. All we can say is that ^ there exists an X such that L  I(X). To get a bijective ^ interpretation function, we define, I : XAB  ima ge(I) ^(X : UA  UB ). Clearly I is such that I(X : UA  UB ) = I a bijection and we call this the interpretation function.
1 Definition 9. Let A be a finite set and let UA = 2 2 2 1 1 1 2 {UA1 , UA2 , . . . , UAk }, UA = {UA1 , UA2 , . . . , UAk } be two 1 2 partitions of A such that, UA  UA . We define the intersec1 2 tion of two similar partitions of a finite set as UA  UA = 2 2 1 1 2 1 2 1 {UAi  UA j | UAi  UA , UA j  UA and UAi  UA j }. 1 Definition 10. Let A and B be two finite sets and UA = 1 1 1 2 2 2 2 {UA1 , UA2 , . . . , UAk }, UA = {UA1 , UA2 , . . . , UAk } be two 1 2 1 1 1 partitions of A and UB = {UB1 , UB2 , . . . , UBk }, UB = 2 1 2 2 {UB1 , UB2 , . . . , UBk } be two partitions of B. Also let UA  1 2 2 UB and UA  UB . Let two set bijections X1 and X2 be 1 1 2 2 defined from UA to UB and from UA to UB respectively. If 1 2 1 2 (UA  UA )  (UB  UB ), we define the intersection between 1 the two set bijections X = X1  X2 as follows: UAi  1 2 2 1 2 1 2 UA , UA j  UA such that UAi  UA j , X(UAi  UA j ) =

1 2 1 2 1 2 X1 (UAi )  X2 (UA j ). Note that, X : UA  UA  UB  UB and it can be shown that I(X) = I(X1 )  I(X2 ).

1 Definition 11. Let A be a finite set. Let UA = 1 1 1 2 2 2 2 {UA1 , UA2 , . . . , UAk }, UA = {UA1 , UA2 , . . . , UAk } be two similar partitions of A. Let X be a set bijection defined 2 2 1 from UA to UA . We define the inverse of X as X-1 : UA  2 1 1 -1 2 1 UA such that X (UAi ) = UA j iff X(UA j ) = UAi .

Definition 12. Let A, B and C be three finite sets and UA = {UA1 , UA2 , . . . , UAk }, UB = {UB1 , UB2 , . . . , UBk } and UC = {UC1 , UC2 , . . . , UCk } be partitions of A, B and C respectively . Also let UA , UB and UC be pairwise similar to each other. Let two set bijections X1 and X2 be defined from UB to UC and UA to UB respectively. We define the composition of X1 and X2 , X = X1 X2 as the set bijection from UA to UC defined by X(UAi ) = X1 (X2 (UAi )), for each UAi  UA . It can be shown that for each l  I(X) there exist, l1  I(X1 ) and l2  I(X2 ) such that l = l1  l2 where  denotes normal function composition. 5 . 2 . 2 . V E C TO R - W E I G H T E D D I G R A P H We assume that a can be ordered and let O be such an ordering. Without loss of generality, we can assume that |Euv | =


On the Hardness of Finding Symmetries in MDPs

k, u, v  V because, we can always take maxu,vV |Euv | = k and if u, v  V such that (u, a, v)  E for some a  a and |Euv | < k, then add dummy labels (chosen from the remaining labels in a ) and zero weights to make |Euv | = k. This corresponds to the general assumption in MDPs that |As | = k, s  S. Let < a1 , a2 , . . . , ak > ordered as per O be the k-tuple representing the label of each edge in Euv . This being the same for all edges, we leave out labeling from the graph definition. Now we define the vector-weighted digraph corresponding to M, V W GM =< V, E , WP , WR >, as follows: E WP WR
=

WP (u, l(t ), v). This partition induces a corresponding parj u tition Uv = {1 , 2 , . . . , a } where ia = {l(t) | t  Nik }. a a a Since, each l sorts WP (u, v), they satisfy the property that l(x  Nik )  ia . Therefore, there exists a set bijection u u QPv : UNv  Uv such that, I(QPv ) = DPv . u u u a k Using a similar procedure, we can show that there exists set u u bijection QRv : UNv  Uv whose interpretation is the set u a k of permutations that sort WR (u, v). Let Quv = QPv  QRv . If Quv = , then there doesn't exist u u an automorphism for the MDP M. 5.2.5. WEIGHTED DIGRAPH Now we define the weighted digraph W GM V, E , W > as follows: W
:

: :

{(u, v) | a  a and (u, a, v)  E} E  Rk defined by WP (u, v) =< WP (u, a1 , v), . . . , WP (u, ak , v) >

=<

E  Rk defined by WR (u, v) =< WR (u, a1 , v), . . . , WR (u, ak , v) >

E


WRs (u, v)) where m is a bijection from R2k  R and . denotes concatenation

R such that W (u, v) = m(WPs (u, v).

where a1 , a2 , . . . , ak are ordered as per O. 5 . 2 . 3 . S O RT E D V E C T O R - W E I G H T E D D I G R A P H We define the sorted vector-weighted SV W GM =< V, E , WPs , WRs >, as follows: WP s : E


digraph,

Algorithm 1 Construction

Rk defined by

WPs (u, v) =< WP (u, puv (1), v), . . . , WP (u, puv (k), v) > where, puv : Nk  a such that WP (u, puv (1), v)  . . .  WP (u, puv (k), v) WR s : E


Rk defined by

WRs (u, v) =< WR (u, ruv (1), v), . . . , WR (u, ruv (k), v) > where, ruv : Nk  a such that WR (u, ruv (1), v)  . . .  WR (u, ruv (k), v) Note that, puv and ruv are not unique. So we choose them such that the order O is preserved. 5 . 2 . 4 . Set Bijections T H AT S O RT T H E V E C T O R - W E I G H T S Here we show that there exists a set bijection whose interpretation is the set of permutations that sort the vectorweights. Let Nk be the set of first k natural numbers. Let DPv { all permutations l : Nk  a | l sorts WP (u, v)} u be defined for each (u, v)  E . So, WPs (u, v) =< WP (u, l(1), v), . . . , WP (u, l(k), v) > and WP (u, l(1), v)  WP (u, l(2), v)  . . .  WP (u, l(k), v) . Clearly, Nk can be j y u partitioned into UNv = {N1 , N2 , . . . , Nk } such that, t  Nk , k k k WP (u, l(t), v) has the same value for each y = 1, 2, . . . , j y y+1 and if t  Nk and t  Nk then WP (u, l(t), v) <

1: Given M = S, A, , P, R 2 : Let SOLN be an empty set 3: Construct the pseudograph GM =< a , V, E, WP , WR > as defined in Section 5.2 4: Construct the vector-weighted digraph V W GM =< V, E , WP , WR > as defined in Section 5.2.2 5: Construct the sorted vector-weighted digraph SV W GM =< V, E , WPs , WRs > as defined in Section 5.2.3 6: for each (u, v)  E do 7: Compute QPv and QRv by finding the partition of Nk as u u described in Section 5.2.4 P R 8: Quv  Quv  Quv 9: if QPv  QRv does not exist then u u 10: exit 11: end if 12: end for 13: Construct the weighted digraph W GM =< V, E , W > using m as described in Section 5.2.5 14: F  DGEN(W GM ) where F is the set of generators of AutW GM 15: for each f  F do 16: for each (u, v)  E do 1 17: Guv  Q f (u) f (v) Q-v u 18: end for ^ 19: Let H f be an empty set 20: for each u  V do 21: Gu  Guv from some v  V 22: for each v  V do 23: Gu  Gu  Guv 24: end for ^ 25: Add Gu to H f 26: end for ^ 27: Add < f , H f > to SOLN 28: end for


On the Hardness of Finding Symmetries in MDPs

5 . 2 . 6 . C O N S T RU C T I O N The procedure for finding symmetries of an MDP M is given in Algorithm 1. The complexity of the algorithm is as follows. The construction steps in lines 3 to 5, are at most polynomial in |E|. Using a constant access time data structure like a hashtable, QPv and QRv can be constructed in O(|Euv |) time.The u u intersection takes O(|Euv |2 ) time. Since this runs for |E | iterations, computation of Quv is at most polynomial in |E|. Since m is known, the construction of weighted digraph in line 13, is polynomial in |E|. With the use of procedures that return at most |V | automorphisms of AutW GM (Mathon, 1979), the construction of Gu for each f , from lines 15 to 26, runs for at most |V | iterations. The most expensive part of the loop from lines 20 to 26 is the computation of |V |2 intersections. But this is still polynomial in |V ||E| time. Hence the algorithm takes polynomially more time than the solution time of DGEN. Also to extract a solution from SOLN, we need to extract |V | ^ SDAR functions from H f for each f , which takes |Euv | time if we use a constant access time data structure. So extraction of a solution takes O(|V |2 |E|) which is still polynomial in |V ||E|. While one can intuitively see that the reduction is indeed polynomial time, the proof is presented in an associated technical report (Narayanamurthy & Ravindran, 2008), due to lack of space. 5.3. Significance The above result is significant both theoretically and practically. Practically speaking, the reduction to Graph Isomorphism allows us to use any of the numerous off-the-shelf Graph Isomorphism solvers to find symmetries on MDPs. In fact, we use NAutY - No Automorphisms, Yes?, the best Graph Isomorphism solver currently available (Skiena, 1997) to find out symmetries in MDPs. NAutY solves AGEN(G). It uses backtracking and a refinement procedure to find the canonical labeling. If two different labelings lead to the same graph, then an automorphism can be found using these labelings (McKay, 1981). In the worst case it can take exponential time. So it allows the use of a variety of vertex invariants, which act like heuristics, to solve harder problems. However, for random graphs with n vertices and edge probability 0.5, average execution times for large n are about n2 nanosecs.We use NAutY in the fourteenth line in the construction, where we need to solve DGEN(G). We first convert the weighted digraph into an unweighted digraph using standard procedure. We then use NAutY to find the generators of the automorphism group of the so found digraph. From these we extract generators of AutW G as per the above procedure. We present some results in Section 6.

6. Results
The experiments were run on the following two domains. We describe results per domain. 6.1. Probabilistic GridWorld The domain is an N × N GridWorld with four probabilistic actions of going UP, DOWN, RIGHT and LEFT having a 90% success probability. The initial state was (0,0) and the goal states were {(0, N - 1), (N - 1, 0)}. We used Algorithm 1 to find the symmetries with NAutY being used as the DGEN solver. We then used the symmetries to find the partition of . We were able to find the partition corresponding to the symmetry group, that is, for a grid of size M × N, states (x,y), (y,x), (M-1-x,N-1-y) and (N-1,M-1-x) are equivalent. We present the time taken by the algorithm for GridWorlds of different sizes.
Value Iteration with Explicit Model Minimization on the Probabilistic GridWorld
45 No Reduction Without 2-reduction time Without 4-reduction time 35 With 2-reduction time With 4-reduction time 30
Time in sec

40

Nauty + 4-reduction time

25

20

15

10

5

0 20

22

24

26

28 30 Size of the Gridworld

32

34

36

38

Figure 1. Average running times of the value iteration algorithm with explicit model minimization on Probabilistic GridWorld vs size of the GridWorld. Each of the 3 sets should be compared with the graph for no reduction. Curves in a set represent different degrees of symmetry. Each set shows the time reduction with reduced model usage. First one discounts the time taken to find symmetries and for reduction. The next set includes the time for reduction but discounts time taken to find symmetries. The last one includes both the time taken to find symmetries using NAutY and time for reduction.

To complete the end-to-end approach, we ran the Greduced image algorithm, presented in (Ravindran, 2004), to find the reduced image and ran the Value Iteration algorithm on the reduced image. To show the efficiency of reduction, we show the time taken for reduction and solution separately. We also present the case of a handcrafted 2folded symmetry which is used with the G-reduced image algorithm and reduced model is used with Value Iteration. From Figure 1 it is evident that the reduced model con-


On the Hardness of Finding Symmetries in MDPs

struction is efficient and adds little overhead. However, the results of the end-to-end approach show a significant overhead due to symmetry finding. It cuts the saving by almost half. Still the results are significant because they double the size of the largest GridWorld that can be solved in some given time. 6.2. GridWorld Soccer The domain is a soccer-inspired grid domain. It is a slightly modified version of that described in (Bowling, 2003). We first describe the original version of the domain and then state the modification. It is an M × N grid with two agents. One is denoted the attacker (A) who holds the ball and the other as the defender (B) who tries to snatch the ball from the attacker. The center lines/grids(depending on whether M and N are even or odd) for both x-axis and y-axis are chosen naturally. The state is defined by the non-identical positions of the attacker and the defender. This defines the state space with (MN)2 - (MN) states. The actions are movements in the four compass directions: N, E, W, S and the hold action H. It is a single player game, in that, only the attacker chooses actions deliberately while the defender executes random actions. The action chosen by the attacker and the random action of the defender are executed in random order, which determines the next state. However if the defender tries to move into the attacker's location then the state is unchanged and if the attacker tries to move into the defender's location, the game is reset to the initial state which is shown in Figure 2. The right hand section of the grid is the attacker's half and the left hand section that of the defender. The goal is chosen to be situated beyond the first column of grids occupying one grid on each side of the y-axis central line/grid. A W action from the squares in front of the goal state leads to a goal with a reward of 1 and to the end of an episode. Everywhere else the reward is 0. A 5 × 4 domain is shown in Figure 2. Intuitively, the domain seems symmetric around the y-axis center line. However, the results of using Algorithm 1 on this domain showed us that the domain is not symmetric due to the existence of the reset action when the attacker tries to move into the defender's position. So we modified the domain to have symmetric reset, that is, reset happens to the initial state and its symmetric state around the y-axis center line with equal probability. This makes the domain symmetric as per intuition, which the algorithm confirms. Interestingly, the algorithm also finds that the existence of the hold action adds further symmetry. The grids along the border of the domain act as walls. For example, the northern wall stops the N action leaving the state unchanged which is the same result if the agent were to execute a H action. These additional symmetries which we did not think

A

WEST

GOAL

B

A

GOAL

A

B

0

.7

.1

0 .1
B

0.1

G OAL

GOAL

G OAL

B

A

0

A

B

Figure 2. Single Player grid soccer where agent B selects it actions randomly. The initial state is shown on the left and an example of transitions and associated probabilities are given for a particular state and action on the right. Notice that fifty percent of the time A's actions are executed first causing it to lose the ball and the game reset to the initial state. In addition, if B selects H or E it does not move and so A still loses the ball and returns to the initial state. The other outcomes are equiprobable.

of before running algorithm were found by the algorithm. This suggests that there might exist complicated symmetries that will be discovered by the algorithm, which are hard to find, even upon a close examination. Also in many cases, symmetries are size invariant. So we can use the algorithm on a relatively smaller version of the domain and find symmetries which might still hold on the larger version. We present the time taken by the algorithm for different sizes. An increment of one here means an increase of one along both axes. The presence of two agents, blows up the state space very rapidly and we hit the limit on the order of the graph imposed by NAutY very soon (for a 11 × 10 grid).To present similar graphs as in the probabilistic GridWorld case, we use the explicit model minimization approach with Value Iteration. The results are presented in Figure 3. In this case, we find that the overheads due to the construction and the G-reduced image algorithm is negligible. Though efficiency of the G-reduced image algorithm is expected, the efficiency of the construction can be possibly because of the structure of the domain yielding an easy graph to find automorphisms on.

7. Conclusion
In this work, we have provided a constructive proof for the Isomorphism Completeness of the problem of finding symmetries. We have also proposed the use of this constructive proof along with an efficient minimization algorithm to solve an MDP using symmetries and demonstrated it empirically. As part of future work, we are looking at adapting approximation algorithms for finding graph isomorphisms to finding approximate symmetries in MDPs.


On the Hardness of Finding Symmetries in MDPs
Value Iteration with Explicit Model Minimization on the GridWorld Soccer domain
12000 No Reduction Without 2-reduction time With 2-reduction time Nauty + 2-reduction time

Givan, R., Dean, T., & Greig, M. (2003). Equivalence notions and model minimization in markov decision processes. Artificial Intelligence, 147, 163­223. Hartmanis, J. (1966). Algebraic structure theory of sequential machines (prentice-hall international series in applied mathematics). Upper Saddle River, NJ, USA: Prentice-Hall, Inc. Lee, D., & Yannakakis, M. (1992). Online minimization of transition systems (extended abstract). STOC '92: Proceedings of the twenty-fourth annual ACM symposium on Theory of computing (pp. 264­274). New York, NY, USA: ACM Press. Manning, J. B. (1990). Geometric symmetry in graphs. Doctoral dissertation, Purdue University. Mathon, R. (1979). A note on the graph isomorphism counting problem. Information Processing Letters, 8, 131­132. McKay, B. D. (1981). Practical graph isomorphism. Congressus Numerantium, 30, 45­87. Miller, G. L. (1977). Graph isomorphism, general remarks. STOC '77: Proceedings of the ninth annual ACM symposium on Theory of computing (pp. 143­150). New York, NY, USA: ACM Press. Narayanamurthy, S. M., & Ravindran, B. (2008). On the hardness of finding symmetries in markov decision processes (Technical Report IITMCSETR-01-08). Indian Institute of Technology, Madras. Ravindran, B. (2004). An algebraic approach to abstraction in reinforcement learning. Doctoral dissertation, Department of Computer Science, University of Massachusetts Amherst. Read, R. C., & Corneil, D. G. (1977). The graph isomorphism disease. Journal of Graph Theory I, 339­363. Skiena, S. (1997). The stony brook algorithm repository. Zinkevich, M., & Balch, T. (2001). Symmetry in markov decision processes and its implications for single agent and multiagent learning. Proceedings of the ICML-01 (pp. 632­640). Morgan Kaufmann.

10000

8000
Time in milli mins

6000

4000

2000

0 1

1.5

2

2.5 Size of the Gridworld

3

3.5

4

Figure 3. Average running times of the value iteration algorithm with explicit model minimization on GridWorld Soccer domain vs size of the domain. Size of one represents the 5 × 4 grid. Thereafter an increment of one means an increment of one along both axes. Each graph should be compared with the graph for no reduction. The other graphs show the time reduction with reduced model usage. First one discounts the time taken to find symmetries and for reduction. The next one includes the time for reduction but discounts time taken to find symmetries. The last one includes both the time taken to find symmetries using NAutY and time for reduction.

Acknowledgements
We would like to thank the reviewers for their valuable comments and inputs. We would also like to thank Google Research for supporting Dr. Ravindran's participation in the conference.

References
Amarel, S. (1968). On representations of problems of reasoning about actions. In D. Michie (Ed.), Machine intelligence 3, vol. 3, 131­171. Amsterdam, London, New York: Elsevier/North-Holland. Amarel, S. Booth, K. S., & Colbourn, C. J. (1977). Problems polynomially equivalent to graph isomorphism (Technical Report). University of Waterloo. Bowling, M. (2003). Multiagent learning in the presence of agents with limitations. Doctoral dissertation, Carnegie Mellon University. Crawford, J. (1992). A theoretical analysis of reasoning by symmetry in first-order logic. Dean, T., & Givan, R. (1997). Model minimization in markov decision processes. AAAI/IAAI (pp. 106­111). Flener, P., Frisch, A., Hnich, B., Kiziltan, Z., Miguel, I., Pearson, J., & Walsh, T. (2002). Breaking row and column symmetries in matrix models.