Sampling First Order Logical Particles Hannaneh Hajishirzi and Eyal Amir Department of Computer Science University of Illinois at Urbana-Champaign {hajishir, eyal}@uiuc.edu Abstract Approximate inference in dynamic systems is the problem of estimating the state of the system given a sequence of actions and partial observations. High precision estimation is fundamental in many applications like diagnosis, natural language processing, tracking, planning, and robotics. In this paper we present an algorithm that samples possible deterministic executions of a probabilistic sequence. The algorithm takes advantage of a compact representation (using first order logic) for actions and world states to improve the precision of its estimation. Theoretical and empirical results show that the algorithm's expected error is smaller than propositional sampling and Sequential Monte Carlo (SMC) sampling techniques. lower error rate. Recently, [8] introduced a new sampling approach which achieves higher precision than SMC techniques given a fixed number of samples. Still, it requires a large number of samples in complex domains because in their method the domains are represented using propositional logic. In this paper we present a sampling-based filtering algorithm in a first order dynamic system called Probabilistic Relational Action Model (PRAM). We show that our new algorithm takes fewer samples and yields better accuracy than previous sampling techniques. Such improvement is possible because of the underlying deterministic structure of the transition system, the compact representation of the domains using first order logic (FOL), and efficient subroutines for first order logical regression (e.g., [17]) and first order logical filtering [19; 14]. We model a PRAM (Section 2) using probabilistic situation calculus [17], extended with a first order probabilistic prior that combines FOL and probabilities in a single framework (e.g., [18; 3; 12; 15; 7; 20]). Transitions in a PRAM are modeled with probability distributions over possible deterministic executions of probabilistic actions (every transition model can be represented this way). Our algorithm (Section 3) first samples sequences of deterministic actions, called first order (FO) particles, that are possible executions of the given probabilistic action sequence and are consistent with the observations. It also updates the current state of the system given each FO particle. Next, the algorithm computes the probability of the query given the updated current state. To do so, it applies logical regression to the query for each FO particle, computes a FOL formula that represents all the possible initial states, and computes the prior probability of that formula. Finally, the algorithm computes the posterior probability as the weighted sum of these derived prior probabilities. This algorithm achieves superior precision with fewer samples than SMC sampling techniques [5]. The intuition behind this improvement is that each FO particle corresponds to exponentially many state sequences (particles) generated 1 Introduction Important AI applications like natural language processing, program verification, tracking, and robotics involve stochastic dynamic systems. The current state of such systems is changed by executing actions. Reasoning is the task of computing the posterior probability over the state of such dynamic systems given past actions and observations. Reasoning is difficult because the system's exact initial state or the effects of its actions are uncertain (e.g., there may be some noise in the system or its actions may fail). Exact reasoning (e.g., [11; 1]) is not tractable for long sequence of actions in complex systems. This is because domain features become correlated after some steps, even if the domain has much conditional-independence structure [4]. Therefore, approximate reasoning is of much interest. One of the most commonly used classes of techniques for approximate reasoning is SMC sampling [5]. These methods are efficient, but they require many samples (exponential in the dimensionality of the domain) to yield a by earlier techniques. The algorithm is computationally efficient when FOL regression and filtering with the deterministic actions are efficient. We prove our claim about precision and verify the results empirically by several experiments (Section 4). Our representation for PRAM differs from Dynamic Bayesian Networks (DBNs) (e.g., [13]). DBNs emphasize conditional independence among random variables. In contrast, our model applies a representation for the transition model as a distribution over deterministic actions. Also, PRAM uses a different model for observations. The observations are received asynchronously without prediction of what will be observed. Both frameworks are universal and can represent each other, but they are more compact and natural in different scenarios. The closest work to ours is [8] which also samples deterministic executions of the given probabilistic sequence. However, that work assumes that the deterministic sequences are always executable. In this paper we relax this assumption and avoid sampling the sequences that are not executable by keeping track of the current state of the system. Also, our representation is more compact (uses FOL), so it achieves higher precision with less samples. Recently, [10; 22] explore reasoning algorithms in relational HMMs. They use a different representation than ours for their observations. Moreover, they do not use a compact representation for their prior distribution. Earlier algorithm trades efficiency of computation for precision as it is an exact method. Latter, introduced a sampling algorithm where each sample represents a set of states and is generated from the probability distribution over disjoint sets of the states. Therefore, to update this distribution they build new disjoint sets at each time step which leads to a large number of disjoint sets (exponential in domain features) in long sequences. Our algorithm is different in a sense that we do not sample states; we sample deterministic sequences which correspond to state sequences that are derived by regressing the query through the deterministic sequence. representing the probability distributions: Definition 1. [PRAM language] The language L of a PRAM is a tuple (F, C, V , A, DA) consisting of: · F a finite set of predicate variables (called fluents) whose values change over time. · C a finite set of constants representing objects in the domain. · V a finite set of variables. · A, DA finite sets of probabilistic and deterministic action names, respectively. We define a fluent atom as a formula of the form f (x1 , . . . , xk ) (also represented by f (x)), where x1 , . . . , xk V C are either variables or constants. In PRAM grounding of a fluent f (x) is defined as replacing each variable in x with a constant c C . Accordingly, a world state s S is defined as a full assignment of {true , false } to all the groundings of all the fluents in F . Example 1 (Briefcase). Briefcase is a domain consisting of objects, locations, and a briefcase B . The variables ?o and ?l denote objects and locations, respectively. The fluents are: In(?o), At(B , ?l), At(?o, ?l). An agent interacts with the system by executing some probabilistic actions: putting objects in, PutIn(?o, B ), taking objects out of the briefcase, TakeOut(?o, B ), and moving the briefcase with objects inside, Move(B , ?l1 , ?l2 ). Some deterministic actions in the domain are: MvWithObj, PutInSucc, and PutInFail. In PRAM each deterministic action da(x) DA is specified by precondition and successor state axioms [17]. Definition 2. [Deterministic action axioms] Precondition axioms show the conditions under which the deterministic action da is executable in a given state; Precond is a special predicate denoting the executability of a deterministic action: Precondda (x) (x) where is a FOL formula. Successor state axioms enumerate all the ways that the value of a particular fluent can be changed; A successor state axiom for a fluent f is defined as: Precondt a (x) (Succt ,da (x) f t+1 (x)) d f where Succt ,da (x) is a FOL formula at time t. One can f easily derive the effects of actions by the above axioms. For example, the precondition of the deterministic action MvWithObj(B , ?l1 , ?l2 ) is At(B , ?l1 ) ¬At(B , ?l2 ), and its effect is ¬At(B , ?l1 ) At(B , ?l2 ) (?o In(?o) At(?o, ?l2 )). The grounding of a deterministic action is defined as the grounding of all the fluent atoms appearing in the precondition and effect logical formulas Precondt a (x) and d Succt ,da (x). Therefore, a grounded deterministic action is f a transition function T : S × DA S . One can see from the example that for grounding the action MvWithObj, one 2 Probabilistic Relational Action Models In this section we present our framework, called Probabilistic Relational Action Model (PRAM), for representing the dynamic system. The system is dynamic in a sense that its state changes by executing actions. Actions have probabilistic effects that are represented with a probability distribution over possible deterministic executions. A PRAM consists of two main parts: (1) A prior knowledge representing a probability distribution P0 over initial world states in a relational model. (2) A probability distribution PA over deterministic executions of probabilistic actions. In what follows we define the basic building blocks of a PRAM. We first define the language L of a PRAM for needs to permute all the possible combinations of objects inside the briefcase. This increases the dimensionality of the domain and increases the required number of samples to yield low error. Hereinafter for simplicity, we represent fluents and actions without their arguments whenever it is not necessary to mention the variables or domain objects. Definition 3. [Probability distribution for probabilistic actions] Let 1 . . . k be FOL formulas (called partitions) that divide the world states into mutually disjoint sets. Then, PA(da|a, s) is defined as a probability distribution over possible deterministic executions da of the probabilistic action a in the state s which satisfies one of the partitions i (i k ). More formally, when some state s satisfies partition i then PA(da|a, s) = PAi (da), where PAi is a probability distribution over different deterministic executions da of action a corresponding to the partition i . PA1 (da) s |= 1 PA2 (da) s |= 2 PA(da|a, s) = (1) ... We assume that replacing variables in a(x) with constants does not change PA(da(x)|a(x), s). For example, for probabilistic action TakeOut(?o), deterministic action TakeOutSucc(?o), and s |= 1 = In(?o): PA(da|a, s) = PA1 (TakeOutSucc(?o)) = 0.9. In PRAM we assume a prior distribution over world states at time 0. Models like [18] are used to represent probabilities in relational models. A knowledge engineer can define the semantics of the prior distribution by using each of these models. Then, we use the inference algorithm defined in that model's semantics to compute probability of formulas at time 0. Our filtering algorithm does not depend on the way the initial knowledge has been represented. We discuss more about this in Section 3.1.3. In conclusion, we define a PRAM formally as follows: Definition 4. A PRAM is a tuple (L, AX, PA, P0 ) as: · Language L = (F, C, V , A, DA) representing the language of PRAM (Definition 1) · A set of deterministic action axioms AX (Definition 2) · A probability distribution PA for each probabilistic action (Definition 3) · A prior distribution P0 over initial world states Based on the aforementioned dynamic system, PRAM, we define our stochastic filtering problem as computing probability of a query given a sequence of probabilistic actions and observations. The query is defined as a FOL formula T at the final time step, T . Note that throughout the paper superscripts represent time. Each at in the probabilistic action sequence a1 , . . . , aT represents the probabilistic action that has been executed at time t. The observations o0 , . . . , oT are given asynchronously in time without prediction of what we will observe (thus, da 1 2 da 2 3 Resampling 1 3 da2 2 da 1 da 1 3 o 0 1 da2 da 1 da 3 3 o 1 da3 2 o 2 o 3 a1 a2 a3 Figure 1: Sampling the FO particle da1 , da2 , da3 (straight ar2 2 2 rows) given probabilistic sequence a1 , a2 , a3 (curved arrows) and observations o0 , o1 , o2 , o3 . Each deterministic action dat is sampled from distribution P (dat |at , da1:t-1 , o0:t ). The thickness of the arrows represent the weight of the current FO particle. this is different from HMMs [16], where a sensor model is given). Each observation ot is represented with a FOL formula over fluents. When ot is observed at time t, the FOL formula ot is true about the state of the world at time t. We assume that a deterministic execution da of the probabilistic action at in the sequence does not depend on the future observations. 3 Filtering Algorithm In this section we present our filtering algorithm for computing probability of a query T given a sequence of probabilistic actions a1:T = a1 , . . . , aT and observations o0:T = o0 , . . . , oT in a PRAM. The algorithm approximates this probability by generating samples among possible deterministic executions of the given probabilistic action sequence. Then, it places those samples instead of the enumeration of deterministic executions and marginalizes over those samples. The following equation shows the exact computation. P (T |a1:T , o0:T ) = i P (T |DAi , o0:T )P (DAi |a1:T , o0:T ) (2) where DAi is a possible execution of the sequence a1:T . The first step of our approximate algorithm is generating N samples (called FO particles) from all the possible deterministic executions of the given probabilistic sequence. Two different sampling algorithms are introduced in Section 3.2. The algorithms (illustrated in Figure 1) generate a weighted FO particle DAi with weight wi given the sequence a1:T and the observations o0:T from the probability distribution P (DAi |a1:T , o0:T ). The next step of the algorithm is computing P (T |DAi , o0:T ) for each FO particle DAi . It does PROCEDURE FOFA(SampleAlg, T , a1:T , o0:T ) Input: Sampling algorithm SampleAlg, probabilistic sequence a1:T , observations o0:T , and query T ~ Output: P (T |a1:T , o0:T ) 1. (DA1:N , curF1:N ) SampleAlg(a1:T , o0:T ) 2. for each DAi compute PFOF(T , DAi , curFi ) ~ 3. return P (T |a1:T , o0:T ) using Equation (6). PROCEDURE PFOF( t , DA, curF) Input: FO particle DA and the current state formula curF Output: P (t |DA, o0:t ) 1. 0 RegSeq(t , DA) 2. curF0 RegSeq(curF, DA) 3. p0 Prior-FOF(0 |curF0 ) 4. return P (t |DA, o0:t ) p0 PROCEDURE Prior-FOF(0 , MMLN ) Input: Formula 0 , Markov network MMLN of the prior MLN Outputi P (0 ) : 1. Ci ConvertToCNF(0 ) 2. Define indicator functions as Equation (4) 3. M ConstructNetwork(MMLN , fluents of 0 ) 4. return P (0 ) by performing inference on M Figure 2: FOFA: First Order Filtering Algorithm for approximating P (t |a1:t , o0:t ) given a sampling algorithm SampleAlg (either S/R-Actions (Figure 5) or S-Actions (Figure 6)). Italic fonts is used to denote subroutines. so (Section 3.1) by updating the current state of the system and then computing the posterior probability conditioning on the updated current state. Finally, the algorithm uses generated samples in place of DAi in Equation (2) and computes PN (T |a1:T , o0:T ) as ~ an approximation for the posterior probability of the query T given the sequence a1:T and the observations o0:T by using the Monte Carlo integration [5]: i ~ PN (T |a1:T , o0:T ) = wi P (T |DAi , o0:T ) (3) Details of each step of our first order filtering algorithm (FOFA, Figure 2) are explained next. We first present the step of computing P (T |DAi , o0:T ) because it is used as a subroutine in the sampling algorithms. 3.1 Probability of a FOL Formula at time t In this section we show how to compute the probability P (t |DA, o0:t ) of the formula t given a FO particle DA and observations o0:t . Executing the FO particle DA with observations o0:t updates the current state of the system. The current state is derived by applying a FO progression subroutine (described below) at each time step. Afterwards, procedure PFOF (Figure 3) computes the probability of the query given the current state of the system for each FO particle. Its first step applies a FO regression subroutine to the query and the current state formula and as output returns FOL formulas at time 0. This can be done since the actions are deterministic. The algorithm's second step computes the prior probability of the regression of the query conditioned on the current state formula regressed by the FO particle; Recall that a FO particle is a sampled sequence of deterministic actions. 3.1.1 Progress Current State Formula Figure 3: PFOF: Compute probability P (t |DA, o0:t ) of a FOL formula t given a FO particle DA and the current state formula. (a) curF0 In(O), At(O,L1),At(B,L1) In(?o) =>At(?o,L1) Observation In(O) Progress Regress Regress curF At(B,L2) In(O), At(O,L2) In(?o) =>At(?o,L2) 0 (In(O), At(O, L1), At(B,L1)) or (~In(O), At(O,L2)) In(?o) =>At(?o,L1) (b) MvWithObj(L1,L2) Regress Query: 1 At(O,L2) P(1|da1, o0:1) = P0 (0|curF0) = 1 Figure 4: (a) Regressing query 1 and curF by MvWithObj. Note, L1, L2, and O are constants, and there are only two possible locations L1 and L2, i.e. At(O, L2) ¬At(O, L1). (b) Computing P (2 |DA, o0:1 ) using Lemma 1. mula with a deterministic action da, Progress(, da), results in a set of FOL formulas (p1:n ) where p1 , . . . , pn are atomic subformulas of and Precondda |= (Succp1 ,da , . . . , Succpn ,da ) (see [19] for more details). Also, filtering a FOL formula with an observation o is the FOL formula o. Overall, generating all s is impossible because there are infinitely many such s. In this paper we assume that progression of the current state formula curF with a deterministic action da, Progress(curF, da, o), is representable with a FOL formula. Hence, it is equal to i i (p1:n ) o. Furthermore, [19; 14] provide some special conditions that the progression algorithm is polynomial and the representation of progressing a formula is compact. For example, progressing In(O) through deterministic action MvWithObj(L1, L2) results in the formula At(B , L2) In(O) At(O, L2) (?o In(?o) At(?o, L2)). 3.1.2 Regressing a FOL Formula We define the current state formula as a FOL formula representing the set of states that are true after executing a sequence of deterministic actions and receiving observations. In this section we present algorithm Progress that updates the current state formula given a deterministic action and an observation. In general, progressing a FOL for- Procedure RegSeq takes a FOL formula t and a FO particle DA and returns as output another FOL formula 0 . 0 represents the set of possible initial states, given that the fi- nal state satisfies t , and the FO particle DA occurs. Thus, every state that satisfies 0 leads to a state satisfying t after DA occurs. For a deterministic action dat and a FOL formula t , the regression Regress(t , dat ) of t through dat is a FO formula t-1 such that state st-1 satisfies t-1 iff the result of the transition function T (st-1 , dat ) satisfies t . The computation of the regression RegSeq(t , DA) of t through the FO particle DA is done recursively. RegSeq(t , da1 , ..., dat ) = RegSeq(Regress(t , dat ), da1 , ..., dat-1 ). Regression of the current state formula curF is defined similar to the regression of formula t since the current state is also represented with a fist order formula. The algorithm for regression Regress(t , da) of formula t with deterministic action da works as follows (see [17]). Suppose that t is an atomic fluent f t (x) with the successor state axiom Succt-1 (x). Then, we derive the regression f ,da of t by replacing f t (x) with Succt-1 (x). The regression f ,da of non-atomic formulas are derived inductively as follows: · Regress(¬t ) = ¬Regress(t ) · Regress(t t ) = Regress(t ) Regress(t ) 1 2 2 · Regress(( )t ) = Regress(t ) The efficiency of the regression yields from the fact that we regress non-grounded fluents f (x). Besides, one can employ some approximations at each step (like removing unnecessary clauses) to maintain the compactness of the regressed formulas. We now summarize and show how to compute the probability distribution P (t |DA, o0:t ) by applying progression and regression. The algorithm first computes curF by progressing through the FO particle and observations. Then, it computes 0 and curF0 by regressing t and curF. Finally, it computes P 0 (0 |curF0 ) as the evaluation of P (t |DA, curF) as shown by the following lemma. Lemma 1. Let t be the query and o0:t the observations. If curF is the current state formula, 0 = RegSeq(t , DA), and curF0 = RegSeq(curF, DA), then P (t |DA, o0:t ) = P 0 (0 |curF0 ). Proof. Probability of a formsula is a marginalization over states: P (t , o0:t |DA) = P (t , curF|s, DA)P (s|DA). For s, P (t , curF|s, DA) = 1 if s |= 0 curF 0 , o.w. it is 0. The reason is that executing the deterministic sequence DA in state s results at time t that models t and is consistent with observations o0:t iff s |= 0 curF 0 . Therefore, s 0 0 0 0 P (s) = P ( , curF ). P (t , o0:t |DA) = |=0 ,curF The same computations exist for P (o0:t |DA). Therefore, P 0 (0 , curF0 ) P (t , o0:t |DA) = . P (t |DA, o0:t ) = P 0 (curF0 ) P (o0:t |DA) Figure 4 shows an example of computing P (t |DA, o0:t ) using procedure PFOF. The next section shows how to compute P 0 (0 |curF0 ) using the prior P 0 . Note that 0 and curF0 are FOL formulas over fluents at time 0. 3.1.3 Prior Probability of a FOL Formula In this section we describe procedure Prior-FOF to compute the probability of a FOL formula at time 0. We assume that the prior distribution P 0 over world states of a PRAM with language L = (F, C, V , A, DA) is represented by a Markov Logic Network (MLN) [18]. We choose to represent our prior probabilistic logic with an MLN (called prior MLN) because it keeps the expressive power of FOL for a fixed domain. Our prior MLN consists of a set of weighted FOL formulas over the fluents in language L of the PRAM. The semantics of the MLN is that of a Markov network MMLN [9] whose cliques correspond to groundings of the formulas given the universe of objects C L. The potential of a clique C l is defined as the exponential of the weight of the corresponding formula in case the grounding is true. We introduce Prior-FOF (Figure 3) to compute the probability of a FOL formula 0 by slightly changing the inference algorithm expressed for MLNs. Procedure Prior-FOF first converts 0 to a clausal form; Note that like MLNs existentially quantified formulas are replaced by disjunction of their groundings. Then, Prior-FOF assigns an indicator function, Ig(Ci ) , to every grounding g of a clause Ci . Ig(Ci ) (fg(Ci ) ) = 1 0 fg(C ) |= g (Ci ) i otherwise (4) The next step is to construct a Markov network M as the minimal subset of the original network MMLN required to compute P 0 (0 ). The nodes in M consist of all the ground fluents fg(Ci ) in clauses Ci of 0 and all the nodes in the original Markov network MMLN with a path to the fluent fg(Ci ) . The final step is to perform inference on this network. It can be computed exactly by performing variable elimination (e.g., [9]) in P 0 (0 ) f f j (fj )I (fi ). Also, one can employ f M j C lj , i Ci any approximate inference algorithm like Gibbs sampling. We are not restricted to use MLNs as a framework for representing prior probability distribution. We can use any other probabilistic logical frameworks ([3; 12; 15; 7; 20]) provided that the expressed inference algorithm in that framework can compute probability of the query. Note that the query 0 for that framework is derived from the regression of the original query t given an FO particle. Also, our algorithms would work for unbounded domains just by using a framework (e.g., [21]) for representing prior distribution over infinite states. PROCEDURE S/R-Actions( a1:T , o0:T ) Input: Probabilistic sequence a1:T and observations o0:T Output: N FO particles DA1:N 1. for n 1 . . . N : curFn o0 2. for t 1 . . . T 3. for n 1 . . . N t 4. PAi · PFOF(i -1 , da1:t-1 ) i t 5. P PAi · PFOF(i -1 , da1:t-1 , curFn ) n i 6. d^ t a sample from probability distribution a 7. curFn Progress(curFn , d^ t n , ot ) a 8. wn 9. w1 , . . . , wN Normalize(w1 , . . . , wN ) ^ ^ t 10. (da1:N , w1:N ) Resample(d^ t 1:N , w1:N ) a ^ 11. return DA1:N da1 . . . daT 1:N ^ (dat ) n ^ P (dat ) n vations o0:t ; Note that current deterministic action is independent of the future observations. Then, the algorithms update the current state formula curF (Section 3.1.1) given the current deterministic action and the observation. Algorithms S/R-Actions, S-Actions use different approaches for incremental sampling of each deterministic action. In S/R-Actions, we adopt a sequential importance sampling approach. At each time step, S/R-Actions samples deterministic actions based on a normalized importance function, assigns weights to the particles, and resamples if the weights have high variance. The importance function (dat |at , da1:t-1 ) approximates P (dat |at , da1:t-1 , o0:t ) by ignoring the effect of the current state curF in the PRAM = (L, AX, PA, P 0 ). i t1 (dat |at , da1:t-1 ) = PAi (dat )PFOF(i,-t , da1:t-1 ) a where i,at is the ith partition of action at (Definition 3). At time step t, S/R-Actions samples N deterministic actions dat :N from the importance function: (dat |at , da1:t-1 ). 1 t-1 It then restructures the nth particle DAn by attaching t the deterministic action dat to that particle, i.e. DAn = n da1 , . . . , dat-1 , dat . The importance weight wn of the n n n t t 1:t-1 0:t P (dan |a ,dan ,o ) nth particle is derived as: wn = (dat |at ,da1:t-1 ) . n n Figure 5: S/R-Actions: Sampling/Resampling algorithm for generating N FO particles given a sequence a1:T and observations o0:T . PROCEDURE S-Actions( a1:T , o0:T ) Input: Probabilistic sequence a1:T and observations o0:T Output: N FO particles DA1:N 1. for n 1 . . . N : curFn o0 2. for t 1 . . . T 3. for n 1 . . N . t 4. P PAi · PFOF(i -1 , da1:t-1 , curFn ) i t 5. da a sample from probability distribution P 6. curFn Progress(curFn , dat , ot ) 7. return DA1:N da1 . . . daT 1:N Figure 6: S-Actions: Direct sampling algorithm for generating N FO particles given a sequence a1:T and observations o0:T . Accordingly, the normalized importance weight wn of the w nth particle is: wn = N n w . i=1 i 3.2 Sampling Algorithms In this section we describe procedures S/R-Actions (Figure 5) and S-Actions (Figure 6) which generate N samples (called FO particles) given a sequence of probabilistic actions and observations. Each FO particle is a possible deterministic execution of the given probabilistic sequence. Both algorithms incrementally build every FO particle by sampling a deterministic action at a time. Later, we will discuss about the deficiencies of S/R-Actions algorithm. Both S/R-Actions and S/Actions generate a FO particle DA = da1 , . . . , daT given a sequence a1:T and observations o0:T from the distribution P (DA|a1:T , o0:T ). We compute this probability distribution iteratively: P (DA|a1:T , o0:T ) = P (da1 |a1 , o0:1 ) t P (dat |at , da1:t-1 , o0:t ) The exact value for P (da |at , da1:t-1 , o0:t ) is computed using PFOF subroutine (Figure 3). t P (dat |at , da1:t-1 , o0:t ) (5) i t-1 t 1:t-1 = PAi (da ) · PFOF(i,at , da , curF) S/R-Actions resamples the particles if the variance of the normalized importance weights becomes too high. The basic idea of resampling is to avoid those particles with very low weight and concentrate on particles that have higher normalized weight. An estimation (see [5]) for measuring high variance among the weights is estimating the effective number of particles as Neff = N1 w . If Neff is smaller i i=1 than a threshold then procedure Resample generates a new deterministic action dat from the probability distribution over the normalized weights w1:N of the particles. It then 1 assigns equal weights, N , to all the particles. The second sampling algorithm, S-Actions (Figure 6), is derived by simplifying the above sampling algorithm. This algorithm at each time step generates samples among executable deterministic actions. It samples every deterministic action in a FO particle from the exact computation for P (dat |at , da1:t-1 , o0:t-1 ) (Equation 5) instead of sampling from the importance function and assigning weights. 1 Therefore, all the weights are equal to N . The above derivation allows iterative sampling of deterministic actions from the distribution P (dat |at , da1:t-1 , o0:t ). Recall that a FO particle is a sequence of deterministic actions. Thus, at each time the algorithms sample a deterministic action dat given the probabilistic action at , the previous deterministic actions da1:t-1 , and the previous obser- S/R-Actions, samples deterministic actions even when they are not executable and assigns weight zero to those particles. Therefore, it decreases the effective number of samples. Resampling step does not help that much because it resamples the latest deterministic action in the FO particle. However, resampling earlier deterministic actions may result in a more effective set of FO particles. Note that this new type of resampling is not tractable. Therefore, it makes more sense to maintain the current state formula curF as in S-Actions and just sample executable deterministic actions (as in S-Actions) unless there exists a better resampling procedure. In the empirical results we examine the effects of using S/R-Actions and S-Actions sampling algorithms in the FOFA filtering algorithm (Figure 2). 3.3 Correctness, Complexity, and Accuracy The following theorem shows how FOFA with S-Actions and S/R-Actions compute the approximate posterior distri~ bution PN (T |a1:T , o0:T ). Theorem 1. Let T be the query, a1:T be the given probabilistic sequence, DAi be the FO particles, and o0:T be the observations. If curFi is the current formula given the ith FO particle, 0 = RegSeq(T , DAi ), and curF0 = i i RegSeq(curFi , DAi ), then i ~ PN (T |a1:T , o0:T ) = wi P 0 (0 |curF0 ) (6) i i where wi = 1 N The proof is similar to the proof of Theorem 3.3 in [8]. Intuitively, we define a mapping f to map each set of FO particles of S-Actions to sets of particles of SMC. The mapping f is defined such that it covers all the possible sets of particles of SMC, and for two separate sets of particles zi = zj , f (zi ) f (zj ) = , and P rFOFA (z ) = P rS M C (f (z )). ~z Furthermore, we prove that y f (z ), KL(P, P1 ) y ~ ~ ~ KL(P, P2 ) where P1 and P2 are approximations returned by FOFA and SMC, respectively. 4 Empirical Results We implemented our algorithm FOFA (Figure 2) with both S/R-Actions (Figure 5) and S-Actions (Figure 6). Our algorithms take advantage of a different structure than that available in DBNs. Hence, we focused on planning-type domains: briefcase and depots taken from International Planning Competition at AIPS-98 and AIPS-02. 2 We randomly assigned deterministic executions and a probability distribution over them for each action. For example, for action PutIn we considered two executions PutInSucc and PutInFail with probabilities 0.9 and 0.1. Note that DBN representations (transitions between states) for the above frameworks are not compact because the independence assumptions among the state variables are not known. We compared the accuracy of our FOFA(S/R-Actions) and FOFA(S-Actions) with SCAI [8] and SMC algorithms. Note that we grounded the domains for running SCAI and SMC. We ran sampling algorithms (50 times) for a fixed number of samples and computed the KL-distance between their approximation and the exact posterior. We calculated the average over these derived KL-distances to approximate the expected KL-distance. SCAI assumes that deterministic actions are always executable. We include this assumption in the depots domain in which we compared the results with SCAI. In the briefcase domain, we just compared our algorithms with the SMC techniques. Figures 7 and 8 show the expected KL-distances (in logarithmic scale) vs. number of samples in the depots and briefcase, respectively. As we expected, the average KLdistance for FOFA(S-Actions) is always the lowest. We expect to get more improvement when there are more objects in the domain (here, we just use 5 constants to be able to compute the exact distribution). For the briefcase domain (Figure 8) SMC has higher accuracy than FOFA(S/R-Actions) for 1000 samples. The reason is that the posterior distribution converges to the stationary distribution in this domain (4 constants, 256 states). Even grounded domain is not too big, but for cases involving longer sequences and more states we did not have the exact posterior to compare with since the implementation for the exact algorithm crashes. 2 Also available from: ftp://ftp.cs.yale.edu/pub/ mcdermott/domains/. for S-Actions. Besides, for S-Actions: (7) ~ PN (T |a1:T , o0:T ) N P (T |a1:T , o0:T ) The proof follows from using MC integration in Equation 2 and using Lemma 1. The running time RFOFA of our filtering algorithm (Figure 2) is O(N · T · (RRegSeq + RProgress + RPrior-FOF )), where N is the number of samples and T is the length of the given sequence of probabilistic actions. Efficiency of FOFA results from efficiency of the underlying algorithms for RegSeq, Progress, and Prior-FOF. We evaluate the accuracy of our sampling algorithm FOFA by computing expected KL-distance 1 as the expected value of all the KL-distances between the exact distribution P ~ and the approximation P derived by FOFA. Our algorithm, FOFA, has higher accuracy than SMC for a fixed number of samples. The intuition is that each FO particle generated by FOFA covers many particles generated by SMC. Theorem 2. If FOFA(S-Actions) and SMC approximate posterior distribution P (T |a1:T , o0:T ) with N samples. Then, Expected-KLFOFA(S-Actions) Expected-KLSMC . 1 ~ KL(P, P ) = x ~ ~ ~ Px log (Px /Px ), KL(P, P ) = 0 if P = P 0. 1 FOFA(S-Acti ons) FOFA(S/ R-Acti ons) SCAI S MC References [1] F. Bacchus, J. Y. Halpern, and H. J. Levesque. Reasoning about noisy sensors and effectors in the situation calculus. AI, 1999. C. Boutilier, R. Reiter, and B. Price. Symbolic dynamic programming for first-order MDPs. In IJCAI, 2001. V. S. Costa, D. Page, M. Qazi, and J. Cussens. Clp(bn): Constraint logic programming for probabilistic knowledge. In UAI, 2003. T. Dean and K. Kanazawa. Probabilistic temporal reasoning. In AAAI, 1988. A. Doucet, N. de Freitas, and N. Gordon. Sequential Monte Carlo Methods in Practice. Springer, 1st edition, 2001. A. Doucet, N. de Freitas, K. Murphy, and S. Russell. Raoblackwellised particle filtering for dynamic bayesian networks. In UAI, 2000. N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In IJCAI, 1999. H. Hajishirzi and E. Amir. Stochastic filtering in a probabilistic action model. In AAAI'07, 2007. Michael Jordan. Introduction to probabilistic graphical models. 2007. Forthcoming. 0. 01 [2] [3] 0. 001 0.0001 50 100 500 1000 [4] [5] [6] Figure 7: Expected KL-distance (in logarithmic scale) of our algorithms, FOFA(S-Actions) and FOFA(S/R-Actions), SCAI, and SMC with the exact distribution vs. number of samples for depots. FOFA(S-Acti ons) FOFA(S/ R-Acti ons) 1 S MC [7] [8] [9] 0. 1 0. 01 0. 001 0. 0001 50 100 500 1000 [10] K. Kersting, L. D. Raedt, and T. Raiko. Logical hidden markov models. Artificial Intelligence Research, 2006. [11] U. Kjaerulff. A computational scheme for reasoning in dynamic probabilistic networks. In UAI, 1992. [12] S. Muggleton. Stochastic logic programs. In Workshop on ILP, 1995. [13] Kevin Murphy. Dynamic Bayesian Networks: Representation, Inference and Learning. PhD thesis, 2002. [14] M. Nance, A. Vogel, and E. Amir. Reasoning about partially observed actions. In AAAI, 2006. [15] D. Pool. Probabilistic horn abduction and bayesian networks. Artificial Intelligence, 64:81­129, 1993. [16] L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. IEEE, 1989. [17] R. Reiter. Knowledge In Action. MIT Press, 2001. [18] Matthew Richardson and Pedro Domingos. Markov logic networks. Machine Learning, 2006. [19] A. Shirazi and E. Amir. First order logical filtering. In IJCAI, 2005. [20] B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In UAI'02, 2002. [21] M. Welling, I. Porteous, and E. Bart. Infinite state bayesian networks. In NIPS, 2007. [22] L. S. Zettlemoyer, H. M. Pasula, and L. P. Kaelbling. Logical particle filtering. In Dagstuhl Seminar on Probabilistic, Logical, and Relational Learning, 2007. Figure 8: Expected KL-distance (in logarithmic scale) of our algorithms, FOFA(S-Actions) and FOFA(S/R-Actions), and SMC with the exact distribution vs. number of samples for briefcase. 5 Conclusions and Future Work In this paper we presented a sampling algorithm to compute the posterior probability of a query given a sequence of actions and observations in a FO dynamic system. Our algorithm takes advantage of a compact representation and achieves higher accuracy than SMC sampling and earlier propositional sampling techniques. There are several directions that we can continue this work: (1) Apply sampling in FO Markov Decision Processes(MDP)s [2](2) Learn the transition model in PRAM. (3) Apply the algorithm in text understanding. (4) Generalize the representation to continuous domains (e.g., by discretizing the real value variables or by combining with Rao-Blackwellised Particle Filtering [6]). Acknowledgements We would like to thank the anonymous reviewers for their helpful comments. This work was supported by DARPA SRI 27-001253 (PLATO project), NSF CAREER 05-46663, and UIUC/NCSA AESIS 251024 grants.