WWW 2007 / Track: Semantic Web

Session: Query Languages and DBs

From SPARQL to Rules (and back) 
Axel Polleres
Universidad Rey Juan Carlos Tulipan s/n, 28933 Mostoles, Madrid, Spain ´ ´

axel@polleres.net

ABSTRACT
As the data and ontology layers of the Semantic Web stack have achieved a certain level of maturity in standard recommendations such as RDF and OWL, the current focus lies on two related asp ects. On the one hand, the definition of a suitable query language for RDF, SPARQL, is close to recommendation status within the W3C. The establishment of the rules layer on top of the existing stack on the other hand marks the next step to b e taken, where languages with their roots in Logic Programming and Deductive Databases are receiving considerable attention. The purp ose of this pap er is threefold. First, we discuss the formal semantics of SPARQL extending recent results in several ways. Second, we provide translations from SPARQL to Datalog with negation as failure. Third, we prop ose some useful and easy to implement extensions of SPARQL, based on this translation. As it turns out, the combination serves for direct implementations of SPARQL on top of existing rules engines as well as a basis for more general rules and query languages on top of RDF.

with defining asp ects such as a formal semantics or layering on top of OWL and RDFS. As for the second part, the RIF working group 1 , who is resp onsible for the rules layer, is just producing first concrete results. Besides asp ects like business rules exchange or reactive rules, deductive rules languages on top of RDF and OWL are of sp ecial interest to the RIF group. One such deductive rules language is Datalog, which has b een successfully applied in areas such as deductive databases and thus might b e viewed as a query language itself. Let us briefly recap our starting p oints: Datalog and SQL. Analogies b etween Datalog and relational query languages such as SQL are well-known and studied. Both formalisms cover UCQ (unions of conjunctive queries), where Datalog adds recursion, particularly unrestricted recursion involving nonmonotonic negation (aka unstratified negation as failure). Still, SQL is often viewed to b e more p owerful in several resp ects. On the one hand, the lack of recursion has b een partly solved in the standard's 1999 version [20]. On the other hand, aggregates or external function calls are missing in pure Datalog. However, also developments on the Datalog side are evolving and with recent extensions of Datalog towards Answer Set Programming (ASP) ­ a logic programming paradigm extending and building on top of Datalog ­ lots of these issues have b een solved, for instance by defining a declarative semantics for aggregates [9], external predicates [8]. The Semantic Web rules layer. Remarkably, logic programming dialects such as Datalog with nonmonotonic negation which are covered by Answer Set Programming are often viewed as a natural basis for the Semantic Web rules layer [7]. Current ASP systems offer extensions for retrieving RDF data and querying OWL knowledge bases from the Web [8]. Particular concerns in the Semantic Web community exist with resp ect to adding rules including nonmonotonic negation [3] which involve a form of closed world reasoning on top of RDF and OWL which b oth adopt an op en world assumption. Recent prop osals for solving this issue suggest a "safe" use of negation as failure over finite contexts only for the Web, also called scoped negation [17]. The Semantic Web query layer ­ SPARQL. Since we base our considerations in this pap er on the assumption that similar corresp ondences as b etween SQL and Datalog can b e established for SPARQL, we have to observe that SPARQL inherits a lot from SQL, but there also remain substantial differences: On the one hand, SPARQL does not deal with nested queries or recursion, a detail which is indeed surpris1

Categories and Subject Descriptors
H.2.3 [Languages]: Query Languages; H.3.5 [Online Information Services]: Web-based services

General Terms
Languages, Standardization

Keywords
SPARQL, Datalog, Rules

1.

INTRODUCTION

After the data and ontology layers of the Semantic Web stack have achieved a certain level of maturity in standard recommendations such as RDF and OWL, the query and the rules layers seem to b e the next building-blocks to b e finalized. For the first part, SPARQL [18], W3C's prop osed query language, seems to b e close to recommendation, though the Data Access working group is still struggling An extended technical rep ort of this article is available at http://www.polleres.net/publications/.
Copyright is held by the International World Wide Web Conference Committee (IW3C2). Distribution of these papers is limited to classroom use, and personal use by others. WWW 2007, May 8­12, 2007, Banff, Alberta, Canada. ACM 978-1-59593-654-7/07/0005.

http://www.w3.org/2005/rules/wg

787


WWW 2007 / Track: Semantic Web
ing by the fact that SPARQL is a graph query language on RDF where, typical recursive queries such as transitive closure of a prop erty might seem very useful. Likewise, aggregation (such as count, average, etc.) of ob ject values in RDF triples which might app ear useful have not yet b een included in the current standard. On the other hand, subtleties like blank nodes (aka bNodes), or optional graph patterns, which are similar but (as we will see) different to outer joins in SQL or relational algebra, are not straightforwardly translatable to Datalog. The goal of this pap er is to shed light on the actual relation b etween declarative rules languages such as Datalog and SPARQL, and by this also provide valuable input for the currently ongoing discussions on the Semantic Web rules layer, in particular its integration with SPARQL, taking the likely direction into account that LP style rules languages will play a significant role in this context. Although the SPARQL sp ecification does not seem 100% stable at the current p oint, just having taken a step back from candidate recommendation to working draft, we think that it is not too early for this exercise, as we will gain valuable insights and p ositive side effects by our investigation. More precisely, the contributions of the present work are: · We refine and extend a recent prop osal to formalize the semantics of SPARQL from P´rez et al. [16], presente ing three variants, namely c-joining, s-joining and bjoining semantics where the latter coincides with [16], and can thus b e considered normative. We further discuss how asp ects such comp ositionality, or idemp otency of joins are treated in these semantics. · Based on the three semantic variants, we provide translations from a large fragment of SPARQL queries to Datalog, which give rise to implementations of SPARQL on top of existing engines. · We provide some straightforward extensions of SPARQL such as a set difference op erator MINUS, and nesting of ASK queries in FILTER expressions. · Finally, we discuss an extension towards recursion by allowing bNode-free-CONSTRUCT queries as part of the query dataset, which may b e viewed as a lightweight, recursive rule language on top of of RDF. The remainder of this pap er is structured as follows: In Sec. 2 we first overview SPARQL, discuss some issues in the language (Sec. 2.1) and then define its formal semantics (Sec. 2.2). After introducing a general form of Datalog with negation as failure under the answer set semantics in Sec. 3, we proceed with the translations of SPARQL to Datalog in Sec. 4. We finally discuss the ab ove-mentioned language extensions in Sec. 5, b efore we conclude in Sec. 6.

Session: Query Languages and DBs
We assume the pairwise disjoint, infinite sets I , B , L and V ar , which denote IRIs, Blank nodes, RDF literals, and variables resp ectively. In this pap er, an RDF Graph is then a finite set, of triples from I  B  L × I × I  B  L,3 dereferenceable by an IRI. A SPARQL query is a quadruple Q = (V , P, DS, S M ), where V is a result form, P is a graph pattern, DS is a dataset, and S M is a set of solution modifiers. We refer to [18] for syntactical details and will explain these in the following as far as necessary. In this pap er, we will ignore solution modifiers mostly, thus we will usually write queries as triples Q = (V , P, DS ), and will use the syntax for graph patterns introduced b elow.

Result Forms. Since we will, to a large extent, restrict ourselves to SELECT queries, it is sufficient for our purp oses to describ e result forms by sets variables. Other result forms will b e discussed in Sec. 5. For instance, let Q = (V , P, DS ) denote the query from Fig. 1, then V = {?X, ?Y }. Query results in SPARQL are given by partial, i.e. p ossibly incomplete, substitutions of variables in V by RDF terms. In traditional relational query languages, such incompleteness is usually expressed using null values. Using such null values we will write solutions as tuples where the order of columns is determined by lexicographical ly ordering the variables in V . Given a set of variables V , let V denote the tuple obtained from lexicographically ordering V . The query from Fig. 1 with result form V = (?X, ?Y ) then has solution tuples ("Bob", : a), ("Alice", alice.org#me), ("Bob", : c). We write substitutions in sqare brackets, so these tuples corresp ond to the substitutions [?X  "Bob", ?Y  : a], [?X  "Alice", ?Y  alice.org#me ], and [?X  "Bob", ?Y  : c], resp ectively.

Graph Patterns. We follow the recursive definition of graph
patterns P from [16]: · a tuple (s, p, o) is a graph pattern where s, o  I  L  V ar and p  I  V ar .4 · if P and P  are graph patterns then (P AND P  ), (P OPT P  ), (P UNION P  ), (P MINUS P  ) are graph patterns.5 · if P is a graph pattern and i  I V ar , then (GRAPH i P ) is a graph pattern. · if P is a graph pattern and R is a filter expression then (P FILTER R) is a graph pattern. For any pattern P , we denote by v ar s(P ) the set of all variables occurring in P . As atomic filter expression, SPARQL allows the unary predicates BOUND, isBLANK, isIRI, isLITERAL, binary equality predicates '=' for literals, and other features such as comparison op erators, data typ e conversion
3 Following SPARQL, we are slightly more general than the original RDF sp ecification in that we allow literals in sub ject p ositions. 4 We do not consider bNodes in patterns as these can b e semantically equivalently replaced by variables in graph patterns [6]. 5 Note that AND and MINUS are not designated keywords in SPARQL, but we use them here for reasons of readability and in order to keep with the op erator style definition of [16]. MINUS is syntactically not present at all, but we will suggest a syntax extension for this particular keyword in Sec. 5.

2.

RDF AND SPARQL

In examples, we will subsequently refer to the two RDF graphs in Fig. 1 which give some information ab out B ob and Alice. Such information is common in FOAF files which are gaining p opularity to describ e p ersonal data. Similarities with existing examples in [18] are on purp ose. We assume the two RDF graphs given in TURTLE [2] notation and accessible via the IRIs ex.org/bob and alice.org2
2 For reasons of legibility and conciseness, we omit the leading 'http://' or other schema identifiers in IRIs.

788


WWW 2007 / Track: Semantic Web
# Graph: ex.org/bob @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix bob: <ex.org/bob#> . <ex.org/bob> foaf:maker : a. : a a foaf:Person ; foaf:name "Bob"; foaf:knows : b. : b a foaf:Person ; foaf:nick "Alice". <alice.org/> foaf:maker : b PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT?Y ?X FROM <alice.org> FROM <ex.org/bob> WHERE { ?Y foaf:name ?X .} ?X "Bob" "Bob" "Alice" # Graph: alice.org

Session: Query Languages and DBs

@prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix alice: <alice.org#> . alice:me a foaf:Person ; foaf:name "Alice" ; foaf:knows : c. :c a foaf:Person ; foaf:name "Bob" ; foaf:nick "Bobby".

?Y :a :c alice.org#me

Figure 1: Two RDF graphs in TURTLE notation and a simple SPARQL query. and string functions which we omit here, see [18, Sec. 11.3] for details. Complex filter expressions can b e built using the connectives '¬','',''. query with dataset DS = ({ex.org/bob }, ) which has an empty solution set. SELECT ?N WHERE {?G foaf:maker ?M . GRAPH ?G { ?X foaf:name ?N } } We will sometimes find the following assumption convenient to avoid such arguably unintuitive effects: Definition 1. (Dataset closedness assumption ) Given a dataset DS = (G, Gn ), Gn implicitly contains (i) all graphs mentioned in G and (ii) all IRIs mentioned explicitly in the graphs corresp onding to G. Under this assumption, the previous query has b oth ("Alice") and ("B ob") in its solution set. Some more remarks are in place concerning FILTER expressions. According to the SPARQL sp ecification "Graph pattern matching creates bindings of variables [where] it is possible to further restrict solutions by constraining the allowable bindings of variables to RDF Terms [with FILTER expressions]." However, it is not clearly sp ecified how to deal with filter constraints referring to variables which do not app ear in simple graph patterns. In this pap er, for graph patterns of the form (P FILTER R) we tacitly assume safe filter expressions, i.e. that all variables used in a filter expression R also app ear in the corresp onding pattern P . This corresp onds with the notion of safety in Datalog (see Sec.3), where the built-in predicates (which obviously corresp ond to filter predicates) do not suffice to safe unb ound variables. Moreover, the sp ecification defines errors to avoid mistyp ed comparisons, or evaluation of built-in functions over unb ound values, i.e. "any potential solution that causes an error condition in a constraint wil l not form part of the final results, but does not cause the query to fail." These errors propagate over the whole FILTER expression, also over negation, as shown by the following example. Example 2. Assuming the dataset does not contain triples for the foaf : dummy property, the example query SELECT ?X WHERE { {?X a foaf:Person . OPTIONAL { ?X foaf:dummy ?Y . } } FILTER ( ¬(isLITERAL (?Y)) ) } would discard any solution for ?X, since the unbound value for ?Y causes an error in the isLITERAL expression and thus the whole FILTER expression returns an error.

Datasets. The dataset DS = (G, {(g1 , G1 ), . . . (gk , Gk )}) of a SPARQL query is defined by a default graph G plus a set of named graphs, i.e. pairs of IRIs and corresp onding graphs. Without loss of generality (there are other ways to define the dataset such as in a SPARQL protocol query), we assume G given as the merge of the graphs denoted by the IRIs given in a set of FROM and FROM NAMED clauses. For instance, the query from Fig. 1 refers to the dataset which consists of the default graph obtained from merging alice.org  ex.org/bob plus an empty set of named graphs. The relation b etween names and graphs in SPARQL is defined solely in terms of that the IRI defines a resource which is represented by the resp ective graph. In this pap er, we assume that the IRIs represent indeed network-accessible resources where the resp ective RDF-graphs can b e retrieved from. This view has also b e taken e.g. in [17]. Particularly, this treatment is not to b e confused with so-called named graphs in the sense of [4]. We thus identify each IRI with the RDF graph available at this IRI and each set of IRIs with the graph merge [13] of the resp ective IRIs. This allows us to identify the dataset by a pair of sets of IRIs DS = (G, Gn ) with G = {d1 , . . . , dn } and Gn = {g1 , . . . , gk } denoting the (merged) default graph and the set of named graphs, resp ectively. Hence, the following set of clauses
FROM <ex.org/bob> FROM NAMED <alice.org> defines the dataset DS = ({ex.org/bob }, {alice.org }).

2.1 Assumptions and Issues
In this section we will discuss some imp ortant issues ab out the current sp ecification, and how we will deal with them here. First, note that the default graph if sp ecified by name in a FROM clause is not counted among the named graphs automatically [18, section 8, definition 1]. An unb ound variable in the GRAPH directive, means any of the named graphs in DS , but does NOT necessarily include the default graph. Example 1. This issue becomes obvious in the fol lowing

789


WWW 2007 / Track: Semantic Web
We will take sp ecial care for these errors, when defining the semantics of FILTER expressions later on.

Session: Query Languages and DBs
The semantics of a graph pattern P over dataset DS = (G, Gn ), can now b e defined recursively by the evaluation function returning sets of substitutions. Definition 2. (Evaluation, extends [16, Def. 2] ) Let t = (s, p, o) b e a triple pattern, P, P1 , P2 graph patterns, DS = (G, Gn ) a dataset, and i  Gn , and v  V ar , then the xjoining evaluation [[·]]x S is defined as follows: D [[t]]x S = { | dom() = v ar s(P ) and t  G} D [[P1 AND P2 ]]x S = [[P1 ]]x S x [[P2 ]]x S D D D [[P1 UNION P2 ]]x S = [[P1 ]]x S  [[P2 ]]x S D D D [[P1 MINUS P2 ]]x S = [[P1 ]]x S -x [[P2 ]]x S D D D [[P1 OPT P2 ]]x S = [[P1 ]]x S = x [[P2 ]]x S  D D D x x [[GRAPH i P ]]D S = [[P ]](i,) [[GRAPH v P ]]x S = {  [v  g ] | g  Gn ,   [[P [v  g ] ]]xg,) } D ( [[P FILTER R]]x S = {  [[P ]]x S | R = } D D Let R b e a FILTER expression, u, v  V ar , c  I  B  L. The valuation of R on substitution , written R takes one of the three values {, , }6 and is defined as follows. R = , if: (1) (2) (3) (4) (5) (6) (7) (8) (9) R = BOUND(v ) with v  dom()  v  = null; R = isBLANK(v ) with v  dom()  v   B ; R = isIRI(v ) with v  dom()  v   I ; R = isLITERAL(v ) with v  dom()  v   L; R = (v = c) with v  dom()  v  = c; R = (u = v ) with u, v  dom()  u = v   u = null; R = (¬R1 ) with R1  = ; R = (R1  R2) with R1  =   R2  = ; R = (R1  R2) with R1  =   R2  = .

2.2 Formal Semantics of SPARQL
The semantics of SPARQL is still not formally defined in its current version. This lack of formal semantics has b een tackled by a recent prop osal of P´rez et al. [16]. We will e base on this prop osal, but suggest three variants thereof, namely (a) bravely joining, (b) cautiously-joining, and (c) strictly-joining semantics. Particularly, our definitions vary from [16] in the way we define joining unb ound variables. Moreover, we will refine their notion of FILTER satisfaction in order to deal with error propagation prop erly. We denote by Tnull the union I  B  L  {null }, where null is a dedicated constant denoting the unknown value not app earing in any of I , B , or L, how it is commonly introduced when defining outer joins in relational algebra. A substitution  from V ar to Tnull is a partial function  : V ar  Tnull . We write substitutions in p ostfix notation: For a triple pattern t = (s, p, o) we denote by t the triple (s, p, o) obtained by applying the substitution to all variables in t. The domain of , dom(), is the subset of V ar where  is defined. For a substitution  and a set of variables D  V ar we define the substitution D with domain D as follows:  x if x  dom()  D x D = null if x  D \ dom() Let 1 and 2 b e substitutions, then 1  2 is the substitution obtained as follows: 8 <x1 if x1 defined and x2 undefined x(1 2 ) = else: x1 if x1 defined and x2 = null :else: x2 if x2 defined else: undefined Thus, in the union of two substitutions defined values in one take precedence over null values the other substitution. For instance, given the substitutions 1 = [?X  "Alice", ?Y  : a, ?Z  null] and 2 = [?U  "B ob", ?X  "Alice", ?Y  null] we get: 1  2 = [?U  "B ob", ?X  "Alice", ?Y  : a, ?Z  null] Now, as opp osed to [16], we define three notions of compatibility b etween substitutions: · Two substitutions 1 and 2 are bravely compatible (bcompatible ) when for all x  dom(1 )  dom(2 ) either x1 = null or x2 = null or x1 = x2 holds. i.e., when 1  2 is a substitution over dom(1 )  dom(2 ). · Two substitutions 1 and 2 are cautiously compatible (c-compatible ) when they are b-compatible and for all x  dom(1 )  dom(2 ) it holds that x1 = x2 . · Two substitutions 1 and 2 are strictly compatible (s-compatible ) when they are c-compatible and for all x  dom(1 )  dom(2 ) it holds that x(1  2 ) = null. Analogously to [16] we define join, union, difference, and outer join b etween two sets of substitutions 1 and 2 over domains D1 and D2 , resp ectively, all except union parameterized by x  {b,c,s}: 1 x 2 = {1  2 | 1  1 , 2  2 , are x-compatible} D 1  2 = { | 1  1 with  = 1 1 D2 or D 2  2 with  = 2 1 D2 } 1 -x 2 = {  1 | 2  2 ,  and 2 not x-compatible}  1 = x 2 = (1 x 2 )  (1 -x 2 )

R = , if: (1) R = isBLANK(v ),R = isIRI(v ),R = isLITERAL(v ), or R = (v = c) with v  dom()  v  = null; (2) R = (u = v ) with u  dom()  u = null  v  dom()  v  = null; (3) R = (¬R1 ) and R1  = ; (4) R = (R1  R2 ) and (R1  =   R2  = )  (R1  =   R2  = ); (5) R = (R1  R2) and R1  =   R2  = . R =  otherwise. We will now exemplify the three different semantics defined ab ove, namely bravely joining (b-joining), cautiously joining (c-joining), and strictly-joining (s-joining) semantics. When taking a closer look to the AND and MINUS op erators, one will realize that all three semantics take a slightly differing view only when joining null. Indeed, the AND op erator b ehaves as the traditional natural join op erator  in relational algebra, when no null values are involved. Take for instance, DS = ({ex.org/bob, alice.org}, ) and P = ((?X, name, ?N ame) AND (?X, knows, ?F r iend)). When viewing each solution set as a relational table with variables denoting attribute names, we can write:
?X ?Name :a "Bob" alice.org#me "Alice" :c "Bob" ?X = :a alice.org#me ?X :a alice.org#me ?Friend :b :c ?Friend :b :c


?Name "Bob" "Alice"

Differences b etween the three semantics app ear when joining over null-b ound variables, as shown in the next example.
6  stands for "true",  stands for "false" and  stands for errors, see [18, Sec. 11.3] and Example 2 for details.

790


WWW 2007 / Track: Semantic Web
Example 3. Let DS be as before and assume the fol lowing query which might be considered a naive attempt to ask for pairs of persons ?X 1, ?X 2 who share the same name and nickname where both, name and nickname are optional: P = ( ((?X 1, a, Person) OPT (?X 1, name, ?N )) AND ((?X 2, a, Person) OPT (?X 2, nick, ?N )) ) Again, we consider the tabular view of the resulting join:
?X1 :a :b :c alice.org#me ?N "Bob" nul l "Bob" "Alice" ?X2 :a :b :c alice.org#me ?N nul l "Alice" "Bobby" nul l

Session: Query Languages and DBs
Example 5. Let DS = ({ex.org/bob, alice.org}, ) and assume a slight variant of a query from [5] which asks for persons and some names for these persons, where preferably the foaf : name is taken, and, if not specified, foaf : nick. P = ((((?X, a, Person) OPT (?X, name, ?X N AM E )) OPT (?X, nick, ?X N AM E )) Only [[P ]]b S contains the expected solution ( : b, "Alice") D for the bNode : b. All three semantics may b e considered as variations of the original definitions in [16], for which the authors proved complexity results and various desirable features, such as semantics-preserving normal form transformations and comp ositionality. The following prop osition shows that all these results carry over to the normative b-joining semantics: Proposition 1. Given a dataset DS and a pattern P which does not contain GRAPH patterns, the solutions of [[P ]]D S as in [16] and [[P ]]b S are in 1-to-1 correspondence. D Proof. Given DS and P each substitution  obtained by evaluation [[P ]]b S can b e reduced to a substitution  D obtained from the evaluation [[P ]]D S in [16] by dropping all mappings of the form v  null from . Likewise, each substitution  obtained from [[P ]]D S can b e extended to a substitution  = vars(P ) for [[P ]]b S . D Following the definitions from the SPARQL sp ecification and [16], the b-joining semantics is the only admissible definition. There are still advantages for gradually defining alternatives towards traditional treatment of joins involving nulls. On the one hand, as we have seen in the examples ab ove, the brave view on joining unb ound variables might have partly surprising results, on the other hand, as we will see, the c- and s-joining semantics allow for a more efficient implementation in terms of Datalog rules. Let us now take a closer look on some prop erties of the three defined semantics.

x

Now, let us see what happens when we evaluate the join x with respect to the different semantics. The fol lowing result table lists in the last column which tuples belong to the result of b-, c- and s-join, respectively.
?X1 :a :a :b :b :b :b :c :c alice.org#me alice.org#me alice.org#me ?N "Bob" "Bob" nul l "Alice" "Bobby" nul l "Bob" "Bob" "Alice" "Alice" "Alice" X2 :a alice.org#me :a :b :c alice.org#me :a alice.org#me :a :b alice.org#me b b b,c b b b,c b b b b,c,s b

=

Leaving aside the question whether the query formulation was intuitively broken, we remark that only the s-join would have the expected result. At the very least we might argue, that the liberal behavior of b-joins might be considered surprising in some cases. The c-joining semantics acts a bit more cautious in between the two, treating null values as normal values, only unifiable with other null values. Compared to how joins over incomplete relations are treated in common relational database systems, the s-joining semantics might b e considered the intuitive b ehavior. Another interesting divergence (which would rather suggest to adopt the c-joining semantics) shows up when we consider a simple idemp otent join. Example 4. Let us consider the fol lowing single triple dataset DS = ({(alice.org#me , a, Person)}, ) and the following simple query pattern: P = ((?X, a, Person) UNION (?Y , a, Person)) Clearly, this pattern, has the solution set [[P ]]x S = {(alice.org#me, null), (null, alice.org#me)} D under al l three semantics. Surprisingly, P  = (P AND P ) has different solution sets for the different semantics. First, [[P  ]]c S = [[P ]]x S , but [[P  ]]s S = , since null values are D D D not compatible under the s-joining semantics. Final ly, [[P  ]]b S = {(alice.org#me, null), (null, alice.org#me ), D (alice.org#me, alice.org#me )} As shown by this example, under the reasonable assumption, that the join op erator is idemp otent, i.e., (P  P )  P , only the c-joining semantics b ehaves correctly. However, the brave b-joining b ehavior is advocated by the current SPARQL document, and we might also think of examples where this obviously makes a lot of sense. Esp ecially, when considering no explicit joins, but the implicit joins within the OPT op erator:

Compositionality and Equivalences. As shown in [16],
some implementations have a non-comp ositional semantics, leading to undesired effects such as non-commutativity of the join op erator, etc. A semantics is called compositional if for each P  sub-pattern of P the result of evaluating P  can b e used to evaluate P . Obviously, all three the c-, s- and b-joining semantics defined here retain this prop erty, since all three semantics are defined recursively, and indep endent of the evaluation order of the sub-patterns. The following prop osition summarizes equivalences which hold for all three semantics, showing some interesting additions to the results of P´rez et al. e Proposition 2 (extends [16, Prop. 1]). The fol lowing equivalences hold or do not hold in the different semantics as indicated after each law: (1) AND, UNION are associative and commutative. (2) (P1 AND (P2 UNION P3 ))  ((P1 AND P2 ) UNION (P1 AND P3 )). (3) (P1 OPT (P2 UNION P3 ))  ((P1 OPT P2 ) UNION (P1 OPT P3 )). (4) ((P1 UNION P2 ) OPT P3 )  ((P1 OPT P3 ) UNION (P2 OPT P3 )). (5) ((P1 UNION P2 ) FILTER R)  ((P1 FILTER R) UNION (P2 FILTER R)). (6) AND is idempotent, i.e. (P AND P )  P . (b,c,s) (b) (b) (b) (b,c,s) (c)

791


WWW 2007 / Track: Semantic Web
Proof Sketch.. (1-5) for the b-joining semantics are proven in [16], (1): for c-joining and s-joining follows straight from the definitions. (2)-(4): the substitution sets [[P1 ]]c,s = {[?X  a, ?Y  b]}, [[P2 ]]c,s = {[?X  a, ?Z  c]}, [[P3 ]]c,s = {[?Y  b, ?Z  c]} provide counterexamples for c-joining and s-joining semantics for all three equivalences (2)-(4). (5): The semantics of FILTER expressions and UNION is exactly the same for all three semantics, thus, the result for the b-joining semantics carries over to all three semantics. (6): follows from the observations in Example 4. Ideally, we would like to identify a sub class of programs, where the three semantics coincide. Obviously, this is the case for any query involving neither UNION nor OPT operators. P´rez et al. [16] define a bigger class of programs, e including "well-b ehaving" optional patterns: Definition 3. ([16, Def. 4] ) A UNION-free graph pattern P is wel l-designed if for every occurrence of a sub-pattern P  = (P1 OPT P2 ) of P and for every variable v occurring in P , the following condition holds: if v occurs b oth in P2 and outside P  then it also occurs in P1 . As may b e easily verified by the reader, neither Example 3 nor Example 5, which are b oth UNION-free, satisfy the welldesignedness condition. Since in the general case the equivalences for Prop. 2 do not hold, we also need to consider nested UNION patterns as a p otential source for null bindings which might affect join results. We extend the notion of well-designedness, which direclty leads us to another corresp ondence in the subsequent prop osition. Definition 4. A graph pattern P is wel l-designed if the condition from Def. 3 holds and for every occurrence of a sub-pattern P  = (P1 UNION P2 ) of P and for every variable v occurring in P  , the following condition holds: if v occurs outside P  then it occurs in b oth P1 and P2 . Proposition 3. On wel l-designed graph patterns the c-, s-, and b-joining semantics coincide. Proof Sketch.. Follows directly from the observation that all variables which are re-used outside P  must b e b ound to a value unequal to null in P  due to well-designedness, and thus cannot generate null bindings which might carry over to joins. Likewise, we can identify "dangerous" variables in graph patterns, which might cause semantic differences: Definition 5. Let P  a sub-pattern of P of either the form P  = (P1 OPT P2 ) or P  = (P1 UNION P2 ). Any variable v in P  which violates the well-designedness-condition is called possibly-null-binding in P . Note that, so far we have only defined the semantics in terms of a pattern P and dataset DS , but not yet taken the result form V of query Q = (V , P, DS ) into account. We now define solution tuples that were informally introduced in Sec. 2. Recall that by V we denote the tuple obtained from lexicographically ordering a set of variables in V . The notion V [V   null] means that, after ordering V all variables from a subset V   V are replaced by null. Definition 6. (Solution Tuples ) Let Q = (V , P, DS ) b e a SPARQL query, and  a substitution in [[P ]]x S , then we D call the tuple V [(V \ v ar s(P ))  null] a solution tuple of Q with resp ect to the x-joining semantics.

Session: Query Languages and DBs
Let us remark at this p oint, that as for the discussion of intuitivity of the different join semantics discussed in Examples 3­5, we did not yet consider combinations of different join semantics, e.g. using b-joins for OPT and c-joins for AND patterns. We leave this for further work.

3. DATALOG AND ANSWER SETS
In this pap er we will use a very general form of Datalog commonly referred to as Answer Set Programming (ASP), i.e. function-free logic programming (LP) under the answer set semantics [1, 11]. ASP is widely prop osed as a useful tool for various problem solving tasks in e.g. Knowledge Representation and Deductive databases. ASP extends Datalog with useful features such as negation as failure, disjunction in rule heads, aggregates [9], external predicates[8], etc. 7 Let P r ed, C onst, V ar , exP r b e sets of predicate, constant, variable symb ols, and external predicate names, resp ectively. Note that we assume all these sets except P r ed and C onst (which may overlap), to b e disjoint. In accordance with common notation in LP and the notation for external predicates from [7] we will in the following assume that C onst and P r ed comprise sets of numeric constants, string constants b eginning with a lower case letter, or '"' quoted strings, and strings of the form quotedstring ^^ IRI , quoted-string @ valid-lang-tag , V ar is the set of string constants b eginning with an upp er case letter. Given p  P r ed an atom is defined as p(t1 , . . . , tn ), where n is called the arity of p and t1 , . . . , tn  C onst  V ar . Moreover, we define a fixed set of external predicates exP r = {r df , isB LAN K , isI RI , isLI T E RAL, =, != } All external predicates have a fixed semantics and fixed arities, distinguishing input and output terms. The atoms isB LAN K [c](v al), isI RI [c](v al), isLI T E RAL[c](v al) test the input term c  C onst  V ar (in square brackets) for b eing valid string representations of Blank nodes, IRI References or RDF literals, returning an output value v al  {t, f, e}, representing truth, falsity or an error, following the semantics defined in [18, Sec. 11.3]. For the r df predicate we write atoms as r df [i](s, p, o) to denote that i  C onst  V ar is an input term, whereas s, p, o  C onst  V ar are output terms which may b e b ound by the external predicate. The external atom r df [i](s, p, o) is true if (s, p, o) is an RDF triple entailed by the RDF graph which is accessibly at IRI i. For the moment, we consider simple RDF entailment [13] only. Finally, we write comparison atoms 't1 = t2 ' and 't1 != t2 ' in infix notation with t1 , t2  C onst  V ar and the obvious semantics of (lexicographic or numeric) (in)equality. Here, for = either t1 or t2 is an output term, but at least one is an input term, and for != b oth t1 and t2 are input terms. Definition 7. Finally, a rule is of the form h :- b1 , . . . , bm , not bm+1 , . . . not bn . (1)

where h and bi (1  i  n) are atoms, bk (1  k  m) are either atoms or external atoms, and not is the symb ol for negation as failure. We use H (r ) to denote the head atom h and B (r ) to denote the set of all b ody literals B + (r )B - (r ) of r , where B + (r ) = {b1 , . . . , bm } and B - (r ) = {bm+1 , . . . , bn }. We consider ASP, more precisely a simplified version of ASP with so-called HEX-programs [8] here, since it is up to date the most general extension of Datalog.
7

792


WWW 2007 / Track: Semantic Web
The notion of input and output terms in external atoms describ ed ab ove denotes the binding pattern. More precisely, we assume the following condition which extends the standard notion of safety (cf. [21]) in Datalog with negation: Each variable app earing in a rule must app ear in B + (r ) in an atom or as an output term of an external atom. Definition 8. A (logic) program  is defined as a set of safe rules r of the form (1). The Herbrand base of a program , denoted H B , is the set of all p ossible ground versions of atoms and external atoms occurring in  obtained by replacing variables with constants from C onst, where we define for our purp oses by C onst the union of the set of all constants app earing in  as well as the literals, IRIs, and distinct constants for each blank node occurring in each RDF graph identified8 by one of the IRIs in the (recursively defined) set I , where I is defined by the recursive closure of all IRIs app earing in  and all RDF graphs identified by IRIs in I .9 As long as we assume that the Web is finite the grounding of a rule r , ground (r ), is defined by replacing each variable with the p ossible elementS of H B , and the grounding of program  s is ground () = r ground (r ). An interpretation relative to  is any subset I  H B containing only atoms. We say that I is a model of atom a  H B , denoted I |= a, if a  I . With every external predicate name lg  exP r with arity n we associate an (n + 1)-ary Boolean function flg (called oracle function) assigning each tuple (I , t1 . . . , tn ) either 0 or 1. 10 We say that I  H B is a model of a ground external atom a = g [t1 , . . . , tm ](tm+1 , . . . , tn ), denoted I |= a, iff flg (I , t1 , . . . , tn ) = 1. The semantics we use here generalizes the answer-set semantics [11]11 , and is defined using the FLP-reduct [9], which is more elegant than the traditional GL-reduct [11] of stable model semantics and ensures minimality of answer sets also in presence of external atoms. Let r b e a ground rule. We define (i) I |=B (r ) iff I |= a for all a  B + (r ) and I |= a for all a  B - (r ), and (ii) I |= r iff I |= H (r ) whenever I |= B (r ). We say that I is a model of a program , denoted I |= , iff I |= r for all r  ground (). The FLP-reduct [9] of  with resp ect to I  H B , denoted I , is the set of all r  ground () such that I |= B (r ). I is an answer set of  iff I is a minimal model of I . We did not consider further extensions common to many ASP dialects here, namely disjunctive rule heads, strong negation [11]. We note that for non-recursive programs, i.e. where the predicate dep endency graph is acyclic, the answer set is unique. For the pure translation which we will give in Sec. 4 where we will produce such non-recursive programs from SPARQL queries, we could equally take other seman8 By "identified" we mean here that IRIs denote network accessible resources which corresp ond to RDF graphs. 9 We assume the numb er of accessible IRIs finite. 10 The notion of an oracle function reflects the intuition that external predicates compute (sets of ) outputs for a particular input, dep ending on the interpretation. The dep endence on the interpretation is necessary for instance for defining the semantics of external predicates querying OWL [8] or computing aggregate functions. 11 In fact, we use slightly simplified definitions from [7] for HEX-programs, with the sole difference that we restrict ourselves to a fixed set of external predicates.

Session: Query Languages and DBs
tics such as the well-founded [10] semantics into account, which coincides with ASP on non-recursive programs.

4. FROM SPARQL TO DATALOG
We are now ready to define a translation from SPARQL to Datalog which can serve straightforwardly to implement SPARQL within existing rules engines. We start with a translation for c-joining semantics, which we will extend thereafter towards s-joining and b-joining semantics.

Translation c . Let Q = (V , P, DS ), where DS = (G, Gn ) Q
as defined ab ove. We translate this query to a logic program c defined as follows. Q c ={triple(S, P, O, default) :- rdf[d](S, P, O). | d  G} Q  {triple(S, P, O, g ) :- rdf[g ](S, P, O). | g  Gn }   (V , P, default, 1) The first two rules serve to imp ort the relevant RDF triples from the dataset into a 4-ary predicate triple. Under the dataset closedness assumption (see Def. 1) we may replace the second rule set, which imp orts the named graphs, by: triple(S, P, O, G) :- rdf[G](S, P, O), H U (G), isIRI(G). Here, the predicate H U stands for "Herbrand universe", where we use this name a bit sloppily, with the intention to cover all the relevant part of C , recursively imp orting all p ossible IRIs in order to emulate the dataset closedness assumption. H U , can b e computed recursively over the input triples, i.e. H U (X ) :- triple(X, P, O, D). H U (X ) :- triple(S, X, O, D). H U (X ) :- triple(S, P, X, D). H U (X ) :- triple(S, P, O, X ). The remaining program  (V , P, default, 1) represents the actual query translation, where  is defined recursively as shown in Fig. 2. By LT (·) we mean the set of rules resulting from disassembling complex FILTER expressions (involving '¬','','') according to the rewriting defined by Lloyd and Top or [15] where we have to ob ey the semantics for errors, following Definition 2. In a nutshell, the rewriting LT - r ewr ite(·) proceeds as follows: Complex filters involving ¬ are transformed into negation normal form. Conjunctions of filter expressions are simply disassembled to conjunctions of b ody literals, disjunctions are handled by splitting the resp ective rule for b oth alternatives in the standard way. The resulting rules involve p ossibly negated atomic filter expressions in the b odies. Here, B OU N D(v ) is translated to v = null, ¬B OU N D(v ) to v ! = null. isB LAN K (v ), isI RI (v ), isLI T E RAL(v ) and their negated forms are replaced by their corresp onding external atoms (see Sec. 3) isBLANK[v ](t) or isBLANK[v ](f), etc., resp ectively. The resulting program c implements the c-joining seQ mantics in the following sense: Proposition 4 (Soundness and completeness of c ). Q For each atom of the form answer1 (s, default) in the unique answer set M of c , s is a solution tuple of Q with respect Q to the c-joining semantics, and al l solution tuples of Q are represented by the extension of predicate answer1 in M . Without giving a proof, we remark that the result follows if we convince ourselves that  (V , P, D, i) emulates exactly

793


WWW 2007 / Track: Semantic Web
 (V , (s, p, o), D, i)  (V , (P  AND P  ), D, i)
 

Session: Query Languages and DBs
(1) (2) (3) (4) (5) (6)

= answeri (V , D) :- triple(s, p, o, D). =  (v ar s(P ), P  , D, 2  i)   (v ar s(P  ), P  , D, 2  i + 1)  answeri (V , D) :- answer2i (v ar s(P  ), D), answer2i+1 ((v ar s(P  ), D).
   

 (V , (P UNION P ), D, i) =  (v ar s(P ), P , D, 2  i)   (v ar s(P ), P , D, 2  i + 1)  answeri (V [(V \ v ar s(P  ))  null], D) :- answer2i (v ar s(P  ), D). answeri (V [(V \ v ar s(P  ))  null], D) :- answer2i+1 (v ar s(P  ), D).  (V , (P  MINUS P  ), D, i) =  (v ar s(P ), P  , D, 2  i)   (v ar s(P  ), P  , D, 2  i + 1)  answeri (V [(V \ v ar s(P  ))  null], D) :- answer2i (v ar s(P  ), D), not answer2i  (v ar s(P  )  v ar s(P  ), D).  answer2i (v ar s(P  )  v ar s(P  ), D) :- answer2i+1 (v ar s(P  ), D). }  (V , (P OPT P ), D, i) =  (V , (P AND P ), D, i)   (V , (P MINUS P ), D, i)  (V , (P FILTER R), D, i) =  (v ar s(P ), P, D, 2  i)  LT (answeri (V , D) :- answer2i (v ar s(P ), D), R.)  (V , (GRAPH g P ), D, i) =  (V , P, g , i) for g  V  I answeri (V , D) :- answeri (V , g ), isIRI(g ), not g = default. Alternate rules replacing (5)+(6): answeri (V [(V \ v ar s(P  ))  null], D) :- answer2i (v ar s(P  ), D), not answer2i  (v ar s(P  ), D) answer2i  (v ar s(P  ), D) :- answer2i (v ar s(P  ), D), answer2i+1 (v ar s(P  ), D). Figure 2: Translation c from SPARQL queries semantics to Datalog. Q the recursive definition of [[P ]]x S . Moreover, together with D Prop osition 3, we obtain soundness and completeness of Q for b-joining and s-joining semantics as well for well-designed query patterns. Corollary 1. For Q = (V , P, DS ), if P is wel l-designed, then the extension of predicate answer1 in the unique answer c set M of Q represents al l and only the solution tuples for Q with respect to the x-joining semantics, for x  {b, c, s}. Now, in order to obtain a prop er translation for arbitrary patterns, we obviously need to focus our attention on the p ossibly-null-binding variables within the query pattern P . Let v null(P ) denote the p ossibly-null-binding variables in a (sub)pattern P . We need to consider all rules in Fig. 2 which involve x-joins, i.e. the rules of the forms (2),(5) and (6). Since rules (5) and (6) do not make this join explicit, we will replace them by the equivalent rules (5') and (6') s for Q and b . The "extensions" to s-joining and b-joining Q semantics can b e achieved by rewriting the rules (2) and (6'). The idea is to rename variables and add prop er FILTER expressions to these rules in order to realize the b-joining and s-joining b ehavior for the variables in VN = v null(P )  v ar s(P  )  v ar s(P  ).
s Translation Q . The s-joining behavior can be achieved by      

(7) (8)

(5') (6')

Translation b . Obviously, b-joining semantics is more Q
tricky to achieve, since we now have to relax the allowed joins in order to allow null bindings to join with any other value. We will again achieve this result by modifying rules (2) and (6') where we first do some variable renaming and then add resp ective FILTER expressions to these rules. Step 1. We rename each variable v  VN in the resp ective rule b odies to v  or v  , resp ectively, in order to disambiguate the occurrences originally from sub-pattern P  or P  , resp ectively. That is, for each rule (2) or (6'), we rewrite the b ody to:
 answer2i (v ar s(P  )[VN  VN ], D),   )[V answer2i+1 (v ar s(P N  V N ] , D ).

Step 2. We now add the following FILTER expressions b b R(2) and R(6 ) , resp ectively, to the resulting rule b odies which "emulate" the relaxed b-compatibility: V b R(2) = vV N ( ((v = v  )  (v  = v  ))  ((v = v  )  ¬B OU N D(v  ))  ((v = v  )  ¬B OU N D(v  )) ) V b R(6 ) = vV N ( ((v = v  )  (v  = v  ))  ((v = v  )  ¬B OU N D(v  ))  ((v = v  )  ¬B OU N D(v  )) ) The rewritten rules are again sub ject to the LT rewriting. Note that, strictly sp eaking the filter expression introduced here does not fulfill the assumption of safe filter expressions, since it creates new bindings for the variable v . However, these can safely b e allowed here, since the translation only creates valid input/output term bindings for the external b Datalog predicate '='. The subtle difference b etween R(2) b b and R(6 ) lies in the fact that R(2) preferably "carries over" b b ound values from v  or v  to v whereas R(6 ) always takes  the value of v . The effect of this b ecomes obvious in the translation of Example 5 which we leave as an exercise to

adding FILTER expressions ^ Rs = ( B O U N D (v ) )
v VN

to the rule b odies of (2) and (6'). The resulting rules are again sub ject to the LT -rewriting as discussed ab ove for the rules of the form (7). This is sufficient to filter out any joins involving null values, thus achieving s-joining semantics, and we denote the program rewritten that way as s . Q

794


WWW 2007 / Track: Semantic Web
the reader. We note that the p otential exp onential (with resp ect to |VN |) blowup of the program size by unfolding the filter expressions into negation normal form during the LT rewriting12 is not surprising, given the negative complexity results in [16]. In total, we obtain a program which b which reflects the Q normative b-joining semantics. Consequently, we get sound and complete query translations for all three semantics: Corollary 2 (Soundness and completeness of x ). Q Given an arbitrary graph pattern P , the extension of predicate answer1 in the unique answer set M of x represents Q al l and only the solution tuples for Q = (V , P, DS ) with respect to the x-joining semantics, for x  {b, c, s}. In the following, we will drop the sup erscript x in Q implicitly refer to the normative b-joining translation/semantics.

Session: Query Languages and DBs

5.2 Result Forms and Solution Modifiers
We have covered only SELECT queries so far. As shown in the previous section, we can consider ASK queries equally. A limited form of the CONSTRUCT result form, which allows to construct new triples could b e emulated in our approach as well. Namely, we can allow queries of the form QC = (CONSTRUCTPC , P, DS ) where PC is a graph pattern consisting only of bNode-free triple patterns. We can model these by adding a rule triple(s, p, o, C) :- answer1 (v ar s(PC ), default). (2)

to Q for each triple (s, p, o) in PC . The result graph is then naturally represented in the answer set of the program extended that way in the extension of the predicate triple.

5.3 SPARQL as a Rules Language
As it turns out with the extensions defined in the previous subsections, SPARQL itself may b e viewed as an expressive rules language on top of RDF. CONSTRUCT statements have an obvious similarity with view definitions in SQL, and thus may b e seen as rules themselves. Intuitively, in the translation of CONSTRUCT we "stored" the new triples in a new triple outside the dataset DS . We can imagine a similar construction in order to define the semantics of queries over datasets mixing such CONSTRUCT statements with RDF data in the same turtle file. Let us assume such a mixed file containing CONSTRUCT rules and RDF triples web-accessible at IRI g , and a query Q = (V , P, DS ), with DS = (G, Gn ). The semantics of a query over a dataset containing g may then b e defined by recursively adding QC to Q for any CONSTRUCT query QC in g plus the rules (2) ab ove with their head changed to triple(s, p, o, g ). We further need to add a rule triple(s, p, o, def ault) :- triple(s, p, o, g ). for each g  G, in order not to omit any of the implicit triples defined by such "CONSTRUCT rules". Analogously to the considerations for nested ASK queries, we need to rename the answeri predicates and def ault constants in every subprogram QC defined this way. Naturally, the resulting programs p ossibly involve recursion, and, even worse, recursion over negation as failure. Fortunately, the general answer set semantics, which we use, can cop e with this. For some imp ortant asp ects on the semantics of such distributed rules and facts bases, we refer to [17], where we also outline an alternative semantics based on the well-founded semantics. A more in-depth investigation of the complexity and other semantic features of such a combination is on our agenda.

5.

POSSIBLE EXTENSIONS

As it turns out, the emb edding of SPARQL in the rules world op ens a wide range of p ossibilities for combinations. In this section, we will first discuss some straightforward extensions of SPARQL which come practically for free with the translation to Datalog provided b efore. We will then discuss the use of SPARQL itself as a simple RDF rules language13 which allows to combine RDF fact bases with implicitly sp ecified further facts and discuss the semantics thereof briefly. We conclude this section with revisiting the op en issue of entailment regimes covering RDFS or OWL semantics in SPARQL.

5.1 Additional Language Features
Set Difference. As mentioned before, set difference is not
present in the current SPARQL sp ecification syntactically, though hidden, and would need to b e emulated via a combination of OPTIONAL and FILTER constructs. As we defined the MINUS op erator here in a completely modular fashion, it could b e added straightforwardly without affecting the semantics definition.

Nested queries. Nested queries are a distinct feature of
SQL not present in SPARQL. We suggest a simple, but useful form of nested queries to b e added: Boolean queries QASK = (, PASK , DSASK )) with an empty result form (denoted by the keyword ASK) can b e safely allowed within FILTER expressions as an easy extension fully compatible with our translation. Given query Q = (V , P, DS ), with subpattern (P1 FILTER (ASKQASK )) we can modularly translate such sub queries by extending Q with Q where Q = (v ar s(P1 )  v ar s(PASK ), PASK , DSASK )). Moreover, we have  to rename predicate names answeri to answerQ i in Q . Some additional considerations are necessary in order to combine this within arbitrary complex filter expressions, and we probably need to imp ose well-designedness for variables shared b etween P and PASK similar to Def. 4. We leave more details as future work.
12

5.4 Revisiting Entailment Regimes
The current SPARQL sp ecification does not treat entailment regimes b eyond RDF simple entailment. Strictly sp eaking, even RDF entailment is already problematic as a basis for SPARQL query evaluation; a simple query pattern like P = (?X, rdf:typ e, rdf:Prop erty) would have infinitely many solutions even on the empty (sic!) dataset by matching the infinitely many axiomatic triples in the RDF(S) semantics. Finite rule sets which approximate the RDF(S) semantics in terms of p ositive Datalog rules [17] have b een im-

Lloyd and Top or can avoid this p otential exp onential blowup by introducing new auxiliary predicates. However, we cannot do the same trick, mainly for reasons of preserving safety of external predicates as defined in Sec. 3. 13 Thus, the ". . . (and back)" in the title of this pap er!

795


WWW 2007 / Track: Semantic Web
plemented in systems like TRIPLE14 or JENA15 . Similarly, fragments and extensions of OWL [12, 3, 14] definable in terms of Datalog rule bases have b een prop osed in the literature. Such rule bases can b e parametrically combined with our translations, implementing what one might call RDFS- or OWL- entailment at least. It remains to b e seen whether the SPARQL working group will define such reduced entailment regimes. More complex issues arise when combining a nonmonotonic query language like SPARQL with ontologies in OWL. An emb edding of SPARQL into a nonmonotonic rules language might provide valuable insights here, since it op ens up a whole b ody of work done on combinations of such languages with ontologies [7, 19].

Session: Query Languages and DBs

8. REFERENCES
[1] C. Baral. Know ledge Representation, Reasoning and Declarative Problem Solving. Cambr.Univ. Press, 2003. [2] D. Beckett. Turtle - Terse RDF Triple Language. Tech. Rep ort, 4 Apr. 2006. [3] J. de Bruijn, A. Polleres, R. Lara, D. Fensel. OWL DL vs. OWL Flight: Conceptual modeling and reasoning for the semantic web. In Proc. WWW-2005, 2005. [4] J. Carroll, C. Bizer, P. Hayes, P. Stickler. Named graphs. Journal of Web Semantics, 3(4), 2005. [5] R. Cyganiak. A relational algebra for sparql. Tech. Rep ort HPL-2005-170, HP Labs, Sept. 2005. [6] J. de Bruijn, E. Franconi, S. Tessaris. Logical reconstruction of normative RDF. OWL: Experiences and Directions Workshop (OWLED-2005), 2005. [7] T. Eiter, G. Ianni, A. Polleres, R. Schindlauer, H. Tompits. Reasoning with rules and ontologies. Reasoning Web 2006, 2006. Springer [8] T. Eiter, G. Ianni, R. Schindlauer, H. Tompits. A Uniform Integration of Higher-Order Reasoning and External Evaluations in Answer Set Programming. Int.l Joint Conf. on Art. Intel ligence (IJCAI), 2005. [9] W. Fab er, N. Leone, G. Pfeifer. Recursive aggregates in disjunctive logic programs: Semantics and complexity. Proc. of the 9th European Conf. on Art. Intel ligence (JELIA 2004), 2004. Springer. [10] A. V. Gelder, K. Ross, J. Schlipf. Unfounded sets and well-founded semantics for general logic programs. 7th ACM Symp. on Principles of Database Systems, 1988. [11] M. Gelfond, V. Lifschitz. Classical Negation in Logic Programs and Disjunctive Databases. New Generation Computing, 9:365­385, 1991. [12] B. N. Grosof, I. Horrocks, R. Volz, S. Decker. Description logic programs: Combining logic programs with description logics. Proc. WWW-2003, 2003. [13] P. Hayes. RDF semantics. W3C Recommendation, 10 Feb. 2004. http://www.w3.org/TR/rdf- mt/ [14] H. J. ter Horst. Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary. Journal of Web Semantics, 3(2), July 2005. [15] J. W. Lloyd, R. W. Top or. Making prolog more expressive. Journal of Logic Programming, 1(3):225­240, 1984. [16] J. P´rez, M. Arenas, C. Gutierrez. Semantics and e complexity of SPARQL. The Semantic Web ­ ISWC 2006, 2006. Springer. [17] A. Polleres, C. Feier, A. Harth. Rules with contextually scop ed negation. Proc. 3rd European Semantic Web Conf. (ESWC2006), 2006. Springer. [18] E. Prud'hommeaux, A. S. (ed.). SPARQL Query Language for RDF, W3C Working Draft, 4 Oct. 2006. http://www.w3.org/TR/rdf- sparql- query/ [19] R. Rosati. Reasoning with Rules and Ontologies. Reasoning Web 2006, 2006. Springer. [20] SQL-99. Information Technology - Database Language SQL- Part 3: Call Level Interface (SQL/CLI). Technical Rep ort INCITS/ISO/IEC 9075-3, INCITS/ISO/IEC, Oct. 1999. Standard sp ecification. [21] J. D. Ullman. Principles of Database and Know ledge Base Systems. Computer Science Press, 1989.

6.

CONCLUSIONS & OUTLOOK

In this pap er, we presented three p ossible semantics for SPARQL based on [16] which differ mainly in their treatment of joins and their translations to Datalog rules. We discussed intuitive b ehavior of these different joins in several examples. As it turned out, the s-joining semantics which is close to traditional treatment of joins over incomplete relations and the c-joining semantics are nicely emb eddable into Datalog. The b-joining semantics which reflects the normative b ehavior as describ ed by the current SPARQL sp ecification is most difficult to translate. We also suggested some extension of SPARQL, based on this translation. Further, we hop e to have contributed to clarifying the relationships b etween the Query, Rules and Ontology layers of the Semantic Web architecture with the present work. A prototyp e of the presented translation has b een implemented on top of the dlvhex system, a flexible framework for developing extensions for the declarative Logic Programming Engine DLV16 . The prototyp e is available as a plugin at http://con.fusion.at/dlvhex/. The web-page also provides an online interface for evaluation, where the reader can check translation results for various example queries, which we had to omit here for space reasons. We currently implemented the c-joining and b-joining semantics and we plan to gradually extend the prototyp e towards the features mentioned in Sec. 5, in order to query mixed RDF+SPARQL rule and fact bases. Implementation of further extensions, such as the integration of aggregates typical for database query language, and recently defined for recursive Datalog programs in a declarative way compatible with the answer set semantics [9], are on our agenda. We are currently not aware of any other engine implementing the full semantics defined in [16].

7.

ACKNOWLEDGMENTS

Sp ecial thanks go to Jos de Bruijn and Reto Krummenacher for discussions on earlier versions of this document, to Bijan Parsia, Jorge P´rez, and Andy Seab orne for value able email-discussions, to Roman Schindlauer for his help on prototyp e implementation on top of dlvhex, and to the anonymous reviewers for various useful comments. This work is partially supp orted by the Spanish MEC under the pro ject TIC-2003-9001 and by the EC funded pro jects TripCom (FP6-027324) and KnowledgeWeb (IST 507482).
14 15

http://triple.semanticweb.org/ http://jena.sourceforge.net/ 16 http://www.dlvsystem.com/

796