Does SPARQL take a Closed World Assumption (CWA) and/or a Unique Name Assuption (UNA)?

I've seen in various places where people claim that SPARQL has a Closed World Assumption (CWA ... anything unknown is false) and has a Unique Name Assumption (UNA ... each "name" refers to some thing unique).

An example from this paper:

Querying Semantic Web Data with SPARQL. Marcelo Arenas, Jorge Perez. PODS 2011.

"Nevertheless, SPARQL has adopted a semantics based on a closed world assumption."

For evidence of CWA, people cite issues when considering SPARQL features like negation as failure (!BOUND/OPTIONAL or NOT EXISTS/MINUS) and counts (returns an exact value like "3", not "≥3").

The UNA of SPARQL has also come up in discussion with various academic types. People cite issues with two URIs that could refer to the same thing not being considered equal, or counts possibly counting the same thing twice if it is given two RDF term references.

The argument then is that the UNA and CWA of SPARQL are somehow incompatible with the No-UNA and OWA of RDF and OWL.

I take exception to this view of SPARQL adopting a CWA or a UNA. For me, SPARQL is defined in terms of lookups on data and data elements (RDF terms). Leaving completely aside SPARQL entailment and datatypes, SPARQL says nothing about what these data mean in relation to the real-world and SPARQL does not say anything about what things might refer to. In other words, SPARQL does not consider an interpretation of the data it indexes.

Thus, for example, when asking a count query, I don't see that as meaning, e.g., "how many presidents have there been in the United States" (SPARQL could only answer "≥0" or "≥1" in all such cases), I see this as asking "how many results/RDF terms for this query can I find for the data indexed" (which can be a non-trivial and actually useful answer). Similarly, in SPARQL I do not expect to ask "has there (not) been a female US president" ... I expect to ask "do these data (not) contain any binding for this query looking for female US presidents".

In summary, without something like interpretations, I don't see that SPARQL has anything to do with either assumption, let alone fall into one side or the other (CWA/OWA # UNA/No UNA).

Nonetheless, I've seen this claim that SPARQL has CWA and/or UNA multiple times and in fairly notable contexts. So I wonder if this is their misconception or mine?

In summary: Is it erroneous to say that SPARQL takes a CWA? Is it erroneous to say that SPARQL takes a UNA?

First, I would make a distinction between the "entailment layer" and the "query layer" in SPARQL, by which I meant that, conceptually, not the underlying graph itself but its entailment closure with regard to the used entailment regime is queried. The idea of SPARQL entailment regimes has actually already existed in SPARQL 1.0, although the only entailment regime then was RDF simple entailment and thus almost "invisible". Now, with this distinction, we can split the question into separate questions for the entailment layer and for the query layer.

For the entailment layer, the answer is simply that the CWA or UNA holds iff it holds for the corresponding entailment regime, because the entailment layer is all about determinging the entailment closure of the underlying graph w.r.t. the entailment regime. In the case of all the built-in SPARQL 1.1 entailment regimes (including the original SPARQL 1.0 entailment regime), the answer is "no" and "no", respectively. However, it would be possible to create a custom entailment regime for the CWA or UNA or both holds. But I'm not going into this topic here.

Now to the query layer!

Concerning CWA:

In general, it isn't perfectly clear what the CWA is meant to be at all. Typically, when people talk about CWA vs. OWA, they come up with concrete examples, such as Negation-as-Failure vs. classical negation, or existential property restrictions as UML-style constraints vs. OWL-style entailment-enabling features. The point is that "CWA" isn't a well-defined term. The way I would give a vage CWA characterization would be to say: a (not "the") CWA holds, if one can get specific results from the assumption that the dataset one operates on is all there is (and not only an excerpt of a possibly larger, unknown "world"). That's for example the case for negation-as-failure as well as for the UML-like constraints, because negation-as-failure holds, if the searched-for information is not available either explicitly or via some form of calculation, and similar for the constraint. But the two features are pretty independent of each other, one might be part of a particular CWA while the other doesn't, or none may hold but a third feature instead, so this vage characterization doesn't determine the actual features of a CWA.

What we can do is to introduce a monotony criterion for querying first, along the following lines: for some query language and protocol, given a query Q, a dataset D1, and a second dataset D2 that fully includes D1 as a subset, then the resultset from applying Q to D1 is a subset of the resultset from applying Q to D2.

I would expect that any form of CWA would be incompatible with this criterion, because with a CWA, I would expect to be able to find a dataset D1 and query Q, such that when I query D1 with Q, some D1-specific information would pop up in the resultset that would not appear when I use Q to query some super-dataset D2 of D1 - because for the first query application, D1* is "all there is", while for the second query application, this is not the case anymore.

Now, if we agree on such a monotony criterion, we can start to analyse SPARQL. If there would only be basic graph patterns (BGP) in SPARQL, then, if I'm not mistaken, monotony would hold, and so no CWA would hold. However, with these negation-style features of SPARQL 1.1 (which were already implicitly available in SPARQL 1.0), we can create a query that produces a result R based on the fact that a certain triple T can /not/ be found in the queried graph. But if we then extend the queried graph by adding T, querying this extended graph will not provide the result R anymore (because now T can be found). Hence, our notion of monotony does /not/ hold in SPARQL, so a form of CWA holds indeed for SPARQL.

Concerning UNA:

Here, I would agree that the term UNA is not particularly useful for querying. We could talk about the relationship between the nodes in a graph pattern and the matched nodes in the queried graph. But what we have here is, on the one hand, URIs and literals that match exactly the same URIs and literals in the graph - so one could call this a kind of trivial UNA, if one likes. But then, we also have query variables, which generally will be assigned many different nodes in the queried graph within one query application. So if we take query variables into account, then the UNA would /not/ hold. Either way, it isn't a useful idea, AFAICT.

My answer would be that SPARQL neither takes CWA nor UNA simply because these notions do not apply to a query language. I'll just talk about CWA here because Michael already said all that must be said about it. Beware, this is going into some logical considerations before getting to the point.

CWA applies to logical formalisms. It is hard to see SPARQL as one. Basically, a logical formalism (or a logic, for short) has a notion of logical formulas (e.g., RDF triples), formulas can be grouped into a logical theory (e.g., RDF graphs) and a logical theories can be interpreted as either true or false. Entailment is then obtained for free as it is defined as "T1 entails T2 whenever all interpretations that make T1 true also make T2 true" (and it's exactly how RDF, RDFS, OWL (etc) entailments are defined).

Classical CWA says that if T1 does not entail T2, then T2 is necessarily false. It sometimes leads to strange conclusions. According to Wikipedia, there are several variants of CWA that avoids weird inferences, which confirms Michael's statement that there is not one CWA, but many.

Now, how does this apply to SPARQL? Is SPARQL a logic at all? What is a formula, and what's a theory in SPARQL? What does it mean that a SPARQL "theory" is interpreted as true? These questions don't seem to make much sense. But let us carry on the reasoning.

SELECT queries do not evaluate as either true or false. They define a function from RDF datasets to sets of answers. CONSTRUCT queries define a function from datasets to RDF graphs. ASK queries, however, evaluates as true or false, so that would probably be the logic defined by SPARQL. But actually, it's a function from datasets to {true,false,error}. Forgetting about errors, we can see an ASK query as a kind of logic programme where the triples in the datasets correspond to simple assertions and the patterns of the query are rules that entails answers, e.g.,

ans(?x,?y,?z) <- filter_not_bound(?x),triples(?y,<p>,?z),optional(triple(?x,p,?z))

This formalisation is possibly not quite right but we can imagine that a SPARQL ASK query against a dataset can be formulated as a logical theory in a certain logic. So, we have a logic, let us consider CWA. For sure, classical CWA does not apply because basic graph patters follow OWA logic strictly. Following Wikipedia's list of CWA extensions, I'd say that neither GCWA nor EGCWA apply. CCWA and ECWA may apply, provided that only the SPARQL constructs, not the triple construct, are in the "given set". I haven't checked this carefully at all and may be wrong, but let us conclude that the logic induced by the ASK queries may have a certain form of CWA.

But all this looks like trying to square the circle by forcing CWA into SPARQL with tons of formal tricks. I prefer the short answer that says "no, SPARQL does not take a CWA".