Skolem syntax for blank nodes in SPARQL?

Signified · September 6, 2010, 10:00pm

The default "semantics" for blank nodes in SPARQL queries is to represent existential variables (essentially a normal variable whose value cannot be returned). As such, specifying blank-nodes in queries is pretty useless (though I may stand corrected) as normal variables can be used in their stead. I'm guessing SPARQL chose to keep an RDF-like semantics for blank-nodes.

Many systems take a skolemised view of blank-nodes as ground (instantiated) terms in their own right. Even still, if I pose one query to a SPARQL endpoint in which I get a binding _:bnode1, I cannot subsequently ask for more information about _:bnode1 without reconstructing and extending the previous query, and/or using some swanky filters.

My question is:

Is there a specific reason why SPARQL has not offered a Skolem syntax for querying blank-nodes with a particular label? (See examples below)

As far as I can see, this would be a practical, useful and simple sprinkle of syntactic sugar to offer in SPARQL. I'm guessing there's a reason why it's not offered? Would it cause a split in the semantics on how blank-nodes should be supported? Anyone know if this was considered in the WG(s)?

(Question inspired by this one, and the discussion on CBDs which seems [IMO] to be an awkward solution—specifically for DESCRIBE queries—to the above non-referencability of blank-nodes. Why not tackle the root cause?)

Secondly:

Is there a non-standard (community) spec for creating globally unique blank-node Skolem names (based on a function of graph and local label) for Named Graphs? If not, would one be useful/appropriate?

If we're all skolemising blank-nodes, why not use the same rule-of-thumb?

Side note: did some tests/lookups for a few different SPARQL engines. They all support query blank-nodes as existential variables, and some support additional syntax for skolemising blank-nodes (can't get the links to test queries to work, so you'll have to trust me).

Supported query blank-node Skolems
- Virtuoso uses <nodeID://b196899188>syntax
- YARS2 uses <_:encodedcontextxxlabel> syntax
- ARQ/Jena (optionally?) also uses <_:bnodeN> syntax
Not supported (at least not in tested endpoint... maybe optional support)
- Sesame
- BigOWLIM

MichaelSchn · September 6, 2010, 10:00pm

I will try to address your first question. I won't discuss the question whether referring to concrete blank node identifiers in an RDF graph is good practice not, but will only deal with some technical aspect here.

In RDF, the identifiers for blank nodes in the queried RDF graph are local to this graph. On the other hand, blank node identifiers occurring in a SPARQL query pattern are local to that pattern (even to their containing basic graph pattern), see Sec. 5.1.1. of the SPARQL specification. So, if I understand it correctly, with a SPARQL-compliant system it is not possible to refer to the identifier of a blank node in the queried graph from within a SPARQL query pattern.

Elaboration:

Let's assume that you know the identifier of some blank node, may it be "_:xyz", and that you use this same identifier in some query pattern. It turns out that a SPARQL-compliant system will distinguish the two identifiers, since they have disjoint scopes. They will be "standardized apart" into something like "_:xyz/graph" and "_:xyz/query", respectively, where "_:xyz/query" does not occur anywhere in the graph.

Now, if you interpret blank nodes as Skolem constants, your query will then not match anything (empty result set). The reason is that in order to match, it would be necessary to match the constant(!) "_:xyz/query" in the query pattern against the same constant in the graph, but that constant does not exist in the graph.

Note that the basic problem here stems from the locality of blank nodes (a syntactic aspect), not from the existential-vs-Skolem question (a semantic aspect), although it turns out that under existential semantics the problem of a non-matching query would not exist.

Bottom line (provided that I have correctly understood the SPARQL spec):

Skolem semantics paired with graph/pattern-locality of blank nodes would be disastrous for SPARQL, since it would effectively prevent query matching whenever there are blank nodes in a query pattern.

cygri · September 6, 2010, 10:00pm

This paper might be related.

CommentBot · September 6, 2010, 10:00pm

My google powers are failing me on this one. As Michael says, blank nodes are tricky. The relevant issue in the sparql 1.0 work was bnodeRef, which provides a starting point. The phrase to look for more generally is 'told bnodes'.

The answer to your first question is the inevitable 'insufficient will given time and resources'. It's clearly not trivial, solutions had to be tried out, it interacts with entailments etc etc.

I'm sure I saw this suggested for 1.1, but I find no sign.