I was thinking about ways of further optimising ASK queries in my SPARQL engine the other day and had an observation which seems fairly intuitive and obvious:
OPTIONAL clauses in ASK Queries can be ignored by a SPARQL engine
So I wondered whether I was correct and whether any existing engines use this optimisation? I have tried to explain my reasoning with a few examples below - counter-examples are welcomed - so any feedback on this would be appreciated.
Consider the following query:
ASK WHERE { ?s ?p ?o OPTIONAL { ?s a ?type } }
For the above query the OPTIONAL
is surely completely irrelevant to whether the ASK
evaluates to true since only the first part of the query has to match?
Now lets consider some slightly less clear queries:
ASK WHERE { OPTIONAL { ?s <http://example.org/noSuchPredicate> ?o } }
In the above the OPTIONAL
is still irrelevant since the algebra will be LeftJoin(BGP({}), BGP({ ?s <http://example.org.noSuchPredicate> ?o }), true)
and since the LHS is the empty BGP it always matches so the result is always true
Now where I think it starts to get really tricky is when you have OPTIONALs nested or in combination with other nested patterns:
ASK WHERE
{
{ ?s <http://example.org/noSuchPredicate> ?o }
UNION
{ OPTIONAL { ?s ?p ?o } }
}
Again this should be true as the 2nd branch of the Union is equivalent to my 2nd example and as such should always result in true. Can anyone come up with an example involving nesting where ignoring the OPTIONAL
would affect the result of the query?
Now this next example is one that I am unsure on since in this query the later part of the query is reliant on the OPTIONAL
matching (or it would be for a SELECT
query):
PREFIX books: <http://example.org/book/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>
ASK WHERE
{
?book dc:title ?title
OPTIONAL { ?book dc:creator ?author }
?author vcard:FN ?name
}
Does the OPTIONAL
have to be evaluated here or could it still be ignored? My suspicion is that is can still be ignored since the two other patterns could still match the data in isolation and they can still be joined (disjoint solutions are always compatible).
The algebra for the above is:
Join(LeftJoin(BGP({?book dc:title ?title}), BGP({?book dc:creator ?author}), true), BGP({?author vcard:FN ?name})
So in my intuition simplifying the LeftJoin
down to just BGP({?book dc:title ?title})
doesn't actually change the overall semantics of the query IMO (please correct me if I'm wrong).
Final examples time - does having a FILTER
in the OPTIONAL
make any difference to this? Again my intuition is no since a FILTER
inside an OPTIONAL
is only used to determine whether solutions found inside the OPTIONAL
should be joined to those outside of it e.g.
ASK WHERE
{
?s ?p ?o
OPTIONAL { ?s a ?type . FILTER(false) }
}
The only obvious scenario I can find where ignoring the OPTIONAL
actually would make a difference is the pattern for checking that something does/doesn't exist (the OPTIONAL + FILTER(!BOUND(?var))
pattern):
PREFIX books: <http://example.org/book/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX vcard: <http://www.w3.org/2001/vcard-rdf/3.0#>
ASK WHERE
{
?book dc:creator ?author
OPTIONAL { ?author vcard:FN ?name }
FILTER (!BOUND(?name))
}
So in this case the OPTIONAL
is actually essential to the result. From an optimisation point of view the algebra would be the following:
Filter(LeftJoin(BGP({?book dc:creator ?creator}), BGP({?author vcard:FN ?name}), true), !BOUND(?name))
So perhaps we can go with the following optimisation rule:
For
ASK
queries aLeftJoin
operator can always be simplified to just the LHS component of the operator provided that theLeftJoin
operator does not occur inside aFilter
operator
What do people think? Can you find any mistakes in my reasoning or is this generally a sound optimisation to make?