What's the best way to parameterize SPARQL queries?

I'm going to show an example with Jena, but I'd love to hear about other frameworks and other languages.

In SQL-based applications we often make queries that have parameters. Some SQL drivers support numbered or names in parameters, but most of us sometimes resort to 'Dynamic SQL' where we interpolate variables in SQL queries. Recently I wrote a Query like this in Jena...

        Query rawResultQuery=QueryFactory.create(
                "PREFIX sw: <http://skunkworks.example.com/redacted#> "
                + "SELECT ?a ?b ?c ?d "
                + "{"
                + "   ?rawHit sw:key <"+someKey+"> ."
                + "   ?rawHit sw:a ?a . "
                + "   ?rawHit sw:b ?b ."
                + "   ?rawHit sw:c ?c ."
                + "   ?rawHit sw:d ?d ."
                + "} ORDER BY DESC(d)");

Note that someKey is a Resource, so there might be some resistance to SPARQL injection, although one could easily want to interpolate a string or a number.

I've seen quite a bit of trouble caused by SQL injections in the last ten years and I'd like to avoid that problem for my next ten writing SPARQL. What's the best way I can do this?

In Sesame, you can do this by providing bindings on the prepared query object:

// parse the query string into a prepared query object
String queryString = "SELECT * WHERE {?X ?P ?Y .}";
TupleQuery query = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString);

// bind variable ?X to the URI 'foo:bar'
URI foobar = conn.getValueFactory().createURI("foo:bar");
query.setBinding("X", foobar);

As Andy noted since joining the Jena project I also added a ParameterizedSparqlString class to Jena which will aid users in building and parameterizing queries.

Example usage as follows (adapting from the example shown in the question):

ParameterizedSparqlString queryStr = new SparqlParameterizedString();
queryStr.setNSPrefix("sw", "http://skunkworks.example.com/redacted#");
queryStr.append("SELECT ?a ?b ?c ?d");
queryStr.append("{");
queryStr.append("   ?rawHit sw:key");
queryStr.appendNode(someKey);
queryStr.append(".");
queryStr.append("  ?rawHit sw:a ?a .");
queryStr.append("  ?rawHit sw:b ?b .");
queryStr.append("  ?rawHit sw:c ?c . ");
queryStr.append("  ?rawHit sw:d ?d .");
queryStr.append("} ORDER BY DESC(d)");

Query q = queryStr.asQuery();

It has various methods that can be used to treat it similar to a StringBuilder for just appending raw text, it also has a variety of appendNode() and appendLiteral() style methods for inserting appropriately escaped and formatted values into the query string.

You can also use it to parameterize a query, so you can use methods like setParam(), setIri() and setLiteral() to indicate that a given variable should be replaced with some constant when the query string is converted into a query.

I don't know how Jena deals with this, but some SPARQL implementations allow you to pre-bind particular variables in a query. So for example, you'd write your query like this:

PREFIX sw: <http://skunkworks.example.com/redacted#>
SELECT ?a ?b ?c ?d 
{
  ?rawHit sw:key ?key .
  ?rawHit sw:a ?a .
  ?rawHit sw:b ?b .
  ?rawHit sw:c ?c .
  ?rawHit sw:d ?d .
}
ORDER BY DESC(?d)

Then you'd pre-bind the variable ?key to whatever URI/literal you like before executing the query.

EDIT: which and how?

  • Jena's ARQ appears to have setInitialBindings on the com.hp.hpl.jena.query.QueryExecution interface, though not all implementations of the interface support it. For example, a query execution to a remote SPARQL protocol endpoint will not.
  • RDF::Query allows a named argument bind to be passed to the prepare or execute methods of a query. This is a hashref (i.e. a reference to an associative array) binding variables to values.

FYI SPIN has a mechanism called SPIN templates for the purpose of formalizing queries that take arguments (pre-bound variables). In the Jena implementation of the SPIN API, we use setInitialBindings(). The same mechanism is used in SPIN functions.

Talis' Kasabi data platform has extended the idea of introducing further variable bindings into a standardised HTTP interface. I like this idea (although their name, catchy as is it, 'SPARQL stored procedures' is a bit of a false analogy): http://www.slideshare.net/ldodds/creating-apis-over-rdf

dotNetRDF (disclaimer - I'm the lead developer) has an explicit SparqlParameterizedString class which is designed for exactly this. It works similarily to a SqlCommand for those familiar with the ADO.Net style of programming e.g.

//Create the Parameterized String
SparqlParameterizedString queryString = new SparqlParameterizedString();
queryString.Namespaces.AddNamespace("ex", new Uri("http://example.org/ns#"));
queryString.CommandText = "SELECT * WHERE { ?s ex:property @value }";

//Inject a Value for the parameter
queryString.SetUri("value", new Uri("http://example.org/value"));

//When we call ToString() we get the full command text with namespaces appended as PREFIX
//declarations and any parameters replaced with their declared values
Console.WriteLine(queryString.ToString());

You can also inject values for normal SPARQL variables as well as ADO.Net style parameters so you can create a query where you might inject a value in place of a variable or you might use it as is.

One feature of SPARQL is that query and update are separate languages - that reduces (does not remove) injection attacks because you can pass an expected query through the query parser and any update operations will be syntax errors. You can only turn a query into another query.