This question needs a bit of introduction, bear with me please.
In RDF, resources are identified using URI references. The notion of URI reference was added in anticipation of the standardization of IRIs. The SPARQL spec builds on this and in fact adopts the IRI standard as part of its spec: RDF terms in SPARQL queries are identified using IRIs.
Unfortunately, there is an incompatibility: RDF URI references may contain "<", ">", '"' (double quote), space, "{", "}", "|", "", "^", and "`", but these are not allowed in IRIs (see SPARQL's IRI syntax). The upshot of this is that while it is perfectly legal to have an RDF triple of the following form:
<http://example.org/a b> a ex:Foo .
You can not directly query such a resource using SPARQL, e.g.:
SELECT * WHERE { <http://example.org/a b> ?P ?Y . }
is not a syntactically valid SPARQL query.
All of this is pretty well known of course. The reason I am introducing it is that I would like to ask some "best practice" type questions related to this issue:
- have you ever encountered this problem in practice, that is, have you ever had to work with a dataset that contained such non-compatible URIrefs?
- how do you deal with this incompatibility? Do you query around it? Does your triplestore/parser toolkit of choice offer you some kind of workaround for this problem? Or do you simply convert the offending data?
I'm not so much looking for theoretical solutions, I'm more interested in what has been done out there in practice, already.
This question was inspired by a recent discussion on the Sesame mailinglist by the way, just in case you thought it looked familiar :)