Is Blueprints misrepresenting RDF in its documentation?

RobVesse · January 11, 2012, 11:00pm

I was investigating using the Tinkerpop Blueprints API which provides support for using arbitrary SPARQL endpoints as the source of the graph. But when reading the documentation I found the following statement:

Any of these stores can be manipulated with Blueprints through their Sail interfaces. Here is a list of aspects of RDF that should be understood when dealing with Blueprints over an RDF store.

No duplicate edges: RDF considers two edges the same if they share the same subject, predicate, object, and graph. Thus, edges with these in common are, in fact, the same edge.

No indices: There are no indices as no elements have properties that can not be accessed simply by using referencing their id (i.e. their URI, blank node, or literal string).

Infinite vertices: RDF is edge based and as such, every possible vertex exists. It is only when finding outgoing or incoming edges to some vertex do you recognize it within a larger graph structure. For this reason, you can not iterate over all vertices in the graph.

Now points 1 and 2 I completely agree with, point 3 is the one that causes me problems. If you take a strict open world view then maybe you can argue that there are infinite vertices but certainly in an implementation sense that seems to be a fallacy since whatever SPARQL endpoint you choose to use will have some finite number of vertices.

My other problem with point 3 is stating that RDF is edge based, yes RDF is concerned with expressing relationships between things but to imply that RDF lack nodes seems like a gross misrepresentation of RDF to me. To me the subjects and objects of triples are nodes in the graph (or vertices in Blueprints parlance) so I'm puzzled as to why someone would consider RDF to be purely edge based.

Question

So are they misrepresenting RDF in their documentation or is this just RDF as represented in terminology relevant to a reader with a graph centric view of the world?

GerritV · January 11, 2012, 11:00pm

So are they misrepresenting RDF in their documentation or is this just RDF as represented in terminology relevant to a reader with a graph centric view of the world?

No this is just an explanation of how they represent RDF data in a Graph. But the infinite vertices statement is the only possible way to do it, since it is impossible to have a vertice without edges in RDF (you can only have triples not isolated subjects or objects). Hence you can not add a vertice withoud edges to a blueprint graph backed by RDF, hence Blueprints must assume all possible vertices always exist.

database_animal · January 11, 2012, 11:00pm

I think different systems can and will treat RDF graphs with different semantics.

For one thing, I like to create spaces in which the Unique Name Assumption (UNA) is true because in the process of enforcing UNA, I get control of the identities of things and avoid pitfalls that people run headlong into with Linked Data.

In SPARQL you can write

 SELECT DISTINCT ?s { ?s ?p ?o }

and get a list of edges that show up in statements. People can make graph models that don't let you do this or do forms of reasoning that take in account statements that could be possibly made even if they haven't been made.

Just as anyone can say anything about anything, anyone can also decide how they want to interprets statements.

Ivan · January 11, 2012, 11:00pm

Rob,

to be honest, I am not sure I fully understand what the 3rd bullet point says. However, I could imagine a legal situation when an infinite number of edges exist, and that is if the triple store implements a complete RDFS expansion (at least in theory) on the local graphs. Remember that the "ref:_i" series represent a, theoretically, infinite sequence of properties. (The upcoming SPARQL 1.1's entailment document restricts the RDFS reasoning a little bit to avoid such infinite series, b.t.w.)

That being said, I do not know whether this is what tinkerpop is referring to.