Impractical features of the RDF stack

Is there any feature of RDF and related technologies (RDFS, OWL, SPARQL, the Linked Data practices etc) that you consider impractical? Something that just seems unnecessary and gets into the way of getting things done?

I'm not asking for things that you consider outright broken; I don't want to hear about reification here ;-) It's more about things that bug and annoy you frequently and where you wish that the language designers had made things a bit easier for you (even if you ultimately understand why it was designed the way it is).

An example that bugs me: If you use datatypes with RDF/XML, then you have to repeat the full datatype URIs all over the place, for each individual value, and you can't even abbreviate them as QNames. Even if the datatype already has been declared as the range of the property. I wish the design would adhere a bit better to the “Don't Repeat Yourself” principle!

Which impractical feature of the RDF stack is bugging you?

As a software developer the worst thing about Linked Data for me is trying to decide if something is an Information Resource or not...and minting identifiers and defining server side behavior accordingly. httpRange-14 is dead, long live httpRange-14! I personally have come to prefer REST's laissez-faire approach to the nature of resources. URLs identify Resources. Resources can be anything. When you resolve a URL you get a Representation back. Does it really have to be more complicated than that?

It's more about things that bug and annoy you frequently and where you wish that the language designers had made things a bit easier for you (even if you ultimately understand why it was designed the way it is)

Blank nodes! Sure I understand they can be useful in some cases, but in most they cause problems later on. It would be nice to have a common and unique mechanism to at least have something like GUIDs asigned to blank nodes in order to make them adressable and exchangable over different triple stores.

Generally

  • Path to entry
  • Lack of documentation
  • High cost of adoption (particularly on developers)
  • Mixed messages, frequent conflation
  • Comparatively little focus / slow movement on the 'next steps'
  • Frequent ontology creation by non domain experts

Inverted but related answers

  • Lack of standardisation on simple things such as a triple Interface (thinks RDFa API for RDF)
  • Lack of standardised binary and json serializations of RDF
  • Lack of tooling, specifically runtime reasoning and inference, lots of focus on publishing data which is dependant on needing a reasoner/inference engine with relatively zero focus on providing the tools to consume and understand the data at runtime.
  • SPARQL / triplestore dependency / assumption, massive lack of consideration for the common masses who at best have a shared hosting environment for publishing Linked Data with neither of the aforementioned available.
  • Key documents that need updating which promote best practises which are perhaps not best practises at all ;)
  • Limited outreach to key working groups such as webapps (would require focus on the 'next steps' first though, thus a dependant)

Precise direct answers to the question posed

  • RDF Documentation all being Serialization specific, need RDF documentation that is non serialization specific.
  • rdf:value (seriously, define it tightly or remove it) what's the point in having something so ambiguous in the core schema of something meant to disambiguate the world..
  • Tight coupling of RDFa to the DOM (and thus browsers)
  • XSD types not being first class members, imho should be the 'default namespace' for all object values so you can simply ^^date
  • Crossover on XSD Restrictions/Facets and OWL Datatype Restrictions with no clarity over which to use.
  • Basic data validation, where are the ontologies which include Datatype Restrictions so we can validate object values easily (seeAlso aforementioned), where is the discussion about this.

... and all the obvious stuff we're not mentioning :)

Good question!

Just a niggle, but you can't copy/paste prefix definitions between SPARQL queries and Turtle documents because the syntax is annoyingly slightly different.

SPARQL : I can write a query to get all the ?x matching a given pattern. But I cannot get easily a complete description of those ?x: with select, I have to name the props I want returned, and with describe, I do not know what are the ?x in the returned RDF (which can contain other resources)

In general, linked data is still too much like programming. The tools and technologies don't seem to have come of age yet. When this happens, things will be much simpler and easier for beginners (like myself) to get to grips with publishing linked data.

I find it bothersome that today's triple stores (so far as I can tell) store all triples in one big pile. You've got little or no choice about what kind of indexes are built to support queries -- you pay the (high) cost for building indexes for queries you don't run and you don't get indexes that are efficient for the queries that you actually run.

I find people in the RDF world are oblivious to this... At best they sell a hybrid RDBMS/RDF product (Virtuoso or Oracle) and will advise you to use the RDBMS part when you want the choice of storage layout that you get with an RDBMS. On the other hand, a friend of mine who does data warehousing for a big data health care company just laughed when I told him about the lack of control over physical partitioning in current RDF products.

To put it simply, if you load data into an index of any kind, you find that the time to insert an item (say a triple) isn't constant, but increases over time. The time to insert data into a B-tree index goes as log N without considering complicating factors like what happens when you depend on a slower part of the memory hierarchy. Now, in a RDBMS, you can create new tables, and each table starts out with N=0... To some extent you can beat "index bloat" by physically partitioning it.

I'd really like to see an RDF product that makes it possible to store named graphs in completely separate indexes... So I can have a core of say 20 million triples that's crazy fast, but if I have 200 million triples that I only use occasionally, I can work with my 20 million without the indexbloat of 220 million triples. To be fair, I'm not entirely clear what impact this has on inference.

Ordering stuff.

I like linked lists but at the moment rdf:List is just hard to work with. But even if they were well supported in the whole toolchain (including SPARQL) they're still questionable semantics-wise and unnecessarily complicated to implement. OWL basically doesn't let you use them at all. Databases could work a lot faster with other data structures.

And it's not the same to say that the property value of something is a list compared to saying that the property value of something is several things in a certain order. So basically I want something where I can use one and the same property to either point to a single value or several values in a certain order such that:

{ :a :b (:x :y :z) } => {:a :b :x, :y, :z}

(At least I think that's what I want. Of course you still need to be able to retrieve them in the right order.)