Can RDF be used as the primary format for data?

In response to a question I asked about modelling, Cygri stated that

RDF is rarely used as the primary form in which data is managed. More typically, RDF is just produced as a “view” on some existing data that lives in some other format, and often in a relational database.

Assuming new data, what arguments are there for/against using RDF as the primary form for data?

Pros:

  • Extensibility. When you find you need to add new properties, either because you have integrated new information or because you now understand the problem better, you can just do so without having to redesign a schema and migrate legacy data.
  • Semi-structured. Copes well situations where you have different levels of information on different entities, don't have to force everything into a single record shape and mark lots of things as "missing".
  • Particularly well suited to representing arbitrary graph structures.
  • Helps you work with your data at the conceptual level. Just model your domain as an ontology and you can immediately create and manipulate instance data corresponding to that ontology. Don't get bogged down in details of mapping the conceptual model to data layouts and access paths. Which in turn means you have great flexibility of query and access paths.
  • Supports inference.
  • Can store the model information (ontologies) along side the instance data in the same format, making it easier to discover the model for data and keep things in sync.

Cons:

  • Performance - this is the flip side of not having to worry about data layouts and optimized access paths :)
  • Weakness of validation and integrity checking - flip side of extensibility and semi-structured model, yes you can do some integrity checking and there are good tools out there but ultimately you can't have it both ways.
  • Relative immaturity of tools - sure there are lots of fantastic RDF stores and associated tools out there but the sheer level of investment in RDMS tools, admin tooling, mapping layers etc will take a little catching up with.
  • Lack of data entry tools, as pointed out by cygri. Data entry, update, review are all key tasks that are relatively less well supported by existing RDF tooling. There's no intrinsic reason this can't be solved and there are, for example, spreadsheet based RDF data entry systems and form generators but right now getting from a schema/ontology to a really usable entry/maintenance UI takes more work than it should.

An approach which can work well is to think about your data at the conceptual level using OWL + RDF, use a generic triple store initially. Then as the requirements settle down you optimize and add additional integrity checking "under the hood". Whether that is best implemented as optimized indexes and processing around a general triple store, or by eventually moving to a different underlying store and projecting out an RDF view, will depend on the problem.

BTW You may find the analysis we did in this report helpful, it was done in the context of IT Systems Management but is fairly generic.

Dave has already mentioned that RDF deals very well with semi-structured data, i.e. circumstances where there are different levels of detail about entries.

Another perspective on that is RDF is good for dealing with heterogeneous data, i.e. where there are many different types of object, with their own schema and inter-relationships. Each object might be fully-described, but there may be a wide range of different types and, importantly, the number of types evolves over time. This is a consequence of RDF's schema-less design.

For example I found that ultimately that RDF was a great data model for a publishing system because:

  1. There were a variety of different levels of data available from different publishers (semi-structured) AND
  2. That there were an increasing variety of different types of content, and a number of different types of relationships between those items.

This also relates to Dave's point about extensibility.

What Dave said. Just one more thing:

There is a lack of good data entry UIs for RDF.

It's easy to enable users to put their data into spreadsheets, CMSes, simple relational databases and so on; but putting a decent data entry frontend over an RDF schema is still significantly more work at this point in time. This suggests using the spreadsheet/CMS/DB as the master store where the data “lives”, and adding RDF export (or an “RDF view”) on top of that master store.

What we did recently in a project, we used relation database to store our data while the users are editing it. The benefit was the easy hibernate, gwt, etc. integration. Once one object is done (fully described), and the user decides to publish it, it will be inserted to a Jena triplestore. This way we can stage the data and review before publishing.