What's missing from RDF databases?

KendallClark2 · August 26, 2010, 10:00pm

What features or capabilities do you want or need in an RDF db system that just aren't available presently? Seems like RDF systems still lag very far behind RDBMSes in terms of features and capabilities. What are some of the maturity vectors for RDF DBs?

tobyink · August 26, 2010, 10:00pm

Standardised full text search would be nice, as noted by another response. I'd quite also like to see support for a quad-based version of CONSTRUCT/DESCRIBE (perhaps outputting TriX or N-Quads).

Some sort of equivalent to SQL's views would be good: imagine I have a nice CONSTRUCT query figured out; I'd like to be able to save that query as a particular named graph - not just a snapshot of the query though; I'd want the graph to always contain the latest results. Similarly, it would be nice to use a SELECT query to build a view which could be used in a SPARQL 1.1 subquery.

But mostly SPARQL's syntax/expressiveness is adequate - an important focus for future development should be allowing people to query more data, faster. Of course optimisation is part of that - many triple stores are thin layers over SQL databases, which is not necessarily the best way to store and index RDF data. But distributed SPARQL is part of that too - smart, distributed queries should allow you to have a lot more data at your fingertips.

TimFinin · August 26, 2010, 10:00pm

Standard and practical ways to handle provenance, uncertainty and temporal qualifications.

IvanMikhailov1 · August 26, 2010, 10:00pm

From my Virtuoso-centric (Virtuoso-holic?) perspective, the key missing part is adequate working environment for application developers. While we are adding advanced features people continue to painfully debug trivial typos in names of predicates, variables and other things --- even trivial auto-completion is a problem.

So IDEs and textbooks, the more the better.

Technical issues mentioned above are important but they're on their way to the release already and they will cost us no more than months of work of me alone, that's much cheaper than the required long and intensive IDE work.

Jerven · August 26, 2010, 10:00pm

Standardized way for atomically minting new URI's on the basis of a pattern and a sequence (mysql autoincrement). Needed for apps where we need to generate new primary keys. Bonus points for being able to randomly select an unique value from a range (e.g. random userids).

In essence I do not know of a correct standard function that would answer generating-unique-ids

WilliamGreenly · August 26, 2010, 10:00pm

Better support for federation (that would nice kick in that b***s for RDBMS)

Consideration for transactions and even federated transactions (an even harder kick)

A good, abundant and wide range of supported Accepts types on the SPARQL endpoint

database_animal · August 26, 2010, 10:00pm

I'd like to have more control over physical layout of the data and indexes. In relational databases, we regularly see factor of 100-1000x improvements when we get this right, and I'm sure the same could be done for the RDF world.

Just try this experiment.

(i) Load 300 million rows into a single mysql table with some indexes. (ii) Load 3 million rows into 100 different mysql rows. (iii) compare. You'll find that (ii) happens much more quickly than (i).

The paradigm of "load everything into one huge triple store" might work for a place like IBM or NASA, that can afford a 50 machine cluster and where there isn't any consequence if a project succeeds or fails. However, you might as well hang out a sign that says "lean startups need not apply" because it makes the cost of entry incredibly high. Many of us are getting results using very stupid methods such as RDF -> relational mapping and batch processing because we find that we deliver answers in 20 minutes on cheap hardware, rather than taking a few days on an expensive cluster.

Just to start with, I'd like to see named graphs with completely separate physical storage and indexing. That is, building 100 named graphs with N triples each should cost 100 times what it costs to build 1 named graph with N triples. Beyond that, I'd like to see supplementary indexes that work on specific graph patterns, so I can build something like a multipart index in SQL.

Signified · August 26, 2010, 10:00pm

Standardised syntax for keyword search in SPARQL.

harschware · August 26, 2010, 10:00pm

A standardized mechanism for the equivalent of RDBMS stored procedures. Allegrograph provides a way to add your own LISP code to the server and with Jena you can extend the instantiation of Fuseki, intercept code at the right points and inject your own Java into the query engine and underlying triple store... But I fear the vendor specific solutions a la Oracle, Sybase etc. devising custom stored procedure languages for their RDBMS. It would be really nice if the W3C could head this off at the pass and devise something that co-exists well with the semantic web stack we are all used to. With something like that in hand, Jerven could solve his "atomically minting new URI's" issue.