ACID in Triple Stores

tobyink · September 5, 2011, 10:00pm

What is the state of the art in terms of ACID implementation in triple stores? Do any support isolated transactions with rollback/commit?

Seems like SQL-backed engines could probably implement decent transactions quite easily. Do any SQL-backed triple stores expose the transactional facilities of their underlying databases?

And have any non-SQL-backed stores implemented transactions?

For any stores that do support transactions, how do they expose the functionality? Do they provide an extension to SPARQL along the lines of "BEGIN TRANSACTION", or do they expose it via another API?

It seems ACID might be an enterprisey trick that triple stores are missing right now.

GerritV · September 5, 2011, 10:00pm

Edit: this was a comment on Jerven's remark, I have converted it to an answer

In response to Jerven's remark "BigData and Owlim both support transactions", I would like to add the following:

The transaction support in Owlim is kind of useless for OLTP. A connection can't see triples it has added until it commits. I.e. the following code will fail in Owlim: connection.add(subj, pred, obj); assertTrue(connection.hasStatement(subj, pred, obj));
Bigdata offers RW-transactions with Multiversion Concurrency Control (i.e. you have a consistent view of the repository state, even if other connections are mutating it).
Sesame Native and Sesame MemoryStore offer transactions with Read-Committed isolation, but connections that write will block each other for the entire transaction.

Additionally I don't agree with his other remark remark "Though the transaction per sesame connection is an awkward interface.". In my opinion it is a good fit for an API that is supposed to work with many back-ends, and comparable to e.g. what JDBC offers on a database level. If more features are required (e.g. checkpoints), than we probably first need triplestores that offer such functionality. If you want another interface to manage transactions, than this is imho best left to third-party tools and should not be part of the core Sesame API. For instance, at Open sahara we have developed an integration with the Spring framework that makes it possible to use @Transactional annotations on methods to control transaction scope. The Spring integration takes care of calling commit() and rollback() on Spring managed Sesame connections (see https://dev.opensahara.com/projects/os/wiki/SesameExtensions).

AndyS · September 5, 2011, 10:00pm

Some stores offer ACID transactions, some offer transactions to the applications but not all the way down to perfect handling of hardware failures and some just don't offer transactions. Implementing transactions is a significant cost so for stores aimed at situations where it's not needed, the cost may just not be worth it. As you say, it's certainly entreprisey.

As to exposing it, a key point is usage. APIs have transactions features. The standard SPARQL protocol is for use over the web and transactions (and stateful interactions) aren't the main focus. Atomicity of each update operations is encourages but not required. HTTP already has features that can be used, in conjunction with per operation atomicity, to get similar effects.

Disclosure: we've just added full ACID transaction support to Jena's TDB (currently in user testing). The isolation level is serializable and it uses write-ahead-logging. Jena's SDB has had transactions since the beginning as it uses SQL datasbases.

Jerven · September 5, 2011, 10:00pm

BigData and owlim both support transactions. Though the transaction per sesame connection is an awkward interface. Nicer use of transaction objects would be preferable and I should ask for it as a sesame feature. I finding out if a transaction failed is currently hard and depends on catching exceptions.

JeenBroekstra · September 5, 2011, 10:00pm

Sesame currently supports transactions at the API level (though the only isolation level is read-committed and, as GerritV mentions, it is multi-read but single-write in the default backend implementations). See the user manual for details.

Developing directly against the Sesame Repository APIs you can group several operations together and then commit (or rollback) as a single transction (the way this works from a user point of view is somewhat similar to a JDBC connection). If a transaction fails for whatever reason, an exception is thrown and you get the chance to gracefully exit the transaction (e.g. by taking care of your internal bookkeeping, then performing a rollback and/or a close on your connection).

In terms of SPARQL update operations, there is no explicit transaction support added at the language level. We simply reuse the current API mechanism: if the connection on which the SPARQL update is executed is in autocommit mode, each SPARQL update will be treated as a single transaction (and therefore as a single, atomic, operation). If it is not, it will group together multiple updates until you invoke commit.

Looking at ACID compliance for Sesame: each operation (either a single operation or grouped in a transaction) is atomic and consistent. In the Sesame Native store, this is ensured by use of a transaction log - in the event of power failure during transaction execution the database can be restored to a consistent state later (though this may not always be a fully automatic process, it may in rare cases require manual removal of a corrupted index). The in-memory store ensures atomicity and consistency by simply not writing anything to disk until after a transaction has completed (so if power fails during a transaction, your last transaction will simply be completely 'forgotten').

Isolation is ensured (though only a single isolation level is supported), meaning that multithread access without interference is supported. Finally, Durability. That is a feature of a particular store implementation, but as far as I'm aware the Sesame Native and RDBMS backends all guarantee that once a commit has succeeded, the database has in fact stored the result of the transaction. The in-memory store of course does not guarantee this: it does a scheduled disk-sync, but if power fails during that disk-sync this may result in a corrupted dump.

RobVesse · September 5, 2011, 10:00pm

I've done some work with Transactions in dotNetRDF (disclaimer - I am lead developer) but this is purely at the API level and primarily works by delaying the persistence of changes to the in-memory state to the underlying storage. We have two different implementations of this because we found there were two different ways we needed to use transactions:

In SPARQL Update you can have updates that query the data and so changes from one operation need to be made to the dataset immediately since if you are executing several operations within a transaction then subsequent operations may rely on data added/removed by previous ones. Thus to rollback a transaction you need to reverse the changes you already made while a commit can simply discard the change history as you won't need to rollback those changes.
In a scenario where you are holding a (partial) in-memory view of some underlying store that you want to make changes to we found it is safer to make the changes only in-memory and then persist/discard those changes when the user decides to do so.

In terms of enterprise grade triple stores both Virtuoso and Stardog also support transactions.

mhgrove · September 5, 2011, 10:00pm

Stardog offers support for transactions natively offering ACID compliance; though the durability is configurable when you create a database, you don't have to pay the overhead of logging the details of every transaction if you don't want to. The supported isolation level is read-committed, same as Sesame, and is implemented as a two-phase commit protocol via JTA. Future versions will expose the JTA support publicly so Stardog transactions can play nicely in the standard J2EE environment.

The transaction support is exposed in the SNARL API as well as the Jena & Sesame bindings for Stardog and is provided for both in-memory and disk-based databases. I'm told the even work very nicely via Gremlin using Blueprints if you want to use it outside of the Java world.

harschware · September 5, 2011, 10:00pm

Wow, pleasantly surprised to see so many triple stores supporting ACID. One more: A major, or the major, difference between Allegrograph 3.3 and 4.0 is ACID compliance. See ACID Properties of AllegroGraph for more info.