Pros and cons for different Jena backends - SDB vs. TDB

BastianSpan · January 18, 2011, 11:00pm

I'm still using the old RDB backend in a project and want to upgrade now to either SDB or TDB.

Which one is better when the application is expecting to be mostly writing/reading/both ? Are transactions useful (SDB with underlying DB transactions) ? Which one is better considering scale/clustering ? ...

What are your experiences with these backends and which one does fit better for which type of situations/use cases ?

castagna · January 18, 2011, 11:00pm

If you have a write once (or not too often) read many times scenario, TDB is a better choice especially on 64 bit machines with a lot of RAM.

To avoid concurrency problems use a multiple reads XOR single writer locking system as explained here:

http://openjena.org/how-to/concurrency.html

If you need transactions and have a lot of updates from remote machines, SDB might be a better choice. But you have other options with TDB:

http://github.com/afs/TDB-BDB

IMHO, TDB is better considering scale/clustering.

See also:

(*) Experimental!

Have fun!

mhgrove · January 18, 2011, 11:00pm

TDB is a better bet than SDB, though both are slower than some of the other options out there like 4Store or many of the various Sesame implementations.

CommentBot · January 18, 2011, 11:00pm

The typical SDB use case is that you have an existing relational database system, and you wish to store RDF in it. In some institutions you may be required to use an sql store, or it may have advantages such as guaranteed managed backups, transactions.

Use TDB if you need a fast, persistent RDF store. It's easier to set up, and quicker than SDB.

I've used both extensively and they've both been fine. I usually reach for TDB because of the simplicity and speed.

BastianSpan · January 18, 2011, 11:00pm

Recently there was a thread on the jena-dev group about SDBs write performance and best practices for getting the best results. Due to some useful informations and insights there, I thought I add the link here: http://tech.groups.yahoo.com/group/jena-dev/message/44329 (that's the summary post, but the rest of the thread can be explored from there)

harschware · January 18, 2011, 11:00pm

The Berlin SPARQL benchmark includes some fairly rigorous performance metrics and the results are published for TDB and SDB (as well as Virtuoso and Sesame among others)