I'm still using the old RDB backend in a project and want to upgrade now to either SDB or TDB.
Which one is better when the application is expecting to be mostly writing/reading/both ?
Are transactions useful (SDB with underlying DB transactions) ?
Which one is better considering scale/clustering ?
What are your experiences with these backends and which one does fit better for which type of situations/use cases ?
If you have a write once (or not too often) read many times scenario, TDB is a better choice especially on 64 bit machines with a lot of RAM.
To avoid concurrency problems use a multiple reads XOR single writer locking system as explained here:
If you need transactions and have a lot of updates from remote machines, SDB might be a better choice.
But you have other options with TDB:
IMHO, TDB is better considering scale/clustering.
TDB is a better bet than SDB, though both are slower than some of the other options out there like 4Store or many of the various Sesame implementations.
The typical SDB use case is that you have an existing relational database system, and you wish to store RDF in it. In some institutions you may be required to use an sql store, or it may have advantages such as guaranteed managed backups, transactions.
Use TDB if you need a fast, persistent RDF store. It's easier to set up, and quicker than SDB.
I've used both extensively and they've both been fine. I usually reach for TDB because of the simplicity and speed.
Recently there was a thread on the jena-dev group about SDBs write performance and best practices for getting the best results. Due to some useful informations and insights there, I thought I add the link here: http://tech.groups.yahoo.com/group/jena-dev/message/44329 (that's the summary post, but the rest of the thread can be explored from there)
The Berlin SPARQL benchmark includes some fairly rigorous performance metrics and the results are published for TDB and SDB (as well as Virtuoso and Sesame among others)