Triple store optimized for large transactions?

I've done some experimentation with Virtuoso, BigOWLIM, and Stardog.

With current products I see (at least) an order of magnitude better performance with bulk loads than I do with any attempt to update the store one triple at a time.

Assuming I want to get my jobs done in the time I have left on this Earth, I've got a strong motivation to avoid OLTP operation and to do everything with bulk loads.

Now, if that's the case, why shouldn't I be using a triple store that works more like Lucene... That is, something that's optimized for large transactions, and that has a a "read often, write infrequently approach"? Such a system would work by using sorting and merging operations, and might give much better bulk load and query performance, although it would give up on the idea of using a triple store in an OLTP mode.

Does this make sense? How much better performance could such a system get?

Stardog is optimized for reads over writes, and our bulk load performance is quite good, but the recent set of releases 0.7.x include a significant change to the internal index structure that produces transactional load rates that are nearly on par with our bulk load speed; there is certainly now less than an order of magnitude difference in the performance between the two and can in some cases be equivalent.

So if you have not tried one of the newer releases, I would suggest you give a new version a go.

Now, if that's the case, why shouldn't I be using a triple store that works more like Lucene...

...or perhaps a triple store that's built on top of Lucene? You might be interested in SIREn... an IR style index for RDF.

From the paper:

Our system supports full-text search, semi-structural queries and top-k query results while exhibiting a concise index and efficient incremental updates.

Used to index 10s of billions of triples in Sindice over a cluster of machines. The inevitable trade-off (edit: for SIREn to ensure efficient queries), however, is query expressivity. They don't support "path" joins (or indeed full SPARQL), but rather atomic lookups and "star-shaped/bushy" queries.