Jena vs Sesame: is there a serious, complete, up-to-date, unbiased, well informed, side by side, comparison between the two?

Is there a serious, complete (i.e. features, APIs, standard compliance, performances, support, documentation, extensions|contributions, etc.), up-to-date, unbiased (I am not. :-)), and well informed, side by side, pros and cons, the good and the bad parts, comparison between Jena and Sesame?

If there isn't one, can we make it altogether? (note: this is a community wiki).

Triple Stores

Quick rundown of some information out there comparing the native triple store solutions of Jena and Sesame.

Performance

Summarising numbers crunched by Bizer et al. (most recent numbers I can find which includes Jena and Sesame: like all evaluations, should be considered informative and not exhaustive—there's always a natural bias in evaluations):

Chris Bizer, Andreas Schultz. The Berlin SPARQL Benchmark. Int. J. Semantic Web Inf. Syst. 5(2): 1-24 (2009)

Results from 2009: Comparing Sesame (Native) and Jena TDB/Jena SDB.

  • To load 100m triples of evaluation data:
    • Sesame: 3 days 6 hours
    • Jena TDB: 1.5 hours
    • Jena SDB: 1 day 15 hours
  • Complex Query Mixes per Hour (rate of query answering -- higher = better) for 1M triples:
    • Sesame: 18,094
    • Jena TDB: 4,450
    • Jena SDB: 10,421
  • Complex Query Mixes per Hour for 25M triples:
    • Sesame: 1,343
    • Jena TDB: 353
    • Jena SDB: 968
  • Complex Query Mixes per Hour for 100M triples:
    • Sesame: 254
    • Jena TDB: 81
    • Jena SDB: 211
  • Simple Query Mixes per Hour for 1M triples:
    • Sesame: 38,727
    • Jena TDB: 15,842
    • Jena SDB: 15,692
  • Simple Query Mixes per Hour for 25M triples:
    • Sesame: 39,059
    • Jena TDB: 1,856
    • Jena SDB: 4,877
  • Simple Query Mixes per Hour for 100M triples:
    • Sesame: 3,116
    • Jena TDB: 459
    • Jena SDB: 584

Conclusion

  • Jena (esp. TDB) much faster to load than Sesame.
  • Sesame but provides faster response times than Jena solutions up to the evaluation scale of 100M triples.

Scalability

Can only find references to unverified claims on the ESW wiki:

  • Jena TDB (1.7B: UniProt V13.4) [2008?]
    • "...on a single machine with 64 bit hardware (36 hours, 12k triples/s)"
  • Jena SDB (650M: UniProt) [200?]
    • "Can load UniProt (650M). Uses PostgreSQL, MySQL, Oracle or MS SQL Server. Also, HSQLDB and Apache Derby"
  • Sesame (70M: LUBM) [2006]
    • "...should be taken as a minimum of what the store can handle... The machine used was a 2.8GhZ P4 (32-bits) with 1GB RAM"

Note that Sesame is demonstrated to load 100M in the previous evaluation, and the older claim of 70M is based on pretty underpowered hardware.

Conclusion

  • Jena (esp. TDB) demonstrates better scale.
  • Hard to get a fix on upper bound for scale for Sesame... only reports of 100M or less...

Summary

  • Jena TDB offers faster load times and better scale, but offers the worst query performance.
  • Sesame seems better all-round for low data sizes (<100M) assuming infrequent loads/low data churn.
  • Jena SDB sits somewhere in the middle, offering load times, query performance, and scalability between the above two.

Performance/scalability comparisons are highly useful, but only part of the story. First of all, both Sesame and Jena support multiple storage backends with highly different performance characteristics. Second, part of the strength of Sesame (and I expect Jena as well) is its ability to provide a storage-independent API, that allows you to easily switch backend without having to change your client code.

So, in my opinion, a useful part of a comparison would be to show how the two frameworks solve certain common tasks: e.g. if I want to load an RDF file and do a query on it, how would I do it in Sesame, and how would I do it in Jena? Having code examples for such simple tasks side-by-side seems useful to me. If someone were to set up something like this, I'd be happy to try and contribute some code examples.

Here's my take. I don't use either Jena or Sesame as a persistent triple store and don't expect to ever do so. There are so many other products out there (4store, Virtuoso, BigOWLIM, StarDog) and I see myself using Jena/Sesame as an API to access them.

On the other hand, I do have a need for something that handles a small number of triples (say 1000 typically and 1M in an extreme case) in RAM. I don't want to deal with setup and teardown time for a "big" triple store.

I do care about feature support, and I think Jena wins that hands down. Jena has strong SPARQL 1.1 support. It also supports much more reasoning than Sesame, even if it's not really good at answering T-Box questions. If you find the reasoning in Jena isn't good enough for you you can hook up to Pellet.

Sesame does put streaming I/O on your fingertips (essential if somebody hands you 500M triples and you don't want to load every single one in a store), but if you load the the RIOT module and dig a little deeper into the docs you can do this with Jena.