Bulk loader for Sesame Workbench/OWLIM-SE?

database_animal · February 6, 2012, 11:00pm

I'm running OWLIM-SE inside the Sesame workbench and attempting to load a data set of about 50 M triples in N-triples and I'm working on a set of about 700 M to try next.

The instructions say I should use a form in the Sesame workbench which lets me POST a chunk of RDF data -- the trouble is that the chunk size allowed is much smaller than the size of my data set. Now, I could write a program that chunks up the RDF and does multiple POSTS, kind of the way Pytassium uploads data into the Kasabi platform. Before I write this program I'd like to ask, "does this already exist?"

Jerven · February 6, 2012, 11:00pm

I wrote this piece of software to load data into a local OWLIM-se without going through the sesame workbench. Might be of interest.

RobVesse · February 6, 2012, 11:00pm

Store Manager (disclaimer - I wrote this) supports any Sesame HTTP based store (amongst various other stores) and has functionality whereby it can import data from a file/URI and imports it in small chunks to the target store.

The default chunk size is a rather weedy 1,000 but you can set it much higher (I can't remember off hand what the maximum limit is in the current release).

Only thing to be aware of is blank node heavy data, in order to guarantee that a store correctly imports blank nodes all triples containing blank nodes are saved until the end of reading the data and then imported in one batch. This ensures that a store doesn't rewrite blank nodes because otherwise _:node in one batch might be treated as something completely different from _:node in the next when they actually are the same node from the data. The upshot of this is that if you've got a lot of blank nodes your memory usage may be rather high

It will also give you basic progress information as it runs.

I'm not aware of any command line tools that have this functionality other than the odd store specific tool e.g. the Virtuoso bulk loader, pytassium