Wicked fast rules engine?

database_animal · February 14, 2012, 11:00pm

I'm working on a scenario where a large number of facts are being processed, ordered on the ?s field of

?s ?p ?o .

so at any given moment I have from 5-500,000 triples on a given ?s. 500,000 is an extreme case -- 100 is a lot more likely. (Technically, the data is not in triple format, but it can be converted to triple format very easily)

I'm interested in applying rules to each of these sets with constant ?s to decide if I want to do further processing of this set. An example of the kind of rule would be

 AND(OR(?s a :Actor,?s a :Writer),NOT(?s a:Person)) -> ?s a :NonPersonAgent

In general I'd like to have RDFS/OWL capabilities available (restriction types, transitive closure, etc.) and also be able to do numerical comparisons and simple arithmetic on floats.

I've tried a few different things to process this data, with the relative timings given

1.5 raw processing of the incoming data with string operators and raw Java code

5 convert to triples, load into Jena models with no inference, use Jena methods to implement the rules

25 use the simplest RDFS profile in Jena and implement as much of the rules as possible in RDFS

75 use the default RDFS profile in Jena

I wasn't patient enough to get timings for any of the OWL reasoners in Jena. Looking at the manual it also seemed that the complement operator is not implemented in Jena, and it would be really nice to have that.

Now what I am doing involves changing the rulebox, seeing what happens, and changing the rulebox again, so the time it takes to apply the rules gets multiplied by many many different trial ruleboxes, so speed is of the essence here.

The crazy question of the day is -- "is there some kind of rules engine I could use which is radically faster than RDFS/OWL in Jena?"

oesxyl · February 14, 2012, 11:00pm

try euler, aka eulersharp. one potential problem with euler is that doesn't use rdf/xml but this can solved by conversion to n3.

benefit: if you are familiar with yap prolog you can write a plugin to do something is missing from euler, :)

Jerven · February 14, 2012, 11:00pm

Have you tried SPIN. Speed of spin depends mostly on the jena backing model. All rules are expressed as SPARQL so development wise it is very straightforward. I would not call it wicked fast but its decent enough. And very keen on data parallelism!

Signified · February 14, 2012, 11:00pm

You could maybe try a RETE rule engine like JESS. It's not directly tailored to RDF, so it will require some adaptation.

Otherwise, you could just try a usual suspect like OWLIM.

RyanKohl · February 14, 2012, 11:00pm

If your data can be chunked up, you could turn each chunk into a model, do your processing, and CONSTRUCT the results. The constructed models from each of the chunks could then be combined into one or more smaller models, where more processing could happen. It's essentially a map-reduce operation, so you can get a bit of parallel bang for your buck from your cores.