I'm working on a scenario where a large number of facts are being processed, ordered on the ?s field of
?s ?p ?o .
so at any given moment I have from 5-500,000 triples on a given ?s. 500,000 is an extreme case -- 100 is a lot more likely. (Technically, the data is not in triple format, but it can be converted to triple format very easily)
I'm interested in applying rules to each of these sets with constant ?s to decide if I want to do further processing of this set. An example of the kind of rule would be
AND(OR(?s a :Actor,?s a :Writer),NOT(?s a:Person)) -> ?s a :NonPersonAgent
In general I'd like to have RDFS/OWL capabilities available (restriction types, transitive closure, etc.) and also be able to do numerical comparisons and simple arithmetic on floats.
I've tried a few different things to process this data, with the relative timings given
1.5 raw processing of the incoming data with string operators and raw Java code
5 convert to triples, load into Jena models with no inference, use Jena methods to implement the rules
25 use the simplest RDFS profile in Jena and implement as much of the rules as possible in RDFS
75 use the default RDFS profile in Jena
I wasn't patient enough to get timings for any of the OWL reasoners in Jena. Looking at the manual it also seemed that the complement operator is not implemented in Jena, and it would be really nice to have that.
Now what I am doing involves changing the rulebox, seeing what happens, and changing the rulebox again, so the time it takes to apply the rules gets multiplied by many many different trial ruleboxes, so speed is of the essence here.
The crazy question of the day is -- "is there some kind of rules engine I could use which is radically faster than RDFS/OWL in Jena?"