I've been doing some work lately on my dotNetRDF library looking at how SPARQL performance can be improved by parallelizing certain operations and seeing what affect that has on performance. I've already seen good results from a couple of operations - namely Join and Filter - but have seen poor results from others - namely Product.
I have a hit list of other parts of the engine that seem candidates for parallelization and I wondered if anyone had either:
- Ideas for other parts I haven't thought of?
- Knew of existing triple stores or SPARQL engines that already parallelize evaluation in any way, shape or form?
So my current list is as follows:
- Join
- Left Join
- Product - this is just cartesian product so is inherently parallelizable, the reasons for the aforementioned poor results for parallelizing this for dotNetRDF are down to internal implementation issues around thread synchro on data structures
- Union
- Minus
- Exists/Not Exists - these boil down to a semi-lazy join i.e. for each solution find me at least one compatible solution or ensure there are no compatible solutions
- Filter
- Extend i.e.
BIND
- Having - this is essentially filtering over groups