Extracting S P O tuple statements from SPARQL queries

Hi,

I have a set of queries and would like to do some statistics about them. In practice I'd like to count the most used resources in my queries (the resources used most as subjects, the predicates mostly used etc...).

In practice I'm looking for a function: f(SPARQLQuery) -> List<TriplesStatement>

I'm trying to analyze these queries with Sesame API, but I'm having a hard time doing so.

I'm looking for an advice, code snippet, library, whataver might help me with this.

Thanks

In Sesame, you can do this by first parsing the query using the SPARQLParser. This produces a ParsedQuery object containing a TupleExpr, which is an algebraic query model, basically a tree of operators.

The next step then is to analyze this TupleExpr. You can do that by implementing your own QueryModelVisitor - this uses a standard Visitor pattern to traverse the operator tree.

Given that you are most interested in use of particular resources, I would have your Visitor object implement meet(StatementPattern node) and do the necessary bookkeeping there: analyze the particular statement pattern, storing/counting its subject, predicate, and object (if present). Or if you prefer, just collect all statement patterns in a list for later analysis.

So, all in all, your code would look roughly like this (showing the variant with the visitor just collecting statement patterns for later analysis):

import org.openrdf.query.parser.sparql.SPARQLParser;
import org.openrdf.query.parser.ParsedQuery;
import org.openrdf.query.algebra.helpers.QueryModelVisitorBase;

SPARQLParser parser = new SPARQLParser();
ParsedQuery query = parser.parse(queryString, null);

StatementPatternCollector collector = new StatementPatternCollector();
query.getTupleExpr().visit(collector);

List<StatementPattern> patterns = collector.getPatterns();

// etc. iterate over the patterns, turf uri occurrences, etc.

...

class StatementPatternCollector 
                 extends QueryModelVisitorBase<Exception> 
{
   private List<StatementPattern> statementPatterns; 

   @Override
   public void meet(StatementPattern node) {
       statementPatterns.add(node);
       super.meet(node);
   }

   public List<StatementPattern> getPatterns() {
       return this.statementPatterns;
   }
}

Ok - you need to be looking at the Jena API:

http://openjena.org/ARQ/javadoc/index.html

In pariticular:

com.hp.hpl.jena.query.Query

com.hp.hpl.jena.sparql.syntax.*

The first class has arbitrary getters for properties of queries and the second package has granular representations of the contents of those properties.

SPARQL 1.1 aggregates work quite well for this kind of analysis. For example to get a count of the number of matches for a variable, use:

SELECT count(?s)
WHERE
{ ?s :someProp :someVal .
}
You can also project the aggregates as variables for use in sub-selects, etc.
SELECT (count(?s) AS ?myvar)
WHERE
{ ?s :someProp :someVal .
}

You can find more information on SPARQL aggregates at http://www.w3.org/TR/sparql11-query/#aggregates (some aggregates, count in particular, are available in a number of existing SPARQL 1 SPARQL engines).

You may try the online SPARQL to SPIN converter: this service parses SPARQL queries and converts them into an equivalent RDF-based representation. After converting your queries, you may upload them to a repository and perform the required analysis using (meta;-)SPARQL.