How to avoid 'double' results with SPARQL 1.1 aggregate query

rafguns · May 22, 2011, 10:00pm

Hi,

I have a simple RDF graph of authors and their publications. Now I am trying to find which authors have collaborated most (i.e. have co-written most publications). My best shot with SPARQL 1.1 is this:

PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?author1 ?author2 (COUNT(?pub) as ?count)

WHERE {

?pub dcterms:creator ?author1;

dcterms:creator ?author2 .

FILTER(?author1 != ?author2)

}

GROUP BY ?author1 ?author2

ORDER BY DESC(?count)

That works, but I get double results like this:

author1                          author2                          count
<http://example.com/author/5917> <http://example.com/author/630>  173
<http://example.com/author/630>  <http://example.com/author/5917> 173
<http://example.com/author/868>  <http://example.com/author/4622> 155
<http://example.com/author/4622> <http://example.com/author/868>  155
...

Is there any way to avoid the double results? I am using Sesame 2.4.0 but would prefer standard SPARQL 1.1 if possible.

AB · May 22, 2011, 10:00pm

As a quick and dirty solution, you could impose a strict lexical ordering on author IRIs by replacing:

FILTER(?author1 != ?author2)

with:

FILTER( str(?author1) < str(?author2) )