Ordering in GROUP_CONCAT in SPARQL 1.1

I'm messing around with the idea of getting characteristic sets (a set of sets of properties used to describe the same subject) from RDF datasets using SPARQL 1.1. The idea is as follows:

    SELECT ?characteristicSet
    WHERE {
            SELECT ?s (GROUP_CONCAT(?p; separator="|") as ?cs)
            WHERE { ?s ?p ?o }
            GROUP BY ?s
            # ORDER BY ?p
    }

I want to enforce an ordering in the GROUP_CONCAT to ensure that subjects with the same property extension will produce the same string for ?cs. But the ORDER BY shown won't work since, I assume, it is applied as a solution modifier after the SELECT (i.e., effectively ignored).

Is there a way to enforce the ordering in GROUP_CONCAT?


My searches tell me that SQL has a dedicated ORDER BY clause that can be stuck in the brackets of GROUP_CONCAT but the SPARQL 1.1 grammar tells me that ORDER BY can only be a solution modifier. I also found the same question asked by @Jeen but without a resolution at that time.

(Another option might be to have another subselect sort all triples first, but that doesn't seem to guarantee that that ordering will hold later, plus it could be super super expensive.)

As Maurizio's answer points out, it's possible in some implementations to get some ordered results by using a subquery that orders some results, and then using group_concat with those results in an outer query. I've used that approach in Jena and had good luck with it, as shown in a Stack Overflow answer. However, this behavior isn't guaranteed. Even if multisets were ordered structured, group_concat's definition says that the order of concatenation is unspecified (emphasis added):

18.5.1.7 GroupConcat

GroupConcat is a set function which performs a string concatenation across the values of an expression with a group. The order of the strings is not specified. The separator character used in the concatenation may be given with the scalar argument SEPARATOR.

The documentation does provide an example:

For example, GroupConcat({"a", "b", "c"}, {"separator" → "."}) = "a.b.c".

but based on the text above, this is only one of the possible things that GroupConcat({"a", "b", "c"}) could return. We can see this clearly based on the formal definition of GroupConcat:

Definition: GroupConcat

literal GroupConcat(multiset M) If the "separator" scalar argument is absent from GROUP_CONCAT then it is taken to be the "space" character, unicode codepoint U+0020.

The multiset of values, M passed as an argument is converted to a sequence S.

GroupConcat(M, scalarvals) = GroupConcat(Flatten(M), scalarvals("separator"))

GroupConcat(S, sep) = "", where |S| = 0

GroupConcat(S, sep) = CONCAT("", S0), where |S| = 1

GroupConcat(S, sep) = CONCAT(S0, sep, GroupConcat(S1..n-1, sep)), where |S| > 1

The last case of the definition is the important one. Since a multiset has no order, there's no way to say which element S0 will be. The first element could be any element of the multiset. Then, recursively, the second could be any (except the first that was included). Thus the order is undefined.

Not sure whether I understand it well, or perhaps you already have this solution, but for those that may need it:

SELECT ?s GROUP_CONCAT(?p; separator="|") as ?cs  {
    SELECT ?s ?p
    WHERE { 
        VALUES (?s) {(dbpedia:Rome) (dbpedia:Milan) (dbpedia:Florence) } # just to test it of few entities, to be removed otherwise
        ?s ?p ?o 
        FILTER(STRSTARTS( STR(?p) , "http://dbpedia.org/ontology") )
    } ORDER BY desc( ?p )
} GROUP BY ?s

Please notice that, regarding the following consideration:

Another option might be to have another subselect sort all triples first

the proposed solution is not ordering all the triples first, only those belonging to the group.

Please also notice (referring here to the problem posed by @Jeen in his question cited by the OP) that according to the following example in the definition of GROUP_CONCAT:

GroupConcat({"a", "b", "c"}, {"separator" → "."}) = "a.b.c".

seems that GROUP_CONCAT preserves the ordering. If an implementer does not respect the order, then she cannot guarantee to comply with the given normative example. I would expect "or any other permutation" in the W3C recommendation, if the any order can be given in the output of the GroupConcat.

SPARQL 1.2 is considering this: https://github.com/w3c/sparql-12/issues/9