Performance of SELECT vs CONSTRUCT

I recently ran some comparison timings of queries using SELECT and CONSTRUCT. I was quite surprised to see such a consistent and large difference in performance between the two query types:

Select: 262ms, 6ms, 7ms, 6ms, 5ms
Construct: 89ms, 45ms, 43ms, 47ms, 44ms

I've laid out my data so that specific entities, and related resources, are stored in a dedicated graph for (what I assumed would be) quick CONSTRUCT based retrieval. For this test data, the CONSTRUCT would be pulling back more data than the SELECT query, but not by much. The SELECT query is pulling back a SparqlResultSet constaining 1 result with 7 bindings. The CONSTRUCT on the other hand, is pulling back a graph containing 13 triples. Either way, there's not a vast difference between the quantities of data in this test scenario.

The environment is: dotNetRdf 0.9, OWLIM-LITE 5.3.5777 running in Tomcat 7 as a service on top of Windows. All running locally on a (pretty old) development laptop.

Would you expect to see such performance differences for this platform? Is this something you've seen in your own work? Is this performance disparity a result of result parsing in the client-side, or performance of the query within the triple store? How can I find out which of the two it is?

Here's the SELECT query (with square brackets substituted for angle brackets for readability):

PREFIX a: [http://industrialinference.com/2013/03/abstract/]
PREFIX w: [http://industrialinference.com/2013/03/work/]
PREFIX rdfs: [http://www.w3.org/2000/01/rdf-schema#]
SELECT DISTINCT ?this ?summary ?description ?dueDate ?estimate ?creator ?status
WHERE {
    GRAPH ?this {
    ?this a w:Task ;
        a:displayName ?summary ;
        w:timeEstimated ?estimate ;
        a:creator ?creator ;
        w:taskCurrentState ?status.
        OPTIONAL {?this rdfs:comment ?description .}
        OPTIONAL {?this a:to ?dueDate .}
        ?this a:partOf ?proj .
    }
    GRAPH ?proj {
        OPTIONAL {?proj a w:Project.}
        OPTIONAL {?proj a w:Project;  a:displayName ?projName .}
        OPTIONAL {?proj w:tenant ?tenantName .}
    }
    FILTER (?this = [http://industrialinference.com/2013/03/work/IdTask8])
    FILTER NOT EXISTS { ?this a:deleted true . }
}

And here's the CONSTRUCT query:

PREFIX a: [http://industrialinference.com/2013/03/abstract/]
PREFIX w: [http://industrialinference.com/2013/03/work/]
PREFIX rdfs: [http://www.w3.org/2000/01/rdf-schema#]
CONSTRUCT { ?this ?p ?o . }
WHERE {
    GRAPH ?this { ?this ?p ?o .
        OPTIONAL {?this a:partOf ?proj .}
    }
    GRAPH ?proj{
        OPTIONAL {?proj a w:Project.}
        OPTIONAL {?proj a w:Project;  a:displayName ?projName .}
        OPTIONAL {?proj w:tenant ?tenantName .}
    }
    FILTER (?this = [http://industrialinference.com/2013/03/work/IdTask8])
    FILTER NOT EXISTS { ?this a:deleted true . }
}

What's missing from the queries, relating to projects is one of several optional FILTERs, used to filter on ?proj or other properties of w:Task (i.e. this is a part of a Search API where you can search on a bunch of different criteria, and where the filter expressions are inserted into the base query at run time). I guess the needless matching of project triples will add overhead, but it should be fairly constant between the two queries, so I'm ignoring it for now. Is that a fair assumption?

UPDATE

There's been some very constructive commentary around the specific queries provided above, which has certainly given me some insight into the issues I need to remain aware of when writing queries like these. What I'd like to do is focus on the question of whether it is reasonable to expect that, all thing being equal, a CONSTRUCT pulling back a whole graph should perform better than a SELECT that pulls back the same data with explicit variable bindings.

I'm keen to understand whether there are any idioms I can adopt that will neutralize the difference, since I really would prefer to use a CONSTRUCT query, to allow me to rehydrate a whole object graph with only one round-trip to the triple store. For example, is there a significant performance difference (in general) in processing the different available response formats possible from a CONSTRUCT query? Is my preoccupation with reducing network hops really necessary? Perhaps it is inappropriate - but when dealing with RDBMSs there is often a significant cost associated with connecting (establishing the connection, handshaking) to the database across a network.

Making a community wiki question to try and incorporate all the suggestions from different people, please feel free to edit and add/expand the discussions here.

Different Queries

The queries are different to start with as @Signified, @Jerven and @Andrew have discussed.

We can go back and forth about which is the harder of the queries but with the WHERE clauses being different it is not really a fair comparison.

Use of DISTINCT

As @Jerven points out use of DISTINCT in SPARQL queries is often expensive and restricts the ability of the triple store to just firehose results at you as soon as it finds them since it has to pass them through some way of eliminating duplicates.

Client Side Parsing

I would not expect to see a huge difference in parsing overhead for this small a data though if the CONSTRUCT is coming back as RDF/XML that might add a bit of overhead as the RDF/XML parser is slow compared to other parser, but for this small a data I hope it wouldn't make that huge a difference.

One way to eliminate the parsing from your calculations is to use SparqlRemoteEndpoint and just call QueryRaw() which will give you the HttpWebResponse directly and you can then time the time taken to simply read the response stream thus eliminating the parsing overheads.

CONSTRUCT has extra work to do

Regardless of the queries CONSTRUCT always requires a store to do slightly more work than a SELECT. For a CONSTRUCT the system must first perform the WHERE part as if it were a SELECT * and then has to go through a phase of substituting the solutions into the template portion of the query.

Again with a small amount of data I wouldn't expect this to make that much of a difference.

SELECT is the most common query

SELECT is by far the most common query type and often that code path is the best optimized by a triple stores developers.

I have filed a bug against one triple store in the past where an identical trivial CONSTRUCT query was woefully slow by comparison (25 seconds CONSTRUCT vs 75 milliseconds SELECT)

Overhead of connecting to a triple store

Depending on the store/scenario this can be a significant factor. It all depends on the exact techniques and implementations that are used. In this case you connect via a http request. On a local machine the cost of setting up an unsecured http connection is in the order of 1ms, increasing to 10ms for a secured connection. This can be avoided by reusing the same http connection for multiple requests or by using a native connection, which in the case of owlim means java code.