Example SPARQL queries on LOD cloud

harschware · April 19, 2011, 10:00pm

I'm looking for SPARQL queries that show aggregating data from the Linked Open Data cloud. Queries should not be isolated to a single data set, and should utilize linkages between datasets and/or federate queries. Queries against LOD cache or FactForge are acceptable, since they contain large portions of LOD data. Or Queries which aggregate from two endpoints list in CKAN, via SPARQL 1.1 SERVICE keyword are acceptable too. Queries also have to be expressible using SPARQL 1.0 or SPARQL 1.1 only, no vendor extensions.

I found some demo queries http://lod.openlinksw.com/demo_queries/ which are kind of good, but they don't really reach accross data sets too well as far as I can tell. Some of the queries are mildly interesting but could have been gathered from the same dataset perhaps (e.g. dbpedia), and most use OpenLink's geo based extensions.

Any help appreciated.

similar: http://answers.semanticweb.com/questions/1580/what-sparql-queries-would-you-like-to-run-against-the-lod-cloud

similar: http://answers.semanticweb.com/questions/3700/sparql-query-repositories

Signified · April 19, 2011, 10:00pm

If your focus is more on demonstrating the aggregation of data from lots of sources, you're probably going to need to be able to handle (in some fashion) the semantics of owl:sameAs (although you might get away with just matching up labels).

Whenever I need similar examples, I always turn to old-school folk in the FOAF community, simply because they are the resources that pop up the most in different sources.

So, simple queries like give me all relevant information about Tim Berners-Lee:

SELECT * WHERE {
  <http://www.w3.org/People/Berners-Lee/card#i> ?p ?o .
}

...should touch upon about thousands of sources from a dozen or so domains (assuming you also consider [and trust] owl:sameAs links).

Same scenario for Dan Brickley.

After you've found some interesting resources that are described in numerous datasets/endpoints, you can start pimping the query a bit... ask for details about the resource, or related resources, such as the names of people they know, or images... or other information where properties are well agreed upon. To answer fancier queries, you may find that there isn't enough agreement on property/class URIs, and you're going to have to use ugly UNIONs to get the job done (a little reasoning may help here).

If your focus is more on demonstrating aggregation of data from a few sources—or for federated querying—you can probably go for more ambitious queries. My suggestion would be to pick some of the more popular datasets from the LOD cloud (like DBpedia, Freebase, LinkedMDB, DBTune, DBLP) and see if you can again find well-known resources described in both... then find a way of relating them (by owl:sameAs or common label)... then figure out what queries require the combination of knowledge from the different sources.

Lee's SPARQL by Example gives the following federated query for getting the birthdates (DBpedia) of folks who acted in Star Trek (LinkedMDB):

PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX dbpedia: <http://dbpedia.org/ontology/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?actor_name ?birth_date
FROM <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf> # placeholder graph
WHERE {

SERVICE <http://data.linkedmdb.org/sparql> {

<http://data.linkedmdb.org/resource/film/675> movie:actor ?actor .

?actor movie:actor_name ?actor_name

}
SERVICE <http://dbpedia.org/sparql> {

?actor2 a dbpedia:Actor ;

foaf:name ?actor_name_en ;

dbpedia:birthDate ?birth_date .

FILTER(STR(?actor_name_en) = ?actor_name)

}

}

...which can be tried here (some patience required... after all, it's remote querying).

Trying to create a couple of novel examples following the above steps:

...get me the papers (DBLP) of Fellows of the British Computer Society (DBpedia).

PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl:  <http://www.w3.org/2002/07/owl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?fellow_name ?paper_name
FROM <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf> # placeholder graph
WHERE {

SERVICE <http://dbpedia.org/sparql> {

?fellow dcterms:subject <http://dbpedia.org/resource/Category:Fellows_of_the_British_Computer_Society> ;

owl:sameAs ?dblp_fellow .

FILTER ( REGEX( STR(?dblp_fellow), "dblp"))

}
SERVICE <http://www4.wiwiss.fu-berlin.de/dblp/sparql> {

?dblp_fellow foaf:name ?fellow_name .

?paper dc:creator ?dblp_fellow ;

rdfs:label ?paper_name .

}

}

...which this time uses some owl:sameAs plumbing and which again can be tried here (shouldn't require as much patience).

Pushing the boat out for the number of sources, here you're trying to find which of your asthma (Diseasome) tablets (DrugBank) might have given you that slight dose of acidosis (Sider)... and get some info on the contraindications if available (DailyMed).

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX drugbank: <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/>
PREFIX sider: <http://www4.wiwiss.fu-berlin.de/sider/resource/sider/>
PREFIX dailymed: <http://www4.wiwiss.fu-berlin.de/dailymed/resource/dailymed/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?drug_name ?brand_name ?drug ?contraindication
FROM <http://dig.csail.mit.edu/2008/webdav/timbl/foaf.rdf> # placeholder graph
WHERE {

SERVICE <http://www4.wiwiss.fu-berlin.de/diseasome/sparql> {

?disease rdfs:label "Asthma" .

}
SERVICE <http://www4.wiwiss.fu-berlin.de/drugbank/sparql> {

?drug drugbank:possibleDiseaseTarget ?disease ;

drugbank:dosageForm <http://www4.wiwiss.fu-berlin.de/drugbank/resource/dosageforms/tabletOral> ;

rdfs:label ?drug_name ;

drugbank:brandName ?brand_name .

}
SERVICE <http://www4.wiwiss.fu-berlin.de/sider/sparql> {

?siderdrug owl:sameAs ?drug ;

sider:sideEffect ?sideeffect .

?sideeffect rdfs:label "Acidosis" .

}
SERVICE <http://www4.wiwiss.fu-berlin.de/dailymed/sparql> {

OPTIONAL {

?moiety rdfs:label ?drug_name .

?branded_drug dailymed:activeMoiety ?moiety ;

dailymed:contraindication ?contraindication .

}

}

}

Again, you can try this here (again, with some patience). Cleaner results and query can be gotten by simply dropping the fourth service (the bit looking for warning labels from DailyMed is a bit messy).

Anyways, you get the idea.

[DISCLAIMER: I make no claims about the correctness/completeness of the results. ;)]