How do RDF databases organize their triples to be easily queried?

LucaMatteis · January 22, 2010, 11:00pm

On Dbpedia.org I can easily query for all the Universities it has in its database, simply by running this query:

SELECT ?uri, ?name WHERE {
    ?uri rdf:type dbpedia-owl:University .
    ?uri foaf:name ?name
} LIMIT 1000 OFFSET offset

HTML result

I found that snippet of SPARQL query on some site, and I have two questions about it:

1) How does the user (the person typing the query) know that Dbpedia stores all its universities under the dbpedia-owl:University type?

2) How does Dbpedia keep all their universities under the same dbpedia-owl:University type? Say Dbpedia wants to import a new RDF dataset that contains universities, but this new dataset uses its own type for universities, maybe foo:university. Does Dbpedia have to convert them to dbpedia-owl:University before importing them, or is there a better way to include and merge external RDF data of the same type?

JeenBroekstra · January 22, 2010, 11:00pm

Re 1): he doesn't, at least not initially. In the case of DBPedia, a user finds out what types and properties to use by doing exploratory querying (e.g. a SPARQL query to see what classes exist, then from there what properties each has, etc.) and/or manually clicking around in the DBPedia browsing interface. More generally, people find out what an ontology is structured like either by reading documentation on the ontology, or by doing things like exploratory querying and browsing around.

Re 2): The problem you are referring to is generally known as ontology mapping or ontology alignment. I can't speak for how the DBPedia maintainers would solve this issue, but there are several possible approaches in general. Some simple strategies are :

doing a conversion from the source type to the target type when importing data;
making the source type a rdfs:subClassOf the target type and relying on inference to make sure the imported items are recognized as the target type.

Note that the second point might already have been done by the maintainers of the source you're importing - there are quite a few ontologies out there that are defined as extensions of basic DBPedia types.

Whether any of these strategies are sufficient depends on the use case and environment, for example in situations where you're exposing live, quickly changing data you may need to rely on more sophisticated mapping techniques.

Jesus · January 22, 2010, 11:00pm

Hello, I will try to answer both questions in one shot:

The Bdpedia has a map system based in the Infoboxes that are used in the original wikipedia. I am going to tell you the logical process I would follow to search for universities:

1) Search for an university in the ordinary wikipedia. For example, who doesn´t know harvard?: - http://en.wikipedia.org/wiki/Harvard_University

Notice how the last part of the URL is bold. With that fragment I can access to the resource in DBpedia that represent Harvard university by doing:

I know Harvard is a resource (not a property) so it´s URL must start like: http://dbpedia.org/resource
I add the bolded fragment and compose the complete URI for Harvard. http://dbpedia.org/resource + Harvard_University = http://dbpedia.org/resource/Harvard_University

Fine, now we can resolve the URL and search for the rdf:type: - Among all the types you can find: dbpedia-owl:University, which is the one we are interested in (we was searching universities).

2) I really don´t understand what you mean by "wikipedia adding a Dataset of universities". I BELIEVE that all the resources that appear in dbpedia are imported from wikipedia. But I KNOW that there is a mapping process for the Infoboxes. That explains why all the universities have the type dbpedia-owl:University, (but not explains why it has types like yago:UniversitiesAndCollegesInCambridge,Massachusetts, Maybe someone can enlighten us!)

You can find all the Infoboxes' mappings that dbpedia uses here:

Going further, we can check the mapping for University here.

Now you can see why Harvard, whose Infobox is University has the type dbpedia-owl:University. And can assume that all the universities will have it too. You can find info about this properly described here, in the point 4.3.