Alternatives to owl:sameAs for Linked Data

Many people have stated that owl:sameAs is problematic because it is being (ab)used without regard for its strong semantics.

SKOS defines the following:

  • skos:closeMatch
  • skos:exactMatch
  • skos:broadMatch
  • skos:narrowMatch
  • skos:relatedMatch

These seem useful and could be a basis for replacing owl:sameAs but what other predicates should exist?

Note: this is not the same question as http://www.semanticoverflow.com/questions/312/mapping-ontologies-are-there-alternatives-to-owlsameas since that is about ontology mapping and my question is about instance data.

I don't know if it's a problem, but the SKOS mapping relations ultimately extend skos:semanticRelation, which defines the rdfs:domain and rdfs:range to be skos:Concept. So if this pattern of linking with skos mapping properties were to catch on anyone doing inferencing across linked data would end up with a lot of skos:Concepts.

skos:Concepts all the way down? :-)

I don't necessarily see this as a bad thing, given that a skos:Concept is "an idea or notion; a unit of thought", which IMHO includes pretty much anything. Isn't Linked Data all about our ideas and thoughts about the world anyhow?

I think skos:closeMatch would be a good one to encourage, since it isn't transitive (like skos:exactMatch) and would lead to less semantic drift in linked data.

Personally I think link:uri (One of possibly many URIs which identify something.) covers many use cases; rdfs:seeAlso covers another large proportion, and finally owl:sameAs is useful when you want the strong semantics - can't find a use case which isn't covered by these three.

An alternative to owl:sameAs, perhaps a more generic term without the strong semantics, would be a good thing to have.

However the problem with the SKOS properties is that they are quite subjective: how "close" does something have to be before its "exact"ly the same? My view is that these refinements don't really add a great deal in practical terms. The distinctions may be useful in specific application scenarios like SKOS where some specific guidance can be given, but I'm not sure this scales particularly well outside of a community.

I think there's more mileage in providing people with the tools to choose amongst difference views of equivalence. The best way to do that is to start factoring out equivalence links into separate link-sets.

If we are to explore less strong forms of equivalence, then I'd like to see usage patterns such as:

  • rdfs:seeAlso -- for referring to relevant/related RDF data. This fulfills the "RDF hyperlinking" use case without having any strong ontological commitment. It can be specialised for specific linking use cases
  • ex:peer -- identify a "peer" resource in another dataset. This could either be a specialisation of rdfs:seeAlso, or a generalisation of owl:sameAs but without the same semantics
  • owl:sameAs -- as currently used

My feeling that most current deployments actually identify "peer" resources in other datasets rather than items that are absolutely identical. The "peer" relationship would still allow linking and navigation between datasets, but still let applications treat the resources as identical if necessary, e.g. by application of local rules.

I found "When owl:sameAs isn’t the Same" a great summary of the problem. Section 5 lays out some varieties of same-ness.

As for some examples:

owl:equivalentClass
owl:equivalentProperty

With respect to their class-ness / property-ness they can be substituted, but not generally. Is this an instance of 'Same Thing As But Referentially Opaque'? Skos is similar, although weaker since it's not apparent when substitution is possible.

og:sameAs (proposal)

The idea here is two things are talking about the same thing. I think the idea is ?x og:same ?y would entail ?x foaf:primaryTopic ?p . ?y foaf:primaryTopic ?p. The IMDB page about the Godfather is the same as the Godfather Wikipedia page.

foaf:isPrimaryTopicOf

Related to the previous, if I understand correctly Hayes and Halpin would grant a variety of same-ness here, in that one thing can stand in for another (such as an IMDB page for the film itself).

The needs differ for generic classes/properties and instances. We've been working [1] on the problem of recognizing that two RDF instances are intended to refer to the same object in the world -- e.g. that two foaf:Persons describe the same person. In some ways the problem conceptually is simpler -- there may be universal consensus that the "D. Knuth" cited as the author of one paper is the same individual as the "Donald E. Knuth" in another. But asserting owl:sameAs between the two foaf:Person instances can be problematic [2].

The solution we are currently exploring is to define a new property to assert that two RDF instances are co-referential when they are believed to describe the same object in the world. The two RDF descriptions might be incompatible because they are true at different times, or the sources disagree about some of the facts, or any number of reasons, so merging them with owl:sameAs may lead to contradictions. However, virtually merging the descriptions in a co-reference engine is fine -- both provide information that is useful in disambiguating future references as well as for many other purposes.

Our property (:coref) is a transitive, symmetric property that is a super-property of owl:sameAs and is paired with another, :notCoref that is symmetric and generalizes owl:differentFrom.

[1] http://ebiquity.umbc.edu/paper/html/id/471/

[2] http://ebiquity.umbc.edu/paper/html/id/473/

Depends on what you're trying to express, in what context.

By which I mean to challenge the idea that, as phrased in generality, the question is even important. In an actual context or use, as opposed to the abstract, it's usually easy to see what the proper relationship is. In my World Cup dataset, for example, the relationship from a Match to a dbpedia URI is "Link". E.g., this. For that data, that makes sense.

It doesn't really matter whether it's a standard predicate or not. The beauty of owl:sameAs (and owl:equivalentProperty) is that they can be used later, by some other user of the data in some other context, to fix up the semantics as necessary for that use. If my "Link" needs to be treated as "owl:sameAs" for your reasoner to hook up some of my stuff with some dbpedia stuff, then fine. In your dataset, which you combine on your servers with mine and dbpedia's, you can say:

ndl:matchLink owl:equivalentProperty owl:sameAs .

owl:sameAs is frequently used when people want to link linked data, it's common to say

dbpedia:Brussels owl:sameAs someOtherDatabase:Brussels_BE .

The trouble is that owl:sameAs, interpreted conventionally, is too powerful. Any reasoner that supports owl:sameAs can be trashed by adding a single owl:sameAs statement to the T-Box. Even in the A-Box, owl:sameAs can cause errors that cascade, causing the identities of important topics to become confused.

To do better, it's necessary to reify subjects, in the sense that you take responsibility for them. You don't store

dbpedia:Brussels dbpedia-owl:postalCode "BE-BRU" .

but instead you store

myIdentifier:t7312 myPredicate:postalCode "BE-BRU".

Now, in your system you need to keep track of how your identifiers map to other identifiers -- so you know which identifiers you should accept (wikipedia redirects) and also which identifiers you should publish (the one that we believe is the most official.) This system is a burden to maintain, but you need it if you're maintaining a specific point-of-view about what things exist. You'll need to do that if you want to get precision and recall higher than a certain level.

"Overloading OWL sameAs" is a nice summary and link collection of owl:sameAs related thoughts (created by Michael Uschold).