skos:exactMatch vs owl:sameAs

VladimirAlexiev · May 4, 2012, 10:00pm

I got some drawings by Rembrandt in a British Museum database, and some paintings by Rembrandt in a RKD database. He's referred to as bm-people:Rembrandt in one, and rkd-artists:rembrandt in the other. I use CIDOC CRM, so Rembrandt is related through crm:P14i_performed to the drawing/painting Production events.

The current representation is:

bm-people:Rembrandt a skos:Concept;
  skos:inScheme bm-people: ; skos:prefLabel "Rembrandt".
rkd-artists:rembrandt a skos:Concept;      
  skos:inScheme rkd-artists: ; skos:prefLabel "Rembrandt".

Use cases:

The user can search for artists using a thesaurus auto-complete function. It relies on skos:inScheme and a further property of the ConceptScheme to collect all values for completion.
If the user selects a value Rembrandt that's correlated between the two thesauri, he should see all works by Rembrandt, no matter which Rembrandt URI they relate to.

Aside: If you look in VIAF, you'll see Rembrandt correlated between 19 sources, including national libraries and Getty ULAN. There's a thousand names and a bunch of extra info about him.

If you look at the VIAF RDF record, you'll see he's represented as a foaf:Person (with a bunch of names). There are also 19 skos:Concepts from the 19 sources (including more names as skos:altLabel), which link back to the main VIAF URI using foaf:focus:

<skos:Concept rdf:about="http://viaf.org/viaf/sourceID/BNF%7C11940484#skos:Concept">
  <foaf:focus rdf:resource="http://viaf.org/viaf/64013650"/>

There are also some owl:sameAs links equating the foaf:Person to URIs in other known sources:

<owl:sameAs rdf:resource="http://dbpedia.org/resource/Rembrandt"/>
<owl:sameAs rdf:resource="http://d-nb.info/gnd/11859964X"/>
<owl:sameAs rdf:resource="http://libris.kb.se/resource/auth/197544"/>
<owl:sameAs rdf:resource="http://www.idref.fr/027341925/id"/>

It would have been great if the BM and RKD thesauri were correlated to VIAF but they're not. end-aside

The question: which is the best way to correlate skos:Concepts representing the same resource?

I know that both the SKOS Primer and Reference say one should use skos:exactMatch and not owl:sameAs. They warn of "undesirable entailments that would follow from using owl:sameAs" but the example they give is a bit weak: "a concept cannot possess two different preferred labels in the same language". Frankly, that doesn't scare me much:

when both labels coincide that doesn't matter
when you use all prefLabels and altLabels for auto-complete, that's a plus

If I use skos:exactMatch and not owl:sameAs, I'll face these difficulties:

Need to merge the labels of these concepts because it would be silly to offer labels that are the same
Lose the OWLIM sameAs optimization
Need to implement custom inferences eg

Like this:

crm:P14i_performed owl:propertyChainAxiom (skos:exactMatch crm:P14i_performed).

What's your advice?

tobyink · May 4, 2012, 10:00pm

owl:sameAs has potentially unwelcome inferences. Imagine that I have a term "Monkey" which was added to my thesaurus today, and I claim it's owl:sameAs the concept "Monkey" in a dictionary that somebody else published many years ago.

{ thesaurus:Monkey a skos:Concept .
  thesaurus:Monkey skos:inSchema books:Thesaurus .
  thesaurus:Monkey skos:changeNote "Added 5 May 2012."@en .
  thesaurus:Monkey owl:sameAs dictionary:Monkey .
  dictionary:Monkey a skos:Concept .
  dictionary:Monkey skos:inSchema books:Dictionary . }
      => { dictionary:Monkey skos:inSchema books:Thesaurus .
           dictionary:Monkey skos:changeNote "Added 5 May 2012."@en . } .

The skos:Concept for a monkey doesn't represent an actual monkey, or the class of all monkeys. Think of it more as representing an entry in an a thesaurus, dictionary, encyclopaedia or catalogue.

database_animal · May 4, 2012, 10:00pm

Overall I think owl:sameAs, with standard semantics, is not suitable for exchange across boundaries between semantic systems. If, for instance, you're scraping triples off the floor and throwing them into a processing chain, owl:sameAs will definitely give you entailments you don't work.

Now, inside a perimeter that you control, the story is different. owl:sameAs behaves a particular way in your triple store and if you like what owl:sameAs does, then go ahead and use it.

I say it that way, because you're using OWLIM, and OWLIM implements owl:sameAs in a way that may or may not be standards correct or mathematically correct but that is certainly "correct" for building real applications. The case for owl:sameAs is much weaker if you use other tools.

Personally I've dealt with these problems by creating a wrapper that normalizes identifiers that cross the system perimeter but this doesn't address all the problems involved when two concepts in the KB get merged,

VladimirAlexiev · May 4, 2012, 10:00pm

Antoine, I am also comfortable with having a person be both Person and skos:Concept, but if I cannot use sameAs then there's little value in it being a skos:Concept.

Seems to me VIAF got it right

First they have a main URI that's foaf:Person and all URIs used in data (eg national library bibliographies) are given as sameAs (or "=" in Turtle):

<http://viaf.org/viaf/99366184> a rdaEnt:Person, foaf:Person;
  = <http://dbpedia.org/resource/William_Temple_(archbishop)>,
    <http://d-nb.info/gnd/118756435>,
    <http://libris.kb.se/resource/auth/230284>,
    <http://www.idref.fr/033849587/id>;

This would mean that none of the source URIs is a skos:Concept.

Then they copy all labels from all the sources to foaf:name. Unfortunately they lose preferredness info, but I don't know how could they pick one of the source's prefLabel as globally preferred...

foaf:name "Temple, William",
    "Temple, William, 1881-1944",
    "Temple, William, Abp. of Canterbury, 1881-1944",
    "Temple, William, archev\u0413\u0404que",
    "Temple, William, vesc. di Manchester, 1881-1944",
    "William Canterbury, Archbishop 1881-1944".

Finally they have one skos:Concept per source, with foaf:focus to the main URI:

<http://viaf.org/viaf/sourceID/BAV%7CADV11296505#skos:Concept> a skos:Concept;
  skos:inScheme <http://viaf.org/authorityScheme/BAV>;
  skos:prefLabel "Temple, William, vesc. di Manchester, 1881-1944";
  foaf:focus <http://viaf.org/viaf/99366184>.
<http://viaf.org/viaf/sourceID/BNF%7C12466527#skos:Concept> a skos:Concept;
  skos:inScheme <http://viaf.org/authorityScheme/BNF>;
  skos:prefLabel "Temple, William, 1881-1944";
  foaf:focus <http://viaf.org/viaf/99366184>.

So they use skos:Concept and foaf:focus only for "thesaurus bookkeeping info", but it seems the intention is to use the main URI (and sameAs source URIs) in business data.

The BnF data model also uses foaf:focus:

So... the answer is: don't use skos:Concept in business data, use other URIs that are amenable to sameAs. Or else, implement extra rules that propagate business relations across skos:exactMatch.

Note: it seems to me we need to assume the following rule:

{?term1 foaf:focus ?entity. ?term2 foaf:focus ?entity} =>
{?term1 skos:exactMatch ?term2}