Inferring triples

We are building out a knowledge graph and as is quite typical get data from multiple sources and have produced triples for them in the graph. We have an entity resolution process that maps those sources to a single entity so imagine we got data about a single organization from 3 places the entity resolution process will say that they are all the same thing. The next step for us is to say ok we know these 3 things are the same entity, which name attribute from across the data we have ingested is the correct name. How would you all think about solving this kind of question?

<source://1> <sdo:name> "Jane Doe"
<source://2> <sdo:name> "Jane"
<source://3> <sdo:name> "Jane D"
<source://1> <owl:sameAs> <entity://a>
<source://2> <owl:sameAs> <entity://a>
<source://3> <owl:sameAs> <entity://a>

What is your criteria for deciding which is the correct one?

Yeah so for each predicate we are using a different criteria

There is no magic answer here. You have to get the criteria into a machine-readable form and then use that to drive the selection of the correct names.

I wrote a blog entry about how to use SPARQL and Wikipedia to identify official company names when you are working with company nicknames; that might help: Normalizing company names with SPARQL and DBpedia

You should be aware of the Legal Entity Identifier (LEI), which, as well as a unique identifier for organizations, also has the legal name verified against business registries (so not subject to name variations) . It includes their legal information and child entities and mappings to identifiers such as ISINs.

See for example LEI Search 2.0

As well as XML and CSV from the site, the data is already published in knowledge graph form here

Though not all companies have an LEI, it is a requirement for various financial transactions, and there are over 2 million companies registered.