What approaches have people used to bring multiple overlapping taxonomies together?
Imagine multiple parties have all constructed their own taxonomy for the same domain. These parties have come together and want to use these individual taxonomies to inform (or, better, to automate) the construction of a new, standard taxonomy that all can use. This standard taxonomy should minimise the collective ‘distance’ to all individual taxonomies, where ‘distance’ is some well-defined metric computable between any taxonomy pair.
Seems there are 3 main things to consider, and I’m most interested in the 1st, and 2nd currently:
- Designing the distance metric: comparing two taxonomies it should be possible to identify matches/overlaps/clashes at the node level, but also to quantify the overall ‘distance’ between the two taxonomies as a whole.
- Construction of the new standard taxonomy - as automated as possible - to yield something which minimises the sum of the distance metric between new standard and all individual taxonomies. This could be manual, or automatically through inference mechanisms and NLP, or it could happen in some embedded space with statistical techniques?
- Representing the new taxonomy: probably with a formal standard like SKOS or OWL and ideally with links to the individual taxonomy (owl:sameAs, subclass, disjoint etc.) - the degree to which the mapping to individual taxonomies is possible will be tightly linked to the construction: if its a statistical technique on embeddings, this may not be possible at all?