There are many methods and tools to produce a knowledge graph from a corpus of documents. One method is for example to use open information extraction or Open IE. Other methods can be based on language models such as Bert or GPT. Independently of the method used, if I have an existing knowledge graph with an ontology and entities, how can I make sure that the KG extracted from text will nicely integrate with my existing KG: existing entities will be merged, and the extracted KG will be following my ontology?
2 Likes
A generalized answer to this would be very difficult, especially if you mean it when you say “independently of the method used”. A given KG will cover knowledge in a particular domain, and part of the suitability of a given corpus to contribute to that knowledge will be how well the tools used can connect that corpus to knowledge in that domain. If the domain is biomedicine as opposed to jazz history, you’re probably going to use both a different corpus and different tools.
2 Likes
Agree with Bob: the only way I see is to code your Extraction to produce entities conforming to your existing KG.
But I can point out several ontologies for representing text extraction results:
- NIF and its constituent ontologies can represent all kinds of NLP results: NER individual, NER class, POS, parsing, sentiment, etc etc
- NERD is a set of NER classes
- Web Annotation can represent NER and more generic associations (eg tagging, highlighting)
2 Likes