[Builder 200] Taxonomies, Ontologies, and Knowledge Graphs

Given the discussions in level 200 regarding taxonomies, ontologies, and knowledge graphs,
  • For Part 1 (3 Points): Based on an organization your are familiar with your own role or industry, how can they effectively leverage their existing taxonomies to build comprehensive and functional knowledge graphs. What are the potential benefits of this taxonomy-driven approach?

  • For Part 2 (2 Points): After submitting your initial response, select a peer’s response. Write a comment or review on your peer’s response. Your comment or review should address at least one challenge to help this peer gain a deeper understanding of the practical implications of leveraging existing taxonomies to build robust and functional knowledge graphs.

The organization I am familiar with understands the value of taxonomies and ontologies and developed its own Knowledge Graph Management System over a decade ago. This system serves as the SoT for various tools within the organization. For example, it provides centralized hierarchical data management for product management, such as grouping products under the same product family. It also supports hierarchical filtering of data for centralized metrics or dashboards and leverages expert systems to predict customer eligibility for bundled services based on product purchases or historical data.

A taxonomy-driven approach offers several advantages:

  • Hierarchical Organization: Data is arranged in a structured hierarchy. For instance, Wi-Fi technology can be organized as Networking Solutions → Mobile Internet → Wi-Fi.
  • Enhanced Search Capabilities: Customers can search for products using alternate names, such as model numbers, short names, or synonyms.
  • Ontology Integration: Combined with ontology, it enables mapping across different concepts or hierarchies, unlocking deeper insights and connections.

For this question, I’m most familiar with a tool that I designed to parse image descriptions into graphs. In ImageSnippets, we use different notions of taxonomies and ontologies.

First, we use different types of ontologies created for the tool, one is called: LIO or the Lightweight Image Ontology. In this link, The Visual Guide to the Properties you can see how the terms in LIO are used as the edges (or properties or relations) that connect the object values (which can be chosen from taxonomies or linked data datasets) to the subject value of the image.

The properties defined by LIO help users build a comprehensive and functional knowledge graph in ImageSnippets by setting constraints on users that help organize the ambiguity of how ordinary keywords or annotations can relate to the image or a region of the image.

But other ontologies can help too — for example, terms from the schema.org vocabularly can help describe web content for SEO into structured data.

In some ways, linked data datasets can also be thought of as taxonomies. Information in datasets like DBpedia or Wikidata are organized with either SKOS (Simple Knowledge Organization System) in DBpedia or subclass hierarchies in Wikidata.

Controlled vocabularies or taxonomies help users structure the data in categories, classes or entities that have definitions and encourage agreement in the the way that data is structured.

In our tool, users can also bring their own vocabularies to use for either the property terms or the object terms - so for example, imagine that someone has a set of images of cars but also have their own taxonomy or set of intentional terms to use as the names to describe the images. These terms can help everyone standardize the ways those entities appear in the dataset, there by reducing redundancy that can occur from things like different spellings, etc.

This is a great answer. This may sound ridiculous and I probably know what you are talking about, but what did the ‘SoT’ acronym mean here?

When companies have unified vocabularies it can be immensely useful in having all operating units speak the same language, so to speak.

Thank you for your query. The acronym SoT stands for Source of Truth. The application I mentioned is a centralized hierarchy management platform, and downstream systems (dependent applications) consume its data through exposed APIs and database views.
Agreed, unified vocabularies play a vital role in ensuring data consistency and streamlining processes. By providing a single, consistent perspective of data across systems, they reduce discrepancies, enhance operational efficiency, and ensure that everyone is working with reliable and accurate information.

Thanks for sharing this. I’m curious about how the regions of an image (in the case of multi-object images) are stored as triplets. Are you using image processing libraries like OpenCV to create bounding boxes around objects, extract their coordinates, and store them as attributes ?

Hi Vijay,
Thanks - for whatever ridiculous reason, I don’t encounter that acronym often, even though that is EXACTLY how I talk about some of the value that images can provide.

In many cases, an image with clear provenance and metadata that is authentic, can serve as a SoT for a topic.

As for the other question, I’d love to talk with you more about OpenCV and how we do the regions.

A taxonomy is a hierarchical system used to organize information into categories based on shared characteristics. It is a foundational tool that organizations can leverage to structure their data, making it easier to classify, retrieve, and analyze information. By adopting a taxonomy-driven approach, companies can organize their data in a way that aligns with their business needs and workflows, creating a more effective and accessible knowledge graph. This bottom-up approach ensures that classifications are meaningful and useful, reflecting the organization’s specific context and domain. A knowledge graph built from an existing taxonomy enables organizations to improve data interoperability, streamline decision-making, and uncover valuable insights by linking various pieces of information across departments.

In the trucking industry, taxonomy can be used to organize and classify key entities into structured categories. These entities are then linked within a Knowledge Graph (KG) to create a comprehensive, interconnected view of operations. For example:

1.Vehicles: Categorized by types (e.g., trucks, trailers), linked to drivers, routes, and maintenance records.
2. Drivers: Classified by roles (e.g., local, long-haul), linked to vehicles, routes, and delivery schedules.
3. Routes: Organized by regions or delivery areas, linked to vehicles, drivers, and freight shipments.
4. Freight: Classified by type (e.g., perishable, fragile), linked to routes, customers, and delivery schedules.
5. Customers: Organized by industry (e.g., retail, manufacturing), linked to freight shipments and delivery schedules.
6. Maintenance: Categorized by service types (e.g., oil change, tire replacement), linked to vehicles for tracking service history.
7. Fuel Usage: Categorized by vehicle types, routes, and operational costs.

This taxonomy serves as the foundation for the Knowledge Graph, organizing data into specific categories and relationships, which are then used to link and analyze the data across various departments. By structuring and categorizing the data in this way, the Knowledge Graph provides benefits of improved operational efficiency, cross-departmental collaboration and better compliance and risk management.

  1. Improved Operational Efficiency: By structuring data through a taxonomy, trucking companies can streamline operations across departments. This enables quick access to critical information—like fleet status, routes, and maintenance schedules—leading to better decision-making and resource allocation.

  2. Enhanced Cross-Departmental Collaboration: Taxonomy-driven knowledge graphs break down data silos, allowing departments such as logistics, fleet management, and customer service to work together more seamlessly. This leads to improved coordination, reduced delays, and more accurate tracking of shipments and deliveries.

  3. Better Compliance and Risk Management: Taxonomy-driven knowledge graphs help track regulatory requirements and maintain accurate records. By linking entities such as vehicles, drivers, and routes with relevant regulations, trucking companies can ensure they stay compliant, avoid penalties, and manage operational risks more effectively.

In the renewable energy sector, simply operating and maintaining a power plant requires running analytics models on a wide array of data sets. Based on continuously updated forecasting data, operators must communicate with customers and market operators about their generation commitments. This takes place across many machines and locations. Additionally, there are extensive regulatory reporting requirements.

Extracting all this data is a challenge that hierarchical, faceted taxonomies can help to make more efficient.

Extending taxonomies into knowledge graphs would add the ability to construct more complex queries across the enterprise. With rdf reasoning, plant operators would collect the data needed to make causal inference across the asset lifecycle. Operators would make more informed decisions about purchases, maintenance, and market participation.

Interesting discussion of ‘semantic’ image segmentation
sensor_vs_camera
This image is from a waymo car driving down the street, real-time segmentation! Deep learning models are getting so good in this area that they are making some of the more expensive sensor equipment obsolete. But to be able to have a real-time model, huge amounts of image data needs to be trained and this is where semantic image segmentation (SiS) is instrumental. This paper provides background on SiS evolution: [2302.06378] Semantic Image Segmentation: Two Decades of Research . Pretty fascinating how the academic work is trying to keep up with Google. It looks like the state of the art is trying to extend a semantically labeled data set with unlabeled data for ‘semi-supervised’ learning (Dual Attention SiS):
image. Not only does the semantic labeling (derived from a structured vocabulary) help accelerate the model training (and the work of people collaborating on it), I bet it also makes for cleaner code and more performant deep learning models.

FYI this is coming out of my current MS CS deep learning coursework, I’ve been working on some image classifications, much more basic than driving a car! LSTM, GAN, cool to see them mentioned in the paper.

1 Like

Hi all,

Just wanted to point others to some good conversation on Linked In with Heather and others on this topic.

Margaret

I worked as a machine learning engineer for smart aquaculture. Although I have not used any taxonomy-driven approaches, I can imagine the benefits: taxonomies are important in building comprehensive and functional knowledge graphs.

For example, it can define Key Entities and Relationships:
Entities: Fish species (e.g., flatfish), feed types, water quality parameters (e.g., salinity, temperature), and growth stages.
Relationships: “requires” (fish species → optimal feed), “impacts” (water quality → fish growth)

It also allows farmers to query the graph for insights, such as “What is the recommended feed quantity for flatfish at a specific growth stage under current water conditions?”

Very nice insights! I especially appreciate your mention of semantic labeling derived from a structured vocabulary and its impact on cleaner code and better collaboration.

However, i think that it might be hard to create and maintain high-quality taxonomies for semantic segmentation. While taxonomies can provide the foundation for clean and efficient models, their creation often requires much domain expertise and manual effort. Also, I was wondering if it requires to retrain the model when a label changes or new categories are added.

The keywords here are “existing taxonomies” and unfortunately in the EdTech space I haven’t personally worked at an organization that has one. From the videos it starts with a non-redundant controlled vocabulary. I think this list of terms can be made from the central organization and given to decentralized departments for QA/QC and additions. Once that controlled vocabulary is created, a hierarchical structure can be created. Then comes the knowledge graph piece, I think in EdTech (and many other organizations) there could be multiple use cases for Knowledge Graphs. One being, a customer facing one for what edtech products to purchase, product format type (synch/asynch trainings), availability, training topics , what instructor is assigned to what course, and more. Another could be internally for tracking KPIs and metrics like attendance rates, job placement, training engagements, etc. The benefits here would be having transparency about company performance and insight into areas that need more training/resources in order to improve upon KPIs. It also can be used in an LLM for customers looking to buy the training products (chatbots with RAG built in)

Agriculture and marine life is a great use case for knowledge graphs. I do wonder how much real-time “de-duplication” is needed when species are added (I’m sure people report species sightings that are already listed but may be listed very specifically i.e. blue white belly shark vs blue shark – not that this species exist just using an example) it would need “roll-up” into an already existing species. And what classifies a new species discovery as well. I’m sure species knowledge graphs can suffer from being super vast with some duplication, and audience specific (like a scientist could search it semantically in a scientific way vs a non-science expert who may search for just “blue shark”)

Your response does a great job showing how knowledge graphs can help renewable energy system operators. One challenge is that different systems often organize and label data differently. How do you think operators can make sure everything works together smoothly? Maybe industry-wide standards could help? I agree that once implemented uniformly, it could unlock much deeper insights across operators.

Folks, I’m catching up here, and my response is relatively brief. The document I turned in is here: https://docs.google.com/document/d/1fDfQUcOJmnXdKhxmFKcx6nVHZYhwirUb6epC_N-uGZQ/edit?usp=sharing

I’m reproducing that response here for purpose of discourse.


Based on an organization your are familiar with your own role or industry,[sic] how can they effectively leverage their existing taxonomies to build comprehensive and functional knowledge graphs. What are the potential benefits of this taxonomy-driven approach?

First, all organizations need to rally around terminology consistently and logically. This is not a one-time effort. Drawing from my own experience with data ontologies in health insurance claims processing:
Given a wealth of existing taxonomies (FHIR, CPT codes, etc.), insurance companies and EHR vendors should have a relatively easy time agreeing on necessary taxonomies and ontologies. However, these enterprises lack the expertise and focus to organize complex taxonomies and ontologies. They may struggle with existing high-level ontologies (very general) or flounder in developing semantic tools for the more narrow use cases.

My projects have assisted clients in bringing subject matter experts (SMEs) into the process of establishing metadata of all kinds, including taxonomies. It’s a laborious task that pays dividends, but not immediately, so it’s an unpredictable ROI.

The primary advantage of this taxonomy-driven approach in health insurance processing is to prevent costly mistakes. Inconsistent terminology in manuals, formularies, policies, EOBs, warnings, and interpretations of regulations) can lead to a) real harm to patients and b) inefficient processes across business units.

You’re asking the right kinds of questions that lead people up the semantic ladder to ontologies, where it’s easier (compared to taxonomies) to define entities that exist with different IDs and then (in OWL anyway) declare they are to be treated one “as the same as” the other.

The one that comes to mind for me is Amazon, who does currently use some taxonomies to bolster their sales and provide added reach. Uniquely positioned as a company with extensive software development teams, a wide ranging global workforce, and just as diverse of a consumer base, a beneficial classification system of products allows for maximum profit. On the sales side, the products being listed correctly in findable categories is the foundation of this work. There is then additional thought and development to “catch” incorrect searches. Their backend classification system also works to crawl products from independent businesses and restructure their results into the same pattern that fits the template for the UI. However, additional work that they can use to improve is on their related “spinoff” services that primarily have a digital path; their taxonomy could be extended to include digital products. A lack of a knowledge graph is evident in the clunky interface between the Amazon website, and their Kindle services. While Amazon does seem to understand the terminology and semantics to allow for successful UI to those sites, the path backwards is not as successful. A knowledge graph that demonstrates common paths that consumers take to implement and buy a kindle book and the lifecycle of that relationship would be beneficial to maximize operational efficiency. Currently, their inability to crawl recommended books to buy from the Amazon account, or inability to generate books in a “book list” at the conclusion of a completed book shows that this knowledge is not currently captured in their product taxonomy and should be a knowledge graph pattern that should be implemented to further reach KPIs and maximize profits.

1 Like