[Builder 200] Taxonomies, Ontologies, and Knowledge Graphs

Given the discussions in level 200 regarding taxonomies, ontologies, and knowledge graphs,
  • For Part 1 (3 Points): Based on an organization your are familiar with your own role or industry, how can they effectively leverage their existing taxonomies to build comprehensive and functional knowledge graphs. What are the potential benefits of this taxonomy-driven approach?

  • For Part 2 (2 Points): After submitting your initial response, select a peer’s response. Write a comment or review on your peer’s response. Your comment or review should address at least one challenge to help this peer gain a deeper understanding of the practical implications of leveraging existing taxonomies to build robust and functional knowledge graphs.

The organization I am familiar with understands the value of taxonomies and ontologies and developed its own Knowledge Graph Management System over a decade ago. This system serves as the SoT for various tools within the organization. For example, it provides centralized hierarchical data management for product management, such as grouping products under the same product family. It also supports hierarchical filtering of data for centralized metrics or dashboards and leverages expert systems to predict customer eligibility for bundled services based on product purchases or historical data.

A taxonomy-driven approach offers several advantages:

  • Hierarchical Organization: Data is arranged in a structured hierarchy. For instance, Wi-Fi technology can be organized as Networking Solutions → Mobile Internet → Wi-Fi.
  • Enhanced Search Capabilities: Customers can search for products using alternate names, such as model numbers, short names, or synonyms.
  • Ontology Integration: Combined with ontology, it enables mapping across different concepts or hierarchies, unlocking deeper insights and connections.

For this question, I’m most familiar with a tool that I designed to parse image descriptions into graphs. In ImageSnippets, we use different notions of taxonomies and ontologies.

First, we use different types of ontologies created for the tool, one is called: LIO or the Lightweight Image Ontology. In this link, The Visual Guide to the Properties you can see how the terms in LIO are used as the edges (or properties or relations) that connect the object values (which can be chosen from taxonomies or linked data datasets) to the subject value of the image.

The properties defined by LIO help users build a comprehensive and functional knowledge graph in ImageSnippets by setting constraints on users that help organize the ambiguity of how ordinary keywords or annotations can relate to the image or a region of the image.

But other ontologies can help too — for example, terms from the schema.org vocabularly can help describe web content for SEO into structured data.

In some ways, linked data datasets can also be thought of as taxonomies. Information in datasets like DBpedia or Wikidata are organized with either SKOS (Simple Knowledge Organization System) in DBpedia or subclass hierarchies in Wikidata.

Controlled vocabularies or taxonomies help users structure the data in categories, classes or entities that have definitions and encourage agreement in the the way that data is structured.

In our tool, users can also bring their own vocabularies to use for either the property terms or the object terms - so for example, imagine that someone has a set of images of cars but also have their own taxonomy or set of intentional terms to use as the names to describe the images. These terms can help everyone standardize the ways those entities appear in the dataset, there by reducing redundancy that can occur from things like different spellings, etc.

This is a great answer. This may sound ridiculous and I probably know what you are talking about, but what did the ‘SoT’ acronym mean here?

When companies have unified vocabularies it can be immensely useful in having all operating units speak the same language, so to speak.

Thank you for your query. The acronym SoT stands for Source of Truth. The application I mentioned is a centralized hierarchy management platform, and downstream systems (dependent applications) consume its data through exposed APIs and database views.
Agreed, unified vocabularies play a vital role in ensuring data consistency and streamlining processes. By providing a single, consistent perspective of data across systems, they reduce discrepancies, enhance operational efficiency, and ensure that everyone is working with reliable and accurate information.

Thanks for sharing this. I’m curious about how the regions of an image (in the case of multi-object images) are stored as triplets. Are you using image processing libraries like OpenCV to create bounding boxes around objects, extract their coordinates, and store them as attributes ?

Hi Vijay,
Thanks - for whatever ridiculous reason, I don’t encounter that acronym often, even though that is EXACTLY how I talk about some of the value that images can provide.

In many cases, an image with clear provenance and metadata that is authentic, can serve as a SoT for a topic.

As for the other question, I’d love to talk with you more about OpenCV and how we do the regions.

A taxonomy is a hierarchical system used to organize information into categories based on shared characteristics. It is a foundational tool that organizations can leverage to structure their data, making it easier to classify, retrieve, and analyze information. By adopting a taxonomy-driven approach, companies can organize their data in a way that aligns with their business needs and workflows, creating a more effective and accessible knowledge graph. This bottom-up approach ensures that classifications are meaningful and useful, reflecting the organization’s specific context and domain. A knowledge graph built from an existing taxonomy enables organizations to improve data interoperability, streamline decision-making, and uncover valuable insights by linking various pieces of information across departments.

In the trucking industry, taxonomy can be used to organize and classify key entities into structured categories. These entities are then linked within a Knowledge Graph (KG) to create a comprehensive, interconnected view of operations. For example:

1.Vehicles: Categorized by types (e.g., trucks, trailers), linked to drivers, routes, and maintenance records.
2. Drivers: Classified by roles (e.g., local, long-haul), linked to vehicles, routes, and delivery schedules.
3. Routes: Organized by regions or delivery areas, linked to vehicles, drivers, and freight shipments.
4. Freight: Classified by type (e.g., perishable, fragile), linked to routes, customers, and delivery schedules.
5. Customers: Organized by industry (e.g., retail, manufacturing), linked to freight shipments and delivery schedules.
6. Maintenance: Categorized by service types (e.g., oil change, tire replacement), linked to vehicles for tracking service history.
7. Fuel Usage: Categorized by vehicle types, routes, and operational costs.

This taxonomy serves as the foundation for the Knowledge Graph, organizing data into specific categories and relationships, which are then used to link and analyze the data across various departments. By structuring and categorizing the data in this way, the Knowledge Graph provides benefits of improved operational efficiency, cross-departmental collaboration and better compliance and risk management.

  1. Improved Operational Efficiency: By structuring data through a taxonomy, trucking companies can streamline operations across departments. This enables quick access to critical information—like fleet status, routes, and maintenance schedules—leading to better decision-making and resource allocation.

  2. Enhanced Cross-Departmental Collaboration: Taxonomy-driven knowledge graphs break down data silos, allowing departments such as logistics, fleet management, and customer service to work together more seamlessly. This leads to improved coordination, reduced delays, and more accurate tracking of shipments and deliveries.

  3. Better Compliance and Risk Management: Taxonomy-driven knowledge graphs help track regulatory requirements and maintain accurate records. By linking entities such as vehicles, drivers, and routes with relevant regulations, trucking companies can ensure they stay compliant, avoid penalties, and manage operational risks more effectively.

In the renewable energy sector, simply operating and maintaining a power plant requires running analytics models on a wide array of data sets. Based on continuously updated forecasting data, operators must communicate with customers and market operators about their generation commitments. This takes place across many machines and locations. Additionally, there are extensive regulatory reporting requirements.

Extracting all this data is a challenge that hierarchical, faceted taxonomies can help to make more efficient.

Extending taxonomies into knowledge graphs would add the ability to construct more complex queries across the enterprise. With rdf reasoning, plant operators would collect the data needed to make causal inference across the asset lifecycle. Operators would make more informed decisions about purchases, maintenance, and market participation.

Interesting discussion of ‘semantic’ image segmentation
sensor_vs_camera
This image is from a waymo car driving down the street, real-time segmentation! Deep learning models are getting so good in this area that they are making some of the more expensive sensor equipment obsolete. But to be able to have a real-time model, huge amounts of image data needs to be trained and this is where semantic image segmentation (SiS) is instrumental. This paper provides background on SiS evolution: [2302.06378] Semantic Image Segmentation: Two Decades of Research . Pretty fascinating how the academic work is trying to keep up with Google. It looks like the state of the art is trying to extend a semantically labeled data set with unlabeled data for ‘semi-supervised’ learning (Dual Attention SiS):
image. Not only does the semantic labeling (derived from a structured vocabulary) help accelerate the model training (and the work of people collaborating on it), I bet it also makes for cleaner code and more performant deep learning models.

FYI this is coming out of my current MS CS deep learning coursework, I’ve been working on some image classifications, much more basic than driving a car! LSTM, GAN, cool to see them mentioned in the paper.

Hi all,

Just wanted to point others to some good conversation on Linked In with Heather and others on this topic.

Margaret

I worked as a machine learning engineer for smart aquaculture. Although I have not used any taxonomy-driven approaches, I can imagine the benefits: taxonomies are important in building comprehensive and functional knowledge graphs.

For example, it can define Key Entities and Relationships:
Entities: Fish species (e.g., flatfish), feed types, water quality parameters (e.g., salinity, temperature), and growth stages.
Relationships: “requires” (fish species → optimal feed), “impacts” (water quality → fish growth)

It also allows farmers to query the graph for insights, such as “What is the recommended feed quantity for flatfish at a specific growth stage under current water conditions?”

Very nice insights! I especially appreciate your mention of semantic labeling derived from a structured vocabulary and its impact on cleaner code and better collaboration.

However, i think that it might be hard to create and maintain high-quality taxonomies for semantic segmentation. While taxonomies can provide the foundation for clean and efficient models, their creation often requires much domain expertise and manual effort. Also, I was wondering if it requires to retrain the model when a label changes or new categories are added.