[Builder 100] Advantages and Challenges of of Using Named Graphs and Transformational Workflows

KGCLearning · November 24, 2024, 4:41am

Considering the information presented in level 100 regarding the use of named graphs and transformational workflows, decide whether you want to explore (a) the advantages of Using Named Graphs and Transformational Workflows, or (b) the challenges of Implementing Named Graphs and Transformational Workflows. Note: You can only choose one category for your initial response.

For Part 1 (3 Points): Write a thoughtful and well-supported answer in your selected category. Be clear, concise, and specific in your explanation.
For Part 2 (2 Points): After submitting your initial response, select a peer’s response in the opposite category. Write a comment or review on your peer’s response. Your feedback should: offer additional insights, counterpoints, or questions to deepen the discussion.

Vijay · December 9, 2024, 4:50pm

I chose to address (a) the advantages of using named graphs and transformational workflows, and here are my insights.

Named graphs are a way of grouping triples together using an IRI/URI, which facilitates selective querying, updating subsets of data without impacting the entire dataset. This also supports easy versioning of data subsets by providing context and provenance information, such as indicating the source and time of a given set of triples. Additionally, named graphs enable federated querying, allowing queries to be executed across multiple data sources without the need to physically merge them.

Transformational workflow enhances data integration, providing a unified view and improving discovery through advanced search capabilities. It supports scalability and flexibility, adapting to evolving needs while maintaining high data quality. This facilitates better decision-making and collaboration, leveraging comprehensive insights from connected data.

What are your thoughts on using the following two identifiers for the named graphs ?

identifier - <class>.<sub_class>.<instance>
abbreviation (hierarchy) - <level_1_label>.<level_2_label>.<level_3_label>

Example:

identifier - org.employee.jane_doe
abbreviation - org.sales_department.lead.jane_doe

thejellis · December 9, 2024, 4:58pm

Hi all -
I was hoping to see responses to some of the question on the thread to confirm about this but I appear to be in the right place. If not, please let me know and I can adjust. I wanted to at least get something on here for the due date in a few minutes.

I want to respond about the advantages of Using Named Graphs and Transformational Workflows towards the very desirable aspect of bringing clarity and utilizing them for queries enhances efficiencies through creating modular units. By focusing on the workflows towards a subset of the graphs, it makes it more flexible and efficient as well. On top of efficiency and clarity, there is a benefit to reduce errors through control that precisely handles what is to be done as well as optimal organization.
The permissions can be set at specific levels and a lot more granular than other options.

The improvement in efficiencies without loss of control or structure makes them easily scalable for growing industries to allow for very detailed queries and the flexibility to adjust for future uses. As data quality and resource usage becomes more important, these provide great opportunities for companies to utilize towards this end.

aspratle · December 15, 2024, 9:44pm

I will talk about the advantages as highlighted in the “2021 Knowledge Graph Seminar Session 6” Youtube video and the MasterClass on Transformational Workflows given to us as a resource.

Named Graph & Transformational Workflow Advantages:

The biggest plus is adding that contextual layer that fosters team collaboration and understanding of data (limits trying to figure out what each dataset is used for and relationships between other datasets)
RDFs (Resource Description Frameworks) can also be reused across different environments (flexibility) by using the URIs of triples (uniform resource identifiers) the same triple can appear across different graphs which is fantastic
Transformational workflows can enhance data quality through standardization and de-duplication of data (no need to count Apple, Apple Inc, and APPLE as separate entities)
Transformational workflows is great as a form of version control (which I never thought of) in which changes in data, data definitions, schemas etc can be traced and rollback when needed

mmw · December 18, 2024, 2:28am

I have to say, I found this question a bit disarming, a bit obscure and a little triggering as it seemed to me to be a very specific question about a very narrow and philosophical concept in RDF and just generally a strange starter question for this program.

As a result, I’d like to discuss a few aspects that are challenging about Named Graphs.

Notably, one challenge is that there are actually several different notions of ‘named graphs’ and the term not officially defined in the actual RDF Specification.

The term ‘Named Graphs’ was first defined in a paper co-authored 20 years ago by Pat Hayes, Jeremy J. Carroll, Christian Bizer and Patrick
Stickler:
Named Graphs, Provenance and Trust [https://lists.w3.org/Archives/Public/www-archive/2004Apr/att-0081/PID-FAFPGYHS-1081860211.pdf]

In Pat’s own words, there could not have been 4 authors who normally disagreed on everything else with each other, EXCEPT, that they all agreed on what ‘named graphs’ should be when they defined the term in this paper.

The original notion of ‘named graphs’ was simply meant to give a name to a graph using a URI so publishers could *“communicate assertional intent and sign their graphs and information consumers could evaluate specific graphs using task-specific trust policies, and act on information from graphs they can accept”. *paraphrased from the paper.

The paper gives an example of named graphs being used to describe a document like a warrant which could be accepted when certain conditions of the named graph can be trusted. But in the 20 years since this paper was developed, the term ‘named graphs’ has come to mean a somewhat entirely different ‘other thing’ such as the meaning described in the SPARQL specs and useful in the RDF* /reification space as well.

In this case, named graphs which are officially defined in the SPARQL specs become more like ‘subsets’ of a graph which can overlap and they do not have to be disjoint. They can be used in transformational workflows as described by Kurt Cagle, where SPARQL can perform updates on these named subsets.
In many cases, this usage might not matter, but because both definitions exist, it’s possible that it can be interpreted and used improperly and because it is not defined with semantics, representational problems can result.

It is a classic case however of how the usage of the term over the past 20 years from the SPARQL community; and while it is now, not necessary, nor worth the time and effort to change the historical usage, the current usage makes it challenging for the semantics to be defined based on this usage.

In one usage, the named graph can be used to reify the context of the graph; while in another, reification can be done on individual triples inconsistently.

Further, another area that can cause problems with the SPARQL updating version of a named graph is when considering the scope of the use of blank nodes.

Basically the original intent of named graphs has to do with interoperability, while the current usage is in using SPARQL to update subsets of graphs.

mmw · December 18, 2024, 2:36am

Hello,

Apologies for taking a minute to get to this question and taking a bit to have a dialog with others here about it. In my reply, I chose to point out some of the challenges that have to do with some fairly specific philosophical nuances about the history of the term ‘named graphs’ and the current usage.

However, I think you did an excellent job of describing the subsets of graphs as modular unit for providing efficiency, flexibility in updating the graphs and clarity.

In general, the ability to make subsets of graphs and then even copy those named subset graphs such that they can act as a type of a digital twin for the purpose of doing non-destructive editing and querying is also useful.

kmc · December 18, 2024, 5:23pm

I’ll take a stab at addressing challenges of using named graphs as such.

First, there are different definitions of named graphs, the first relating to combining separate graphs (eg one from wikidata, one from an enterprise model, a third from a separate open data initiative) and the second related to SPARQL, which IMO should be more clearly called “named subgraphs”. So that first challenge is 1) what do we mean by named graphs? and 2) how useful are SPARQL named subgraphs in transformational workflows? After viewing Kurt’s talk, I don’t have a clear sense of how one thing has anything much to do with the other.

Transformational workflows are always hard to implement for reasons folks have mentioned above. (Thanks Angelica)

Named graphs can be helpful with reification, but that’s still a hard problem for many use cases. (Thanks Margaret)

ankuku2002 · December 18, 2024, 8:28pm

The advantages of using Named Graphs and Transformational Workflows

Named Graphs allow for data partitioning, contextualization, and query isolation, enabling structured and efficient data organization. Transformational Workflows complement Named Graphs by extracting, processing, and updating relevant data, ensuring that each graph remains current and contextually enriched. Together, Named Graphs and Transformational Workflows create a powerful framework for managing and transforming data in complex systems. Their integration offers several advantages:

1. Improved Data Organization and Query Efficiency

Named Graphs enable logical segmentation of datasets within a graph database, improving organization.
Queries can target specific named graphs, reducing computational overhead and focusing on relevant data subsets.
This improves query performance and ensures more precise results.

2. Scalability and Version Control

Named Graphs facilitate scalability by allowing incremental updates to isolated datasets.
Transformational Workflows enable versioning, which simplifies tracking changes, rolling back when necessary, and maintaining historical data integrity.

3. Integration with Semantic Technologies

Named Graphs enhance the modeling of relationships and context in semantic web and linked data applications.
This integration supports advanced analytics and fosters interoperability across systems.

4. Enhanced Collaboration and Reusability

Associating meaningful names with graphs ensures datasets are easy to reference, share, and build upon without ambiguity.
Transformational Workflows establish consistent, repeatable processes, foster team collaboration and minimize errors during data manipulation.

For instance, in legal case management, Named Graphs could isolate legal cases by jurisdiction, while Transformational Workflows extract key entities like involved parties and relevant precedents to streamline case preparation.

The strategic use of Named Graphs and Transformational Workflows streamlines data management, enhances collaboration, and supports scalable, efficient analytics. Together, they provide a structured and flexible approach for transforming raw data into actionable insights across diverse domains.

ankuku2002 · December 18, 2024, 9:04pm

You, Angelica and Margaret raise an excellent point about the lack of standardization in defining names graphs, which indeed creates challenges in implementating these tools. The ambiguity also seems to demand specialized knowledge of graph databases and transformational logic along with domain expertise, making it harder for broader adoption.

Could you expand on your statement “After viewing Kurt’s talk, I don’t have a clear sense of how one thing has anything much to do with the other”? Named graphs and transformational workflows could work togather to create more streamlined and efficient business processes, in my opinion.

It would be interesting to hear if anyone has ideas or examples of potential approaches for standardization?

anon72111884 · December 19, 2024, 8:18pm

Given my initial efforts at transformational workflows into a named graph (instantiating a kg built from a highly complex ontology with a text corpus processed with NER), I am pivoting to a more modular ‘named subgraphs’ approach, agree with points above. Seeing advantages of modular design.

Lu · December 23, 2024, 2:00am

I choose to answer (a) the advantages of Using Named Graphs and Transformational Workflows.

Using named graphs and transformational workflows is benificial in managing and processing data, particularly when working with complex knowledge graph systems. In general, they have 3 pros:

Enhanced Data Context and Modularity
Named graphs allow for organizing data into distinct, named subsets, adding clarity and context. For example, in a knowledge graph, each named graph can represent a specific domain or source, making data integration and provenance tracking easier.
Efficiency in Querying and Data Transformation
Transformational workflows simplify complex data processing by automating and streamlining repetitive tasks such as schema validation or data enrichment. Using tools like SPARQL and XSLT3 can ensure consistency and speed in these operations. For example, SPARQL queries can target specific named graphs instead of querying the entire dataset.
Facilitating Advanced Features and Compliance
Named graphs are invaluable for tracking changes, enabling version control, and maintaining compliance by preserving data provenance, which is more transparent and reproducible. For example, SHACL can validate schema integrity across named graphs, while workflows manage updates and maintain consistency over time.

Lu · December 23, 2024, 2:21am

Awesome! Thank you for your detailed and well-researched comment! It’s interesting to see how the semantics of the same term can evolve over time. Another similar example is the relationship between deep learning and machine learning. Deep learning is a subset of machine learning. However, nowadays, when we talk about “machine learning”, we always talk about the non-deep learning parts of machine learning.

aspratle · December 26, 2024, 11:32pm

This is a great point. What can be a ‘named graph’ for one organization might not be a ‘named graph’ for another organization. I do agree that the challenge of adding semantic layers to named graphs is ongoing and is improving day by day with LLM packages like GraphRAG. I see how named graphs started with a focus on “interpretability” and not all named graphs are interpretable.

thejellis · December 28, 2024, 5:33pm

Hi Keith and other who replied here -

I’m taking this program from a perspective in education, understanding how these types of tools and concepts would be apart of a larger educational shift to incorporate rather than specialization.

Has there been any movement on an open standards process? I was on a similiar task with Data Contracts which lead me down the rabbit hole to eventually make it here. The two seem similiar in the sense of a process/product in place before a standard had been put in place.

Why do you think graph databases are not apart of learning curriculums when learning relational and unstructured databases? At what point do you feel a person is ready in their learning journey to implement these tools or understand them to make most value from?

John Ellis

kmc · January 1, 2025, 8:13pm

Margaret, this is what troubled me about the assignment: as you pointed out, the watered down concept of named graphs has become common usage in SPARQL. Much confusion could have been avoided if SPARQL defined a more specific case of “subgraph” as a use case for named graphs. Pat Hayes wrote the original definition, then later he co-authored the paper that settled the matter, and this included the description of named graphs within SPARQL queries. So moving forward I think any discussion of named graphs should be grounded in either the 2007 paper[1] or W3C efforts to move forward from that definition to a standard.

If there’s anything about the history of the terminology that I got wrong I would love to know more.

[I’m copying this citation from Wikipedia]

[1]
Carroll, Jeremy J.; Bizer, Christian; Hayes, Pat; Stickler, Patrick (2005). “Named graphs”. Journal of Web Semantics . 3 (4): 247–267. doi:10.1016/j.websem.2005.09.001.

kmc · January 1, 2025, 8:43pm

I welcome comments and questions on my reply to the question about challenges of using named graphs in transitional workflows.

TL;DR: Tough concept to grasp; support for full capabilities (ie pentuples) varies among vendors; and they make difficult workflows equally difficult as other solutions. I didn’t go into any length about alternative solutions are they are bespoke to organizations that deal with provenance, temporality, and other use cases without resorting to named graphs.

agreen · January 8, 2025, 12:38am

While I do think there is a lot of value in Named Graphs and Transformational Workflows particularly in the ability to organize data and have reference points, I think that there is a challenge of having over partitioned data that leads to a lot of assumptions within a workflow. There is an instance where inconsistencies of internal semantics within a workplace without having established definitions for that work may lead to errors due to a perceived internal checkmark. For instance, if teams were to restructure an organization by software development tasks, but some folks classified data development as data sourcing while others meant scripting or geoprocessing, they may not be able to access the correct personnel for QC within the organization. In this case a named graph by means of a new organizational chart was a bandaid where having established semantics on workflows was needed to have successful integration to other semantic technologies mentioned.

agreen · January 8, 2025, 12:41am

While I spoke to the challenges, I agree that the flexibility and scalability of named graphs is entirely beneficial. If done correctly, they save time and and efficiency without a loss in structure is such a great point as to how different fields would be able to utilize them as a lower barrier for entry.

mmw · January 8, 2025, 3:00am

I can jump in here on the part where @kmc said he didn’t see how one thing had much to do with the other and at least add my thoughts.
The problem here is that a “transformational workflow” is just that. It is a workflow that takes one thing in one format/system/graph and transforms it to another format/system/graph. Transformational workflows are, by necessity, part of streamlined business processes. Named graphs might be just one of dozens or more mechanisms for achieving transformation. It is like asking how the concept of ‘basil’ and the concept of ‘recipes’ are good for cooking.

They are both kind of abstract concepts that aren’t necessarily related to each other nor would I consider either of them to have qualities of being ‘advantageous’ or ‘challenging’.

What is curious about the concept of named graphs is that there is the practice of using them, but this is not the same as the theoretical concept nor with the same intent as when the term was defined, so there is only the usage in SPARQL that creates any kind of standard usage.

KGCLearning · January 11, 2025, 9:42pm