In a graph database, how best to express sequential relationships between nodes that represent events in time?

abenodes · March 7, 2021, 8:56am

Clearly this kind of work is being done: In the Panama Papers case it would have been critical to establish chains of events like bank deposits and legal moves concerning shell corporations, etc.

I have a simpler use case in mind: Procedures. A Procedure would be one type of node, and a Step another type of node. Steps could have a BELONGS_TO relationship to Procedures. But we’d need to encode and work with the knowledge that Step 1 comes before Step 2, etc.

If you have an RDF answer in mind that’s fine. I am working with Neo4j and can import an RDF ontology. But I am also interested to know how to solve this in plain Neo4j, if anyone can tell me.

phil.taylor · March 8, 2021, 1:43pm

I’m not a Neo4j expert, but from a generalized graph perspective there are a number of ways to solve this. I am assuming from your description that you want to model the procedure in advance (and with apologies if I’m explaining something you already know at the concept level!).

The simplest approach is to add a directional ‘successor’ edge between two steps, A and B, such that ‘A–>B’ denotes that B is the successor to A; an optional inverse ‘predecessor’ edge can also be applied to denote ‘A<–B’ to support walking backwards, efficiently. Although deceptively simple, this is quite powerful. You can form and traverse arbitrary-length sequences armed only with the knowledge that any two adjoining steps within the sequence must be joined by a successor edge. This also allows forks and joins to be modeled as part of the sequence; for example where you need to support parallel tasks as a sub-sequence. Your traversal simply needs to be able to handle multiplicity of edge and vertex to take advantage of this. Further, you can decompose steps into sub-steps using the same approach; where for example, A = {A.1–>A.2}.

To widen this out a little, we can add further temporal attributes to describe when the real-world events take place, their duration, etc… Here, you might also add constraints that describe any in-built delay or other rules that affect task start/completion. The semantics will be down to your specific process requirements.

In fact, what I just described is nothing new. It’s from a decades-old technique called ‘Critical Path Method’ that has been used within project management to describe complex schedules in manufacturing and construction. It can be used for planning and historical record; and has a very natural ‘graph fit’. You can read more and see the basic visualization in the link below:-

abenodes · March 9, 2021, 2:12am

Thanks so much for this, @phil.taylor !

This context is extremely helpful. I had run across CPM years ago but forgot about it completely so your pointing out the connection is most welcome!

BobDuCharme · March 15, 2021, 1:37pm

This demo of how to use RDF lists may be handy: RDF lists and SPARQL

abenodes · March 20, 2021, 1:13am

Thanks, Bob, will investigate!