How to Ensure Consistency in a Triple Store?

biancanevo · April 27, 2011, 10:00pm

Is there any triple store that guarantees that the triples inserted into the store do not cause inconsistencies? As an example let's consider this set of triples added into a Triple Store.

:a rdf:type :d .
:a rdfs:subClassOf :d .
:count rdfs:domain :a .
:count rdfs:range xsd:integer .
:x :count "mamma mia"^^xsd:string .

This is an example of two inconsistencies:

First a Node is declared both as being of a type and being a subclass of it
Second I declare range of an attribute being an integer and then I add a triple putting a string as its value hence violating the range of the property

In case of detected inconsistencies, is it possible to infer the missing triples? Remove triples causing inconsistencies? Or just notify back to the submitter of the triples that these caused inconsistency?

Regards.

scotthenninger · April 27, 2011, 10:00pm

First it should be made clear that what you have is perfectly valid RDF. If by consistency you mean OWL consistency, then it is also consistent. Punning is allowed in OWL 2. Type mismatches are not detected as an inconsistency in OWL. In OWL inconsistencies are basically contradictions. For example a resource is a member of two classes declared disjoint or two resources are declared both same as and different from each other, then the model is inconsistent.

Also note that "In case of detected inconsistencies, is it possible to infer the missing triples?" is inconsistent with how RDFS/OWL reasoning works. Reasoners will infer certain types of missing triples. For example :x in your example above is a member of :a because of the rdfs:domain statement on :count. OWL (actually RDFS) semantics dictate adding the triple {:x rdf:type :a}. It's not that this missing triple is an "inconsistency", but that the triple is entailed by the model and a RDFS or OWL reasoner will add it. Inconsistency is a different concept in RDFS/OWL, per the previous paragraph.

If you want to do general data sanitation rules, then one option is to use SPIN constraints, specifically 2.1 Constraints. For example, the following will detect the "can't be a type of a subclass" rule:

# resource can't be a type of a subclass
ASK WHERE {
    ?this a ?cls .
    ?this rdfs:subClassOf* ?cls .
}

SPIN constraints and rules are defined on a class and are applied to all members of a class (including rdfs:subClassOf entailments). The ?this variable refers to the member being inspected. For example, if you want to apply the above to all member of :d then you would declare:

:d spin:constraint [ …spin rdf representation of above query ]

If you want to apply it to all classes in your model, then apply to owl:Thing.

And the following constraint rule will detect the "range consistency" rule:

# property value must match rdfs:range value
ASK WHERE {
    ?this ?prop ?value .
    ?prop rdfs:range ?range .
    FILTER (datatype(?value) != ?range) .
}

These are short versions of the constraint. You could also use a CONSTRUCT to build a constraint violation structures that include possible fixes, etc. See the SPIN documents for more.

harschware · April 27, 2011, 10:00pm

I'm not sure if any of the triplestores have this built in. What you want is the equivalent of RDBMS stored procedures or constraints checking (see also question on this site "Whats missing from RDF Databases?" ). This can be accomplished by using SPIN. If it were me I would try adding the SPIN reference API in my pipeline and define some SPARQL based rules and run them against the incoming triples plus the union of the current dataset. If SPIN flags any errors you could toss out the whole batch. Or perhaps SPIN would be granular enough to show you just the triples that are violating the constraints.

TopBraidLive server utilizes SPIN and has the ability to store triples using other vendor triple stores, so it may do what you want out of the box.