Expressing Constraints using RDF/OWL or something else?

JamesBranigan · August 10, 2010, 10:00pm

I'm pretty new to designing data models with RDF. I've got a design problem that I'm trying to solve with my RDF based data model. I'm hoping someone here might have some pointers to a better solution than what I've come up with so far. I'm trying to create an OWL-DL compliant data model.

So the subjects in this problem are all tires. They are all instances of the tire class:

www.example.com/2010/tire

The tire class has several properties:

www.example.com/2010/manufacturer
www.example.com/2010/model
www.example.com/2010/serial-number
www.example.com/2010/pressure-reading
www.example.com/2010/installation-location
...

(Sorry for the missing http: elements above, the stack overflow engine thinks they are real hyperlinks and wouldn't let me do more than 5 as a new user)

There are several constraints that I know how to express in english, but not with RDF-S or OWL

Constraint 1) The tuple (manufacturer, serial number) globally defines a unique instance of a tire. Expressed another way, serial number defines the uniqueness of a tire instance within the context of a given manufacturer.

Constraint 2) Installation Location uniquely identifies a single install location within the context of a vehicle, but not globally(global uniqueness requires a tuple with the vehicle identifier and installation location).

Constraint 3) Exactly one tire must be installed at each installation location on a vehicle. (i.e. you can't have 2 front right tires on your car)

My current solutions is to encode these constraints in application code that interfaces to the data layer, but that means that they are really just conventions. Someone that just got the data and schema would not be able to tell that the constraints existed. So what I'm hoping is that there is a way to express this type of constraint in RDF-S, OWL or something other technology. This way someone processing the data in another programming language without going through my application code has the pointers they would need to avoid introducing data that violates the constraints.

Thanks for any pointers,

James

Signified · August 10, 2010, 10:00pm

Hi James,

First off, some background: it's important to note that much of what might seem like constraints in RDFS/OWL are not actually constraints. The problem is that RDFS/OWL abide by two relevant principles:

The first is the Open World Assumption (OWA): this means that OWL never assumes that your data is complete. Even though in OWL you can say things like, e.g., that all Parents must have at least one value for hasChild, this cannot be interpreted as a constraint. If I say John type Parent, and don't mention anything about children, OWL figures that John must have a child, even if that child is not named in the data... OWL doesn't flag the data as invalid: it just figures that the data might be incomplete. If OWL had the Closed World Assumption (CWA) – which it doesn't – then it would expect the data to be complete, and expect the child to be named. With OWA, OWL is not suitable out-of-the-box for checking "completeness" of data.

The second principle is the lack of a Unique Name Assumption (UNA): this means that OWL is uncertain as to whether two things are the same or not until proven one way or the other (or stated explicitly) – i.e., names are not necessarily distinct. Why is this important? Well say you've stated that all Persons should have two values for hasBiologicalParent. Now, you find out (maybe from a number of sources) that John type Person, John hasBiologicalParent William, John hasBiologicalParent Mary, John hasBiologicalParent Bill. Again, OWL won't sniff a problem: instead it will figure that some pair of names in { William, Mary, Bill } refer to the same real-world entity. Thus, OWL is not suitable for checking that you might have, e.g., "overloaded" some property.

For similar reasons in RDFS, things like rdfs:domain and rdfs:range – which masquerade as constraints – are open to confusion. Saying that hasChild rdfs:range Person does not mean that any value for hasChild should be typed as Person: OWA means that any value for hasChild can be automatically typed as Person, even if the data is incomplete (if x type Person is not explicitly given).

It should be noted that these are not omissions of OWL: rather features which are sympathetic to the Web. On the Web, you cannot expect complete data, rather pieces of jigsaws that coalesce into a bigger picture. Similarly, you cannot expect everyone to agree on using the same terms off the bat: instead, let them use different terms and try sort it all out later.

Anyways, to take an example of what you want:

Constraint 1) The tuple (manufacturer, serial number) globally defines a unique instance of a tire. Expressed another way, serial number defines the uniqueness of a tire instance within the context of a given manufacturer.

You can model something like this is OWL 2 using owl:hasKey: define compound keys (manufacturer, serialNumber) which together identify an instance of a class (a particular Tire). However, this is not to flag if you have two tires which have the same values for (manufacturer, serial number): if it finds two tires which do, the lack of UNA means that OWL will figure that those two tires are just names for the same tire.

It's worth noting perhaps that there is one way of emulating "constraints" in OWL using 'inconsistencies'. For example, if you know that Tire disjointWith Person – meaning that you can't have something that's both – and someone says John type Tire and John type Person, then OWL definitely knows that somethings up – UNA and CWA don't play a role.

From what you need, it sounds like a solution involving inconsistency would be at least cumbersome, if even possible. Some else might be able to sketch a solution, but I doubt it.

On the other hand, I'm aware of some works that look at using OWL under UNA and CWA (interpreting what look like constraints as constraints for local data). You might want to check out: http://clarkparsia.com/weblog/2009/02/11/integrity-constraints-for-owl/ I'm not knowledgeable about Pellet or the tool, but someone might be able to give working examples for your constraints.

(It's also worth noting that yours is a commonly observed "problem/feature" of RDFS/OWL. E.g., see the start of an old panel discussion for some senior researchers in the field bickering like old women about the topic. Also, to justify the long answer, I need this text for elsewhere, so comments welcome ;))

HolgerKnubl · August 10, 2010, 10:00pm

In our experience, using SPARQL is an attractive choice for representing constraints, and SPIN can be used to put those SPARQL constraints directly into the RDF model, associated with the classes where they belong. SPARQL gives you the "right" closed world semantics and is well supported by efficient database engines. A good example, and the big picture around SPIN can be found in a blog entry.

AntoineZimmermann · August 10, 2010, 10:00pm

Leaving out the debates on what is a constraint and what's not, this is a possible way of expressing what you describe.

Constraint 1: use keys for this one (a feature introduced in OWL 2 and already supported by some editors and reasoners). This is written this way:

ex:Tire a owl:Class ;
    owl:hasKey ( ex:manufacturer ex:serial-number ) .

This exactly models what you say in English, that is, that a pair (manufacturer,serial-bnumber) uniquely identifies one tire.

Constraint 2 and 3: you can say that a tire is uniquely identified by the vehicle it is installed on together with the location on the vehicle (assume a property on-vehicle that links tires to vehicles):

ex:Tire owl:hasKey ( ex:installation-location ex:on-vehicle ) .

You can add that installation-location and on-vehicle are functional, so that a tire can only be installed at one place:

ex:installation-location a owl:FunctionalProperty .
ex:on-vehicle a owl:FunctionalProperty .

You may have a class ex:InstallLocation which contains all the installation location, a class ex:Vehicle and a property ex:has-install-loc which relates vehicles to installation locations. This is how you can specificy that, for a given vehicle, there must be exactly on tire per installation location:

ex:has-install-loc a owl:ObjectProperty ;
    rdfs:domain ex:Vehicle ;
    rdfs:range ex:InstallLocation .
ex:Vehicle rdfs:subClassOf [
    a owl:Restriction ;
    owl:onProperty ex:has-install-loc ;
    owl:allValuesFrom [
        a owl:Restriction ;
        owl:onProperty [ owl:inverseOf ex:installation-location ] ;
        owl:cardinality 1 .
    ] .
] .

There may be other solutions, maybe simpler ones, but I could not figure out a better way.

Again, remember that these axioms are not imposing constraints on what is written in your data. The constraints are only defining what must be true in the world.

JeffSchmitz · August 10, 2010, 10:00pm

Hi James, I'll second Holger's suggestion, SPIN is a flexible, Object Oriented way to handle constraint checking in a model. Using SPIN you attach such constraints to the Class definition in your model to which they apply. For example, a check on the front right location installations on all the vehicles in a model (per your second constraint) would attach to your Vehicle class definition via the spin:constraint property and would look something like below (untested and I had to make some assumptions about your model, and there's probably better ways):

# One tire per location
ASK
WHERE {
   SELECT (count(?tire) as ?frCount)
   WHERE {?this ex:tire-installations ?tire .
          ?tire ex:installation-location ?fr .
           FILTER (?fr = "ex:frontRight"  && ?frCount != 1)}
}

The, using Jena/SPIN, you can get a list of any constraint errors that exist in your model with the following code:

   List<ConstraintViolation> cvs = SPINConstraints.check(
                myModel, null);

ConstraintViolation being a SPIN provided class that allows you to access all relevant info about the violation, and myModel being the Jena Model you are checking. Note the comment is important as spin includes it in the ConstraintViolations structure it returns via the getMessage() function. On return you can then process the constraint violation list as you need to, perhaps displaying the message to the user as well as the vehicle on which the error occurs.

Also note the ?this variable. This will be assigned to all instances of the class to which the constraint is attached, i.e. all the Vehicles in this example.

jeff