Ontology version control systems

utapyngo · February 10, 2012, 11:00pm

When developing ontologies it is important to use a version control system. But ordinary version control systems, like Subversion, Git, Bazaar and Mercurial do not take into account the graph nature of ontologies. They track every file change, even if you just swap two lines, which does not actually change an ontology.

Developing a separate version control system for ontologies is an interesting task, but it would be too much work (new GUI, new protocols, integration with bugtrackers). What is the easiest way to extend an existing version control system to make it take ontology semantics into account?

We know about OWLDiff. It could be used to extend an existing VCS. Detecting changes in OWL is easier than in RDF(S), considering the blank nodes problem, see the article.

Using N3 looks like a good temporary solution, but what about automatic commit messages? Like

2 new classes: Class1, Class2 (subclass of Class1)
1 class removed: Class3
1 class changed:
    Class4: new superclass: "Class1 and hasProperty value Class2"
2 new individuals: Individual1 (a Class4), Individual2 (a Class2, a Class1)
1 new fact: Invididual1 hasProperty Individual2

Such commit messages could be even structured (using OMV, for example) to allow advanced search over them. The question is about existing VCS which can be extended in such a way.

Here is my attempt to implement the features I was telling about in this question: http://code.google.com/p/ontovcs. It can be used together with existing version control systems, such as Git and Mercurial, and is able to provide summary like the one I mentioned above. It is faster than OWLDiff (probably because it does not use reasoner). It also contains a simple three-way merge tool for OWL ontologies. It does not provide any search capabilites, though.

Carl · February 10, 2012, 11:00pm

We use N3 notation which we post-process the ordering of and then check it into subversion.

RobVesse · February 10, 2012, 11:00pm

Can't answer your question directly but this paper from ISWC 2009 might be of interest to you

On detecting high-level changes in RDF/S KBs by Papavassiliou, V. and Flouris, G. and Fundulaki, I. and Kotzinos, D. and Christophides, V.

Edit

If you have Blank Nodes then you may want to look at the approach taken by RDFSync: efficient remote synchronization of RDF models by Tummarello, G. and Morbidoni, C. and Bachmann-Gmur, R. and Erling, O.

RDF diffs are doable when the graph contains blank nodes it just makes things more complex to compute. Essentially the approach is to decompose the graph into MSGs (Minimal Spanning Graph) and you can then compare the list of MSGs to discover differences.

And speaking from experience implementing an algorithm for doing RDF diffs based around the approach outlined in their paper is relatively easy though depending on the implementation may need a decent graph isomorphism algorithm as well which is a whole other issue.

RobertHoehn · February 10, 2012, 11:00pm

To identify changes in the semantics of an ontology, you could try OWL Diff. As far as I know, it does not interact with SVN or other version control systems (yet).

There is also a paper (which won the KR best paper award) about the problem of finding differences between DL-Lite ontologies.

tobyink · February 10, 2012, 11:00pm

Jeremy Carroll has documented a canonical way of assigning labels to blank nodes http://www.hpl.hp.com/techreports/2003/HPL-2003-142.pdf. This works with most graphs. Combined with canonical whitespace for N-Triples, it provides a canonical string form for most graphs.

Unfortunately, adding a single blank-node-containing triple to the graph has the potential to relabel every blank node. :-(

Really for RDF diffs the only solution is to skolemize each blank node on input (i.e. assign it a URI). I believe this is what Talis does. (Their RDF store has pretty good support for diffs and versioning.)

HolgerKnubl · February 10, 2012, 11:00pm

TopBraid Composer (that is built into Eclipse and therefore has convenient access to Git, SVN or CVS) has a little-known option in the I/O Preferences that will use a sorted Turtle writer. This algorithm should work well with line-based versioning systems, as it will sort the resources and then their properties and objects alphabetically.