Does anybody else start with the A-Box first?

database_animal · January 19, 2012, 11:00pm

One of the most common patterns I see in questions here is "I made an ontology in Protege and now I need to put instance data in... HELP!"

I've got a sneaking suspicion is that many of these people aren't going to do any inference at all with their schemas... The only role the schema plays is human-readable documentation. In a lot of ways I think their perception of RDFS is distorted by experience with relational databases and XML Schema.

On the other hand, what's unique about RDFS and OWL is that it can be used to define ruleboxes that connect different vocabularies. All of the projects where I've used RDFS and OWL have either involved: (i) making sense of large A-Boxes such as DBpedia and Freebase or (ii) making it possible for people to query a private vocabulary with well-known predicates such as foaf:maker and dcterms:creator. Is anybody else doing this?

lee · January 19, 2012, 11:00pm

I've got a sneaking suspicion is that many of these people aren't going to do any inference at all with their schemas... The only role the schema plays is human-readable documentation.

Seems to be a bit of a false dichotomy to me. We use OWL all the time in contexts where we're not running inference or reasoning or rules based on it, but also not as primarily human-readable documentation. We find that OWL is very useful as an expressive data modeling language, and we use it that way so that, e.g., many elements of an application's user interface can be driven by the semantics given in a schema or ontology.

Signified · January 19, 2012, 11:00pm

One of the most common patterns I see in questions here is "I made an ontology in Protege and now I need to put instance data in... HELP!"

+1.

I've got a sneaking suspicion is that many of these people aren't going to do any inference at all with their schemas... The only role the schema plays is human-readable documentation.

0. Maybe in some cases, but then I figure they're not using the standards appropriately. There's much more effective and aesthetically pleasing ways to document vocabularies that don't involve RDFS or OWL. Like simple class/property-domain models, or HTML with nice looking pictures. Further, RDFS and OWL are far too widely misunderstood to form the basis of human-readable documentation.

In a lot of ways I think their perception of RDFS is distorted by experience with relational databases and XML Schema.

+1.

On the other hand, what's unique about RDFS and OWL is that it can be used to define ruleboxes that connect different vocabularies. All of the projects where I've used RDFS and OWL have either involved: (i) making sense of large A-Boxes such as DBpedia and Freebase or (ii) making it possible for people to query a private vocabulary with well-known predicates such as foaf:maker and dcterms:creator. Is anybody else doing this?

If I rephrase it as "do people often start with data and then worry about the model?"; then yes! Such a methodology would be particularly common for projects centred around legacy data (of whatever kin). For example, the DBpedia ontology/property-model is a direct result of the raw Wikipedia info-boxes and structure.

Where legacy data does not exist, of course it makes sense to create a model to encourage the creation of such data; for example, FOAF, voiD, etc. go down this path. Such folks might have an idea of what kind of data they want to model, but the (hopefully generic and flexible) model comes first.

Of course, this is a very Linked-Data-esque perspective. People doing more traditional ontology modelling may be representing most of their knowledge-base as T-Box. This is where a tool like Protege excels. Genuine use-cases for full-fledged OWL reasoning are, however, currently much rarer than the number of use-cases that have been proposed. Genuine use-cases should involve complex, rigorously defined legacy models: the types of models that are only prevalent in a few areas such as health care and life sciences, law, maybe manufacturing.

AntoineZimmermann · January 19, 2012, 11:00pm

One of the first things you learn at school about information systems is that you must design the model of your system before implementing it and before putting data into it. This has nothing to do with relational databases or XML databases. It's a general rule. Indeed, how can you even write a triple if you don't know what predicate to use. Would you mint a new predicate URI for each triple you write, without knowing the relationships between those properties?

So, whether you do an Entity-Relationship model, a UML model, etc, you need some kind of schema (possibly borrow an existing one, of course). But when you deal with RDF, it's perfectly natural to make your schema in RDFS or OWL, just like you make an XML Schema for XML data. DBpedia is no exception: they had to define a schema first in order to write the right triples. The properties and classes used by DBPedia did not define themselves out of the ABox triples. Even less the relationships between these terms. The fact that DBpedia datasets are built automatically from existing data does not mean that the ABox was produced before deciding the terms.

All of the projects where I've used RDFS and OWL have either involved: (i) making sense of large A-Boxes such as DBpedia and Freebase or (ii) making it possible for people to query a private vocabulary with well-known predicates such as foaf:maker and dcterms:creator.

Regarding (i), it seems you are talking about using RDF data that already exist, not about producing the ABox before making the schema, so I don't think it really makes a point. Regarding (ii), you are right, RDFS or OWL can be used to document the terms, therefore help people who want to query the data. But it's more than that: it makes the definition of the terms, their nature and their relationships explicit, to people and to software agents.