What are the theoretical underpinnings of the T-Box/A-Box split? It seems divorced from reality to have the description of the world separate from the world itself. Naively I have heard it said that the situation is similar to the the first-order/second-order logic divide where second order logic can talk about the logical propositions themselves. Is that a correct analogy?
The split is not necessary and is not really analogous to first-order/second-order logic.
The semantics of OWL (and related DL logics) applies to the complete set of assertions and makes no separation of t-box from a-box. Indeed constructs such as owl:hasValue
and owl:oneOf
rather blur the distinction anyway allowing a-box like assertions to be make using the t-box like machinery. I don't think there is any mention of a-box/t-box in the OWL (1) semantics document.
Even though it is not required there is sometimes some convenience to thinking in terms of a split.
Firstly, when writing an ontology you are often mostly defining classes and properties that you expect to reuse in lots of different sets of instance data. Thus the ontology is often mostly t-box assertions and you can think of the different instance datasets as a-boxes which will be coupled to the t-box (either explicitly with owl:imports
or implicitly in your processing system). However, it is pretty common, and perfectly reasonable, for such ontologies to have a few individuals present (e.g. as symbolic labels) and so not be "pure" t-boxes.
The other convenience revolves around reasoning technology. The tableaux reasoning algorithms used for DL are particularly well suited to reasoning about relationships between classes and originally tended to scale less well when dealing with individuals (e.g. use of nominals, owl:oneOf
, was particularly expensive in earlier reasoners I believe). Whereas rule based reasoners and databases are good at dealing with lots of simple assertions about individuals but less good a reasoning with sets. So when checking consistency of your big ontology you might use a tableaux reasoner but then when "applying" the ontology to a pile a data you might use a rule reasoner.
This may be considered a nit-picking answer, but since you asked for "necessity" and "theoretic underpinnings", you might still be interested in the following technical aspect.
As Dave said, the split is not technically necessary, and I would like to add that it isn't even always possible in Semantic Web reasoning. You can always do the splitting in OWL DL and its fragments (OWL Lite, OWL EL, etc.), but not in RDFS. For example, the axiom
ex:c rdfs:subClassOf ex:d .
is clearly a TBox axiom in OWL DL but not an ABox axiom. However, in RDFS, this axiom does not only state a subsumption relationship between two sets, as OWL DL does, but it also relates the two resources (individuals!) ex:c and ex:d by the property(!) rdfs:subClassOf. Hence, it is also a property assertion and, therefore, an ABox axiom in RDFS.
In OWL DL, classes are sets, while in RDFS a class is an individual with an associated set, the so called "class extension" of the individual. The RDF Semantics specification [1] defines several layers of semantic expressivity, where RDFS (the third layer) is a "semantic extension" of RDF (the second layer), and this means that the RDFS semantics inherits all semantic meaning of an RDF graph from the RDF semantics. Now, in the RDF semantics layer, the triple above is only an ordinary property assertion. Only the semantics of RDFS introduces the notion of a class and class-related semantics for the property rdfs:subClassOf. The semantic meaning defined by RDF is not removed from the triple above, though, but it is fully retained. Hence, the above can be seen as a mixed ABox/TBox axiom. And, in general, you will /never/ find axioms that are soley TBox axiom in RDFS, since there will always be the additional RDF-inherited semantics for all the RDF triples that encode the axiom.
Btw, in the "RDF family" of ontology languages, this phenomenon of ABox/TBox mixture propagates to the level of OWL. For example, the axiom
ex:c owl:equivalentClass ex:d .
is only a property assertion (ABox) in RDFS, but it is a TBox axiom in OWL Full, which introduces class-related semantics for owl:equivalentClass. But since OWL Full is layered on top of RDFS, just in the way RDFS is layered on top of RDF, namely as a semantic extension, the axiom still keeps being an ABox axiom. So, in contrast to OWL DL, in OWL Full you also never have a clear separation between the ABox and the TBox.
Dave mentioned that such a split allows for re-using many A-Box sets of data against a T-Box, but this split also allows re-use in the opposite direction. There can often be many, and sometimes conflicting views of the semantics of any problem space. Having such a split allows you to easily see a single set of instance or A-Box data through the lenses of several different T-Box ontologies and see what the different views infer.
(1) You might want to use the same A-Box with several different T-Boxes or use the same T-Box with different A-Boxes.
For instance, you might want to add some facts to a T-Box to reflect your own point of view, or to support one particular operation. For instance, Freebase distinguishes between a :film.film_actor, a :tv.tv_actor, and a :theatre.theatre_actor. Maybe you don't want to make that distinction, and just union them to create a :UnisexActor. Now maybe you think an :Actor is a :UnisexActor who's male and an :Actress is a :UnisexActor who's female, and you can easily create these classes by applying restrictions.
Now maybe somebody else thinks you're a sexist pig because you use sexist language, but she's got her own disagreements with the Freebase data model, so she's free to make her own T-Box. You can both use the same A-Box, but process them in different ways because you used a different T-Box.
On the other hand, a company like Amdocs might need to do some reasoning about a customer, and they might poll a number of enterprise databases to make an A-Boxes about particular customers. They might put business rules in a T-Box, and apply the same T-Box to millions of different A-Boxes.
(2) Practically, OWL becomes undecidable if you let a resource be both a class and an instance. OWL DL is a subset of OWL Full that has a few restrictions, including this one. If you think of classes and instances as separate, that's another reason to think of a difference between the A-Box and the T-Box.
For particular applications, separation of classes and instances is easy to live with, but the separation doesn't make sense in terms of commonsense reasoning. For instance, a car parts database might think a :2010_Honda_Fit_Base_Model is an instance which is a member of :2010_Honda_Fit and :Compact_Car categories. On the other hand, the state's car registration database might think that :2010_Honda_Fit_Base_Model is a class and my car is an instance.