Common Semantic Web misconceptions you've encountered?

The question applies to misconceptions you may have encountered on this site, or elsewhere, or even have (had) yourself relating to RDF(S)/OWL/SPARQL/Syntaxes/Linked Data/etc.

I'm motivated by confusion relating to the Open World Assumption and "constraints" in RDFS/OWL. These are misconceptions I had myself when I first saw rdfs:domain and rdfs:range, and then again for owl:cardinality et al., and that I've seen since many times elsewhere. (I wanted to write up some stuff about this before, but got a bit stumped on what else to write about.)

Are there any core misunderstandings of RDF & Co. that you've commonly encountered and would like to see set straight? Something beginners get wrong or initially misinterpret? Maybe something the community gets wrong all of the time? Something technical? Something specific? Something general?

How about...

RDF is not XML

XML is a syntax, a mark-up language.
RDF is a model, or framework for describing stuff using triples.
RDF/XML is one RDF syntax based on XML. There are others not based on XML (e.g., Turtle).
RDF is not based on XML.

See: http://www.oreillynet.com/xml/blog/2005/09/the_difference_between_xml_and.html

In fact, some go as far as to say that RDF/XML should be brought round back and shot (or at least maybe deprecated): http://www.phildawes.net/blog/2005/01/07/time-to-deprecate-rdfxml/

Sure. This sounds like a fun game :) Here are a bunch of statements that in my experience are false but are frequently believed:

  • Semantic Web is about unstructured text

(Actually rather untrue, though technologies like NLP often have nice synergies with Semantic Web technologies.)

  • Semantic Web data integration is all about query federation (EII) and never about warehousing/ETL

(I could and likely will write a full blog post on this topic some day, as I've had this conversation countless times and believe this misconception sets unrealistic expectations and actively harms enterprise adoption of Semantic Web technologies.)

  • Linked Data is all about dereferenceable URIs.

(This may be a popular position that I'm claiming is not true, but almost every single Linked Data application I've seen is really a SPARQL application instead, and makes little-to-no use of dereferencing identifiers.)

  • OWL is only useful for reasoning

(Many people find it useful for things as pedestrian as declaratively driving parts of a user interface.)

  • Inference is all about OWL (or RDFS)

(Counterpoint of the above; many people believe inference means ontologies, but I'd suggest that inference can just as easily be accomplished via rules (which in turn can be expressed in something like SPARQL (a la SPIN)).)

  • Ontologies require community-wide agreement.

(Ontologies can be developed top-down, bottom-up, or via an as-needed combination of both approaches. They're flexible!)

OK, I'm sure I could go on, but I'll just throw those out there.

Lee

Thanks to RDF/OWL/the Semantic Web, machines have access to the meaning of terms/documents.

While Sem Web technologies may help computers making very useful and "meaningful" processing of the data, there is no way one can put the meaning of things into a computer. I often hear people saying "ontologies describe the meaning of things". Maybe they can have the purpose of telling the meaning of things to humans, but from a machine's perspective, an ontology is just of logical theory. It can be used to compute new information. It's like a programme. It does not make computers aware of what the terms are referring to. There may even be Web ontologies that are not accurate but are useful for processing data.

Unfortunately, the term "semantic"---which refers to the "meaning"---is abused in the phrase "Semantic Web". To some extent, the Web of HTML can as much provide a notion of meaning to computers thanks to the googol of correlations between words that can be found in billions of interlinked textual documents. Hopefully, RDF, OWL, etc. can help achieving this, but in no way it is a feature of the so-called "Semantic Web".

Another misconception seems to arise from the term "RDF graph". It is sometimes misinterpreted to mean graphs as in computer science graphs where graph traversal algorithms are what-you-do with graphs. While RDF is a graph in the traditional sense, it seems to defy some expectations.

Symptoms often include asking questions about whether the graph is directed, fully connected, cyclic, what tools are used to visualize the graph, finding the shortest path between nodes, etc.

The cure, from my experience, is twofold:

  1. The two major pieces of an RDF graph is the triple (the basic unit of data) and the URI, which defines a resource. Two resource with the same URI are, by definition, the same thing. Hence a graph can be constructed with properties acting as graph edges. The triple is the basic building block and any graph can be constructed by a set of triples.
  2. The way to "traverse" the graph is through the SPARQL query language. But instead of graph traversal, the paradigm is to define a "graph pattern" that matches sub-graphs in the data.

The cure is often followed by the realization that RDF should be viewed as a data structure that one can query for information, just like one uses SQL to query relational tables. And this would be in alignment with Berners-Lee's original vision of moving from a "web of hyperlinks" to a "web of data". It's not just hyperlinks, but a queryable data structure.

'Semantic Web is too complicated' is one I often hear.

I can only really speak from a software development side of things, but in comparison to a lot of things software developers end up learning these days, Semantic Web actually simplifies / makes life easier in all sorts of ways. You can also be confident that what you are learning won't deprecate as quickly as other things and actually has the advantages of being standardised, qualified and futureproofed to an extent, by a respected process and genuine passionaries.

In the scheme of things, I actually think the success of HTML (above and beyond what it should have been) might have detrimentally impacted RDF, as it became engrained as 'the representation' for the web and accordingly practices, process and ideas were based on slightly narrow minded preconceptions of the web was or could be.

If RDF and Semantic Web had been presented as 'the web' and distinctions had not been made between the Semantic Web and the Web, then things might have been different. I know if I was to teach people about the web, then I wouldn't bother to distinguish between the two and RDF, not HTML would be more predominant.