Are semantic technologies ready for commercial use?

Hi, I would like to introduce myself. I'm student and thesis of my work is to prove that semantic technologies are ready for commercial use. I create this analysis in cooperation with one company which is interested in the technology, but it's necessary for sem. technologies to be competitive.

Therefore, please compare the technology with other technologies which are used nowadays( performance,effectivity, stability, fast development and price of eg. relation databases(Oracle, MS SQL), Casandra.. in comparison to Semantic databases etc. I know about many scientific products (DBPedia) and also some small business products( small in means that maybe only 1% of the product run on semantic technologies and the rest is still old fashioned one). I want to realize whether the uninterest in the semantic technologies is because of low maturity and big performance issue or because of lack of knowledge of programmers. The company wants to create systems based on semantic technologies but from the bottom. Not just the part of system ( e.g. only FOAF), but the whole data structure and everything around. Maybe it will be bit restrictive but company already choose OWLIM to work with therefore any experience is appreciated. Have on mind that the main comodity for the company is money! They need 100% stable system which can handle millions of triples, thousands of requests per day + some basic reasoning(espacially forward inferencing - therefore OWLIM) and development have to be fast enough + added value for customer which will give company advantage on market. Simply it has to have equal or better ratio than current products/technologies. Thank you for answers, Marek

Edit: Probably I have to be more specific, because this discussion is going into way I didn't mean to go.

  1. You are mentioning publishing RDF data on web but it was NOT my point. According my quick research majority of webpages don't provide RDF data. Maybe in future they will be, but now they don't.
  2. According to me, SW has two major advantages over relation database systems - ease of integration and reasoning which can bring added value to stored data but... there are some issues(mentioning later)
  3. You can do reasoning over data which is big advantage of SW, but if you try LUBM 1000 reasoning with OWL2-RL ruleset you will see what I'm talking about.(hours of reasoning on average machine) I know it is just generic test and you don't need to use OWL2-RL ruleset, but as some measurment it serves well.
  4. Integration is really faster and better, but if I will have to spend hours of optimization to let things run fast, is it worth it?
  5. Maturity - e.g. I ran SPARQL query SELECT COUNT() WHERE { conditions }, it returns number 12000. Then I ran INSERT someTriple WHERE { sameCondition } and I checked it with SELECT COUNT() WHERE { someTriple } and it return 11800. Why 200 triples didn't INSERT when the condition of inserting is the same as condition of counting? Bug.
  6. Maturity - Using MINUS in sparql takes absolutely insane time
  7. Sesame Console on Java 7? I can't run it.
  8. According to standard there were more ways how to do some query, but only few worked correctly ...etc.
  9. I'm mentioning only wrong experience on purpose. Just to explain that these basic features of the technology still don't behave absolutely bug-free. I DO NOT say everything is buggy, unstable etc. This is just opposite view of some comments which seems to me as if everything is fine with the technology and I'm the wrong which is asking such question.

Maybe it seems so but I'm NOT against technology. (I'm doing it because I truly like the way SW works ) I just want to point out some things and start productive discussion about the topic. Pls, don't consider it as flamewar, but as productive discussion with all pros and cons with some conclusion. Thank you for all answers (especially Signified's and database_animal's answer)

Here are my thoughts.

I think SPARQL 1.1 was a necessary step for semantic web commercialization. Before 1.1, a SQL developer could easily look at SPARQL as being hopelessly inferior to SQL. Now that's not the case.

Compared to relational products, I think native triple stores are relatively strong for "data warehousing" applications and weak for OLTP. I work with billion-triple data sets, and if I wasn't doing batch processing and bulk loading I just wouldn't be able to do the stuff that I'm doing.

Products like OWLIM that use forward-chaining inference are very good at some things -- the BBC World Cup example is a great example. There are other backwards-chaining products such as Virtuoso and Revelytix that let you do SPARQL queries with inference on a live relational database... and that's compelling too.

What's great about RDF, RDFS and OWL is that they can be implemented with different technologies that have different performance characteristics. The standards mean that different products are compatible so you can choose one standard and have a lot of choices.

One delusion people labor under is the idea that they're going to have "a triple store." More likely, you're going to have a lot of triple stores. I use little in-memory triple stores (Jena) the same way people use hashtables in PHP. If I put all of my data into one big triple store, I'd have terrible data management problems. I'm very happy to use different products for different purposes -- it's better to say you're committed to RDF than to say you're commited to one particular triple store from one vendor.

Other than that, I think a lot of software developers aren't that interested in trying new things. RDF is deceptively simple, which has a few consequences. A good RDBMS developer will look at RDF and see what's missing -- I know I did. Relational technology is entrenched because it's very well-developed. It's very easy to misread RDFS and OWL and not understand the real value of inference.

I think that many of people who got into RDF from the academic side don't have a real appreciation of what it takes to make commercial products, and even a lot of people from the business world don't. A lot of people have the idea that they only need to do the 20% of the work that gets you 80% there and don't want to do the fit and finish work that means a customer will hand over money for your product because it "just works" and they don't need to pay developers to deal with B.S.

I met an aspiring actor at a supermarket in Hollywood and we talked about his work and some of my misadventures working with an amateur filmmaker. I told him about the corners I thought I could get away with cutting and he told me, no, you can't cut corners if you want to be a pro. He's right.

I'm wondering, how does one know that a technology is ready for commercial use? I believe that there are sufficient examples of companies using Semantic Web technologies to say that it is commercially ready. It's not only about using the triple stores and other software applications. It's also about using the formats (RDF(S), RDFa, OWL). Facebook, The New York Times, Bestbuy.com, Tesco, Volkswagen, BBC, etc. publish RDF data online. You may think that this is just putting something in a certain format on the Web and it does not qualify as "a commercial use of the techonology". But think about it: if data is overwhelmingly put online in RDF and interlinked, then no matter what you think of the technologies, if you want to make a startup that process Web data at large, you need to get advantage of the growing Linked Data Cloud. So the fact that big companies publish RDF certainly makes huge commercial opportunities for RDF technologies.

Starting from there, the comparison with "old school" technologies becomes meaningless. If you want to process RDF data, you have to use an RDF processor. Sure, you could store RDF data in a relational database. But benchmarks show that native RDF triple stores are performing better on common SPARQL queries. In any case, I find inappropriate to compare Semantic Web technologies to "traditional" database technologies. They are not competing against each other. A relational database is exactly what most companies need for their enterprise information systems. Semantic Web technologies are suitable for integrating, querying, processing data from multiple, mostly independent sources, or to publish data online to increase integration opportunities. The BBC Use Case explains it well: it was simply too cumbersome to do what they do with traditional technologies.

If I may use a comparison, I would say comparing relational DBs with triple stores is like comparing the language C with Javascript. Today, it doesn't make sense to write a C programme to handle the dynamic behaviour of a Web page. Yet, C is probably a more efficient, stable and well known language than Javascript.

A final remark regarding scalability. It's often said that semantic technologies will fail as long as they cannot scale. While scalability is indeed a very important issue for Semantic Web technologies, it's certainly not scalability alone that can prevent those technologies to succeed. There are many successful commercial applications that are not scalable. Just a few examples: MP3 player software is not capable of loading a 1,000,000 track library; Web browsers are not able to load even 1,000,000th of a per cent of the Web. But this is not a problem because who needs to manage a 1,000,000 track music library? Who needs to load many thousands of Web pages simultaneously? Originally, RDF was meant to be used as a format for small metadata chunks to go with documents. Even though today RDF has switched to a universal data exchange format, there are certainly use cases remaining that simply asks for processing small data chunks. And small data chunks abound on the Web.

Before I add to the answer, its worth looking at the success stories of Semantic Web see that many large organisations have been succesfully employing Semantic Web technology:

http://answers.semanticweb.com/questions/1533/what-are-the-success-stories-of-the-semantic-weblinked-data

To further add to the above, try dereferencing any facebook graph URI using a text/turtle accepts header and you will get back RDF. So thats all of facebook. (EDIT Also worth noting was Experian's recent aquistion of Garlik, a massively Semantic Web backed company, mainly for its cutting edge technology). The point is big corporations are already using RDF and RDFa and SPARQL.

@database_animal makes some good points about SPARQL. SPARQL 1 wasn't really very useful. SPARQL 1.1, with all it's bits is. The big problem is that bits of it are still working draft and look pretty immature and more importantly, very few vendors have really implemented quite important bits (the federation extensions are awesomely useful as would a standardised update protocol).

So in my mind, it is still early days, but definitely ready for adoption in almost any project I come across and can and should be used extensively and substitute many tranditional technologies (e.g Triplestore + SPARQL instead of RDBMS). Back when the Web was starting out, there were lots of people who were saying it wouldn't work and many software engineers who didn't care very much for it and were happy to carry on regardless using closed proprietary systems etc. etc. It took a while to become mainstream and bore the brunt of many cowboy builders and machivellian marketeers, many of whom still exist in the same industry. As database_animal states, it doesn't lend itself to people who want to cut corners and in many projects, thats just what happens.

There are other thing you can draw similarities too. It took a long time for people to acknowledge Web Accessibility guidelines and even now, these get ignored or are only enforced by law. Semantic Web might go down the same route (public bodies by law having to make there data open and accessible). Added to that, Semantic Web, although isn't mainstream fashionable technology, seems to always improve and enhance fashionable technology, suggesting there's still nothing that achieves the things it does right here and now (not to suggest the things in store for the future). There is a funny report that comes out every so often (can't remember the name) that lots of developers read, that basically tells people what they should and shouldn't be using. Not once have I seen Semantic Web or RDF etc. on there, but for nearly everything in there I think Semantic Web solves the problem.

I present Semantic Web to people quite often in my (new) workplace or at user groups. I did this recently and was surprised by the results. Quite a few people didn't attend and some disdainedy. But those that did really learnt something. A few to my complete surprise actually went off and started to use RDF and SPARQL, without any intervention from myself, to the point that we almost got a project out of it (we might still do so). I was also surprised when someone who attended sent me the link below that showed some really interesting stuff coming out of Microsoft (and even more importantly showed they appreciated some quite fascinating use cases of Semantic Web):

http://www.youtube.com/watch?v=-SGPEUuG1I8

This says to me that Semantic Web will also come out of people's activties without even knowing they are contributing to it. It also says to me that there some really good stuff close to being released that will change the landscape completely.