i'm building a semantic search system over the data of Dbpedia . i nearly finished the algorithm work
but afterwards it seemed that the Dbpedia Linkeddata base is not enough to answer Many Questions
for example: populations , Grammy winners , Presidents of each country ..etc
so i've searched a lot and found : freebase , LinkedMDB and others in this page
but i was wondering about what's the best scalable way to Merge open linked data togther
for example my project now supports searching Literals and predicates from Dbpedia
if i'm adding freebase , their might be redundancy of literals or new predicates that might not exist before .i know that linked data is designed well for integration so what is the best practice to integrate with new open linked databases .
I see two strategies.
You can do a data cleaning stage, like the data cleaning done in enterprise data warehousing. Then you've got a fairly normal database that you can do simple SPARQL queries against and get the right answer.
If you merged DBpedia and Freebase, for instance, you'd translate the two of them into a common vocabulary and reconcile any differences before you write queries. If you had two different population numbers from the two data sources, for instance, you'd pick one at ingestion time.
The "clean up front" strategy is the most conservative and the safest bet for something you could do quickly with current technology.
An alternative approach is to "clean at query time;" here you'd have a system that analyzes your query and recognizes that different paths could be required to get data from Freebase and DBpedia. It automatically tries both of those paths, and if it gets contradictory results, it deals with this after doing the query.
The latter is an interesting research topic, but if you asked me how to do it, I'd tell you that it would probably need a large knowledge base to make decisions of what to do, and that knowledge base would overlap quite a bit with the cleaned DBpedia/Freebase merge needed for the other path.