Anonymisation methods for RDF?

There seems to be a lot of research on anonymisation methods for graph structured data to protect privacy in online social networks(Zhou, et al. 2008), but where can I find similar methods for anonymising RDF and privacy in the semantic web in general?

It seems to me that privacy preservation in the Linked data model is even more complex, given that: (I'm new to this, so please correct me)

  1. RDF and OWL is more sensitive to information loss which typically occurs with clustering methods. By masking nodes and edges with sensitive attributes and modifying the graph structure, the certainty of inferencing and DL is significantly compromised.
  2. Federated SPARQL queries and the Linked Data model offer adversarial attackers even more leverage (and convenience) in utilizing their background knowledge to target invdividuals.
  3. Some features of the RDF model such as directionality (in graph terms), are not covered by current anonymisation methods.

For someone who wants to publish anonmyised personal data as Linked Data - if only to create FOAF Persons and attributes (on the behalf of my friends :), where do I turn to?

Good question. My gut says that anonymisation on the infrastructural level of RDF/RDFS/OWL would be difficult, and that it would be up to domain experts to make informed decisions. My initial impression was that, as a crude analogy, it would be like asking how to anonymise JSON ... i.e., it depends on the JSON.

On the other hand, there's an interesting parallel between "entity disambiguation" or "sameAs mining" (that look to resolve identity through looking for potentially complex or approximate "keys" ... e.g., birthday and full name, or username and site, etc.) and de-anonymisation techniques. So maybe you could take that research and invert it. :)

/2 cents

Wouldn't this be as easy as making your RDF data not public? And if you have a SPARQL end-point you would also make that private.

Otherwise it doesn't make much sense to use Linked Data, which is all about collaboration and sharing data, if you're not willing to share.