How will the semantic web counter spam?

steevc · November 1, 2009, 11:00pm

The current web is heavily polluted with various forms of spam. Can semantic technologies help? If the spammers start using fake metadata then it could reduce the usefulness of what we currently have.

Could we make use of personal networks to assign reputation to sites and people? I've heard of people using their FOAF network as an email filter, i.e. only accept incoming email from people in their extended network. Google's social search can give you search results using network data in your profile that can be based on FOAF and XFM data.

This question was partly inspired by the classic Metacrap article.

I know I keep mentioning FOAF and XFN, but those are the semantic technologies which I have played with most.

IanDavis · November 1, 2009, 11:00pm

I wrote recently about some of the vectors by which Spam could attack the semantic web. I don't offer any solutions but I think it's important to understand the vulnerable areas first. See http://iandavis.com/blog/2009/09/linked-data-spam-vectors

Andrew · November 1, 2009, 11:00pm

Perhaps the semantic web will be better able to accomodate advertising if the advertising has to be semantically relevant to gain linkage. The irritation about spam (discounting malware for a moment) is that the spam is frequently irrelevant. If it weren't irrelevant, I doubt people would be so irritated by it.

Obviously, there's nothing to stop spammers from, say, scanning the ontology referenced in some content and then randomly sprinkling the ontology's concepts all over their own advertisement. To detect that, you'd need something a bit more intelligent than a baseline semantic web application. Another evolutionary pressure towards general purpose AI.

I guess it depends what form of spam you're refering to:

unsolicited emails
comment spam
irrelevant wiki edits
semi-relevant answers that plug specific products...

Each of these has a different context that determines the contextual relevance requirements upon them.

MichelvanTol · November 1, 2009, 11:00pm

I believe one measure that is worth mentioning is the sha-1 encoding of the mailbox in FOAF, but obviously it can only be used to check whether a known value corresponds to it. Whether it really does counter spam, I don't know...