Department

Computer Science

Document Type

Conference Proceeding

Publication Date

5-2005

Abstract

Web spamming, the practice of introducing artificial text and links into web pages to affect the results of searches, has been recognized as a major problem for search engines. It is also a serious problem for users because they are not aware of it and they tend to confuse trusting the search engine with trusting the results of a search [16]. The parallels between web spamming on the internet and propaganda in the real world suggest that we can use anti-propaganda techniques to educate users and develop tools to help them evaluate the reliability of the information they find online.

In this paper, we first analyze the effects that web spam has on the evolution of the search engines and their relationship to propagandistic techniques in society. Then, we examine the neighborhoods of untrustworthy sites, finding that a dense biconnected component (BCCs) containing the site provide a reasonable trust neighborhood that has parallels in social network theory. The fact that spammers employ propagandistic techniques enables us to design a heuristic that follows anti-propagandistic practices in order to recognize a spamming network. In society, recognition of an untrustworthy message (in the opinion of a particular person or other social entity) is a reason for questioning the entities that recommend the message. Entities that are found to strongly support more untrustworthy messages become untrustworthy themselves. So, social distrust is propagated backwards for a number of steps. Our heuristic simulates this behavior on the trust neighborhood of a spammer.

In our experiments, we examined trust neighborhoods of web sites, both trustworthy and not. Our findings suggest that spamming networks can be reliably recognized from their relationship to a single untrustworthy starting point by backward propagation of distrust. Further, nodes involved in a spamming network can be divided into two groups: those that have content similar to the starting site (aka “link farms”), and those that have dissimilar content (aka “mutual admiration societies”). Our tool explores thousands of nodes within minutes and could be deployed at the browser level, making it possible to resolve the moral question of who should be making the decision of weeding out spammers in favor of the end user.

[16] L. Graham and P. T. Metaxas. “Of course it’s true; i saw it on the internet!”: Critical thinking in the internet era. Commun. ACM, 46(5):70–75, 2003.

Comments

appeared in Adversarial Information Retrieval (AIRWeb), WWW 2005 Conference, Chiba, Japan

Version

Pre-print

Share

COinS