Rahul Sami

Rahul Sami is an associate professor at UMSI.

By Rahul Sami

Nowadays, people often use aggregated information from other users to make judgments about products and people. For example, Amazon.com recommends books to consider buying; Intrade.com provides forecasts of election outcomes; and TripAdvisor.com aggregates users’ hotel ratings. As Internet users increasingly rely on such guidance, there are stronger incentives for people with vested interests to subvert these aggregation systems to promote their own products or candidates. A number of recent cases have highlighted the danger that manipulative attacks pose to information aggregation systems.

One simple but often effective form of attack consists of ballot stuffing in online voting systems. For example, Time magazine annually holds an informal Internet poll to nominate the Time Person of the Year. This poll is often the target of ballot-stuffing campaigns, leading people to view the results with suspicion.

The most obvious instance of manipulation occurred in 2009, when members of the online forum 4chan successfully voted the 4chan founder to the top of this poll. The lack of credibility of this poll points to the main weakness in many information aggregation systems: It costs almost nothing to create multiple online identities and use them to vote for a chosen alternative, thereby allowing a few people to have a disproportionate influence on the outcome.

In other systems, it is difficult for a single person to operate multiple credible online identities. However, it is still possible for multiple people to coordinate to attack the system. Moreover, the Internet enables new marketplaces for buying and selling influence on information systems. In 2007, journalist Annalee Newitz demonstrated that she could boost an article to the front page of Digg by buying votes through a third-party service called User/Submitter, and in 2009, a Belkin employee used Amazon’s Mechanical Turk Service to pay workers to write positive reviews of Belkin products on review websites.

Personalized recommender systems, such as those used by Amazon and Netflix, provide some defense against straightforward boosting of a single product, because rating profiles that are very different from a genuine user’s profile tend to have little influence on her recommendations. However, if each attack rating profile is disguised to look like a genuine user, a personalized recommender can be particularly vulnerable. For users with niche tastes, there are few genuine users with similar profiles, and so even a few similar attack profiles can have a large effect on recommendations.

Along with growing realization of the threat of manipulation, a number of proposed and deployed techniques are being developed to defend against this threat. One approach is simply to require stronger identification to create online accounts. This makes attacks more difficult, but damages privacy online. A second direction is to detect anomalous behavior, such as bursts of coordinated rating, or anomalous rating profiles. Once detected, the identified profiles can be thrown out of the information system.

When there is a large pool of users, another promising approach is to create incentive schemes such that honest users have an incentive to quickly correct attacks or such that the cost of mounting a successful attack outweighs the benefit the attacker derives.

At UMSI, building on incentive-design techniques from prediction markets, we are developing information aggregation algorithms that limit a profile’s influence based on its historical contribution of useful information. This ensures that a would-be attacker has to make equally valuable contributions to the system before she can carry out a damaging attack, and hence the attacker’s net damage is minimal.

As we defend against manipulation in information aggregation systems, we should not lose sight of an unavoidable tradeoff: The more skeptically we treat contributed information (whether by limiting influence, dropping anomalous profiles, or other techniques), the more likely we are to also disregard some honest information. This “collateral damage” can severely harm the systems we are trying to defend. Understanding and improving the tradeoffs we make between responsiveness and defense against attack is an exciting–and important–challenge for future research in this area.