Archive for February, 2012

Misinformation and Propaganda in Cyberspace

Sunday, February 26th, 2012

Dear Readers,

The following is a blog that I wrote recently for a conference on “Truthiness in Digital Media” that is organized by the Berkman Center in March. It summarizes some of the research findings that have shaped my approach to the serious challenges that misinformation propagation poses in Cyberspace.

Do you have examples of misinformation or propaganda that you have seen on the Web or on Social Media? I would love to hear from you.

Takis Metaxas

 


Misinformation and Propaganda in Cyberspace

Since the early days of the discipline, Computer Scientists have always been interested in developing environments that exhibit well-understood and predictable behavior. If a computer system were to behave unpredictably, then we would look into the specifications and we would, in theory, be able to detect what went wrong, fix it, and move on. To this end, the World Wide Web, created by Tim Berners-Lee, was not expected to evolve into a system with unpredictable behavior. After all, the creation of WWW was enabled by three simple ideas: the introduction of the URL, a globally addressable system of files, the HTTP, a very simple communication protocol that allowed a computer to request and receive a file from another computer, and the HTML, a document-description language to simplify the development of documents that are easily readable by non-experts. Why, then, in a few years did we start to see the development of technical papers that included terms such as “propaganda” and “trust“?

Soon after its creation the Web began to grow exponentially because anyone could add to it. Anyone could be an author, without any guarantee of quality. The exponential growth of the Web necessitated the development of search engines (SEs) that gave us the opportunity to locate information fast. They grew so successful that they became the main providers of answers to any question one may have. It does not matter that several million documents may all contain the keywords we were including in our query, a good search engine will give us all the important ones in its top-10 results. We have developed a deep trust in these search results because we have so often found them to be valuable — or, when they are not, we might not notice it.

As SEs became popular and successful, Web spammers appeared. These are entities (people, organizations, businesses) who realized that they could exploit the trust that Web users placed in search engines. They would game the search engines manipulating the quality and relevance metrics so as to force their own content in the ever-important top-10 of a relevant search. The Search Engines noticed this and a battle with the web spammers ensued: For every good idea that search engines introduced to better index and retrieve web documents, the spammers would come up with a trick to exploit the new situation. When the SEs introduced keyword frequency for ranking, the spammers came up with keyword stuffing (lots of repeating keywords to give the impression of high relevance); for web site popularity, they responded with link farms (lots of interlinked sites belonging to the same spammer); in response to the descriptive nature of anchor text they detonated Google bombs (use irrelevant keywords as anchor text to target a web site); and for the famous PageRank, they introduced mutual admiration societies (collaborating spammers exchanging links to increase everyone’s PageRank). In fact, one can describe the evolution of search results ranking technology as a response to Web spamming tricks. And since for each spamming technique there is a corresponding propagandistic one, they became the propagandists of cyberspace.

Around 2004, the first elements of misinformation around elections started to appear, and political advisers recognized that, even though the Web was not a major component of electoral campaigns at the time, it would soon become one. If they could just “persuade” search engines to rank positive articles about their candidates highly, along with negative articles about their opponents, they could convince a few casual Web users that their message was more valid and get their votes. Elections in the US, after all, often depend in a small number of closely contested races.

Search Engines have certainly tried hard to limit the success of spammers, who are seen as exploiting this technology to achieve their goals. Search results were adjusted to be less easily spammable, even if this meant that some results were hand-picked rather than algorithmically produced. In fact, during the 2008 and the 2010 elections, searching  the Web for electoral candidates would yield results that contained official entries first: The candidate’s campaign sites, the office sites, and wikipedia entries topped the results, well above even well-respected news organizations. The embarrassment of being gamed and of the infamous “miserable failure” Google bomb would not be tolerated.

Around the same time we saw the development of the Social Web, networks that allow people connect, exchange ideas, air opinions, and keep up with their friends. The Social Web created opportunities both for spreading political (and other) messages, but also misinformation through spamming. In our research we have seen several examples of propagation of politically-motivated misinformation. During the important 2010 Special Senatorial election in MA, spammers used Twitter in order to create a Google bomb that would bring their own messages to the third position of the top-10 results by frequently repeating the same tweet. They also created the first Twitter bomb targeting individuals interested in the MASEN elections with misinformation about one of the candidates, and created a pre-fab Tweet factory imitating a grass-roots campaign, attacking news organizations and reporters (a technique known as “astroturfing“).

Like propaganda in society, spam will stay with us in cyberspace. And as we increasingly look to the Web for information, it is important that we are able to detect misinformation. Arguably, now is the most important time for such detection, since we do not currently have a system of trusted editors in cyberspace like that which has served us well in the past (newspapers, publishers, institutions). What can we do?

* Retweeting reveals communities of likely-minded people: There are 2 larger groups that naturally arise when one considers the retweeting patterns of those tweeting during the 2010 MA special election. Sampling reveals that the smaller contains liberals and the larger conservatives. The larger one appears to consist of 3 different subgroups.

Some promising research in social media has shown potential in using technology to detect astroturfing. In particular, the following rules hold true most (though not all) of the time:

  1. The credibility of the information you receive is related to the trust you have towards the original sender and to those who retweeted it.
  2. Not only do Twitter friends (those that you follow) reveal a similarly-minded community, their retweeting patterns make these communities stronger and more visible.
  3. While both truths and lies propagate in cyberspace, lies have shorter life-spans and are questioned more often.

While we might prefer an automatic way of detecting misinformation with the use of algorithms, this will not happen. Citizens of cyberspace must become educated about how to detect misinformation, be provided with tools that will help them question and verify information, and draw on the strengths of crowd sourcing through their own groups of trusted editors. This Berkman conference will help us push in this direction.

 

Social Experiments: People vs Machines and In-lab vs Online

Saturday, February 25th, 2012

Social Experiments: People vs Machines

Recently, I attended a couple of talks on conducting social experiments. I found them both very interesting for different reasons, and thought of giving you an overview in this posting.

The first talk was at MIT. The Dertouzos lecture was established after the death of MIT’s LCS Director Michael Dertouzos who, even though he left us early, he left behind a great legacy. Given the strong interest of Dertouzos in the inter-disciplinary nature of Computer Science, the choice of Prof. Michael Kearns of UPenn was a particularly appropriate choice. Here is the abstract of Michael’s talk:

“What do the theory of computation, economics and related fields have to say about the emerging phenomena of crowd sourcing and social computing? Most successful applications of crowd sourcing to date have been on problems we might consider “embarrassingly parallelizable” from a computational perspective. But the power of the social computation approach is already evident, and the road cleared for applying it to more challenging problems. In part towards this goal, for a number of years we have been conducting controlled human-subject experiments in distributed social computation in networks with only limited and local communication. These experiments cast a number of traditional computational problems — including graph coloring, consensus, independent set, market equilibria, biased voting and network formation — as games of strategic interaction in which subjects have financial incentives to collectively “compute” global solutions. I will overview and summarize the many behavioral findings from this line of experimentation, and draw broad comparisons to some of the predictions made by the theory of computation and microeconomics.”

Michael is interested in exploring how well would people be able to effectively crowd source in the lab, when presented with a variety of problems, from the computationally easy to the hard. Graph coloring is a hard problem for a computer (i.e., for any parallel or sequential algorithm we know so far). How well would 36 undergraduate students solve instances of graph coloring? Quite well, it turns out. See the video clip.

Finding consensus (e.g., having all nodes in a graph choose the same color) is an easy problem to solve by both sequential and parallel algorithms. Yet, when presented with a time limit, humans have troubles reaching consensus as they are not able to come up consistently with a successful strategy: some will change colors often, trying to accommodate their neighbors; others will stick stubbornly to their color expecting other to follow them, yet others will flip-flop a lot giving up at the wrong moment, etc. Experience does not seem to help: Playing this game over and over, seems to be teaching them little. See this video clip of 36 undergraduates finding consensus of a graph composed of highly interconnected tribes.

The two video clips I recorded on my iPad during his talk are only a small teaser of the work Michael Kearns presented. If you are interested, you should take a closer look at his published papers.

Social experiments in the lab vs online

The second talk was at the Berkman Center for Internet and Society. Fellow Jerome Hergueux’s talk was entitled “The Promises of Web-based Social Experiments.” He is interested in exploring how closely the results of experiments conducted online match those conducted in the lab. Here is the abstract of his talk:

“The advent of the internet provides social scientists with a fantastic tool for conducting behavioral experiments online at a very large-scale and at an affordable cost. It is surprising, however, how little research has leveraged the affordances of the internet to set up such social experiments so far.  In this talk, Jerome Hergueux will introduce the audience to one of the first online platforms specifically designed for conducting interactive social experiments over the internet to date. He will present the preliminary results of a randomized experiment that compares behavioral measures of social preferences obtained both in a traditional University laboratory and online, with a focus on engaging the audience in a reflection about the specificities, limitations and promises of online experimental economics as a tool for social science research”

Jerome and his colleagues at the University of Paris tried to re-create online as close as possible the environment of the labs that social scientists have used for a long time. They recruited subjects from the very same pool, and asked some of them to participate in experiments in a lab setting, while others were to participate in the very same experiments online. There were no interactions between the participants, though the ones in the lab would see who else had come for the experiments. What they found was that the results of the experiments differ! In particular they found that the online subjects seem to be significantly more social than those in the Lab: More altruistic, showing higher trust, and being less risk averse. While this is still preliminary work, it seems quite promising in giving us a better understanding on the transformation we undergo when we go online. You can watch the full talk of Jerome Hergueux from the Berkman’s site.

We still have a lot to learn about conducting social experiments, but these two talks are definitely helping in this direction.