You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

Artificial Intelligence, your brain, and other things you cannot trust about politics


A few days ago the Center for Research on Computation and Society organized a workshop with the provocative title “Six Reasons Fake News is the End of the World as we Know It“. I call it provocative because, whether “fake news” is a new thing or not, has been discussed a lot lately. Not all of us agree on what it is, or how novel it is. Some point out that it is as old as newspapers, others see it as something that mainly appeared last year. Yet others doubt that it is even a phenomenon worth discussing and that, instead of fake news, we should talk instead about specific categories such as false news, misinformation, disinformation, and propaganda.

Accepting the challenge, I gave a talk with an equally provocative, I would like to believe, title:  “Artificial Intelligence, your brain, and other things you cannot trust about politics“. You can follow my talk in the video below, but let me give you a list of the “things” that I discussed in the talk:


I hope you find it interesting and do your own thinking about what we can trust when it comes to politics. Importantly, we need to figure out how to solve the problems of online misinformation and propaganda that seem to be all around us these days.

Or, to learn how to live with them, which is what I think will happen.

The Real “Fake News”


The following is a blog post that Eni Mustafaraj has recently published in The Spoke. We reproduce it here with permission.


Fake news has always been with us, starting with The Great Moon Hoax in 1835. What is different now is the existence of a mass medium, the Web, that allows anyone to financially benefit from it.

Etymologists typically track the change of a word’s meaning over decades, sometimes even over centuries. Currently, however, they find themselves observing a new president and his administration redefine words and phrases on a daily basis. Case in point: “fake news.” One would have to look hard to find an American who hasn’t heard this phrase in recent months. The president loves to apply it as a label to news organizations that he doesn’t agree with.

But right before its most recent incarnation, the phrase “fake news” had a different meaning. It referred to factually incorrect stories appearing on websites with names such as or that mushroomed in the weeks leading up to the 2016 U.S. Presidential Election. One such story—”FBI agent suspected in Hillary email leaks found dead in apparent murder-suicide”—was shared more than a half million times on Facebook, despite being entirely false. The website that published it,, was operated by a man named Jestin Coler, who, when tracked down by persistent NPR reporters after the election, admitted to being a liberal who “enjoyed making a mess of the people that share the content”. He didn’t have any regrets.

Why did fake news flourish before the election? There are too many hypotheses to settle on a single explanation. Economists would explain it in terms of supply and demand. Initially, there were only a few such websites, but their creators noticed that sharing fake news stories on Facebook generated considerable pageviews (the number of visits on the page) for them. Their obvious conclusion: there was a demand for sensational political news from a sizeable portion of the web-browsing public. Because pageviews can be monetized by running Google ads alongside the fake stories, the response was swift: an industry of fake news websites grew quickly to supply fake content and feed the public’s demand. The creators of this content were scattered all over the world. As BuzzFeed reported, a cluster of more than 100 fake news websites was run by individuals in the remote town of Ceres, in the Former Yugoslav Republic of Macedonia.

How did the people in Macedonia manage to spread their fake stories on Facebook and earn thousands of dollars in the process? In addition to creating a cluster of fake news websites, they also created fake Facebook accounts that looked like real people and then had these accounts subscribe to real Facebook groups, such as “Hispanics for Trump” or “San Diego Berniecrats”, where conversations about the election were taking place. Every time the fake news websites published a new story, the fictitious accounts would share them in the Facebook groups they had joined. The real people in the groups would then start spreading the fake news article among their Facebook followers, successfully completing the misinformation cycle. These misinformation-spreading techniques were already known to researchers, but not to the public at large. My colleague Takis Metaxas and I discovered and documented one such technique used on Twitter all the way back in the 2010 Massachusetts Senate election between Martha Coakley and Scott Brown.

There is an important takeaway here for all of us: fake news doesn’t become dangerous because it’s created or because it is published; it becomes dangerous when members of the public decide that the news is worth spreading. The most ingenious part of spreading fake news is the step of “infiltrating” groups of people who are most susceptible to the story and will fall for it.  As explained in this news article, the Macedonians tried different political Facebook groups, before finally settling on pro-Trump supporters.

Once “fake news” entered Facebook’s ecosystem, it was easy for people who agreed with the story and were compelled by the clickbait nature of the headlines to spread it organically. Often these stories made it to the Facebook’s Trending News list. The top 20 fake news stories about the election received approximately 8.7 million views on Facebook, 1.4 million more views than the top 20 real news stories from 19 of the major news websites (CNN, New York Times, etc.), as an analysis by BuzzFeed News demonstrated. Facebook initially resisted the accusation that its platform had enabled fake news to flourish. However, after weeks of intense pressure from media and its user base, it introduced a series of changes to its interface to mitigate the impact of fake news. These include involving third-party fact-checkers to assign a “Disputed” label to posts with untrue claims, suppressing posts with such a label (making them less visible and less spreadable) and allowing users to flag stories as fake news.

It’s too early to assess the effect these changes will have on the sharing behavior of Facebook users. In the meantime, the fake news industry is targeting a new audience: the liberal voters. In March, the fake quote “It’s better for our budget if a cancer patient dies more quickly,” attributed to Tom Price, the Secretary of Health and Human Services, appeared on a website titled US Political News, operated by an individual in Kosovo. The story was shared over 80,000 times on Facebook.

Fake news has always been with us, starting with The Great Moon Hoax in 1835. What is different now is the existence of a mass medium, the Web, that allows anyone to monetize content through advertising. Since the cost of producing fake news is negligible, and the monetary rewards substantial, fake news is likely to persist. The journey that fake news takes only begins with its publication. We, the reading public who share these stories, triggered by headlines engineered to make us feel outraged or elated, are the ones who take the news on its journey. Let us all learn to resist such sharing impulses.

Two rumors about the downing of a Russian warplane by Turkey


News of Turkish airplane shooting down a Russian one over the Turkish-Syrian border has dominated the news and the social media lately. We investigated the rumor within hours after it appeared (24 Nov. 2015) and you can see the results of the analysis here:

This was not the first time a rumor of this kind emerged. About a month and a half ago (10 Oct. 2015) an identical rumor had emerged. We had investigated that rumor too and you can see the results of our analysis here:

Russian jet downing rumors

As you can see, based on the crowd’s reaction to the rumors, TwitterTrails was able to determine that the October rumor was false while the November one was true. The false rumor did not spread much and had a lot of skeptical tweets questioning its validity. On the other hand, the true rumor spread much higher and in terms of skepticism was undisputed.

Our understanding of the way the “wisdom of the crowd” works is that, when unbiased, emotionally cool observers see a rumor that seems suspicious, they usually react in one of two ways: They either do not retweet it, reducing its spread, or they may respond questioning the validity of the rumor, resulting in higher skepticism.

This is something we see often in the stories we investigate on TwitterTrails. Our understanding of the way the “wisdom of the crowd” works is that, when unbiased, emotionally cool observers see a rumor that seems suspicious, they usually react in one of two ways: They either not retweet it, reducing its spread, or they may respond questioning the validity of the rumor, resulting in higher skepticism.

When plotting the true and false rumors (after they have been verified through journalists’ work), the following image emerges:

spread-vs-skepticismIt is not a 100% separation, but one can see that the false rumors (marked by red triangles) show low spread and high skepticism, while the true ones show high spread and low skepticism. The picture is of course muddled in the lower corner. A rumor that does not attract much attention did not have the opportunity to benefit from the “wisdom of the crowd” and thus cannot be determined by our system.


Note: This posting originally appeared on our TwitterTrails blog.

False rumors do not propagate like True ones


On Twitter, claims that receive higher skepticism and lower propagation scores are more likely to be false.
On the other hand, claims that receive lower skepticism and higher propagation scores are more likely to be true.

The above is a conjecture we wrote in a recent paper entitled Investigating Rumor Propagation with TwitterTrails (currently under review). Feel free to take a look if you want to know more details about our system, but we will write here some of its highlights.

As you may know if you have read our Twitter Trails Blog before, we are developing a Web service that, starting from a tweet or a set of keywords related to a story propagating on Twitter (or a hashtag), it will investigate it and answer automatically some of the basic questions regarding the story. If you are not familiar, you may want to take a look at some of the posts. Or, it can wait until you read this one.

Recently we deployed a site containing the growing collection of stories and rumors that we investigate. Its front end looks like this:


This is the “condensed view” which allocates one line per story, 20 stories per page. There are over 120 stories collected at this point. Clicking on a title brings you the investigation page with lots of details and visualizations about its propagation, its originator, how it burst, who supports it and who refutes it.

Note that on the right side of the condensed view we automatically compute two metrics:

  • The propagation level of a story. This is a logarithmic scale of the h-index of a tweet collection that has currently 5 levels: Extensive, High, Moderate, Low and Insignificant.
  • The skepticism level of a story. This is the ratio of tweets with negated propagation over tweets with no negated propagation. It has four levels: Undisputed, Hesitant, Dubious and Extremely doubtful.

The initial quote at the top of this post refers to these metrics.

There is also a more detailed,  “main view” of TwitterTrails:


In the main view there are additional tools to select stories, based on time of collection, particular tags, levels of propagation and skepticism or keywords.

A few weeks ago we gave a presentation of TwitterTrails at the Computation and Journalism 2014 symposium at Columbia University in NYC. There is a video of our presentation that you can view if interested. In this presentation we noted that false rumors have different pattern of propagation on Twitter than true rumors. Below is a graph that shows that difference.


The graph displays propagation levels vs skepticism levels, and the data points are colored depending on whether a rumor was true (blue), false (red) or something else (green) that cannot be categorized as true or false (e.g., reference to an event or a tweet collection based on a hashtag). The vast majority of the false rumors show insignificant to low propagation while at the same time their level of skepticism ranges from dubious to extremely doubtful.

This is remarkable, but it may not be too surprising. As we write in the paper, “Intuitively, this conjecture can be explained as an example of the power of crowd sourcing. Since the ancient times philosophers have argued that people will not willing do bad unless they are guided by irrational impulses, such as anger, fear, confusion or hatred. Therefore, the more people see some false information, the more likely it is that they will either raise an objection or simply decide not to repeat it further.

We make the conjecture specific for Twitter because it may not hold for every social network. In particular, we rely on the user interface for promoting an objection to the same level as the false claim. Twitter’s interface does that; both the claim and its negation will get the same amount of real estate in the a user’s Twitter client. On the other hand, this is not true for Facebook, where a claim gets much greater exposure than a comment, while a comment may be hidden quickly due to follow up comments. So, on Facebook most people may miss an objection to a claim.”

Take a look at and tell us what you think!
We would also be happy to run an investigation for you, if interested.

(This is copy of a blog post on the site.)


Defending your Domain Name


I recently had the uninvited opportunity to defend my domain name,, and I am writing this post because it may be useful to others who find themselves in such a position. The good news: You do not need to be a lawyer to do it yourself and the arbitration system works reasonably well. The bad news: You need to do a bit of reading and writing.

I registered and own since 1999. Back then it was essential to own a domain name since, at that time, there were no services to upload and share your photos, no easy ways to have email addresses for you or your family members, and the only clouds around were still up in the sky unable to store any files you may need while away from your office.

As the Web services expanded to cover everything under the sun, the above uses became secondary, and I started utilizing the domain name for other reasons. In 2004 I had started a line of research to discover how Web Spammers succeeded in gaming search engines and place their bogus postings in the top-10 page of relevant search results. Back then it was thought that the “PageRank” algorithm was like “42”: the answer to everything (related to the Web). We now know that search engines can be gamed, and that they spend considerable resources to avoid Web Spam.

[Side note: My research led me to discover the reasons that search engines can be gamed —  pretty much for the same reasons we, humans, can be fooled. Web Spammers were using techniques very similar to the propagandistic techniques that politicians, advertisers and financial criminals are using to persuade us to vote for something, buy something or invest in something. I presented my work initially at AIRWeb 2005 (Web Spam, Progaganda and Trust) but if you are interested to read more you should check the journal version (Web Spam, Social Propaganda and the Evolution of Search Engine Rankings).]

Anyway, to implement a technique of discovering Web Spam sites I needed a method to evaluate similarity between Web site contents. It helped to have a Web site containing text that is unbiased towards a particular theme or product. Since there was nothing available online, I uploaded onto my own site a large collection of Associated Press news that were used by the TREC community. If you visited the top directory of you would be surprised to find huge files containing old news. But it was very unlikely that you would visit it. I did not include in any search engine listing, so it would never appear in your search results. Though I never planned to use as text repository, over the years it proved very useful. Many of my students used it to run their research projects, and after graduation some of them ended up working at Microsoft Bing and Google Search, fighting Web Spam on a regular basis.

I continued to own the domain name paying the fees on time, so it was a complete surprise when I received a letter in early April about a dispute filed with the World Intellectual Property Organization (WIPO) by Remy-Cointreau Luxemburg, the well known liqueur producing company. They wanted to take over my domain name! Remy-Cointreau had bought a well-known Greek Spirits company, METAXA, and over the years they had started buying every domain name that contained the string “metaxa”  (including “metaxaswineestate”!) Given my research, I knew exactly why they were doing that: They wanted to “persuade” the search engines that any search of the term “metaxa” should lead to their official site only! They knew how to fool PageRank and were doing it legally too. Now they wanted to fool WIPO’s arbitrators. In fact, as they claimed in the Complaint they served me,

Indeed, the term METAXA® is only known in relation to the Complainant. It has no meaning whatsoever in English or in any other language. A Google search on the term METAXA® displays several results, all of them being related to the Complainant

Wow. Three lies in three sentences. Anyone who knows just a bit of modern Greek history or checks Wikipedia knows that the name Metaxas is not that rare. (In Greek, the female version of a name, or a reference to the family name itself, does not include the final “s”.) One can Google translate metaxa to see that it means “raw silk” and “silk trader” in Greek, depending on the intonation. And the first page of Google search results does not constitute a proof of unique association. (Yet, even their own submitted screenshot included other METAXA references!)  And these were not the only “inaccurate claims” or logical fallacies in the Complaint. There were about a dozen of them. You can see a more detailed list (though not exhaustive) in my Response to their Complaint.

I was doubly stunned. Using such claims were ridiculing the WIPO, the body that they were asking to support them. How can they be so arrogant insulting WIPO’s arbitrators’ intelligence? Wouldn’t they expect that the Responder would point out their lies?

Probably not. My guess is that they expected that nobody would respond to their Complaint and they would win an uncontested case. You see, when a couple years ago we changed the domain record hosting company, the WHOIS information was not updated correctly, and it showed that the administrator was to be reached at … Seems reasonable to expect that whoever owned the domain would not be reached, and so they could snatch it without contest. It would cost them a few thousand dollars, but for a company with deep pockets, that would not be a problem.

Unfortunately for them, due to a billing inquiry, I did get informed. Legalese never being part of my tongues, I turned to the wonderful Berkman community for advice. And the advice poured in immediately. Several Berkmanites, and primarily Prof. Jacques de Werra of the University of Geneva and Faculty Associate at the Berkman Center this academic year, suggested literature, gave me references to other relevant cases, recommended experts in the field, offered advice on my options. They even pointed out to other unfair activities that the company was involved in the past. In particular, a song written for the company’s ad campaign was stolen from Berkmanite musician Erin McKeown – you can read all about it at TechDirt Case study A Perspective On The Complexities Of Copyright And Creativity From A Victim Of Infringement.

Onto the technical part. It turns out that one can defeat a Complaint by convincing the arbitrators that any of these elements are not present:

  1. your domain name is identical or confusingly similar to a trademark or service mark in which the complainant has rights;
  2. you have no rights or legitimate interests in respect of the domain name; and
  3. your domain name has been registered and is being used in bad faith.

The detailed policy and rules (called a UDRP, Uniform Domain-Name Dispute-Resolution Policy) are listed here:

In my case, the first element could not be countered: There were good reasons why my domain name was identical to theirs. But I could convince the arbitrators that the Complaint was wrong for both the second and the third element. That I have legitimate rights, was straight forward: I had to just point out the lies within their claims. But you need to provide a complete counter argument. It is not enough to point out a lie and expect that the arbitrators will go looking for its validity. You have to provide it yourself. With screenshots, excerpts, clear arguments. And you better be exhaustive in your arguments because you may only get one shot. For example, even though I was quite sure that I could nullify the second element, I should better also nullify the third one, just in case. For the third element, I had to go digging for references and receipts showing that I was always the owner of the account and had used it in good faith. Showing “good faith” was important, as this is a main reason why UDRPs exist: to curb the efforts of cyber-squatters who buy domain names only to sell them to the higher bidder of competing companies.

While I was at it, I wanted to point out that the Complaint itself was not filed in good faith. Remy-Cointreau really did not really have a case. Only through the dozen of lies in the filing they could put a case together. I would love to get the arbitrators acknowledge the Company’s filing with bad faith. There are no penalties associated with such an acknowledgement, but future arbitrators may take it into account in the future.

At the end the arbitrators denied the Remy Cointreau – Metaxa Complaint, stopping at the fact that the second element was not proven by the company. In addition to the company losing a case and a few thousand dollars, they lost the opportunity of persuading search engines about the unique association of their trademark. Now that this is an officially recorded WIPO UDRP case, it may help reducing the number of future frivolous Complaints.

PS. I recently found a good guide on How to Choose the Right Domain Name, I hope you will find it useful.



Looking beyond “Big Data” analysis to discover those who make a difference


In an earlier post (Trusting Anonymous Twitter Users) I wrote about how ordinary citizens in Mexico are using Twitter to stay informed about areas of immediate risk in their cities. In our social media research we saw some anonymous Twitter accounts begin to amass large numbers of followers as they gained repute as trusted sources in the dissemination of information related to shootings, explosions and areas of danger in some Mexican cities. If you are not familiar with this earlier blog post you may want to take a look at it since I am about to describe the rest of the story as we discovered and recently published it (The Rise and the Fall of a Citizen Reporter) at the WebScience 2013 conference.

The limits of “Big Data”

While the data and the narrative we presented in the paper “Hiding in Plain Sight: A Tale of Trust and Mistrust Inside a Community of Citizen Reporters” were very interesting, my co-authors and I had the feeling that we had not discovered the full story. For one thing, who really was @GodFather, the person behind the pseudonym we had created for the prominent account in our data? Was it a real person? What if it was merely one of the successful tweet-bots that researchers have launched in the past? Or, maybe it was some guy tweeting from Scotland posing as a young woman living in Mexico. Importantly, what about the accusation that she was not really interested in the well-being of her community, but was instead working for the Zetas, the criminal drug cartel that has been accused of some of the more heinous crimes of the Mexican drug-war. Was there any truth to it?

Furthermore, there were several events that we had discovered and had not written in the paper or the blog post. Looking at the aggregate data, my co-author Eni Mustafaraj and I discovered some important developments in the lives of these citizen reporters: Shortly before the accusation against @GodFather appeared, the City had seen a lot of violence and the authorities had failed to act quickly. @GodFather had tried to organize an informant movement of  “eagles” (aguilas) on Twitter to report on the actions of the “hawks” (halcones). Hawks is the name given to low-level cartel associates working on street corners using cellphones to communicate with their bosses. These hawks are seen as important actors informing cartels about the movement of the Mexican Army and Navy so they can escape after an attack. Therefore, another distinct possibility was that @GodFather was accused because she was becoming annoying to a specific cartel.

Events in the timeline of @GodFather’s activity in our data indicated a reduction of her activity in early April, 2011. The activity of those mentioning and retweeting her also shows a similar pattern.


What was really happening? Was @GodFather one of the prominent citizen reporters informing the people about areas they should avoid on any given day? Was she a traitor working for the Zetas? Or perhaps a fake account? Why was she attacked, and why did she subsequently stop tweeting? Was she still tweeting from another account name or had she disappeared from the community?


Separating Retweets from Mentions

Another interesting data visualization separating retweets of @GodFather’s messages (in blue) from mentions of her name (in red). While in the first half of the graph her tweets (in green) seem to be echoed by the community, in the second half things change. At that time people are mainly talking about her, not echoing what she tweets.


Using a Berkman talk to make the connection

Though we wanted to find out more, our big data analysis was not helping much. We needed verification on the ground. But we could not contact @GodFather directly (we figured that, “Hi, I am a researcher from the US and would like to verify your identity…” would not take us far). We knew that her account had been compromised in the past, so she had every reason to hide her identity. Moreover, there existed several accounts with similar-sounding names, some of them clearly belonging to trolls attacking her, and we did not want to end up talking to them by accident!

How could we uncover the truth? The Berkman Center and a measure of good luck helped us make a breakthrough. In July, 2012, the Berkman Center asked my co-author Andrés Monroy-Hernández and me to give a talk (“Narcotweets: Reporting on the Mexican Drug War using Social Media”) on our earlier work. I knew that Berkman talks are advertised, attended and tweeted widely online. Though not very likely, it was possible that some “tuiteros” from Mexico would follow our talk live. If I told them what we had discovered, even using pseudonyms, members of the citizen reporter community would certainly recognize the real identities to which the pseudonyms referred, and perhaps they would be willing to talk to us.

Indeed, by the end of the talk (available for viewing), Mariel Garcia, a Berkman intern from Mexico who was tweeting about the talk, showed me a couple of tuiteros accounts that had shown active interest in the talk. They were offering to answer any questions I might have. Of course I jumped on the opportunity; a few hours and many direct messages later I had established connection with one of the prominent citizen reporters of the community.

From that citizen reporter Eni and I learned that we had missed an important point in the data analysis. One of the reasons that @GodFather had stopped tweeting was that her anonymity had been compromised in late July, 2011. One of the trolls that had been attacking her throughout the year revealed her real name, her street address, and her picture. Now that we knew where to look, we went back to the data and found the relevant tweets. Her pictures had been deleted on the Web but we were able to look through archives and locate several of them. Now that we knew a lot about Melissa Lotzer, the pseudonym used the by the owner of the @GodFather account, all we needed was a way to contact her. We wanted to interview her about her motives and threats she had received.

For reasons that will soon become apparent, we can reveal some details about the community we were studying. Our community of Twitter users is located in Monterrey, Mexico, and they have been using the tag #MTYfollow to stay informed about dangerous situations in their city. The prominent citizen reporter, @trackMTY (aka @GodFather) was owned by a young woman who, like many such reporters, spent many hours a day informing and being informed by her sources. Melissa Lotzer (not her real name, but the one with which she is known in the community) became an active citizen reporter in March 2010, shortly after the #MTYfollow tag was adopted by the community. The drug war had hit the town of Comales, in the neighboring Tamaulipas region, where a drug cartel was reportedly holding some citizens hostage. Melissa and some of her some friends formed a Facebook group, Mexico Nueva Revolucion, and sent an open letter to President Cardenas begging for him to send the Army to free Comales. Following the discussion on various blogs, we see that Melissa and the MNR group received credit for their initiative.

But not everyone in the community was happy with these developments; Melissa’s accounts were attacked several times by trolls. But by early 2011, her reputation in the community was strong enough that Twitter shut down some of the trolling accounts after the outcry of the community. Her later initiative to organize the aguilas movement, however, was not as successful. While more than 80 aguilasMTY accounts were created within 2 days (!) ready to support her cause, many of her old friends did not follow her in this movement. Renewed troll attacks and troll collaboration with an editor of the famous Blog del Narco proved to be too strong for Melissa’s reputation to withstand.


Some of the aguilasMTY accounts that were created within a couple of days in late March 2011 at the call of trackMTY

We connected with Melissa and established a trusted two-way connection. We were able to verify her identity not only from the pointers of other citizen reporters, but also because we could go back and verify her claims through our tweet corpus. You can read more about our interviews with her in the later sections of the paper The Rise and the Fall of a Citizen Reporter, and you can find our slides from the WebScience 2013 talk online.

Communities of Citizen Reporters.

In recognizing Melissa we recognize the thousand of other citizen reporters who spend long hours daily informing their fellow citizens about important and dangerous events unfolding in their cities and neighborhoods. Like most of the citizen reporters involved in supporting the communities of Monterrey, Saltilo, Reynosa, Veracruz and elsewhere, she is an idealist who wants to help others. Her experience has made her stronger despite the risk to which she has been exposed. Even after all her experience she would choose to do it all over again because, as she says:

I’m completely sure that trackmty was the reason why many people started using twitter. I receive comments daily by followers that are opening a twitter account to a family member just to follow me […] They tell me: please take care of my mom, she will be reading your tweets, she will not be reporting cases because she doesn’t know how to use a blackberry. Many similar cases like that happen every day.

Voice of Melissa Lotzer (@trackMTY) Click the play button to hear.


PS. We also found out more about the identity of one of Melissa’s trolls: A young clerk at a local policy station inspired by WWF characters and with a hobby of posting photographs of prostitutes and gays on his blog.




Trusting Anonymous Twitter Users


Can we trust anonymous Twitter users? Before writing this paper with my colleagues Eni Mustafaraj, Samantha Finn and Andrés Monroy-Hernández, I would think that it was not impossible. But this is the theme of the paper that Andrés is presenting this week at ICWSM 2012:

Hiding in Plain Sight: A Tale of Trust and Mistrust inside a Community of Citizen Reporters

Below is a brief description of our findings. (It may look a bit impersonal because it is extracted from the contents of a poster we created, but you will get the idea.)

The contributions of this paper can be described as follows:

  1. To the best of our knowledge, this paper presents the first analysis of the practices of a community of Twitter citizen reporters in a life-threatening environment over an extended period of time (10 months).
  2. We discover that in this community, anonymity and trustworthiness are coexisting. Because these citizens live in a city troubled by the narco-wars that have plagued Mexico since 2006, it is a great example of a community where anonymity of active participants is crucial, while lack of anonymity may be fatal.
  3. We describe a series of network and content based features that allow us to understand the nature of this community, as well as discover conflicts or changes in behavior.


The large volume of user-generated content on the Social Web puts a high burden on the participants to evaluate the accuracy and quality of content.
We usually rely on known reputed news sources (NPR, NYT, BBC, Der Spiegel, etc.) to evaluate them. However, not every country has a free press or is willing or able to allow the international press to move freely. In some countries, like Mexico, journalists have been killed by organized crime or put under pressure by the authorities to stop reporting on certain events.

In the era of social Web, more citizens are reporting of newsworthy issues gaining reputation as citizens-reporters.
However, not everywhere in the world is there a right to and protection of free speech. In countries where the traditional media cannot report the truth, anonymity becomes a necessity for citizens who want to exercise their right of free-speech in the service of their community.

Is it possible for anonymous individuals to become influential and gain the trust of a community? Here, we discuss the case of a community of citizen reporters that use Twitter to communicate, located in a Mexican city plagued by the drug cartels fighting for control of territory.

Our analysis shows that the most influential individuals inside the community were anonymous accounts. Neither the Mexican authorities, nor the drug cartels were happy about the real-time citizen reporting of crime or anti-crime operations in an open social network such as Twitter, and we discovered external pressures to this community and its influential players to change their reporting behavior.


When we read news, we usually choose our information sources based on the reputation of the media organization. We trust the news organizations, therefore, we expect that their reporting is credible, though in the past there have been breaches of such trust, and all media organizations have an embedded bias that affects what they choose to report.

Social media platforms specializing in organizing humanitarian response to disasters, such as Ushahidi, rely on people on the ground to report on situations that need immediate attention. Anyone can be a reporter.

However, this poses a new problem: how do we assess the credibility of citizen reporting?
Citizen reporting lacks the inherent structures that help us evaluate credibility as we do with traditional media reporting. But sometimes, citizen reporting might be the only source of information we might have.
How can we use technology to help us verify the credibility of such reports?


To address this question we look at a particular community of citizen reporters gathered around Twitter accounts in a Mexican city plagued by drug-related violence.

Twitter has a unique feature that facilitates on-the-fly creation of communities: the hyperlinked hashtags. While previous research has shown that the majority of Twitter hashtags have a very short half-life span (Romero, Meeder, and Kleinberg 2011), in this paper we analyze the practices of a community of citizens that have been using the same hashtag since March 2010 to report events of danger happening in their city.

We refer to the community with the obfuscated hashtag #ABC_city, which is a substitute for the hashtag present in the tweets of our corpus. We will also substitute the exact text of important tweets with a translation from Spanish to English, so that searching online or with the Twitter API will not lead to unique results.


Through research we discovered the birth of the community defined by the hashtag #ABC_city : The following tweet mentioning #ABC_city for the first time was the inaugural one, on March 19, 2010, by a not-particularly-active member:

#YXZ_city #ABC I propose #ABC_city to inform about news and important events in our city.

Then, this user reused the new hashtag many times in the following days together with #old_ABC hashtag and others, in order to spread its use:

@userA shootings are being reported in [address] (good source) #ABC #old_ABC #ABC_city #XYZ_city

In May 11, 2010, the same user who created the hashtag tweeted the following:

@Spammer101 Stop spamming #ABC_city. It’s only about important events that might affect our society.

Between May and November 2010 the usage of the hashtag is sparse, with the old hashtags being used more often. An increase in the adoption of #ABC_city starts on November 4th, only a week before the starting period of the #ABC_city dataset.


We used a basic dataset and a supplemental collection informed by our initial set of data.

The original dataset consists of 258,734 tweets written by 29,671 unique Twitter users, covering 286 days in the time interval November 2010 – August 2011.
On November 2010 we provided a set of keywords related to Mexico events to the archival service. The collection was later divided in separate datasets according to the presence of certain hashtags.

To supplement our limited original dataset, we performed a series of additional data collection in September, 2011. In particular, we collected all social relations for the users in the current dataset, as well as their account information.
We collected all tweets for accounts created since 2009 with less than 3200 tweets, in order to discover the history of the (anonymized) hashtag #ABC_city that defines the community we are studying.
We also made use of the dataset described in (O’Connor et al. 2010) to locate tweets archived in 2009.


While we would prefer to give further details on the collected data and use them freely in this paper, on ethical grounds, we will protect this community under anonymity, due to potential risk that our research can pose now or in the future. To exemplify the seriousness of the situation, we provide one example out of the many documented in the press of what the lack of anonymity can lead to.
On September 27, 2011, the Mexican authorities found the decapitated body of a woman in the town of Nuevo Laredo (near the Texas border) with a message apparently left by her executioners, which starts this way:

“OK, Nuevo Laredo en Vivo and social networking sites, I’m The Laredo Girl, and I’m here because of my reports, and yours, …”

Laredo Girl was the pseudonym used by the woman to participate in a local social network that enabled citizens to report criminal activities.

THE ACCOUNT @GodFather                          

Followee Relations Out of 29,671 unique users in the corpus, we were able to collect followee information for 24,973 accounts that were active and public in September 2011 (84% of all users in the corpus). There are more than 8,5 million followee links, with an average of 336 followees per user and a median of 162 followees. The total number of unique followees is almost 1,7 million.

Ranking the followees based on the number of relations inside this #ABC_city community serves as an indicator of the attention that this community as a whole pays to other Twitter users. We inspected the top 100 accounts to understand the nature of their popularity. The top account was Mexico’s president, Felipe Calderon, followed by the TV news program of the city, and an anonymous citizen reporter to whom we will refer as @GodFather. Four journalists, the city’s newspaper, a famous Mexican poet, and a comic’s character made up the rest of top ten. Almost half of the accounts in the top 100 are entertainers of Mexican fame, with only a few international superstars such as Shakira or Lady Gaga in the mix. This statistic confirms the widespread perception that a large part of the Twitter appeal derives from its use by celebrities, though it also indicates that each community is interested in its own celebrities. 25 of top 100 most followed accounts belong to local and national journalists and media organizations, compared to 10 for politicians at the state and federal level. In fact, the governor of the state in which ABC city is located (Mexico is a federation of 31 states) ranks at the 45th position in the followees list, one place behind the account of Barack Obama.

To understand the appeal to the community of the top 100 ranked accounts, we inspected their Twitter profiles. The top account, @GodFather, has 9,079 followers inside the community, or 36% of all active members. This amounts to 16% of all his audience, he has in total 57,127 followers. @GodFather is an anonymous citizen who has written the largest number of tweets in the corpus (6,675), which make up 25% of all his statuses (26,340).

A mutual-follow relation in Twitter (the friendship) is significant because it enables the involved accounts to send direct messages to one another. Direct messages offer some privacy to users, though if an account is hacked messages are compromised (unless a user has the habit of deleting them). Communication through direct messages is not visible to researchers or the public and cannot be quantified. However, it is possible to quantify the extent to which such strong ties exist inside the community by discovering mutual links in the sets of followers and followees. As shown below, on average, 40% of user relations are reciprocated.

The normal-like histogram of reciprocal link distribution of friendship relations (mutual links) in the network of the #ABC_city corpus.

The next figure shows the graph of all members with more than 75 friendship links which only reinforces the conclusion that this is a tightly connected community of users. (We limited the number of nodes for computational reasons)

The graph of all members with more than 75 friendship links. Coloring is produced automatically by the Gephi modularity algorithm that finds communities in a network using the Louvain algorithm.


Past research has shown that retweeting is indicative of agreement between the original sender and the retweeter (e.g., (Metaxas and Mustafaraj 2010; Conover et al. 2011)). Over time, retweets are effectively providing information about a community of social media users that are in agreement on specific issues. Otherwise, the chance of a community member retweeting a message of an opposing political community is under 5%.

Since retweets involve a relation between two users, the original sender and the retweeting user, we can create a network of such relations for all retweets in the corpus. This retweet graph is shown below.

The retweet graph reveals a large component that is actively involved in retweeting, with smaller star-like components at the fringes. Closer examination reveals that the stars at the fringes were occasional retweeters of famous users (e.g., entertainers) and could easily be identified and excluded from our analysis. The nodes have been drawn in size relative to their in-degree, that is to the degree that their messages had been retweeted, revealing a small number of accounts that rose to prominence in the community.

Zooming in inside this graph reveals the most influential nodes in the community, which we identified as the anonymous citizen reporters. The biggest node belongs to @GodFather.

A closer look at the core of the community reveals 13 nodes that have a larger share of their messages retweeted. The spatial proximity of these nodes determined by a force-directed algorithm indicates that they were also retweeting each other (as opposed to the nodes in the periphery of the retweet graph). The biggest node belongs to @GodFather.


Tweeting activity of three groups of users with different tweeting patterns overlaid with the frequency of appearance for the word “balacera” (shooting). All three groups have an increase in activity, matching the ups of the balacera distribution. There is only one discrepancy, in April-May 2011, related to an event explained in the next section.


Daily distribution of tweets for the anonymous account @GodFather and its daily mentions in tweets by other members of the community. In April 2010, he was accused by newly created anonymous accounts of working for the criminal organization. After that event, he decreased his involvement in the community and at the end of July stopped tweeting altogether.


In a time when social networking platforms such as Facebook and Google+ are pushing to force users to assume their real-life identities in the Web, we think that it is important to provide examples of communities of citizens for which maintaining their anonymity inside such networks is essential. But being anonymous makes one more susceptible to denigration attacks from other anonymous accounts, leaving the other members of community in the dilemma of who to trust.

Inside a community, even anonymous individuals can establish recognizable identities that they can sustain over time. Such anonymous individuals can become trustworthy if their efforts to serve the interests of the community remain constant over time.

Το κοστος της αναξιοπιστιας


Δεν ειναι μονο η ελλειψη εμπιστοσυνης που εχουμε μεταξυ μας αυτες τις μερες, οπως εγραψα προηγουμενως. Υπαρχει και μια τεραστια ελλειψη εμπιστοσυνης που εχουν οι αλλοι Ευρωπαικοι λαοι προς εμας. Οι προηγουμενες κυβερνησεις, δημοκρατικα εκλεγμενοι εκπροσωποι του Ελληνικου λαου, ειπαν κατ’ επαναληψη ψεμματα στους εαυτους τους, το λαο, και στον υπολοιπο κοσμο. Αυτα τα ψεμματα οδηγησαν στα χαμηλα ποσοστα των κομματων τους στις τελευταιες εκλογες. Τα νεα κομματα που εμφανιστηκαν να κερδιζουν ψηφους απο την οργη του Ελληνικου λαου, ο ΣΥΡΙΖΑ, οι Ανεξαρτητοι Ελληνες και η Χρυση Αυγη, δεν δειχνουν να εχουν την δυνατοτητα να κερδισουν την εμπιστοσυνη των Ευρωπαιων (ουτε καν των περισσοτερων Ελληνων).

Η αξιοπιστια στις ανθρωπινες σχεσεις ειναι βασικη προυποθεση για την καλη λειτουργια μιας κοινωνιας. Σκεφτειτε ποσο δυσκολο ειναι για καποιον που εχει “βγαλει κακο ονομα” να το αντιστρεψει. Η παροιμια “καλλιο να σου βγει το ματι παρα το ονομα” στοχευει να τονισει ακριβως αυτο το κοστος της αναξιοπιστιας.

Ταξιδευω πολυ συχνα στην Ευρωπη και εχω δει δειγματα αυτης της ελλειψης εμπιστοσυνης προς τους Ελληνες τον τελευταιο χρονο. Και ειμαι σιγουρος οτι οι Ευρωπαιοι ηγετες, οπως και οι πολιτες τους, τη λαβαινουν σοβαρα οταν σχεδιαζουν τις κινησεις τους για τα επομενα χρονια. Νομιζω οτι ο κυβος εχει ριφθει, και ολοι τους κυττανε πως θα απεμπλακουν απο αυτον που του “εχει βγει το ονομα” με το λιγοτερο κοστος. Θα τους κοστισει, σιγουρα, αλλα οσο περναει ο καιρος το κοστος ειναι μαλλον λιγοτερο, συγκρινομενο με το κοστος διατηρησης μιας αναξιοπιστης σχεσης.

Ενα κομμα που μπορει ισως να αναστρεψει αυτη τη κατασταση ειναι ο ΣΥΡΙΖΑ. Ο κ. Τσιπρας πηγαινει να συναντησει Ευρωπαιους ηγετες κομματων με τα οποια πιθανον συμφωνει, ελπιζοντας στην υποστηρηξη τους. Θα απογοητευτει οταν καταλαβει οτι η ανομια και τα πολιτικα παιχνιδια στα οποια τοσο πολυ εχουμε συνηθησει τα τελευταια 30 χρονια δεν ειναι αποδεκτα εκει. Και οι πιθανοτητες να οδηγηθουμε στην καταστροφη απο κινησεις πανικου πριν ακομα φτασουμε στις εκλογες ειναι πολυ μεγαλες. (Για μετα τις εκλογες, αν δεν αλλαξουμε συμπεριφορα, ειναι σχεδον 100%.) Οπως και οι προηγουμενοι “ηγετες” του ΠΑΣΟΚ και της ΝΔ, θα συν-χρεωθει αυτη την καταστροφη.

Τα αποτελεσματα της καταστροφης ομως, θα τα νοιωσουμε ολοι μας. Και δεν μιλαω μονο για την οικονομικη καταστροφη μεγαλου μερους του πληθυσμου (τα “λαμογια” εχουν προετοιμαστει για τον δευτερο γυρο του πλιατσικου, εχουν βγαλει τα λεφτα τους και περιμμενουν να τα πολλαπλασιασουν σε δραχμες). Μιλαω για την καταστροφη που θα επελθει λογω της αναξιοπιστιας που εχουμε χτισει με το χασμα αναμεσα στα λογια και τις πραξεις μας.

Αυτο θα το θυμομαστε ολοι, Ευρωπαιοι και Ελληνες, για χρονια.

Το κοστος της ελλειψης εμπιστοσυνης


“Μεγαλη ελλειψη εμπιστοσυνης σε μια κοινωνια […] επιβαλλει ενα ειδος φορολογιας σε καθε μορφης οικονομικη δραστηριοτητα, εναν φορο που οι κοινωνιες που διεπονται απο εμπιστοσυνη δεν χρειαζεται να πληρωσουν.”

Ταδε εφη Francis Fukuyama (απο το Trust: The Social Virtues and the Creation of Prosperity), αναφερομενος στο επιπλεον κοστος που η ελλειψη εμπιστοσυνης επιβαλλει σε μια κοινωνια. Το θυμηθηκα γιατι τις τελευταιες μερες βρισκομαι στην Αθηνα και διαπιστωνω την ελλειψη εμπιστοσυνης που διακατεχει τους συμπατριωτες μου στις καθημερινες τους δραστηριοτητες, συναλλαγες και συζητησεις. Απο τις κλειδωνιες στις πορτες και τις σιδεριες στα μπαλκονια, απο το χρονο μετακινησης αποφευγοντας καποιες συνοικιες, ως το χρονο που ξοδευουμε στις συζητησεις μας για να βεβαιωθουμε οτι ο συνομιλητης μας πιστευει πραγματικα αυτα που ισχυριζεται οτι πιστευει.

Διαπιστωνω οτι αυτη η ελλειψη εμπιστοσυνης με ποναει οσο και η οικονομικη κριση γιτι στοχευει πιο πολυ στην καρδια μας απο οτι στο πορτοφολι μας.




Replication, Verification and Availability for Big Data



The next step in the evolution of Social Computing Research: Formal acceptance of credit worthiness by the community of Replication, Verification, and Availability of Big Data.

In his response to my posting on Research Replication in Social Computing, Dr. Bernardo Huberman pointed to his letter to Nature on a related issue: Verification of results. Here I expand to include proposal that I have heard others mention recently.

I totally agree, of course, that “Science is unique in that peer review, publication and replication are essential to its progress.” This is what I also propose above. And he focuses on the need for having accessible data so that people can verify claims. For those who may not have access to his letter, I reproduce the central paragraph here:

“More importantly, we need to recognize that these results will only be meaningful if they are universal, in the sense that many other data sets reveal the same behavior. This actually uncovers a deeper problem. If another set of data does not validate results obtained with private data, how do we know if it is because they are not universal or the authors made a mistake? Moreover, as many practitioners of social network research are starting to discover, many of the results are becoming part of a “cabinet de curiosites” devoid of much generality and hard to falsify.”

Let me add something further, that I heard it mentioned by Noshir Contractor and Steffen Staab at the WebScience Track during the WWW2012 conference, that I think will complement the overall proposal: People who make their data available to others should get credit for that. After all, in Science a lot of time is spend collecting and cleaning data, and whose who do that and make their data available to other researchers for verification, meta-analyses and studying of other research questions should be rewarded for their contributions.

I believe the time is right to introduce formal credit for replication of results on comparable data sets, verification on the same data set, and for making data accessible to others for further and meta-analysis. I plan to use much of my group’s research time on these issues this summer and publish our findings afterwards.

Log in