You are viewing a read-only archive of the Blogs.Harvard network. Learn more.
Skip to content

Media Cloud Technical Lead Position


Media Cloud is hiring a new technical lead for the project.

This position is dear to my heart, because the new hire will take over many of my duties on the project, including most importantly driving the technical direction of the project, which includes an ambition slate of improvements to the system as we enter a new two year grant cycle. The position calls for a talented geek with experience wrangling large amounts of text / data, and someone who is comfortable sitting at a table with the other members of our amazing team and working out together what is technically possible and wise.

The position is a great opportunity for:

* an academic geek with some social science chops who is looking for opportunity to apply tech to our questions around the networked public sphere;
* a startup geek who is looking to move into academia to work on hard social problems with smart people;
* a text wrangling geek who is looking for ways to apply her skills to organizations working for social change;
* any smart, talented, experienced geek who would thrive in the atmosphere of Berkman/Harvard and CCM/MIT.

The full job description and application link are here:

If you know anyone who might fit for the job (or are such a person), please apply and/or email me with any questions.

Net Neutrality in the Networked Public Sphere


Earlier this week, I published a new paper on the net neutrality debate in the public sphere with my co-authors Rob Faris, Bruce Etling, Dalia Othman, and Yochai Benkler. In this paper, we use the Media Cloud Controversy Mapper to identify over 16,000 stories about net neutrality from 2013-12-01 to 2014-11-17. We use that data to perform a variety of types of analysis to explore the networks of language and influence within the controversy.

Our core finding is that the networked public sphere played a key role in changing FCC policy on net neutrality. Several key periods of debate during the controversy look a lot like traditional elite agenda setting (court makes a decision reported by national newspapers, FCC insider leaks upcoming ruling to the Wall Street Journal, etc.). But during the key months before Obama’s announcement of support for net neutrality, the dominant modes of agenda setting were a viral video, an online comment campaign, and an internet protest by pro-net neutrality activists.

Here are some of the network maps we drew for individual weeks:


Russian Blogs as an Alternative Public Sphere


I am proud to be releasing today with my co-authors Bruce Etling and Rob Faris an analysis of the Russian media evology, in which we show that blogs serve as a critical alternative public sphere in Russia. Here’s one of the key visualizations from the paper:

This diagram compares words used in the same sentence as the ‘egypt’ in Russian political blogs vs. mainstream media during the time of the Egyptian December 2010 revolution. You can see at a glance that the two spheres share only basic geography and some concern for tourists. Other than basic geography, the mainstream media are additionally interested in the attacks only through foreign ministry expressions of concern for Russian tourists. In strong contrast, the political blogs are interested in the events as a ‘revolution’ in the context of the larger topic of unrest in the Middle East.

Official blurb below.

Applying a combination of quantitative and qualitative methods, we investigate whether Russian blogs represent an alternative public sphere distinct from web-based Russian government information sources and the mainstream media. Based on data collected over a one-year period (December 2010 through December 2011) from thousands of Russian political blogs and other media sources, we compare the cosine similarity of the text from blogs, mainstream media, major TV channels, and official government websites. We find that, when discussing a selected set of major political and news topics popular during the year, blogs are consistently the least similar to government sources compared to TV and the mainstream media. We also find that the text of mainstream media outlets in Russia (primarily traditional and web-native newspapers) are more similar to government sources than one would expect given the greater editorial and financial independence of those media outlets, at least compared to largely state-controlled national TV stations. We conclude that blogs provide an alternative public sphere: a space for civic discussion and organization that differs significantly from that provided by the mainstream media, TV, and government.

Internet Censorship and Control


The Internet is and has always been a space where participants battle for control. The two core protocols that define the Internet – TCP and IP – are both designed to allow separate networks to connect to each other easily, so that networks that differ not only in hardware implementation (wired vs. satellite vs. radio networks) but also in their politics of control (consumer vs. research vs. military networks) can interoperate easily. It is a feature of the Internet, not a bug, that China – with its extensive, explicit censorship infrastructure – can interact with the rest of the Internet.

I’m proud to announce today the release of an open access collection of five peer reviewed papers on the topic of Internet Censorship and Control. These papers appear in the May issue of the IEEE Internet Computer magazine, but today we also make them available as an open access collection. The collection was edited by Steven Murdoch and me.

The topics of the papers include a broad look at information controls, censorship of microblogs in China, new modes of online censorship, the balance of power in Internet governance, and control in the certificate authority model. These papers make it clear that there is no global consensus on what mechanisms of control are best suited for managing conflicts on the Internet, just as there is none for other fields of human endeavour. That said, there is optimism that with vigilance and continuing efforts to maintain transparency the Internet can stay as a force for increasing freedom than a tool for more efficient repression.

Our Circumvention Research Does Not Support SOPA


Daniel Castro of The Information Technology & Innovation Fund recently published a paper supporting the Stop Online Privacy Act (SOPA) currently being debated in congress. In that report, he claims that research performed by us supports the domain name system (DNS) filtering mechanisms mandated by SOPA. This claim is a distortion of our work. We disagree with the use of our study to make the point that DNS-based Internet filtering works and that we should therefore use it as a means of stopping websites from distributing copyrighted content. The data we collected answer a completely different set of questions in a completely different context.

Among other provisions that seek to control the sharing of copyrighted material on the Internet, SOPA, if enacted, would call upon the U.S. government to require that Internet service providers remove from their DNS servers the names of any sites that either infringe copyright directly or merely “facilitate” copyright infringement. So, for example, the government could require that ISPs remove the name “” from their DNS servers if was not being sufficiently aggressive in preventing its users from tweeting information about places to download copyrighted materials. This practice is known as DNS filtering. DNS filtering is one of the most common modes of Internet-based censorship. As we and our collaborators in the OpenNet Initiative have shown over the past decade, practices of this sort are used extensively in autocratic countries, including China and Iran, to prevent access to a range of sites offensive to the governments of those countries.

Opponents of SOPA have argued that the DNS filtering, even though it will have a number of harmful effects on the technical and political structure of the Internet, will not be effective in preventing users from accessing the blocked sites. Mr. Castro cites our research as evidence that SOPA’s mandate to filter DNS will be effective. He quotes our finding that at most 3% of users in certain countries that substantially filter the Internet use circumvention tools and asserts that “presumably the desire for access to essential political, historical, and cultural information is at least equal to, if not significantly stronger than, the desire to watch a movie without paying for it. Yet only a small fraction of Internet users employ circumvention tools to access blocked information, in part because many users simply lack the skills or desire to find, learn and use these tools.”

In our report, we looked at three sets of censorship circumvention tools: complex, client-based tools like Tor; paid VPNs; and web proxies. We estimated usage of those three classes of tools. We used reports from the client tool developers, a survey to gather usage data from VPN operators and used data from Google Analytics to estimate usage of web proxy tools. Counting all three classes of tools, we estimated as many as 19 million users a month of circumvention tools. Given the large number of users in China, Iran, Saudi Arabia and other states where filtering is endemic, this represents a fairly small percentage of Internet users in those countries; 19 million people represents about 3% of the users in countries where Internet filtering is pervasive. We actually believe that 3% figure is high, as some of the tools we study are used by users in open societies to evade corporate or university firewalls, not just to evade government censorship.

We stand behind the findings in our study (with reservations that we detail in the paper), but we disagree with the way that Mr. Castro applies our findings to the SOPA debate. His presumption that people will work as hard or harder to access political content than they do to access entertainment content deeply misunderstands how and why most people use the Internet. Far more users in open societies use the Internet for entertainment than for political purposes; it is unreasonable to assume different behaviors in closed societies. Our research offers the depressing conclusion that comparatively few users are seeking blocked political information and suggests that the governments most successful in blocking political content ensure that entertainment and social media content is widely available online precisely because users get much more upset about blocking the ability watch movies than they do about blocking specific pieces of political content.

Rather than comparing usage of circumvention tools in closed societies to predict the activities of a given userbase, Mr. Castro would do better to consider the massive userbase of tools like bit torrent clients, which would make for a far cleaner analogy to the problem at hand. Likewise, the long line of very popular peer-to-peer sharing tools that have been incrementally designed to circumvent the technical and political measures used to prevent sharing copyrighted materials are a stronger analogy than our study of users in authoritarian regimes seeking to access political content.

Second, our research has consistently shown that those who really wish to evade Internet filters can do so with relatively little effort. The problem is that these activities can be very dangerous in certain regimes. Even though our research shows that relatively few people in autocratic countries use circumvention tools, this does not mean that circumvention tools are not crucial to the dissident communities in those countries. 19 million people is not large in relation to the population of the Internet, but it is still a lot of people absolutely who have freer access to the Internet through the tools. We personally know many people in autocratic countries for whom these tools provide a crucial (though not perfect) layer of security for their activist work. Those people would be at much greater risk than they already are without access to the tools, but in addition to mandating DNS filtering, SOPA would make many circumvention tools illegal. The single biggest funder of circumvention tools has been and remains the U.S. government, precisely because of the role the tools play in online activism. It would be highly counter-productive for the U.S. government to both fund and outlaw the same set of tools.

Finally, our decade-long study of Internet filtering and circumvention has documented the many problems associated with Internet filtering, not its overall effectiveness. DNS filtering is by necessity either overbroad or underbroad; it either blocks too much or too little. Content on the Internet changes its place and nature rapidly, and DNS filtering is ineffective when it comes to keeping up with it. Worse, especially from a First Amendment perspective, DNS filtering ends up blocking access to enormous amounts of perfectly lawful information. We strongly resist the claim that our research, and that of our collaborators, makes the case in favor of DNS-based Internet filtering.


Mr. Castro’s report may be found here:

with the reference to our work on p. 8.

The study that is being misused by Mr. Castro is here:

The findings of our decade-long studies are documented in three books,
published MIT Press and available freely online in their entirety at:

– John Palfrey, Jillian York, Rob Faris, Ethan Zuckerman, and Hal Roberts

Local Control: About 95% of Chinese Web Traffic is Local


While exploring the structure of national networks through our Mapping Local Internet Control project, we decided to combine our national network data with Google’s AdPlanner data to estimate the overall locality of web site traffic in individual countries. The most interesting result so far is that we estimate that 96% of all page views in China are to web sites hosted within China. This is a very interesting finding because of its implications for how to understand Internet control in China.

There are lots of ways to control the Internet, including blocking local users from viewing objectionable remote content, flooding or hacking objectionable sites, and monitoring the Internet usage of activists. But in many cases the most effective forms of Internet control are offline — threatening, fining, arresting, or killing activists because of their activity online. These forms of control are especially effective against content that is hosted within a country. There is no need to launch a DDoS attack against a dissident site that is hosted within the offended country when agents of the offended country can simply knock on the door of either the individual activist publishing the content or of the hosting provider that is hosting the objectionable content and use traditional methods of the state (fines, closing of businesses, jail) to control the content.

The extremely high proportion local web traffic in China may be the result of the success of the Chinese government in blocking the international sites, like Facebook, YouTube, and Blogger, that are generally the biggest destination in other countries. Or it might be because Chinese people like to read content written in Chinese by other Chinese about Chinese topics run by Chinese people. It is likely some combination of the two factors. But the end result is the same. The most direct battleground in the fight over control of the Internet in China is local — it’s happening on the local Chinese services that are the source of almost all Chinese web traffic but are required to censor content by the government.


To generate this number, we took the existing database of countries to autonomous systems to IP address blocks from our Mapping Local Internet Control project (documented here) and combined them with Google AdPlanner’s list of the number of page views of the most popular 250 sites in China. We combined the datasets by looking up the IP address of each of the sites in the AdPlanner 250, looking up the autonomous system of each IP address in our database, and then looking up the country of registration for each of those autonomous systems. We then took the resulting list of the AdPlanner 250 sites and countries and computed the web locality number by dividing total the number of page views for sites hosted within the country by the total number of of page views for all of the AdPlanner 250 sites. This approach is just an estimate. Some IP addresses may physically route to another country even though they are registered with a local autonomous system. The AdPlanner 250 sites are not necessarily representative of all web traffic. The AdPlanner stats themselves are only estimates, and they do not include numbers for itself.

Tor and Journalism Vulnerabilities


I was recently quoted in a story in the New Scientist about a new attack on Tor. The quote was a combination of somewhat sloppy wording on my part and a lack of context on the reporter’s part, so I’d like to provide context and more precise wording here. The quote is:

“There are lots of vulnerabilities in Tor, and Tor has always been open about the various vulnerabilities in its system,” says Hal Roberts at Harvard University, who studies censorship and privacy technologies. “Tor is far from perfect but better than anything else widely available.”

The basic idea of the attack described in the article is to use a rogue Tor exit node to insert an address owned by the attacker into a BitTorrent stream to fool the client into connecting to that address via UDP, which is not anonymized by Tor. So when the BitTorrent client connects to the UDP address, the attacker can discover the attacker’s real IP address. This sort of attack on Tor is well known — the paper’s authors call it a ‘bad apple’ attack. Tor’s core job is just to provide a secure TCP tunnel, but most real world applications do much more than just communicating via a single TCP connection. For example, in addition to HTTP requests for web pages, web browsers make DNS requests to lookup host names, so any end user packaging of Tor has to make sure that DNS lookups happen over the Tor tunnel (as does TorButton). Tor does not ultimately control the applications that use its tunnels but relies on those applications to use its TCP tunnel exclusively to maintain the privacy of the user.

Tor’s conundrum is that at the end of the day what end users need is anonymous communications through applications, not secure TCP tunnels. So even though Tor can’t be responsible for making every application in existence behave nicely with it, to be actually useful it has to take some responsibility for the most common end user applications. To this end, Tor works closely with the Firefox developers to make Firefox work as well as possible with Tor, and Tor and associated folks have invested lots of effort into tools that improve the interface between the browser, the user, and Tor. But there’s only so much that Tor can do here in the world of all applications.

These attacks might not be considered ‘vulnerabilities in Tor’, as I say above, so I should have been more careful with my language (though most folks who do these press interviews struggle with the danger of any given sentence out of an hour long conversation not having precise language that can stand out of context of the rest of the conversation). But the basic point remains — there are lots of ways to break through the privacy of Tor as it is used in the real world, and Tor has been completely open about those in an effort to educate its user base and provide ‘open research questions’ (Roger Dingledine’s favorite phrase!) for its developer community. Roger’s response to the specific BitTorrent problem is simply to tell Tor users not to use BitTorrent over Tor because there’s no way that Tor itself can fix all of the broken BitTorrent clients in the world, but one of the core findings of the above paper is that lots of people do use BitTorrent clients over Tor. So that’s a really hard problem.

The attack described in the paper has a second component that is more directly a vulnerability of Tor than a ‘bad apple’ application attack. The second component is that Tor does not create a new circuit of nodes for every connection, but instead re-uses the same circuit for several connections from the same client to improve performance. This behavior makes it possible to identify the origin IP address of not just the one ‘bad apple’ connection (the BitTorrent connection in the paper’s attack) but also the origin IP address of other current connections by the same user. So a user who is using BitTorrent and browsing the web at the same time exposes not just her BitTorrent activities but also her web browsing activities to the attacker (the paper’s authors say ‘one bad apple spoils the bunch’).

This attack can be more traditionally described as a ‘vulnerability in Tor.’ Claiming ‘lots’ of these is sloppy language, but there is certainly a whole class of timing / tagging attacks that allow an attacker who has control of an entry and an exit node to identify users (and I think the risk of these attacks is more than theoretical in a world in which one ISP in China controls about 63% of the country’s IP addresses).

So to return to the quote and story, I spoke to the author of the piece for about an hour, most of which I spent trying to convince him not to write a ‘TOR IS BROKEN!’ piece that hyped this attack as the one, new chink in Tor’s otherwise pristine armor. I walked through the above, trying to explain that Tor is intended to do a single specific thing (anonymize communication through a TCP tunnel) but that there are various attacks that exploit the layer between Tor and the applications that use it. And there are also attacks like the circuit association described above that are more properly vulnerabilities in Tor itself. But many examples of both of these sorts of attacks have been around for as long as Tor has been around, and Tor has been very vocal about them.

I was trying (unsuccessfully!) to steer the reporter toward explaining the vulnerability as an example of how it is important that users understand that even a project like Tor that is very strongly focused on anonymity over other properties can’t provide perfect privacy for its users, that there are some things it does well but not perfectly (setting up anonymous TCP tunnels) and other things it does not as well (automagically make any application using Tor anonymous). To borrow Roger’s favorite phrase, how to explain complex social / technical issues like this one to reporters is still an open research question for which I’m eager to hear solutions!

Update: The reporter who wrote the article reminded me nicely that the only contact he had with me for this article was a single email exchange, so evidently I made up the long conversation with the reporter in my mind. In my defense, I give a lot of interviews on circumvention related topics, and I can actually still (falsely!) remember standing in the my house having this call with the reporter.

Independent Media Sites in Belarus Reportedly Hijacked During Election


Belarus is holding an election today. This election is particularly important because Aleksandr G. Lukashenko, sometimes referred to as the ‘last dictator of Europe,’ has allowed a fair degree of freedom throughout the campaign, including giving free airtime on national TV to opposition candidates, during which they were allowed to criticize him without censorship.

However, it appears that Belarus is continuing in its mixed record of allowing free access to opposition Internet sites during elections. I am getting reports from a digital activist whom I trust of DDoS attacks against a number of sites, which is common during times of crisis in authoritarian countries. I can verify that the following sites have been inaccessible at times this morning:,, He is also reporting that international connections to ports 443 and 465 are being blocked, which will prevent users from securely posting content to international sites like facebook and twitter and from sending mail through international carriers like gmail (the blocking is apparently for all international sites, though, not just ones that may be offensive to the government).

Most interestingly, he reports that BELPAK, the Belarussian national ISP, has been silently redirecting requests from independent media sites to copies of those sites presumably run by pro-government actors, if not the government itself. So when a user requests, the ISP hijacks the request and instead of returning the requested page returns a redirect for The fake site is almost identical to the originally requested site, and as of this post each fake site appears to contain all of the same stories as the original site. Presumably as election day goes on, though, the government will use the fake site to prevent publication of stories that it does not like (by merely not mirroring them onto the fake site). My source observed this behavior repeatedly this morning, but it has since stopped, so requests from within Belarus are currently going to the original sites. This behavior was reported for the following sites, with the following faked mirrors (which can be accessed as confirmation):

original site fake site

Here’s a zip file of screenshots of each of the above sites, in case the fake sites are taken down.

I cannot verify that this activity was or is happening, but the mere presence of the mirrored sites under almost identical names is strong evidence of bad behavior by someone. My source is working directly with many of the sites listed above and so can verify that those mirrored sites are not being run by the site owners (running such mirrored sites under similar domain names is a very common form of DDoS resistance).

This practice of using a complex combination of different methods for controlling the Internet, particularly during times of crisis like an election or a protest, is very common (we will shortly release a report on DDoS attacks against independent media which includes the finding that independent media sites offer suffer from a range of different types of control rather than just filtering, just ddos, just hijacking, etc). Note above that several of the sites that have been subject to the hijacking described above have also been DDoS’d. It may or may not be the case that the actors DDoS’ing the sites are the same as the ones hijacking them (the hijacking is almost certainly the work of BELPAK, since they are the only ones with the ability to hijack requests as described above).

Update 2010-12-19:

All of the mirrors above are hosted on IP addresses owned by BELPAK: has address has address has address has address has address has address has address has address

This doesn’t necessarily mean that BELPAK itself is directly hosting the sites — it just means that BELPAK or one of its customers is hosting the mirrors sites within its network. Nonetheless, this is further evidence of bad behavior.

Update 2010-12-21:

Radio Free Europe / Radio Liberty is reporting that one of the site mirrors changed the location of a protest (presumably to misdirect protesters).

Amazon’s Wikileaks Takedown


For the past year, I’ve been working on a study on distributed denial of service (ddos) attacks against independent media and human rights sites with colleagues at the Berkman Center. The resulting report will be out shortly, but one of the main conclusions is that independent media sites are not capable of independently defending themselves of large, network based ddos attacks. There are many things an independent site can do to protect itself against smaller ddos attacks that target specific application vulnerabilities (including simply serving static content), but the problem with a large, network based attack is that it will flood the link between the targeted site and the rest of the Internet, usually causing the hosting ISP to take the targeted site down entirely to protect the rest of its network.

Defending against these large network attacks requires massive amounts of bandwidth, specific and deep technical experience, and often connections to the folks running the networks where the attacks are originating from. There are only a couple dozen organizations (ISPs, hypergiant websites, and content distribution networks) at the core of the Internet who have sufficient amounts of bandwidth, technical ability, and community connections to fight off the biggest of these attacks. Paying for services from those organizations is very expensive, though, starting at thousands of dollars per month without bandwidth costs and often going much, much higher. An alternative is to use one of a handful of hosting services like blogger that offers a high level of ddos protection at no financial cost. One of the recommendations we make in our report is for independent media sites that think they are likely to be attacked and want to be able to defend against themselves either find the resources to pay for a ddos protection service or accept the compromises of hosting on a service like blogger in return for the free ddos protection.

We make this recommendation with a great deal of caution, however, because moving independent media sites to these core network actors trades more freedom from ddos attacks for more control by one of these large companies. It’s great to be able to withstand a 10Gbps ddos attack on youtube, but it’s not so great for youtube to take down your video at its sole discretion for violation of its terms of service. In general, these core companies have struggled in this genuinely difficult role. How is youtube supposed to judge what to do when it receives complaints about a violent video in Arabic posted from Egypt? Do videos of police brutality qualify as the ‘graphic or gratuitous violence’ that youtube disallows in its terms of service?

So with this context, I’ve been watching the Wikileaks attack with great interest. It has been suffering a pretty big network attack (Wikileaks claims about 10Gbps, which is big enough to take down all but a couple dozen or less ISPs in the world; arbor claims about 2-4 Gbps, which is still big enough to cause the vast majority of ISPs in the world major disruption). The attack successfully took its site offline at its main hosting ISP. Wikileak’s textbook response was to move to Amazon’s web services, one of those core Internet services capable of defending against big network attacks.

The move seemed to work for a couple of days, but then Amazon exercised its control, shutting the site down. Joe Lieberman claimed responsibility for Amazon’s decision to take the site down. But Amazon responded with a message claiming that it made the decision to take the site down based purely on its own decision based on its terms of service. The core of their argument is that Wikileaks was hosting content that it did not own and that it was putting human rights workers at risk:

for example, our terms of service state that “you represent and warrant that you own or otherwise control all of the rights to the content… that use of the content you supply does not violate this policy and will not cause injury to any person or entity.” It’s clear that WikiLeaks doesn’t own or otherwise control all the rights to this classified content. Further, it is not credible that the extraordinary volume of 250,000 classified documents that WikiLeaks is publishing could have been carefully redacted in such a way as to ensure that they weren’t putting innocent people in jeopardy. Human rights organizations have in fact written to WikiLeaks asking them to exercise caution and not release the names or identities of human rights defenders who might be persecuted by their governments.

If this is really how they made their decision, this is a worse process than merely succumbing to the political pressure of the US government. At least Lieberman is an elected official and therefore to some degree beholden to his constituents. Amazon is instead arguing dismissively that it made the decision based on its own interpretation of its terms of service. Without getting into the merits of either side, the questions of whether Wikileaks has the rights to the content and especially of what level of risk of harm merits censorship are very, very difficult and should clearly be decided by some sort of deliberative jurisprudence rather than arbitrarily and dismissively decided by a private actor.

This need for careful, structured, and public deliberation on these questions is obviously balanced by Amazon’s right to decide what to do with its own property. But as a society, we have reached a place where the only way to protect some sorts of speech on the Internet is through one of only a couple dozen core Internet organizations. Totally ceding decisions about control of politically sensitive speech to that handful of actors, without any legal process or oversight, is a bad idea. The problem is that an even worse option is to cede these decisions about what content gets to stay up to the owners of the botnets capable of executing large ddos attacks.

Filtering and Circumvention in Iran


Here’s a guest post I wrote yesterday for the MIT Technology Review about filtering and circumvention during the protests in Iran.