Media Cloud. It’s a tool. It’s a database. It’s a kind of mechanized form of content analysis. It’s chocolately goodness.

It’s raining news
Developed by some of the clever folks at Berkman, Media Cloud takes the output of many many many (1500 so far) news sources, from the New York Times to blogs of all persuasions and parses them using an incredibly powerful free tool from Thomson Reuters with the lovely if mysterious name of Calais. After that, you can use it to ask questions about who’s covering what country, what topics are mentioned most in which media (“New York” is in the top ten for the Wall Street Journal and the Washington Post, but NOT for the New York Times? hmm) and what terms are used with specific terms (in non-Boston general interest media, the top ten things mentioned with “Boston” almost always includes the Celtics).
The stuff you can try out now on the site is just the beginning – the developers want you to begin to help them imagine the possibilities.
As Ethan Zuckerman explains on his blog and in a video interview at Nieman Labs, one of the reasons Berkman decided to do this was to help people like Ethan (and me) prove to Yochai Benkler and others that the blogosphere’s power is mostly not about initiating original reporting.
I am compelled to put in my own tiny claim (success has a thousand mothers, or however it goes) to a place in the Media Cloud origin myth. Early on in my year at Berkman, I went to talk to Berkman guru Jonathan Zittrain about how to shape my Media Re:public project. He had two suggestions – one was an interesting and complicated idea about the political blogosphere and the presidential election which I rejected instantly not so much because it would have eaten the entire project and not been what Berkman and MacArthur had in mind but because although at that time I didn’t dare admit it, I find political blogs more boring than I can convey in polite language. The second thing that Prof. Zittrain said was vaguer, but much more intriguing. He said – “Don’t just produce another boring white paper” (oops) “why don’t you make something for MacArthur that lasts beyond the project – a gift that keeps on giving. Some kind of tool or something.” I relayed this thought to Ethan Zuckerman, who’d already done some experiments in this area, and Hal Roberts, who was able to conceive of how to do it on a much grander scale. They brought in other folks, including the multi-talented Steve Schultze and the rest is history. Or at least weather. Go, play!
PS it’s not just the name of Calais that’s opaque, try understanding the dense prose poetry below (you downstream reader, you!)
About the Thomson Reuters Calais Initiative
The Calais initiative supports the interoperability of content and advances Thomson Reuters mission to deliver intelligent information. It leverages the company’s substantial investment in semantic technologies and Natural Language Processing to offer free metadata generation services, developer tools and an open standard for the generation of semantic content. It also provides publishers with an automatic connection to the Linked Data cloud and introduces a global metadata transport layer that helps them leverage next-generation search engines, news aggregators and more to reach more downstream readers. For more information or to get started with the Calais API, visit OpenCalais.com.
Images:
2007_09_09_bos-iad-lhr069.JPG by doc searls
Uploaded September 15, 2007