You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

The Longest Now

Wikipedia content *sometimes* available
Sunday October 29th 2006, 11:09 pm
Filed under: Not so popular

The last available dump of the entire English Wikipedia is 10 weeks old.  Many dumps have been attempted since then, all have failed.  There is currently no way for an individual to sign up for incremental updates from the site, so the only option for updating one’s local dump/mirror/archive/testing-ground is to retrieve the entire thing from

XML dumps are not available from Wikimedia servers; you have to get a massive tarball — even though many people and most researchers will convert this to xml before doing anything further with it.  Other useful but unavailable dumps include: randomized subsets of en:wp and other languages, a dump of all commons images, a dump of all images used on en:wp or other language wp’s, whether on commons or not, a dump of any media uploaded in the past 11 months (latest dump here).  Worth noting: the images/ subdirectory on isn’t officially linked to from anywhere.

There are limits to use of information.  One is free copyright, one is free format, but before all of those is discovery and access to any version at all.  I hope these oversights are remedied soon.

Wikipedia content *sometimes* available …

Wiki is the knowledge base I know.

Comment by Rolf Bachmann 11.04.06 @ 3:33 am

Bad Behavior has blocked 356 access attempts in the last 7 days.