You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

The Longest Now

Google’s 130M tomes (metadata only)
Thursday August 05th 2010, 8:21 pm
Filed under: international,metrics,wikipedia

While sometimes confusing “books whose metadata has been scanned by Google” with “books that exist in the world”, a recent post on the G-blog about the size of the Google Books repository is delightful in its details.  Thanks to Leonid Taycher for condensing that into a bit of light reading.

Sadly, no estimates are given on the long-tail number of works that are nowhere close to having their metadata scavenged; or the number of works in the world that have never been moved into a formal archive; or the average number of tomes per conceptual work.  So it’s hard to gauge from this list anything like ‘what % of scanned books are available in freely licensed digital form online’.

But at least the Internet Archive collection is within two orders of magnitude. Now if only finished Wikibooks would make it into that collection…  In related news, there are new docs posted for Open Library developers who want to dig into their archives.  Congrats to Raj and team for the update.

Thanks to Lars for the central correction.

I think the post said there are 129 million books (or 210, depending on how you count) in the world. Google has not scanned anywhere near that number. Their plan in 2004 was to scan 14 million books in ten years.

Comment by Lars Aronsson 08.05.10 @ 9:23 pm

Right you are, thank you. The author doesn’t seem to have any estimates on the upper bound of how many books there are in the world, only a lower bound. As I understand the post, he is reading off the total # of tomes Google’s metadata-organizing algorithms thinks they have found metadata for at the end of last week.

This changes over time, but is not drawing from metadata about all books in the world, or even all libraries in the world. I would be surprised to learn they have yet pursued gathering index information from any libraries that have not digitized their card catalogs, for instance… not to mention the millions of personal libraries, many of which contain a non-zero number of unique or very rare works…

I’d love to see a back-of-napkin estimate of an upper bound on the total # of books! Even a confident estimate that it is 10x this number would be interesting to break down.

Comment by metasj 08.05.10 @ 10:30 pm

Sweden’s national union catalog, covering all (known) Swedish works and foreign works held by Swedish university libraries, has near 10 million records. Presumably half are Swedish, so 5 million. Assuming 20-50 times more (100-250 million) for the entire world is probably a low estimate. The 5 million includes all journals, but not all volumes or issues of all journals. It includes PhD theses, but not the B.A. or MSc final papers, which often look like books and some 20-50 thousand (?) are produced each year in Sweden. As the post points out, it all depends on what you want to count as a book.

Comment by Lars Aronsson 08.05.10 @ 11:15 pm

Bad Behavior has blocked 230 access attempts in the last 7 days.