You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

The Longest Now

Identifiers and classifications : moving beyond ISBN and FRBR levels
Friday March 06th 2009, 1:48 am
Filed under:

A global identifier for works becomes increasingly valuable to the general public and to other creators, as we gain nuanced control over the mechanisms of publishing, and need to solve more advanced and interesting tasks in creating, versioning, adapting, and enjoying one another’s works.

Many library systems currently rely on ISBNs and, in recent years a notion of FRBR-style groupings of related works, without a universal and generalizable system.  So that we don’t get caught up in the details of ISBN and FRBR, which may need to change, I will use “Open Work Number” and “Abstraction and Originality Level” in place of ISBN number and FRBR level [1] as placeholders for future better-defined specifications.

I’m currently thinking about ways to use the Open Library [OL] as a repository of global archival records for all creative works, and what sorts of identification and connection information that project needs to support common uses.

There is a need for permanent identifiers and a maintained public authority file:
* an identifier for any named opus, work, version, historical item, &c;  [work element?].  this includes identifiers for special items, for manifestations, for works, for special collections; and for any component or satellite materials notable enough to have a distinct name, version, or description.
* an identifier for every distinct digital file.  One can imagine multiple scans of an identical work, multiple files for one work element, multiple formats.
* an identifier for categories and equivalence classes of works [and files].  this is related to an identifier for each item in a restricted vocabulary.
* an authority file associating names with identifiers, and tracking alternate names / redirecting multiple identical names to a single record

If OL does this from scratch, using a new database, the format of the identifiers does not matter too much.  Third parties would simply want to be able to find the IDs they need for their own cataloguing.
— what’s the Work ID associated with this element?
— what is the set of elements that shares this Work ID?  What’s the set of elements that shares this Manifestation ID?
— where can I indicate a relationship between two elements or works?

There are some interesting ideas from similar projects managing O(10M) records. [2]

An example of the above, applied to the family tree for Pygmalion; all of the following have book, script, or other text associated with them; though I hope none of the above is specific to textual matter.  Every entry below needs an OWN.


Pygmalion (play) [plays, works by GB Shaw, works based on Greek myth]
Pygmalion (play performance)[<various years>]
Pygmalion (film) [film, works by GB Shaw, works by WP Lipscomb, works by Cecil Lewis]

My Fair Lady (musical) [musical play]
My Fair Lady (musical, 1956-19xx) [works by Alan Lerner, works by Frederick Loewe]
—> associated scripts, drafts, scores
My Fair Lady (Broadway musical, 1956-19xx) [performances directed by Moss Hart]
—> associated playbills, modified scripts, audio or video recordings
My Fair Lady (West End musical, 1958-19xx)
My Fair Lady (West End musical, 1979-19xx)

My Fair Lady (film, 1964) [works by Alan Lerner, by GB Shaw, by George Cukor]
My Fair Lady (film, 2010) [works by GB Shaw, by Emma Thompson]

Pygmalion and My Fair Lady (1975 Paperback) [works by Alan Lerner, by GB Shaw, by Richard Goldstone]


[1] FRBR : the levels it defines are helpful, but only to a point.  In particular, FRBR terms (aside from ‘item’) refer to relationships with other ideas and with steps in the creative process, not absolute status.  Every entry in the OL database for instance is a creative Work in its own right (and certainly the copyright office, like translators and printers coveting their careful layout of the latest Dover imprints, would say so!)

ISBN: these numbers are traditionally linked to commerce, and the system was designed to be replaced, with arbitrary barriers imposed on generating new numbers.  Authors don’t generate ISBNs for a work when it is published, only for the final manifestation someone wants to sell /if/ they choose to buy one; though both the original and the final manifestation actually go through multiple versions (even within the same printing), without anyone being the wiser.  There are many other reasons why ISBN cannot be a long-term archival identifier for creative works.

[2] For instance:
* Have a public queue of upcoming bulk additions.  [say, entries for each of the equivalence classes by Work published, and then open for public renaming]
* Start a style guide for how to name, merge, split, &c.  This can reference esteemed ones from major libraries if those aren’t too cumbersome.
* Plan for soft-security input control in the future; for instance new additions could be invisible to the public but visible to all contributors until approved.
* Some elements are not notable enough to be kept and archived by a traditional library.  Digital storage is not paper; there may be value in a way to distinguish between major and minor IDs.  A four-inch review, or a page of program notes, that traditionally might have been archived clumsily as part of the encapsulating paper or playbill should be able to receive an appropriately granular ID without cluttering up a visible namespace.

[…] Identifiers and classifications : moving beyond ISBN and FRBR levels […]

Pingback by SJ’s Longest Now » Dreaming in purple links 03.06.09 @ 1:54 am

Bad Behavior has blocked 150 access attempts in the last 7 days.