Over the past week, the technical development team has met with a number of digital library experts, including David Smith and R. Manmatha from U Mass, Sebastian Hammer of Index Data, Nasos Drosopoulos and Stefanos Kollias of MINT, and MacKenzie Smith:
Last week we talked with David Smith and R. Manmatha from U Mass about their work on identifying languages in scanned text. They are able to report on what percentage of a work is in each of the supported languages (six at the moment), which would be very valuable information for the DPLA platform to make available. Indeed, when the metadata records for a work lists the languages, the percentage of “unknown” that their recognition software reports can indicate problems with the OCR.
Read the details in David Weinberger’s post on the DPLA Dev blog: Interesting Conversations