Here are the notes SJ Klein took at the end of our first Hackathon on April 5 when we each described what we had built. The original notes from the session are here.
Corey – The main thing i did was write a really hackish ruby harvester, get json back and parse it into columns with dates, urls, data soruce, title, and record id. Dump that into a pipe-delimited file.
I played around with loading this into viewshare, a LOC project imnplementing a lot of Simile and Exhibit tools that can import data and visualize it.
I’m looking at attempts to plot the word ‘exploration’ on a timeline based on opu date, and a pie chart showing that 30% of results for ‘exploration’ come from NPR YouTube and Biodev Library.
I did the same thing with monkeys and turtles. There were more monkeys in LOC than in the biodev heritage library…
David – User-created book covers for works for which DPLA doesn’t have covers (which right now is all books). I put in a search term, to get in results from dpla, pick one. It goes to flickr, taking the subject and description and author and title from dpla, mashing them together as a string, removing stopwords, separating strings by commas, sending them to flickr as tags, and geting the set of flickr images. Let the user choose which one she wants to use as a cover and put the title and author on the image.
Andromeda Timeline! On the backend, python is querying dpla and taking JSON it gets and munging it into this timeline tool, put out by Knight recently (http://timeline.verite.co/). It autogenerates a timeline that helps you see where things are in history. If it is a multimedia item, it will embed that by default. it doesn’t know how to handle npr stuff yet, but it is a fun way to see an idea evolve over time.
User data could be dumped into this script, but it’s not yet clear how to wire those two pages together. In theory you could put in your own search terms.https://github.com/thatandromeda/DPLA
Jason I already showed what I was working on earlier. I just used the NPR data to embed an mp3 onto another page. Taking a blacklight app and using it… I di release the related gem, so that’s [progress].
Ralph I’ve been trying to enhance the data in the db. So I tried a monkey search like Corey, and found an author David Lipsky, I copy that author and throw it at this little api for looking up names: this comes back and tells me there are 3 david lipsky’s it knows about, and gives me control ids into variuos services that know about him. So we pick one that is most known: the Natl Lib of Austrlia, LOC, and DNB all know about him. We throw that up into worldcat and find out more about him.
This turns up things like related works, links to his WP article. I wanted to see if I could cook down the google refine API to query freebase and come out with better IDs all at once than by going through all of these services.
Dan Working on Covered: with Brad, we refined how this works — it lets you stack up a set of criteria in one search. Say “monkey”. this lets you paginate through the result set. This pulls in covers from openlibrary; if I click on a result, this will do a flickr search for terms in the title. If it doesn’t find anything, it shows nothing.
If I want to refine this down, I can find subsets of the matches. This is all done clientside.
James and Nate Working on a map mashup, showing how one can generate lists from dpla queries and place them on a map. Right now we have random locaions in Boston; we picked 10 spots and 10 lists of books in those locations. These are the nearest local public libraries. IT geolocates your position and finds nearby public libraries. One happens to be in the middle of a river… don’t mind that.
We can click on one of them, and you will see different books… these are live links to relevant media. Ideally this would be connected to a tool for making lists, and you could find out what summer reading lists people were making around you.
James – this morning we worked on the DPLA api to the set of apis that Zeega works with. Now you can ingest things from the DPLA api into Zeega, and expose it — to add extra metadata, geolocate it.
Reinhard and Ryan I am amazed that Ryan was able to merge the things we were doing. “the world’s gnarliest merge in the past few minutes” What we have here are several visualizations. you are seeing a treemap. this shows – with imperfect colors – several dimensions of data. all the subjects at the top, bigger boxes being ones with more items in the total search resultset.
Colors, from white to green, show how many were present in the 20 items we actually retrieved; just to show how you could have multiple dimensions of data. At the bottom there’s a timeline bargraph: results by data from the 1800s to the current time. no labels on it yet.
And below that there is a tag cloud, another way of representing this data. We built all of this purely from results from the facets of the API response. I imagine we could make a visual way to drill down into results this way. For instance, you could click on one of these vis’s and requery.
We got many results for people, who have birth and death dates. I looked at all creators, averaged their birth/death dates, to get a single value here.
Jay I created a really simple python wrapper for the DPLA API – based on the Solr api someone else wrote. It is much simpler – it lets you query in different ways, and define facets and sort parameters. It is on github as dplapy. http://github.com/lbjay/dplapy
Matt I showed something this morning which I’ll show again v. quickly — pulling a list of trending twitter topics and see what dpla matches we can get. not too useful, but here are things trending in the last 5 minutes. you have to click around a bit to find matches – but there are things we match on for say Stenson. Monkey (As a failsafe, just to get a working link)
I can click on one of the matches and pull up a page on http://api.dp.la
This morning quickly I took dan’s Covered app, and included an “add to shlv.me” link which should take you to a shlv.me page where you can fill in various required fields and add that dp.la item to my own shelf here. That’s it!
Paul OCLC has an XID service, so during ingestion when an ISBN shows up, they can call out, get related items, and create a Work record (in FRBR speak) and then find all matching records in WorldCat.