CNI Spring Task Force Meeting – April 6-7, 2009

I attended the CNI Spring Task Force Meeting in Minneapolis, April 6-7, 2009. Below are some takeaways that I found noteworthy, especially as they relate to repositories.

Keynote Address – David Rosenthal, Chief Scientist, LOCKSS, Stanford University: David challenged some of the prevailing thought on digital preservation regarding format obsolescence. He stated that incompatibility is not inevitable, rather that “creating incompatibility = reinventing the wheel”. He argued that format obsolescence never happens. He backed this up with evidence from the last few decades. The moral of the story: If we go ahead and just collect the bits, we will be fine. A rather freeing thought, given that the perceived complexities often make digital preservation a non-starter.

JPEG2000 is a viable alternative: Ryan Chute, from Los Alamos National Library, demonstrated the Djatoka (pronounced jay-too-kay), which is an open source JPEG2000 image server, built with the Kakadu software library. The Djatoka server now has two client implementations (IIP implementation at the Biodiversity Heritage Library, and Open Layers at UNC). Conceivably, JPEG2000 could be used as both a presentation format and as a preservation format (lossless compression around 2:1 and visually lossless compression around 10:1 from tiffs). Demonstration looked very sharp, will need to pay attention to how it performs in production environments. Discussed with Ryan the plans for integration with Fedora, and there are a few implementation paths to evaluate.

Preservation services in the clouds, Duraspace: Sandy Payette and Michele Kimpton discussed the joint venture between Fedora Commons and Dspace Foundation. Duraspace will be a service (eventually a set of services) as well as open source software. The initial use case will allow for a preservation based service in the cloud. They have identified a few sites that they will be piloting these services with. By Q1 2010, they expect to have extensions available for Fedora and Dspace to plug into these cloud services. I asked about a scenario where we might store preservation copies in the cloud and store derivatives locally, and have Fedora and Akubra broker the data to the right store; they said this is a scenario they are planning for.

Cool Book Digitization Workflow at Northwestern: I attended a presentation by Claire Stewart and Steve DiDomenico from Northwestern on their web-based book digitization workflow, codename “crabcake”. They are digitizing books and ingesting into Fedora. Their Fedora implementation is similar to ours with an atomistic content model and use of METS for structural metadata. Very clean set of workflow tools. The most impressive part of their presentation is their GUI for manipulating the METS structure for a book digital object. This interface is built heavily with Ext JS. Their project is grant funded, and they will be releasing as open source in the summer. From what I can tell, installation of their tools may require some adoption of their local practices, at the very least, their interpretation of METS. Regarding their digitization/QC process, they have a lot of throughput, they push things into Fedora with very little human intervention and fix later, in essence getting things online with very little impediment.

Trident project report: I gave an update on the Trident project. The presentation was well attended, and the project was well received. There was good discussion around the metadata application profile, its possible extension to different metadata schemas, and general use cases for the Editor. There was a general validation that our project continues to head in the right direction.

One thought on “CNI Spring Task Force Meeting – April 6-7, 2009”

phil.cryer says:

June 1, 2009 at 5:31 am

Demonstration looked very sharp, will need to pay attention to how it performs in production environments.

It is in production, and performing very well for us at the Biodiversity Heritage Library. I’ll save web stats for someone else to quote, but my implementation has run for weeks at a time without any intervention. The few issues we’ve come across have been addressed by Ryan (via the djatoka mailing list), and we’re happy being a test case to not only showcase the functionality of djatoka, but also serving as a testcase to drive this open source project to be more functional, stable and usable for others.

My initial implementation here:
http://www.fak3r.com/2009/01/27/howto-serve-jpeg2000-images-with-a-scalable-infrastructure/

I will be writing a followup detailing scaling and failover methods I going to put in place in a weeks time.

Thanks

P

Comments are closed.