The Classical String Quartet, 1770-1840

August 10, 2009 Jill Katte Vermillion 3 Comments

Note: This is a guest post by Tom Moore, Head of the Music Library and Music Media Center at Duke. Tom is also the editor of the Music Library blog, Biddle Beat.

The award-winning Historic American Sheet Music Project of the Duke Libraries Digital Collections provides access to images of more than three thousand pieces of early American sheet music. Almost all of this music is popular vocal music intended for voice with piano accompaniment, and virtually none belongs to the genres of classical or concert music, which are also richly represented in the collections of the Duke Libraries. The Classical String Quartet, 1770-1840, begins to explore this area, and makes available the contents of about forty collections from the period when the string quartet was at its peak, when the works of the Viennese masters for the genre were created, many of them unavailable previously in any form since their original publication. Of particular interest are the various arrangements of operas for string quartet, including Joseph and his Brothers by Méhul, and the famous Magic Flute of Mozart. This resource will be highly valuable to scholars of the period, providing primary sources for study, and to string quartets, with a wealth of new repertoire.

AdViews, Announcements

AdViews: Don’t Touch That Dial!

July 21, 2009 Jill Katte Vermillion 5 Comments

AdViews Logo

The Duke Digital Collections team is excited to announce our newest project: AdViews, a digital archive of vintage television commercials. Our first batch of commercials went live in iTunes U last night (July 20, 2009), and we’ll continue to add thousands of historic commercials to the collection through the rest of 2009. By year’s end, the collection will contain over 10,000 digitized TV commercials from the archives, all available for FREE from Duke’s iTunes U site.

AdViews will provide students, teachers, and researchers access to a wide range of vintage brand advertising from the first four decades of mainstream commercial television. The collection will support interdisciplinary research, not only in marketing and advertising history, but also in visual studies, communication, women’s studies, public health, cultural anthropology, nutrition, technology, and more.

AdViews currently features commercials from the ad agency D’Arcy Masius Benton & Bowles (DMB&B), a New York advertising firm founded in 1929. The DMB&B archives are held at Duke in the Hartman Center for Sales, Advertising & Marketing History, a research center in the Rare Book, Manuscript, and Special Collections Library.

Stay tuned! We’ll be right back with more AdViews updates and behind-the-scenes information…

Announcements, Trident

You Know What We Did This Summer

July 15, 2009 Rich Murray 2 Comments

I’ve been working in academic libraries for fourteen years now, and I still haven’t been able to convince my grandmother that working for a university doesn’t mean you get the summers off. We certainly haven’t been taking the summer off in the Digital Collections Program here at the Duke University Libraries, even though you haven’t seen most of the results of our summer work yet.

We premiered the Duke Digital Collections iPhone app back in June, which has been getting positive and enthusiastic feedback (thanks!), but otherwise most of our work has been behind-the-scenes stuff that will pay off in the future. Among our projects:

The metadata phase of the Broadsides & Ephemera digital collection has begun in earnest, with a team of eight catalogers and archivists using our new metadata editor to describe these rare and valuable resources.

Work continues on Trident, our digital collections system. With a new repository, a new metadata editor, and all sorts of other new developments, we’ll be able to create and manage digital collections better, faster, and more seamlessly than ever before, and deliver content in new and exciting ways.

Our Digital Production Center continues digitizing materials for future collections at a furious rate. As usual, they’re very speedy and the rest of us sometimes feel like we’re trying to play catch-up with them….

We’ve introduced new ways to keep up with the Digital Collections Program, including a Facebook page (come be our friend!) and more frequent Twitter updates, where we’ve been tweeting highlights from the Duke Digital Collections since the spring. We’ve also been posting with our digital collections colleagues from across the state to the North Carolina Digital Collections Collaboratory blog.

Last but certainly not least, we’re about to launch a huge, fantastic, exciting, FUN new digital collection — hopefully next week — that we’re going to have to keep secret a bit longer. We hate to tease you … well, maybe we want to tease you a little bit. It’s completely different from anything we’ve done before in several ways that will become clear when it’s published. We’ve been working like fiends on this one, but we think it’s totally going to be worth it, and hope you will, too, when you see it. Stay tuned.

As always, thanks for reading, and for your support and interest. We hope you’re having as good a summer as we are. Don’t forget the sunscreen and the frosty beverage of your choice….

Interface Features

Library Digital Collections? There’s an App for That.

June 16, 2009 Sean Aery 10 Comments

Amid the excitement surrounding the new iPhone this month, we’ve got our own exciting announcement: an iPhone app for Duke Digital Collections! A mobile interface to search and browse 20 of our collections (over 32,000 images) is now included in the free DukeMobile app. [press release in ‘Duke Today’]

Here’s a 3-minute demo of the app:

Approach

Providing an iPhone interface to the collections helps us to reach an audience–whether at Duke or beyond–that is increasingly mobile. Continue reading Library Digital Collections? There’s an App for That. →

Trident

Open Repositories 2009

May 26, 2009 Dave Kennedy

I attended Open Repositories 2009 Conference this past week. Overall it was a very informative conference on the open source repository platforms (Fedora, dSpace, ePrints, Zentity), current projects and developments using these platforms, and future directions of repositories. Below are some relevant notes from the conference.

Repository Workflow

There were a few presentations that discussed how institutions were managing their repositories, in particular, repositories built with Fedora. Two of these, eSciDoc and Hydra, had some very useful nuggets.

Hydra is a grant-funded collaboration between Hull University, University of Virginia and Stanford University to build a repository management toolkit to manage their three very different workflows, and be extensible to manage heterogeneous workflows around the Fedora community. There are a few practices or ideas that we might want to adopt from this project, as well as some possible points of convergence with Trident.

The idea to treat workflow processes discretely. Hull is using BPEL (Business Process Execution Language) to define and implement the processes. They are using Active Endpoints (was open source, may not be any longer) which provides a really nice GUI for defining and connecting workflow processes. Not sure if this tool is worth investigating it, but I have seen it before, and have heard good things.
Stanford has a good design for representing the state of multiple workflows for an item. Items have workflow datastreams, which include a number of processes, each with an indicator of state. They then represent these workflow processes as a checklist in management interface.
UVA, like us, is thinking RESTfully. RESTful approach to workflow steps allows processes to be encapsulated nicely and reused in a variety of ways.
Repository API – This is a possible point of eventual convergence, Hydra will be creating a RESTful API layer on top of Fedora, similar in architecture to the one that we have developed for Trident.

eSciDoc is an eResearch environment built on top of Fedora.

They have a well established object life cycle. An item’s stage in it’s life cycle determines who is allowed to do what to the item. For instance, pending (only the creator can access and modify, collaborators may be invited, item may be deleted), submitted (QC/editorial process, creator cannot modify any longer, metadata may still be enriched),…
They have a very tight versioning design in their Fedora repository. They use an atomistic approach to Fedora, with items and components as separate Fedora objects. With this approach, they can represent multiple versions of an item in their repository. They do this by creating a copy of the item fedora object with each version, and a copy of only the changed component with each version. Their item fedora objects contain all of the pointers to the components. A handle gets assigned to the published version.

Cloud Storage

Sandy Payette and Michele Kimpton gave an update on the emerging DuraCloud services. They are currently in development, and will be tested with a few beta sites before general release. The DuraCloud services will definitely be worth Duke looking into; however, will probably need to wait for more Akubra development before these services can be properly integrated into Fedora. For Duke’s repository, cloud storage should be evaluated for storage of preservation masters. Also on the topic of cloud storage, David Tarrant gave an update from ePrints, as well as a reminder, “Clouds do blow away.”

Smart storage underpinning repositories

ePrints has exactly what is needed. Their storage controller allows for rule based storage configuration. This is now in their current release.
Fedora is still developing Akubra. Some of the beginnings of this code are in version 3.2, but it is not implemented. From what I gather, if we have a use case, we need to implement it ourselves.
dSpace will be looking at incorporating Akubra into version 2 of dSpace
Reagan Moore (UNC) and Bing Zhu (UCSD) gave a very detailed discussion on iRods. iRods has a very detailed architecture for rule-based storage. It defines many micro-services to be performed on objects. These micro services can be chained together. iRods has a clean rule-based configuration for defining chains of micoservices and the conditions under which these workflow chains should be executed on an object. iRods allows for a good separation between remote storage layer and “metadata repository.” Bing discussed how iRods is integrated with Fedora. From what I understood, Fedora does not directly manage iRods, rather datastreams are created in Fedora as external references to iRods, and iRods must be managed separately.

JPEG2000

djatoka continues to impress me. It takes the math out of jpeg2000. Ryan Chute discussed how this can be integrated into Fedora, and the service definitions involved in doing so. He also showed some of the image viewers that have been built using djatoka. With djatoka, the primary use of jpeg2000 is as a presentation format. The integration with Fedora relies on a separate jpeg2000 “caching” server for serving up jpeg2000 services, which would live outside of Fedora. In this model, it may be that Fedora never even needs to hold a jpeg2000 file. I need a little more understanding on how the caching server gets populated, but will be investigating this in the coming months.

Islandora

UPEI has packaged an integration of Drupal and Fedora. There is a mixed bag between what Drupal content is stored in Fedora and what content gets stored in Drupal. As new types of content are stored in Drupal, new content models need to be created in Fedora to support them. Presenter indicated that work still needs to be done on updates on Fedora being reflected in Drupal and vice-versa. Without more than a presentation to base my opinions on, this seems like an extensible model, but one that also requires continued hand-tuning and management.

Complex object packaging

METS and OAI-ORE, or should it be METS vs OAI-ORE. There is a lot more discussion and work in the last year around OAI-ORE. It is a lot more flexible packaging model for complex objects than is METS. And it has been the medium by which SWORD and other similar models are based on. With flexibility though comes programmatic complexity. Our repository model is based on a METS-centric view of digital repositories. We did generalize item structure in such a way though that we could conceivably change the underlying structure from METS to something like ORE. More to come on this

Cool stuff

@mire showed off some authoring tools integrated into Microsoft Office as add-ins. I’m told these won’t be released for at least six months, but showed some real possibility and value that repositories to add to authors. The authoring tools decomposed powerpoint presentations and word documents and stored them in the repository, and then allowed for searching of the repository (from within powerpoint and word) to include slides, images, text, etc from the repository into the working document.

Peter Sefton showed off his Fascinator. It features click to create portals that could then be customized fairly easily. He also talked about work he is currently doing on a “desktop sucker upper” which extracts data from a laptop to store into a repository.

Programming notes

eSciDoc is using the same terminology as us, in terms of items and components. This is good, although I have not heard our terminology really used in other contexts. Also dSpace seems to be moving away from this terminology.
Enhanced content modeling – this development allows for more precise description of datastreams and more precise description of relationships. This is not incorporated into Fedora proper, although it should be because it adds a lot of value to the core.
There are others taking a RESTful approach to repositories, at least in representing the R in CRUD
Others confirmed my belief that web services (RESTful ones) should be programmer friendly as well as computer friendly. In other words, the responses should display in web pages and give a programmer at least a rudimentary but helpful view of the data

Fedora

FIZ Karlsruhe has done extensive performance testing and tuning of Fedora. They tested with data sets up to 40 million objects. In terms of scaling, performance was not effected by size of the repository. They were also able to increase performance by tuning the database, as well as separating the database from the repository. They found that I/O was the limiting factor in all cases.

Fedora 3.2 highlights – beginnings of Akubra, SWORD integration, will be switching to new development environment (maven, OSGi/Spring DM)

dSpace

SWORD support, Shibboleth supported out of the box, new content model in dSpace 2.0 (based on entities and relationships)

metadata

“Wow! This job sure keeps us hopping!”

May 13, 2009 Rich Murray 2 Comments

There are many steps involved in creating and publishing a new digital collection — it’s truly a team effort that requires a lot of hard work and coordination of efforts from people across the libraries, with many different skill sets, working in many different departments, in buildings across Duke’s campus. People who aren’t familiar with the process often think that digitizing the materials is the most time-consuming part, and that once that’s done, the collection is ready to go. The truth, though, is that our colleagues in the Digital Production Center, who do the digitizing, are so fast and wildly productive on their scanners and cameras that the rest of us are constantly trying to catch up with them.

One of the most time-consuming parts of the digital collections process, and the part that people often don’t think about, is creating the metadata. Metadata is data about the materials we’ve digitized, and as part of the metadata process, we have to decide how to arrange the items in the digital collection, how to describe them, what information we need to collect about them, what kind of terminology to use so people can find them, and all sorts of other things. We have to decide how we want users to be able to find and interact with the digital objects, and what metadata is necessary to make that possible.

To make things even trickier, not only is metadata perhaps the most time-consuming part of the process, but up until this point we’ve had only a small number of staff working on it. Part of the problem has been that we haven’t had a good metadata creation/management tool, so the workflows and procedures we’ve concocted to get around that have been so unwieldy that it just didn’t make sense to throw tons of staff at them. But now that our new metadata editor Trident is getting closer and closer to becoming a reality, we can finally think about bringing nearly all our catalogers and archivists into the metadata process, which has been our goal all along. In early May, we brought two trainers in to teach a two-day metadata course for about 20 of our catalogers, archivists, and other staff to prepare them to do this work. We’ll soon be putting a subset of that group to work on the huge Broadsides project we’ve been talking about elsewhere on this blog, and then once we really get going, we’ll bring even more of them into this project and others.

Our goal is that digital collections work will become just one of the many things our catalogers and archivists do as a regular part of their jobs. These folks are already experts at describing, arranging, and providing access to the library’s collections, so now they’ll be applying that expertise to new types of materials. Even if they only work on digital collections as a small part of their jobs, bringing all these new staff members into the process will allow us to create metadata — and therefore create digital collections — much faster than we ever have before. And that means more images, more text, more audio, more video … more ideas and discoveries will be possible for users around the world than ever before. The best is yet to come ….

Assessment

Answering the important questions.

May 5, 2009 Noah Huffman 1 Comment

Recently we implemented Google Analytics to track usage of our digital collections. Sean has already contributed several great posts about our digital collections use statistics, but one thing I find particularly interesting (and amusing) is that Google Analytics allows us to see the types of keywords our users are entering into Google, Yahoo, and other search engines, and where those keywords lead them in our digital collections.

Not surprisingly, some search queries are common and reveal the subject strengths of our digital collections. For example, the top three queries that bring users to our collections are “sheet music,” “ad access,” and “history of advertising.”

After scanning through thousands of these search queries, several distinct categories emerge: the known-item query (an exact title in quotes), the URL as query (e.g. http://library.duke.edu/digitalcollections/adaccess/), and the format query (e.g. “diaries” or “manuscripts”), among others. The most entertaining category, however, is the query issued in the form of a question.

Below are some of the important questions our users have asked with links to where they’ve found answers to those questions in our digital collections.

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:””;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:”Times New Roman”;
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}

“what is electronics?”

“can you rent cars at Hertz for fun?”

“what to wear with a corset”

“what is a funeral chair?”

“what are 1955 war bonds worth today”

“what are chemical properties of Listerine?”

“will I be sick tomorrow?”

Uncategorized

CNI Spring Task Force Meeting – April 6-7, 2009

April 9, 2009 Dave Kennedy 1 Comment

I attended the CNI Spring Task Force Meeting in Minneapolis, April 6-7, 2009. Below are some takeaways that I found noteworthy, especially as they relate to repositories.

Keynote Address – David Rosenthal, Chief Scientist, LOCKSS, Stanford University: David challenged some of the prevailing thought on digital preservation regarding format obsolescence. He stated that incompatibility is not inevitable, rather that “creating incompatibility = reinventing the wheel”. He argued that format obsolescence never happens. He backed this up with evidence from the last few decades. The moral of the story: If we go ahead and just collect the bits, we will be fine. A rather freeing thought, given that the perceived complexities often make digital preservation a non-starter.

JPEG2000 is a viable alternative: Ryan Chute, from Los Alamos National Library, demonstrated the Djatoka (pronounced jay-too-kay), which is an open source JPEG2000 image server, built with the Kakadu software library. The Djatoka server now has two client implementations (IIP implementation at the Biodiversity Heritage Library, and Open Layers at UNC). Conceivably, JPEG2000 could be used as both a presentation format and as a preservation format (lossless compression around 2:1 and visually lossless compression around 10:1 from tiffs). Demonstration looked very sharp, will need to pay attention to how it performs in production environments. Discussed with Ryan the plans for integration with Fedora, and there are a few implementation paths to evaluate.

Preservation services in the clouds, Duraspace: Sandy Payette and Michele Kimpton discussed the joint venture between Fedora Commons and Dspace Foundation. Duraspace will be a service (eventually a set of services) as well as open source software. The initial use case will allow for a preservation based service in the cloud. They have identified a few sites that they will be piloting these services with. By Q1 2010, they expect to have extensions available for Fedora and Dspace to plug into these cloud services. I asked about a scenario where we might store preservation copies in the cloud and store derivatives locally, and have Fedora and Akubra broker the data to the right store; they said this is a scenario they are planning for.

Cool Book Digitization Workflow at Northwestern: I attended a presentation by Claire Stewart and Steve DiDomenico from Northwestern on their web-based book digitization workflow, codename “crabcake”. They are digitizing books and ingesting into Fedora. Their Fedora implementation is similar to ours with an atomistic content model and use of METS for structural metadata. Very clean set of workflow tools. The most impressive part of their presentation is their GUI for manipulating the METS structure for a book digital object. This interface is built heavily with Ext JS. Their project is grant funded, and they will be releasing as open source in the summer. From what I can tell, installation of their tools may require some adoption of their local practices, at the very least, their interpretation of METS. Regarding their digitization/QC process, they have a lot of throughput, they push things into Fedora with very little human intervention and fix later, in essence getting things online with very little impediment.

Trident project report: I gave an update on the Trident project. The presentation was well attended, and the project was well received. There was good discussion around the metadata application profile, its possible extension to different metadata schemas, and general use cases for the Editor. There was a general validation that our project continues to head in the right direction.

Presentations, Trident

CNI Spring Task Force Meeting 2009 – Presentation on Trident Project

April 8, 2009 Dave Kennedy 2 Comments

I gave a presentation yesterday to CNI on our Trident Project. The slides are below

Books

My Own Frank Brown

April 5, 2009 Will Sexton 1 Comment

One tends to remember making major life-changing decisions on April Fool’s Day. So I can tell you that it was April 1, 1995 when I decided to get a master’s degree in Information or Library Science. Even now, I sometimes wonder, is this whole thing just a cosmic joke? Is some unseen trickster entity laughing at my feeble attempts to manufacture order where none can exist? Probably. But I may never know.

The most dangerous 16 months of my life began on that day. I had just missed the deadline for the next academic year, and would have to wait for the application period to roll around again. Meanwhile, I was living in Chapel Hill/Carrboro and working as a cook in various restaurants. Many opportunities for mischief would materialize. At one point, a housemate had just about convinced me to head for Alaska to work the salmon boats. It was that kind of a year. I was engaged in the most extravagant of all human behaviors, marking time.

Two things saved me from a career of wading through fish guts: the guitar and the library. It wasn’t the first time that I relied on the guitar to get me through a shaky patch, and it would not be the last. Not that I was ever very good at it — having a tin ear kind of limits a person’s musical potential — but looking at a year of waiting to fill out an application, I decided to do something I’d always wanted to do. I would learn to play fingerstyle.

Continue reading My Own Frank Brown →