In January of 2007 I sent a post to the Web4lib list titled “Metadata tools that scale.” At Duke we were seeking opinions about a software platform to capture metadata for digital collections and finding databases. The responses to that inquiry suggested that what we were looking for didn’t exist.
About a year ago, an OCLC report on a survey of 18 member institutions, “RLG Programs Descriptive Metadata Practices Survey Results,” supported that basic conclusion. When asked about the tools that they used to “create, edit and store metadata descrptions” of digital and physical resources, a sizable majority responded “customized” or “homegrown” tool.
Since my initial inquiry, we launched a new installation of our digital collections. Yet we still lack a full-featured software platform for capturing descriptive metadata.
We did our own informal survey of peer institutions building digital collections, which further reinforced that familiar conclusion — there are lots of Excel spreadsheets, Access and FileMaker databases, etc.,
out there, but no enterprise-level solution.
We also articulated a detailed series of specifications for a metadata tool. The library has committed to hiring two programmers each to a two-year appointment for this purpose. The job description is here, and there are two openings for it.
When we thought through through the specifications for a metadata tool, we went on and envisioned a full-featured platform to support digital collections, including digitization, asset management and a front end. I’m going to list in some detail the specifications from what we refer to as the Digitization and Description use cases.
* Supports identification – an itemized list representing each scan in the collection. This list includes 1st pass descriptive metadata, structural metadata (how the components of an item are related to one
another) and location metadata (box #, folder #, album, chapter, etc) identifiers, item type and dimensions.
* Check out systems – The ability to create an arbitrary grouping of consecutive components that can be “checked out” by a scanner operator during the scanning/QC process. These units include information about the physical location of the materials and any pertinent information needed to scan. Needs to enable multiple users at the same time.
* Reporting interface – In conjunction with the checkout system above the system needs to be able to report statistics such as total number of items in a collection, number of items left to scan and/or qc, the average amount of time it take to scan a “unit.”
* Pulls technical metadata from the image header – Must be able to extract technical metadata from image files.
* Student worker login access – Restricted read/write access through login credentials
* Generates image derivatives according to dimension specification – Generation of derivatives via batch processing with the option to change dimensions to fit changing web displays. Must use color profiles and have options to control compression quality.
* Generates checksums – Creates checksums of files that move through the system to ensure that the files have not become altered.
* Supports Duke Core (multiple metadata schema) metadata creation – Duke Core, a modified version of qualified Dublin Core, is the standard metadata schema developed for digital collections at the Duke libraries by the Metadata Advisory Group.
* Authority lists — including sharing authority lists between similar projects, setting default dropdowns for all projects/items (e.g. Type), etc.
* Set mandatory fields and cardinality constraints.
* Assign values in mass to every item in a collection – Collections often have particular metadata that needs to be applied to every item in the collection (e.g., subject terms, creator, etc.).
* Find and edit existing records easily.
* Integrates with digitization workflow.
* See digital object while editing metadata. Users should be able to see the digital object while they are creating or editing the corresponding metadata. Does not have to be the highest-resolution image, but a working version.
* Displays record status – Allows catalogers to specify the state of a record. System should allow catalogers to specify this status and list records in a way that provides at-a-glance overview of work remaining within a collection.
* Handle item-level metadata-only records – Some of our digital collections are metadata only. The tool must allow users to create and edit metadata records that do not have an attached digital object.
* User Interface simple and intuitive, distributed system. Could be web-based – The interface should be simple and intuitive, and should allow multiple users to work at the same time, though they should not be able to edit the same record at the same time. A web-based tool would allow users to work on digital collections from anywhere and would not require them to use a computer with particular software installed.
* Supports UTF-8 universal character sets – Metadata for Duke’s digital collections often includes special characters (diacritics, non-Roman characters, etc.). The tool must accommodate UTF-8 character sets.