In January of 2007 I sent a post to the Web4lib list titled “Metadata tools that scale.” At Duke we were seeking opinions about a software platform to capture metadata for digital collections and finding databases. The responses to that inquiry suggested that what we were looking for didn’t exist.
About a year ago, an OCLC report on a survey of 18 member institutions, “RLG Programs Descriptive Metadata Practices Survey Results,” supported that basic conclusion. When asked about the tools that they used to “create, edit and store metadata descrptions” of digital and physical resources, a sizable majority responded “customized” or “homegrown” tool.
Since my initial inquiry, we launched a new installation of our digital collections. Yet we still lack a full-featured software platform for capturing descriptive metadata.
We did our own informal survey of peer institutions building digital collections, which further reinforced that familiar conclusion — there are lots of Excel spreadsheets, Access and FileMaker databases, etc.,
out there, but no enterprise-level solution.
We also articulated a detailed series of specifications for a metadata tool. The library has committed to hiring two programmers each to a two-year appointment for this purpose. The job description is here, and there are two openings for it.
When we thought through through the specifications for a metadata tool, we went on and envisioned a full-featured platform to support digital collections, including digitization, asset management and a front end. I’m going to list in some detail the specifications from what we refer to as the Digitization and Description use cases.
Digitization
* Supports identification – an itemized list representing each scan in the collection. This list includes 1st pass descriptive metadata, structural metadata (how the components of an item are related to one
another) and location metadata (box #, folder #, album, chapter, etc) identifiers, item type and dimensions.
* Check out systems – The ability to create an arbitrary grouping of consecutive components that can be “checked out” by a scanner operator during the scanning/QC process. These units include information about the physical location of the materials and any pertinent information needed to scan. Needs to enable multiple users at the same time.
* Reporting interface – In conjunction with the checkout system above the system needs to be able to report statistics such as total number of items in a collection, number of items left to scan and/or qc, the average amount of time it take to scan a “unit.”
* Pulls technical metadata from the image header – Must be able to extract technical metadata from image files.
* Student worker login access – Restricted read/write access through login credentials
* Generates image derivatives according to dimension specification – Generation of derivatives via batch processing with the option to change dimensions to fit changing web displays. Must use color profiles and have options to control compression quality.
* Generates checksums – Creates checksums of files that move through the system to ensure that the files have not become altered.
Description
* Supports Duke Core (multiple metadata schema) metadata creation – Duke Core, a modified version of qualified Dublin Core, is the standard metadata schema developed for digital collections at the Duke libraries by the Metadata Advisory Group.
* Authority lists — including sharing authority lists between similar projects, setting default dropdowns for all projects/items (e.g. Type), etc.
* Set mandatory fields and cardinality constraints.
* Assign values in mass to every item in a collection – Collections often have particular metadata that needs to be applied to every item in the collection (e.g., subject terms, creator, etc.).
* Find and edit existing records easily.
* Integrates with digitization workflow.
* See digital object while editing metadata. Users should be able to see the digital object while they are creating or editing the corresponding metadata. Does not have to be the highest-resolution image, but a working version.
* Displays record status – Allows catalogers to specify the state of a record. System should allow catalogers to specify this status and list records in a way that provides at-a-glance overview of work remaining within a collection.
* Handle item-level metadata-only records – Some of our digital collections are metadata only. The tool must allow users to create and edit metadata records that do not have an attached digital object.
* User Interface simple and intuitive, distributed system. Could be web-based – The interface should be simple and intuitive, and should allow multiple users to work at the same time, though they should not be able to edit the same record at the same time. A web-based tool would allow users to work on digital collections from anywhere and would not require them to use a computer with particular software installed.
* Supports UTF-8 universal character sets – Metadata for Duke’s digital collections often includes special characters (diacritics, non-Roman characters, etc.). The tool must accommodate UTF-8 character sets.
We’re not just using homegrown systems (Access, FileMaker, etc.), but we’re also in the early stages of building Web-based of the sort you’re describing here. However, we’d love to be able to beg, borrow, or steal one and not have to develop it ourselves.
The crux, though, is workflow support: different institutions have developed or evolved different kinds of workflow, and a data entry tool too closely bound to a particular way of doing things may make it difficult for other institutions to adopt without completely retooling their internal processes to fit the tool. This is probably unavoidable, as batch operations, authorization, collection assignment, etc. are necessary for efficient data entry, but are also tightly bound to particular models of workflow organization.
Your functional requirements above are similar to ours; here are some other features that may be useful:
– support batch import/export of objects
– allow a single object to belong to many (or no) collections
– provide for compound objects, where a first-class object’s children may (or may not) be other first-class objects.
Good luck with the project! Ours started as a “new data entry interface project”, but has evolved into “change the entire infrastructure” – the tail (in hindsight, necessarily) wagged the dog.
We have developed a system at The University of Texas at Austin that sounds quite a lot like what is being described. It is an open source project, meant to have a low barrier to entry for both users and developers. More info at DASe Project. We plan a beta release soon, but adventurous users can install from an svn checkout. This post (and Peter Gorman’s comment) are very helpful in defining the needs we all seems to have. I plan a blog post shortly running down exactly how DASe addresses (or plans/hopes to address) each of the points articulated above.
Another open source project which might meet most of your requirements is OpenCollection (www.opencollection.org). Some work should be done to incorporate the workflow processes you described, but from a metadata management point of view, OC should be able to handle your needs.
Hi Will, all…
I was just alterted to this post by Derek Rodriguez (thank you Derek!!).
The RLG report is an interesting study. I consulted with the authors, after it was released, however, because I was very surpised they didn’t consider other research of similar nature, and in particular the AMeGA project’s final report.
The AMeGA project included an extensive survey with 214 participants , asking them not only about the metadata applications they are using, but the functionalities they desire in their applications. The report was produced in conjunction with LC’s bibliographic action control plan, specifically to address section 4.2, and consider recommendations for automatic metadata generation. But the report includes a lot of other rich date specific to the ideas and concerns expressed in this blog on the topic of tools!! 🙂
I very stronlgy encougate anyone interested in this topic to read the report, at the very least to view the executive summary. The AMeGA task force included a number of top people in the area of metadata, who provided thouthful feedback and commentary, and helped to distribute the underlying survey.
The report is, in my “biased” opinion, much more extensive than RLG’s report and provides recommendations functionalities for metadata applications. And, I am surprised that after dialog with the authors of the RLG report, they do not make reference to this work.
The AMeGA work, and the work fo the Dublin Core Tools Community, is important to the future of tools development, and increasing the dialog among tool developers and users.
I am providing three links here that I believe are very important, given this disucssion:
1. AMeGA final report: http://www.loc.gov/catdir/bibcontrol/lc_amega_final_report.pdf
Don’t let the title scare anyone from viewing this, there is extensive research here and outcomes!! (And, I’d be pleased to hear from anyone about this).
2. Metadata Tools for Digital Resource Repositories–
JCDL 2006 Workshop Report
http://www.dlib.org/dlib/july06/greenberg/07greenberg.html
This is the final report of an exciting workshop held in Chapel Hill during DC2006, bringing together tool developers and users.
3. Dublin Core Tools Community: http://dublincore.org/groups/tools/
Home page for the DC Tools community.. a group that is near and dear to my heart…and led by Thomas S. and Seth V-H, the last poster on this blog.
Thanks anyone who has taken the time to read my response!! I am passionate about this topic, because there is a lot of exciting work to be done in the area of metadata tool development, and bringing various communities together…
best wishes, jane (janeg@email.unc.edu)
more.. i mean to say “alerted” not altered.. to the discussion, but may this is a Freudian slip??
no spell check for me… here.. oh well
cheers..agaihn. j