Grand Metadata Tool Ideas

We’re embarking on a project to adopt or build a metadata tool at Duke University Libraries.  Before we’re immersed in architectures, designs, workflows, schedules, layers, platforms, capacities, etc., I’d like to indulge in some guilt-free big thinking.  I thought I’d just kind of put the question out there:  What are some of the big ideas that could inform the development of a metadata tool?

I invite conversation here and on the web4lib and code4lib lists, to which I’m sending an abridged version of this post.  Other conversations will occur in various venues over the next month or so.  I’ll try to pull together and post on anything I see, hear, read or say.  In the meantime, I’ll share one big idea that I’ve been considering; I’m not saying it’s THE big idea or even implying that we’ll follow through on it at Duke.  It’s just one way to bend our thinking about this project.  I’m interested in other ideas that can help with the bending of the thinking on the project for the tool for the metadata.

The idea that I’m posing follows from a blog post that Lorcan Dempsey wrote in May, mentioning an example of a “shared cataloging environment”.  When I read it, I wondered, what if you take that idea to its logical (illogical?) extreme:  a metadata tool as a software-as-a-service (SaaS) platform.

In this scenario, there’s a negligible barrier to entry.  Let’s say that “anyone” can sign up, as with flickr or youtube, and use it to describe any collection of things, including things that have URL’s (flickr images) and things that don’t (vinyl LP’s on my shelf, bulbs planted in my flower bed).  As a user, nothing that I’m describing with my metadata actually resides on the server with the platform.  I’m just making records, which point to something online, or refer to something either on my hard drive or my shelf (or stuck to a contact sheet in a binder, or arranged in a box, etc.).

A Service-Oriented Architecture (SOA) allows me to embed online resources (flickr images, youtube videos) in the display of my metadata records.  Whenever I want, I can harvest my metadata in a variety of forms.  The platform also has features for creating and sharing custom metadata schemas and authority lists and publishing, sharing and organizing the contents of collections.   If you further imagine extending that SOA to embed the output of digitization workflows, then the digital library or digital collections applications enter the scenario.  But I’m going to provide a couple of usage scenarios not related to libraries, just for now.

Scenario 1:  OTRR, or stuff that’s not online (or is, either way)

A colleague of mine at Duke Libraries, Randy Riddle, collects old-time radio transcriptions.  He writes a blog about his collection, builds podcasts, and participates in a community, the Old Time Radio Researchers Group.  The community maintains a wiki site, and embedded on many of the pages are structured metadata records.  See, for example, the pages on “The Adventures of Superman.”

Randy sat down with me and provided a list of the metadata fields that he thinks the community might use to track and share their collections, and it’s fairly extensive:  Series Title, Episode Title, Alternative Title, Date of Original Broadcast, Episode Number, Personnel (writer, actor, producer), Synopsis, Publisher (network, local show, syndicating company), Running Time, Sponsor, and then a number of fields for information specific to the disc (Material, Matrix Number, Pressing Company) and to the generation of a recording or dub if it’s not a disc.

A collector’s community like OTRR could collaborate on a metadata schema, implement it in our hypothetical SaaS metadata tool, and build their own authority lists.  They could maintain their individual collections, pool them, and potentially build other services around the metadata.  They could link to digitized versions of items in their collections if they exist, or they can enter “metadata only” records.

Scenario 2:  birdwatchers, or stuff that’s online (or maybe not, either way)

Let’s say I enjoy birdwatching.  I go out on weekends to take photographs and upload them to flickr, where I share them with other birdwatchers.  There exist a number of such groups on flickr.  But flickr only gives my group a couple of metadata fields to work with, and besides, some of my friends shoot video and others record high-quality audio.  We’d like to be able to catalog these resources with a little more precision:  Species, Place, Date and Media Format.  Just those four fields would provide a lot of discovery power for our community.

So we post our pictures and videos to flickr and youtube, but we can also sign in to this metadata tool and create a collection that we’re all going to share.  We collaborate on our metadata profile and populate the authority lists with species names.  We might even practice “Wikithority” — linking terms to their entries on Wikipedia (“Sialia mexicana”:  http://en.wikipedia.org/wiki/Western_bluebird).  Whenever I want to add a new item to my personal collection — which is a member of the “Birdwatchers” community — it defaults to the “birdwatchers” metadata profile.

Let’s imagine further that the platform even performs some discovery functions:  keyword searching, faceted browsing, and maybe “advanced search.”  It has a front end that looks something like the “Tripod” system we developed for digital collections at Duke.  So when I’m under everyonesmetadatatool.org/groups/birdwatchers, I’ll see facets from the birdwatchers profile.

Meanwhile, libraries can use the tool for digitization projects by forming their own communities and importing records from their digitization workflows.  The scenario can use some fleshing out, possibly in a follow-up post.

What if the libraries developed a tool that provides this service for any online community?  Would it position the library in the midst of the social networking and online community-building culture where, we believe, research and the exchange of ideas is actually occuring?  It seems to me that this kind of environment puts the library in the position of staking its own claim on the basic idea of online collections, something that I think has been co-opted by the big SaaS platforms like flickr.

I’d appreciate any feedback on this idea, including, “You could never make it work, Duke!” (along with reasons why).  I’d also appreciate any other “big thinking” that folks might have to offer on the possibilities for a metadata tool platform.

10 thoughts on “Grand Metadata Tool Ideas”

  1. re: Use Case 2 –have you seen Ravelry? It is a well-developed community metadata/social platform where users can enrich images they’ve uploaded to flickr with all sorts of metadata on pattern sources, yarn types, stitches per inch, etc. and connect with each other. It is WILDLY popular and so interesting for info-science types to check out. I’d be happy to show you the site (log-ins required).

  2. We have 5,000 issues of newspapers dating from 1860 – 1923. Because they are poor scans of old microfilm, we get lousy searching results from the built-in search tool in Adobe Acrobat (our images are PDF). There’s way too many images for our staff to tackle – so I’d like to create a database the community at large can use to help index this collection. Example: I was clicking through the collection one day and came upon a great local account of the dedication of Grant’s Tomb. A local reporter went to the event and wrote a fascinating article. If I had the opportunity to index that article on that page of that newspaper – I would have.

    I have a Google search tool – but produces poor results. I know I can never index all the great stories in 5,000 papers – but the community at large or interested historical societies could. Maybe one day. I’m very interested in minimum metadata fields that won’t overwhelm the casual, untrained, indexer. Check it out at:
    http://www.atlanticlibrary.org/collections/digitized/newspapers/index.asp

  3. You might want to explore the use of linked-data and in particular RDF for sharing vocabularies for describing things in a grassroots way.

  4. I don’t know how “big” the following idea is, but if you are looking for something that would be very useful, I would like to suggest a portable Ajax-powered MARC and/or XML editor that could be plugged into any web-based development environment.

  5. Some random thoughts:

    Are you looking for a tool that helps people create metadata in a consistent way – build schemas etc? Or are you looking for a store in which people can ‘deposit’ their metadata?

    These are overlapping ideas, but I think they are distinct – for an important reason. I don’t think you should build anything that *relies* on a central store. I’m not saying you shouldn’t build a store, just don’t assume that all relevant metadata will end up there – it won’t – you won’t be the only community of birdwatchers etc. so you have to embrace the idea that other communities will be building relevant metadata outside your store.

    The idea of linking discovery tools intimately to your metadata store worries me – we’ve just realised this is a problem for libraries!

    I’m not sure the distinction ‘available on line’ and ‘not available online’ are very meaningful when talking about metadata – if you are only talking about a tool that deals with metadata, and that metadata points to the location (whether online or not), then the function is exactly the same – what differences do you see?

  6. If you are looking for a non-rdbms platform to work I recommend Brainwave Platform which has a Semantic Database “Poseidon” which is based on RDF model and might help you.

Comments are closed.