Category Archives: User Experience

508 Update

Web accessibility is something that I care a lot about. In the 15 some odd years that I’ve been doing professional web work, it’s been really satisfying to see accessibility increasingly becoming an area of focus and importance. While we’re not there yet, I am more and more confident that accessibility and universal design will be embraced not just an afterthought, but rather considered as essential and integrated at the first steps of a project.

Accessibility interests have been making headlines this past year, such as with the lawsuit filed against edX (MIT and Harvard). Whereas the edX lawsuit focused on section 504 of the Rehabilitation Act of 1973, the web world and accessibility are usually synonymous with section 508. The current guidelines were enacted in 1998 and badly in need of an update. In February of this year, the Access Board published a proposed update to the 508 standards. They are going to take a year or so to digest and evaluate all of the comments they have received. It’s expected that the new law will be published in the Federal Register around October of next year. Institutions will have six months to make sure they are compliant, which means everything needs to be ready to go around April of 2017.

I recently attended a webinar on the upcoming changes that was developed by the SSB Bart Group. Key areas of interest to me were as follows.

WCAG 2.0 will be base standard

The Web Content Accessibility Guidelines (WCAG) are general a more simplified yet also more strict set of guidelines for making content available to all users as compared to the existing 508 guidelines. The WCAG standard is adapted around the world, so the updated rule to section 508 means there will be an international focus on standards.

Focus on functional use instead of product type(s)

The rules will focus less on ‘prescriptive’ fixes and more on general approaches to making content accessible. The current rules are very detailed in terms of what sorts of devices need to do what. The new rule tends to favor user preferences in order to give users control. The goal being to try to enable the broadest range of users, including those with cognitive disabilities.

Non-web content is now covered

This applies to anything that will be publicly available from an institution, including things like PDFs, office documents, and so on. It also includes social media and email. One thing to note is that only the final document is covered, so working versions may not be accessible. Similarly, archival content is not covered unless it’s made available to the public.

Strengthened interoperability standards

These standards will apply to software and frameworks, as well as mobile and hybrid apps. However, it does not apply specifically to web apps, due to the WCAG safe harbor. But the end result should be that it’s easier for assistive technologies to communicate with other software.

Requirements for authoring tools to create accessible content

This means that editing tools like Microsoft office and Adobe Acrobat will need to output content that is accessible by default. Currently it can take a great deal of effort after the fact to make a document accessible. Often times content creators either lack the knowledge of how to make them, or can’t invest the time needed. I think this change should end up benefiting a lot of users.


In general, the intent and purpose of these changes help the 508 standards catch up to the modern world of technology. The hopeful outcome will be that accessibility is baked in to content from the start and not just included as an afterthought. I think the biggest motivator to consider is that making content accessible doesn’t just benefit disabled users, but rather it makes that content easier to use, find, etc. for everyone.

Zoomable Hi-Res Images: Hopping Aboard the OpenSeadragon Bandwagon

Our new W. Duke & Sons digital collection (released a month ago) stands as an important milestone for us: our first collection constructed in the (Hydra-based) Duke Digital Repository, which is built on a suite of community-built open source software. Among that software is a remarkable image viewer tool called OpenSeadragon. Its website describes it as:

“an open-source, web-based viewer for high-resolution zoomable images, implemented in pure Javascript, for desktop and mobile.”

OpenSeadragon viewer in action on W. Duke & Sons collection.
OpenSeadragon viewer in action on W. Duke & Sons collection.
OpenSeadragon zoomed in, W. Duke & Sons collection.
OpenSeadragon zoomed in, W. Duke & Sons collection.

In concert with tiled digital images (we use Pyramid TIFFs), an image server (IIPImage), and a standard image data model (IIIF: International Image Interoperability Framework), OpenSeadragon considerably elevates the experience of viewing our image collections online. Its greatest virtues include:

  • smooth, continuous zooming and panning for high-resolution images
  • open source, built on web standards
  • extensible and well-documented

We can’t wait to get to share more of our image collections in the new platform.

OpenSeadragon Examples Elsewhere

Arthur C. Clarke’s Third Law states, “Any sufficiently advanced technology is indistinguishable from magic.” And looking at high-res images in OpenSeadragon feels pretty darn magical. Here are some of my favorite implementations from places that inspired us to use it:

  1. The Metropolitan Museum of Art. Zooming in close on this van Gogh self-portrait gives you a means to inspect the intense brushstrokes and texture of the canvas in a way that you couldn’t otherwise experience, even by visiting the museum in-person.

    Self-Portrait with a Straw Hat (obverse: The Potato Peeler). Vincent van Gogh, 1887.
    Self-Portrait with a Straw Hat (obverse: The Potato Peeler). Vincent van Gogh, 1887.
  2. Chronicling America: Historic American Newspapers (Library of Congress). For instance, zoom to read in the July 21, 1871 issue of “The Sun” (New York City) about my great-great-grandfather George Aery’s conquest being crowned the Schuetzen King, sharpshooting champion, at a popular annual festival of marksmen.
    The sun. (New York [N.Y.]), 21 July 1871. Chronicling America: Historic American Newspapers. Lib. of Congress.
    The sun. (New York [N.Y.]), 21 July 1871. Chronicling America: Historic American Newspapers. Lib. of Congress.
  3. Other GLAMs. See these other nice examples from The National Gallery of Art, The Smithsonian National Museum of American Museum, NYPL Digital Collections, and Digital Public Library of America (DPLA).

OpenSeadragon’s Microsoft Origins

OpenSeadragon

The software began with a company called Sand Codex, founded in Princeton, NJ in 2003. By 2005, the company had moved to Seattle and changed its name to Seadragon Software. Microsoft acquired the company in 2006 and positioned Seadragon within Microsoft Live Labs.

In March 2007, Seadragon founder Blaise Agüera y Arcase gave a TED Talk where he showcased the power of continuous multi-resolution deep-zooming for applications built on Seadragon. In the months that followed, we held a well-attended staff event at Duke Libraries to watch the talk. There was a lot of ooh-ing and aah-ing. Indeed, it looked like magic. But while it did foretell a real future for our image collections, at the time it felt unattainable and impractical for our needs. It was a Microsoft thing. It required special software to view. It wasn’t going to happen here, not when we were making a commitment to move away from proprietary platforms and plugins.

Sometime in 2008, Microsoft developed a more open Javascript-based version of Seadragon called Seadragon Ajax, and by 2009 had shared it as open-source software via a New BSD license.  That curtailed many barriers for use, however it still required a Microsoft server-side framework and Microsoft AJAX library.  So in the years since, the software has been re-engineered to be truly open, framework-agnostic, and has thus been rebranded as OpenSeadragon. Having a technology that’s this advanced–and so useful–be so open has been an incredible boon to cultural heritage institutions and, by extension, to the patrons we serve.

Setup

OpenSeadragon’s documentation is thorough, so that helped us get up and running quickly with adding and customizing features. W. Duke & Sons cards were scanned front & back, and the albums are paginated, so we knew we had to support navigation within multi-image items. These are the key features involved:

Customizations

Some aspects of the interface weren’t quite as we needed them to be out-of-the-box, so we added and customized a few features.

  • Custom Button Binding. Created our own navigation menu to match our site’s more modern aesthetic.
  • Page Indicator / Jump to Page. Developed a page indicator and direct-input page jump box using the OpenSeadragon API
  • Styling. Revised the look & feel with additional CSS & Javascript.

Future Directions: Page-Turning & IIIF

OpenSeadragon does have some limitations where we think that it alone won’t meet all our needs for image interfaces. When we have highly-structured paginated items with associated transcriptions or annotations, we’ll need to implement something a bit more complex. Mirador (example) and Universal Viewer (example) are two example open-source page-viewer tools that are built on top of OpenSeadragon. Both projects depend on “manifests” using the IIIF presentation API to model this additional data.

The Hydra Page Turner Interest Group recently produced a summary report that compares these page-viewer tools and features, and highlights strategies for creating the multi-image IIIF manifests they rely upon. Several Hydra partners are already off and running; at Duke we still have some additional research and development to do in this area.

We’ll be adding many more image collections in the coming months, including migrating all of our existing ones that predated our new platform. Exciting times lie ahead. Stay tuned.

Animated Demo

eye-ui-demo-4

 

Who, Why, and What:  the three W’s of the Duke Digital Collections Mini-Survey

My colleague Sean wrote two weeks ago about the efforts a group of us  in the library are making towards understanding the scholarly impacts of Duke Digital Collections.  In this post, I plan to continue the discussion with details about the survey we are conducting as well as share some initial results.

Surveying can be perilous work!
Surveying can be perilous work!

After reviewing the analytics and Google Scholar data Sean wrote about, our working group realized we needed more information.   Our goal in this entire assessment process has been to pull together scholarly use data which will inform our digitization decisions, priorities, technological choices (features on the digital collections platform), and to help us gain an understanding of if and how we are meeting the needs of researcher communities.    Analytics gave us clues, but we still didn’t some of the fundamental facts about our patrons.   After a fervent discussion with many whiteboard notes, the group decided creating a survey would get us more of the data we were looking for.  The resulting survey focuses on the elemental questions we have about our patrons:   who are they, why are they visiting Duke Digital Collections, and what are they going to do with what they find here.

 

The Survey

Creating the survey itself was no small task, but after an almost endless process of writing, rewriting, and consultations with our assessment coordinator we settled on 6 questions (a truely miniature survey).  We considered the first three questions (who, why, what) to be most important, and we intended the last three to provide us with additional information such as Duke affiliation and allow a space for general feedback.  None of the questions were considered “required” so respondents could answer or skip whatever they wanted; we also included space for respondents to write-in further details especially when choosing the “other” option.

Our survey in its completed form.
Our survey in its completed form.

The survey launched on April 30 and remains accessible by hovering over a “feedback” link on every single Digital Collection webpage.  Event tracking analytics show that 0.29% of the patrons that hover over our feedback link click through to the survey. An even smaller number have actually submitted responses.  This has worked out to 56 responses at an average rate of around 1 per day.  Despite that low click through rate, we have been really pleased with the number of responses we have had so far.  The response rate remains steady, and we have already learned a lot from even this small sample of visitor data.  We are not advertising the survey or promoting it, because our target respondents are patrons who find us in the course of their research or general Internet browsing.

Hovering over the help us box reveals expectations and instructions for survey participants.
Hovering over the help us box reveals expectations and instructions for survey participants.

Initial Results

Before I start discussing our results, please note that what I’m sharing here is based on initial responses and my own observations.  No one in digital collections has thoroughly reviewed or analyzed this data.  Additionally, this information is drawn from responses submitted between April 30 – July 8, 2015. We plan to keep the survey online into the academic year to see if our responses change when classes are in session.

With that disclaimer now behind us, let’s review results by question.

Questions 1 and 4:  Who are you?

Since we are concerned with scholarly oriented use more than other types in this exercise, the first question is intended to sort respondents primarily by academic status.   In question 4, respondents are given the chance to further categorized their academic affiliation.

Question 1 Answers # of Responses %
Student 14 25%
Educator 10 18%
Librarian, Archivist or Museum Staff 5 9%
Other 26 47%
55 100

Of the respondents who categorized themselves as “other” in question 1, 11 clarified their otherness by writing their identities in the space provided.  Of this 11, 4 associated themselves with music oriented professions or hobbies, and 2 with fine arts (photographer and filmmaker).  The remaining 5 could not be grouped easily into categories.

As a follow up later in the survey, question 4 asks respondents to categorize their academic affiliation (if they had one).  The results showed that 3 respondents are affiliated with Duke, 12  with other colleges or universities and 9 with a K-12 school.   Of the write-in responses, 3 listed names of universities abroad, and 1 listed a school whose level has not been identified.

Question 2:  Why are you here?

We can tell from our analytics how people get to us (if they were referred to us via a link or sought us out directly), but this information does not address why visitors come to the site.  Enter question 2.

Question 2 Answers # of Responses %
Academic research 15 28
Casual browsing 15 28
Followed a link 9 17%
Personal research 24 44%
Other 6 11%
54

The survey asks that those who select academic research, personal research, and other to write-in their research topic or purpose.  Academic research topics submitted so far primarily revolve around various historical research topics.  Personal research topics reflect a high interest in music (specific songs or types of music), advertising, and other various personal projects.  It is interesting to note that local history related topics have been submitted under all three categories (academic, personal and other).  Additionally,  non-academic researchers seem to be more willing to share sharing their specific topics; 19 of 24 respondents listed their topics as compared to 7 out of 15 academic researchers.

Question 3:  What will you do with the images and/or resources you find on this site?

To me, this question has the potential to provide some of the most illuminating information from our patrons. Knowing how they use the material helps us determine how to enhance access to the digitized objects and what kinds of technology we should be investing in.  This can also shed light on our digitization process itself.  For example, maybe the full text version of an item will provide more benefit to more researchers than an illustrated or hand-written version of the same item (of course we would prefer to offer both, but I think you see where I am going with this).

In designing this question, the group decided it would be valuable to offer options for the those who share items due to their visual or subject appeal (for example, the Pinterest user), the publication minded researcher, and a range of patron types in between.

 

Question 3 Answers # of Responses %
Use for an academic publication 3 6%
Share on social media 10 19%
Use them for homework 8 15%
Use them as a teaching tool in my classes 5 9%
Personal use 31 58%
Use for my job 2 4%
Other 10 19%
53

The 10 “other” respondents all entered subsequent details; they planned to share items with friends and family (in some way other than on social media), they also wanted to use the items they found as a reference, or were working on an academic pursuit that in their mind didn’t fit the listed categories.

Observations

As I said above, these survey results are cursory as we plan to leave the survey up for several more months.  But so far the data reveals that Duke Digital collections serves a wide audience of academic and non-academic users for a range of purposes. For example, one respondent uses the outdoor advertising collections to get a glimpse of how their community has changed over time. Another is concerned with US History in the 1930s, and another is focused on music from the 1900s.

The next phase of the the assessment group’s activities is to meet with researchers and instructors in person and talk with them about their experiences using digital collections (not just Duke’s) for scholarly research or instruction.  We have also been collecting examples of instructors who have used digital collections in their classes.  We plan to create a webpage with these examples with the goal of encouraging other instructors to do the same.  The goal of both of these efforts is to increase academic use of the digital collections (whether that be at the K-12 or collegiate level).

 

Just like this survey team, we stand at the ready, waiting for our chance to analyze and react to our data!

Of course, another next step is to keep collecting this survey data and analyze it further.  All in all, it has been truly exciting to see the results thus far.  As we study the data in more depth this Fall, we plan to work with the Duke University Library Digital Collections Advisory Team to implement any new technical or policy oriented decisions based on our conclusions.  Our minds are already spinning with the possibilities.

The Tao of the DAO: Embedding digital objects in finding aids

Over the last few months, we’ve been doing some behind-the-scenes re-engineering of “the way” we publish digital objects in finding aids (aka “collection guides”).  We made these changes in response to two main developments:

  • The transition to ArchivesSpace for managing description of archival collections and the production of finding aids
  • A growing need to handle new types, or classes, of digital objects in our finding aid interface (especially born-digital electronic records)

Background

While the majority of items found in Duke Digital Collections are published and accessible through our primary digital collections interface (codename Tripod), we have a growing number of digital objects that are published (and sometimes embedded) in finding aids.

Finding aids describe the contents of manuscript and archival collections, and in many cases, we’ve digitized all or portions of these collections.  Some collections may contain material that we acquired in digital form.  For a variety of reasons that I won’t describe here, we’ve decided that embedding digital objects directly in finding aids can be a suitable, often low-barrier alternative to publishing them in our primary digital collections platform.  You can read more on that decision here.

ahstephens_screenshot
Screenshot showing digital objects embedded in the Alexander H. Stephens Papers finding aid

 

EAD, ArchivesSpace, and the <dao>

At Duke, we’ve been creating finding aids in EAD (Encoded Archival Description) since the late 1990s.  Prior to implementing ArchivesSpace (June 2015) and its predecessor Archivists Toolkit (2012), we created EAD through some combination of an XML editor (NoteTab, Oxygen), Excel spreadsheets, custom scripts, templates, and macros.  Not surprisingly, the evolution of EAD authoring tools led to a good deal of inconsistent encoding across our EAD corpus.  These inconsistencies were particularly apparent when it came to information encoded in the <dao> element, the EAD element used to describe “digital archival objects” in a collection.

As part of our ArchivesSpace implementation plan, we decided to get better control over the <dao>–both its content and its structure.  We wrote some local best practice guidelines for formatting the data contained in the <dao> element and we wrote some scripts to normalize our existing data before migrating it to ArchivesSpace.

Classifying digital objects with the “use statement.”

In June 2015, we migrated all of our finding aids and other descriptive data to ArchivesSpace.  In total, we now have about 3400 finding aids (resource records) and over 9,000 associated digital objects described in ArchivesSpace.  Among these 9,000 digital objects, there are high-res master images, low-res use copies, audio files, video files, disk image files, and many other kinds of digital content.  Further, the digital files are stored in several different locations–some accessible to the public and some restricted to staff.

In order for our finding aid interface to display each type of digital object properly, we developed a classification system of sorts that 1) clearly identifies each class of digital object and 2) describes the desired display behavior for that type of object in our finding aid interface.

In ArchivesSpace, we store that information consistently in the ‘Use Statement’ field of each Digital Object record.  We’ve developed a core set of use statement values that we can easily maintain in a controlled value list in the ArchivesSpace application.  In turn, when ArchivesSpace generates or exports an EAD file for any given collection that contains digital objects, these use statement values are output in the DAO role attribute.  Actually, a minor bug in the ArchivesSpace application currently prevents the use statement information from appearing in the <dao>. I fixed this by customizing the ArchivesSpace EAD serializer in a local plugin.

file_version_aspace_example
Screenshot from ArchivesSpace showing digital object record, file version, and use statement

 

duke_dao_code
Snippet of EAD generated from ArchivesSpace showing <dao> encoding

 Every object its viewer/player

The values in the DAO role attribute tell our display interface how to render a digital object in the finding aid.  For example, when the display interface encounters a DAO with role=”video-streaming” it knows to queue up our embedded streaming video player.  We have custom viewers and players for audio, batches of image files, PDFs, and many other content types.

Here are links to some finding aids with different classes of embedded digital objects, each with its own associated use statement and viewer/player.

The curious case of electronic records

The last example above illustrates the curious case of electronic records.  The term “electronic records” can describe a wide range of materials but may include things like email archives, disk images, and other formats that are not immediately accessible on our website, but must be used by patrons in the reading room on a secure machine.  In these cases, we want to store information about these files in ArchivesSpace and provide a convenient way for patrons to request access to them in the finding aid interface.

Within the next few weeks, we plan to implement some improvements to the way we handle the description of and access to electronic records in finding aids.  Eventually, patrons will be able to view detailed information about the electronic records by hovering over a link in the finding aid.  Clicking on the link will automatically generate a request for those records in Aeon, the Rubenstein Library’s request management system.  Staff can then review and process those requests and, if necessary, prepare the electronic records for viewing on the reading room desktop.

Conclusion

While we continue to tweak our finding aid interface and learn our way around ArchivesSpace, we think we’ve developed a fairly sustainable and flexible way to publish digital objects in finding aids that both preserves the archival context of the items and provides an engaging user-experience for interacting with the objects.  As always, we’d love to hear how other libraries may have tackled this same problem.  Please share your comments or experiences with handling digital objects in finding aids!

[Credit to Lynn Holdzkom at UNC-Chapel Hill for coining the phrase “The Tao of the DAO”]

The Value of Metadata in Digital Collections Projects

Before you let your eyes glaze over at the thought of metadata, let me familiarize you with the term and its invaluable role in the creation of the library’s online Digital Collections.  Yes, metadata is a rather jargony word librarians and archivists find themselves using frequently in the digital age, but it’s not as complex as you may think.  In the most simplistic terms, the Society of American Archivists defines metadata as “data about data.”  Okay, what does that mean?  According to the good ol’ trusty Oxford English Dictionary, it is “data that describes and gives information about other data.”  In other words, if you have a digitized photographic image (data), you will also have words to describe the image (metadata).

Better yet, think of it this way.  If that image were of a large family gathering and grandma lovingly wrote the date and names of all the people on the backside, that is basic metadata.  Without that information those people and the image would suddenly have less meaning, especially if you have no clue who those faces are in that family photo.  It is the same with digital projects.  Without descriptive metadata, the items we digitize would hold less meaning and prove less valuable for researchers, or at least be less searchable.  The better and more thorough the metadata, the more it promotes discovery in search engines.  (Check out the metadata from this Cornett family photo from the William Gedney collection.)

The term metadata was first used in the late 1960s in computer programming language.  With the advent of computing technology and the overabundance of digital data, metadata became a key element to help describe and retrieve information in an automated way.  The use of the word metadata in literature over the last 45 years shows a steeper increase from 1995 to 2005, which makes sense.  The term became used more and more as technology grew more widespread.  This is reflected in the graph below from Google’s Ngram Viewer, which scours over 5 million Google Books to track the usage of words and phrases over time.

metadatangram_blog
Google Ngram Viewer for “metadata”

Because of its link with computer technology, metadata is widely used in a variety of fields that range from computer science to the music industry.  Even your music playlist is full of descriptive metadata that relates to each song, like the artist, album, song title, and length of audio recording.  So, libraries and archives are not alone in their reliance on metadata.  Generating metadata is an invaluable step in the process of preserving and documenting the library’s unique collections.  It is especially important here at the Digital Production Center (DPC) where the digitization of these collections happens.  To better understand exactly how important a role metadata plays in our job, let’s walk through the metadata life cycle of one of our digital projects, the Duke Chapel Recordings.

The Chapel Recordings project consists of digitizing over 1,000 cassette and VHS tapes of sermons and over 1,300 written sermons that were given at the Duke Chapel from the 1950s to 2000s.  These recordings and sermons will be added to the existing Duke Chapel Recordings collection online.  Funded by a grant from the Lilly Foundation, this digital collection will be a great asset to Duke’s Divinity School and those interested in hermeneutics worldwide.

Before the scanners and audio capture devices are even warmed up at the DPC, preliminary metadata is collected from the analog archival material.  Depending on the project, this metadata is created either by an outside collaborator or in-house at the DPC.  For example, the Duke Chronicle metadata is created in-house by pulling data from each issue, like the date, volume, and issue number.  I am currently working on compiling the pre-digitization metadata for the 1950s Chronicle, and the spreadsheet looks like this:

1950sChronicle_blog
1950s Duke Chronicle preliminary metadata

As for the Chapel Recordings project, the DPC received an inventory from the University Archives in the form of an Excel spreadsheet.  This inventory contained the preliminary metadata already generated for the collection, which is also used in Rubenstein Library‘s online collection guide.

inventorymetadata_blog
Chapel Recordings inventory metadata

The University Archives also supplied the DPC with an inventory of the sermon transcripts containing basic metadata compiled by a student.

inventorysermons_blog
Duke Chapel Records sermon metadata

Here at the DPC, we convert this preliminary metadata into a digitization guide, which is a fancy term for yet another Excel spreadsheet.  Each digital project receives its own digitization guide (we like to call them digguides) which keeps all the valuable information for each item in one place.  It acts as a central location for data entry, but also as a reference guide for the digitization process.  Depending on the format of the material being digitized (image, audio, video, etc.), the digitization guide will need different categories.  We then add these new categories as columns in the original inventory spreadsheet and it becomes a working document where we plug in our own metadata generated in the digitization process.   For the Chapel Recordings audio and video, the metadata created looks like this:

digitizationmetadata_blog
Chapel Recordings digitization metadata

Once we have digitized the items, we then run the recordings through several rounds of quality control.  This generates even more metadata which is, again, added to the digitization guide.  As the Chapel Recordings have not gone through quality control yet, here is a look at the quality control data for the 1980s Duke Chronicle:

qcmetadata_blog
1980s Duke Chronicle quality control metadata

Once the digitization and quality control is completed, the DPC then sends the digitization guide filled with metadata to the metadata archivist, Noah Huffman.  Noah then makes further adds, edits, and deletes to match the spreadsheet metadata fields to fields accepted by the management software, CONTENTdm.  During the process of ingesting all the content into the software, CONTENTdm links the digitized items to their corresponding metadata from the Excel spreadsheet.  This is in preparation for placing the material online. For even more metadata adventures, see Noah’s most recent Bitstreams post.

In the final stage of the process, the compiled metadata and digitized items are published online at our Digital Collections website.  You, the researcher, history fanatic, or Sunday browser, see the results of all this work on the page of each digital item online.  This metadata is what makes your search results productive, and if we’ve done our job right, the digitized items will be easily discovered.  The Chapel Recordings metadata looks like this once published online:

onlinemetadata_blog
Chapel Recordings metadata as viewed online

Further down the road, the Duke Divinity School wishes to enhance the current metadata to provide keyword searches within the Chapel Recordings audio and video.  This will allow researchers to jump to specific sections of the recordings and find the exact content they are looking for.  The additional metadata will greatly improve the user experience by making it easier to search within the content of the recordings, and will add value to the digital collection.

On this journey through the metadata life cycle, I hope you have been convinced that metadata is a key element in the digitization process.  From preliminary inventories, to digitization and quality control, to uploading the material online, metadata has a big job to do.  At each step, it forms the link between a digitized item and how we know what that item is.  The life cycle of metadata in our digital projects at the DPC is sometimes long and tiring.  But, each stage of the process  creates and utilizes the metadata in varied and important ways.  Ultimately, all this arduous work pays off when a researcher in our digital collections hits gold.

A Look Under the Hood—and the Flaps—of the Anatomical Fugitive Sheets Collection

We have digitized some fairly complex objects over the years that have challenged our Digital Collections team to push the boundaries of typical digital library solutions for digitization and publication. It happens often: objects we want to digitize are sort of like something we’ve done for a previous project, but not quite, so we can’t simply mimic whatever we did before to get the new project done. We’re frequently flexing our creative muscles.  In many cases, our most successful projects ended up that way because we didn’t concede to the temptation of representing items digitally in an oversimplified manner, or, worse still, as something they are not.

Working with so many rare and unique items from the Rubenstein Library through the years, we’ve become unfazed by these representation challenges and time and again have simply pulled together our team’s brainpower (and willpower) to make something work. Dare I say it, we’ve been unflappable. But this year, we met our match and surely needed some help.

In March, we published ten anatomical fugitive sheets from the 1500s to 1600s. They’re printed illustrations from the Rubenstein Library’s History of Medicine Collections, depicting the human body using layers of paper flaps that can be lifted to reveal internal organs. They’re amazing. They’re distinctive. And they’re really complicated.

Fugitive Sheet
Fugitive Sheet example, accessible online at http://library.duke.edu/digitalcollections/rubenstein_fgsms01003/ (Photo Credit: Les Todd)

The complexity of this project necessitated enlisting help from beyond the library’s walls. Early on, Prof. Mark Olson in Duke’s Art, Art History & Visual Studies department was instrumental in helping us identify modern technical approaches for capturing and modeling such objects. We contracted out development work through local web firm Cuberis, who programmed the bulk of the UI. In-house, we handled digitization, metadata, and integration with our discovery & access application with a lot of collaborative creativity between the digital collections team, the collection curator, conservators, and rare materials cataloger.

In a moment, I’ll discuss what modern technologies make the Fugitive Sheets interface hum. But first, here’s a look at what others have done with flap-based items.

Flaps in the Wind, Er… Wild

There are a few examples of anatomical flap objects represented on the Web, both at Duke and beyond. Common approaches include:

  1. A Sequence of Images. Capture one image of the full item for every state of the flaps possible, then let a user navigate them as if viewing a paginated document or photo sequence.
  2. Video. Either film someone lifting the flaps, or make an auto-playing video of the image sequence above.
  3. Flash. Develop a Flash application and put a SWF file on the web.

The third approach is actually what powers Duke’s Four Seasons project, which remains one of the best interactive historical anatomy interfaces available today. Developed way back in 2000 by Educational Media Services, Four Seasons began as a Java program distributed on CD-ROM (gasp!) and in subsequent years found a home as a Flash application embedded on the library website.

Flash-based flap interface for The Four Seasons, available at http://library.duke.edu/rubenstein/history-of-medicine/four-seasons
Flash-based flap interface for The Four Seasons, available at http://library.duke.edu/rubenstein/history-of-medicine/four-seasons

Flash has fallen out of favor over the last decade for many reasons, most notably: 1) it won’t work on iOS devices, 2) it’s bad for accessibility, 3) it’s invisible to search engines, and most importantly, 4) most of what Flash used to do exclusively can now be done just as well using HTML5.

Anatomy of a Modern Flap Interface

The Web has made giant leaps forward in the past five years due to advances in HTML, CSS, and Javascript and the evolution of web browsers. Key specs for HTML5 and CSS3 have been supported by all major browsers for several years now.  Below are the vital bits (so to speak) in use by the Anatomical Fugitive Sheets. Many of these things would not have worked (or worked well) on the Web five years ago.

HTML5 Parts

1. SVG (scalable vector graphics). An <svg> element in HTML contains shape data for each flap using a coordinates system. The <path> holds a string with line instructions using shorthand (M, L, c, etc.) for tracing the contour: MoveTo, Lineto, Curveto, Arcto. We duplicate the <path> with a transform attribute to render the shape of the back of the flap.

SVG for flap
SVG coordinates in a <path> element representing the back of a flap.

2. Cross-window messaging API. Each fugitive sheet is rendered within an <iframe> on a page and the clickable layer navigation lives in its parent page, so they’re essentially two separate web pages presented as if one. Having a click in one page do something in another is possible through the Javascript method postMessage, part of the HTML5 spec.

  • From parent page to iframe: frame.contentWindow.postMessage(message, '*');
  • From iframe to parent page: window.top.postMessage(message, '*');

CSS3 Parts

  1. transition Property. Here’s where the flap animation action happens.  The flap elements all have the style declaration transition:1s ease-in-out. That ensures that when a flap property like height changes, it animates over the course of one second, slower at the start and end and quicker in the middle.  Clicking to open a flap calls a Javascript function that simultaneously switches the height of the flap front to zero and the back to its full size.
  2. transform Property. This scales down the figure and all its interactive components for display in the iframe, e.g., body.framed .flip-up-wrapper { transform:scale(.5) }; This scaling doesn’t apply in the full-size and zoomed-in views and thus enables the flaps to work identically at full- or half-resolution.

Capture & Encoding

Capture

Because the fugitive sheets are large and extremely fragile, our Digital Production Center staff and conservators worked carefully together to untangle and prop open each flap to be photographed separately. It often required two or more people to steady and flatten the flaps while being careful not to cast shadows on the layer being shot. I wasn’t there, but in my mind I imagine a game of library Twister.

Staff captured images using an overhead reproduction camera using white paper below each flap to make it easier to later determine and crop the contours. Unlike most images we digitize, the flaps’ derivative images are stored and delivered in PNG format to preserve transparency.

Encoding

As we do for all digital collections, we encode in an XML document the structural, administrative, and descriptive data about the digital objects using accepted library standards so that 1) the data can be preserved and ported between applications, and 2) we can use it to power our discovery & access interface. We use METS, a flexible Library of Congress standard for describing all kinds of digital objects.

METS worked pretty well for representing the flap data (see example), and we tapped into a few parts of the standard that we’ve never or rarely used for other items. Specifically, we:

  • added the LC MIX namespace for technical image metadata
  • used an amdSec to store flap heights & widths
  • used file/@GROUPID to divide flap images between figure 1, figure 2, etc.
  • used fptr/area/@COORDS to hold the SVG path coordinates for each flap

The descriptive metadata for the fugitive sheets posed its own challenges outside the box for our usual projects. All the information about the sheets existed as MARC catalog records, and crosswalking from MARC to anything else is more of an art than a science.

Looking Ahead

We’ll try to build on the accomplishments from the Fugitive Sheets Collection as we tackle new complex digitization projects. The History of Medicine Collections in particular are brimming with items that will be far more challenging than these sheets to model, like paginated flap books with fold-out pages and flaps that open in different directions. Undaunted, we’ll keep flapping our wings to stay aloft.

You’re going to lose: The inherent complexity, and near impossibility, of developing for digital collections

 

“Nobody likes you. Everybody hates you. You’re going to lose. Smile, you f*#~.”

Joe Hallenbeck, The Last Boy Scout

Screen Shot 2015-04-01 at 12.17.56 PMWhile I’m glad not to be living in a Tony Scott movie, on occasion I feel like Bruce Willis’ character near the beginning of “The Last Boy Scout.” Just look at some of the things they say about us.

Current online interfaces to primary source materials do not fully meet the needs of even experienced researchers. (DeRidder and Matheny)

The criticism, it cuts deep. But at least they were trying to be gentle, unlike this author:

[I]n use, more often than not, digital library users and digital libraries are in an adversarial position. (Saracevic, p. 9)

That’s gonna leave a mark. Still, it’s the little shots they take, the sidelong jabs, that hurt the most:

The anxiety over “missing something” was quite common across interviews, and historians often attributed this to the lack of comprehensive search tools for primary sources. (Rumer and Schonfeld, p. 16)

Screen Shot 2015-04-03 at 10.57.02 AM
Item types in Tripod2.

I’m fond of saying that the youtube developers have it easy. They support one content type – and until recently, it was Flash, for pete’s sake – minimal metadata, and then what? Comments? Links to some other videos? Wow, that’s complicated.

By contrast, we’ve developed for no less than fifteen different item types during the life of Tripod2, the platform that we’ve used to provide discovery and access for Duke Digital Collections since March 2011. You want a challenge? Try building an interface for flippable anatomical fugitive sheets.  It’s one thing to create a feature allowing users to embed videos from a flat web-site structure; it’s quite another to allow it from a site loaded with heterogeneous content types, then extend it to include items nested within multiple levels of description in finding aids (for an example, see the “Southwest Georgia Voters Project” item here).

I think the problem set of developing tools for digitized primary sources is one of the most interesting areas in the field of librarianship, and for the digital collections team, it’s one of our favorite areas of work. However, the quotes that open this post (the ones not delivered by Bruce Willis, anyway) are part of a literature that finds significant disparity between the needs of the researchers who form our primary audience and the tools that we – collectively speaking, in the field of digital libraries – have built.

Our team has just begun work on our next-generation platform for digital collections, which we call Tripod3. It will be built on the Fedora/Hydra framework that our Digital Repository Services team is using to develop the Duke Digital Repository. As the project manager, I’m trying to catch up on the recent literature of assessment for digital collections, and consider how we can improve on what we’ve done in the past. It’s one of the main ways  we can engage with researchers, as I wrote about in a previous post.

One of the issues we need to address is the problem of archival context. It’s something that the users of digitized primary sources cite again and again in the studies I’ve read. It manifests itself in a few ways, and could be the subject of a lengthier piece, but I think Chassanoff gives a good sense of it in her study (pp. 470-1):

Overall, findings suggest that historians seem to feel most comfortable using digitized sources when an online environment replicates essential attributes found in archives. Materials should be obtained from a reputable repository, and the online finding aid should provide detailed description. Historians want to be able to access the entire collection online and obtain any needed information about an item’s provenance. Indeed, the possibility that certain materials are omitted from an online collection appears to be more of a concern than it is in person at an archives.

The idea of archival context poses what I think is the central design problem of digital collections. It’s a particular challenge because, while it’s clear that researchers want and require the ability to see an object in its archival context, they also don’t want it. By which I mean, they also want to be able to find everything in the same flat context that everything assumes with a retrieval service like Google.

Archival context implies hierarchy, using the arrangement of the physical materials to order the digital. We were supposed to have broken away from the tyranny of physical arrangement years ago. David Weinberger’s Everything is Miscellaneous trumpeted this change in 2007, and while we had already internalized what he called the “third order of order” by then, it is the unambiguous way of the world now.

With our Tripod2 platform, we built both a shallow “digital collections miscellany” interface at http://library.duke.edu/digitalcollections/, but later started embedding items directly in finding aids.  Examples of the latter include the Jazz Loft Project Records and the Alexander Stephens Papers. What we never did was integrate these two modes of publication for digitized primary sources. Items from finding aids do not appear in search results for the main digital collections site, and items on the main site do not generally link back to the finding aid for their parent collection, and not to the series in which they’re arranged.

While I might give us a passing grade for the subject of “Providing archival context,” it wouldn’t be high enough to get us into, say, Duke. I expect this problem to be at the center of our work on the next-generation platform.


Sources

 

Alexandra Chassanoff, “Historians and the Use of Primary Materials in the Digital Age,” The American Archivist 76, no. 2, 458-480.

Jody L. DeRidder and Kathryn G. Matheny, “What Do Researchers Need? Feedback On Use of Online Primary Source Materials,” D-Lib Magazine 20, no. 7/8, available at http://www.dlib.org/dlib/july14/deridder/07deridder.html

Jennifer Rumer and Roger C. Schonfeld, “Supporting the Changing Research Practices of Historians: Final Report from ITHAKA S+R,” (2012), http://www.sr.ithaka.org/sites/default/files /reports/supporting-the-changing-research-practices-of-historians.pdf.

Tefko Saracevic, “How Were Digital Libraries Evaluated?”, paper first presented at the DELOS WP7 Workshop on the Evaluation of Digital Libraries (2004), available at http://www.scils.rutgers. edu/~tefko/DL_evaluation_LIDA.pdf

Building a Kiosk for the Edge

Many months ago I learned that a new space, The Ruppert Commons for Research, Technology, and Collaboration, was going to be opening at the start of the calendar year. I was tasked with building an informational kiosk that would be seated in the entry area of the space. The schedule was a bit hectic and we ended up pruning some of the desired features, but in the end I think our first iteration has been working well. So, I wanted to share the steps I took to build it.

Setting Requirements

I first met with the Edge team at the end of August 2014. They had an initial ‘wish list’ of features that they wanted to be included in the kiosk. We went through the list and talked about the feasibility of those items, and tried to rank their importance. Our final features list looked something like this:

Primary Features:

  • Events list (both public and private events in the space)
  • Room reservation system
  • Interactive floor plan map
  • Staff lookup
  • Current Time
  • Contact information (chat, email, phone)

Secondary Features:

  • Display of computer availability
  • Ability to report printing / scanning problems
  • Book locations
  • Scheduleable content on ‘home’ screen

Our deadline was the soft opening date of the space at the start of the new year, but with the approaching holidays (and other projects competing for time) this was going to be a pretty fast turn around. My goal was to have a functional prototype ready for feedback by mid October. I really didn’t start working on the UI side of things until early that month, so I ended up needing to kick that can down the road a few weeks, but that happens some times.

The Hardware

The Library had purchased two Dell 27″ XPS all-in-one touchscreen machines for the purpose of serving as an informational kiosk near the new/temporary main entrance of Perkins/Bostock. For various reasons, that project kept getting postponed. But with the desire to also have a kiosk in the Edge, we decided we could use one of the Dell machines for this purpose. The touch screen display is great —  very bright, reasonably accurate color reproduction, and responsive to touch inputs. It does pickup a lot of finger prints, but that’s sort of unavoidable with a glossy display. The machine seems to run a little bit hot and the fan is far from silent, but in the space you don’t notice it at all. My favorite aspect of this computer is the stand. It’s really fantastic — it’s super easy to adjust, but also very sturdy. You can position it in a variety of ways, depending on the space you’re using it in, and be confident that it won’t slip out of adjustment even under constant use. Various positions of Dell computer I think in general we’re a little wary of using consumer grade hardware in a 24/7 public environment, but for the 1.5 months it’s been deployed it seems to be holding up well enough.

The OS

The Dell XPS came from the factory with Windows 8. I was really curious about using Assigned Access Mode in the Windows 8.1, but the need to use a local (non-domain) account necessitated a clean install of 8.1, which sounds annoying, but that process is so fast and effortless, at least compared to days of Windows yore, that it wasn’t a huge deal. I eventually configured the system as desired — it auto-boots into the local account on startup and then fires up the assigned Windows app (and limits the machine only to that app).

I spent some time playing around with different approaches for a browser to use with assigned access. The goal was to have a browser that ran in a ‘kiosk’ mode in that there was no ability for the user to interact with anything outside of the intended kiosk UI — meaning, no browser chrome windows, bookmarks, etc. I also planned to use Microsoft’s Family Safety controls to limit access to URLs outside of the range of pages that would comprise the kiosk UI. I tried both Google Chrome and Microsoft IE 11 (which really is a good browser, despite pervasive IE hate), but I ended up having trouble with both of them in different ways. Eventually, I stumbled on to a free Windows Store app called KIOSK SP Browser. It does exactly what I want — it’s a simple, stripped down, full screen browser app. It also has some specific kiosk features (like timeout detection) but I’m only using it to load the kiosk homepage on startup.

The Backend

As several of the requirements necessitated data sources that live in the Drupal system that drives our main library site, I figured the path of least resistance would be to also build the kiosk interface in Drupal. Using the Delta module, I setup a version of our theme that stripped out most of the elements that we wouldn’t be using (header, footer, etc.) for the kiosk. I could then apply the delta to a small range of pages using the Context Module. The pages themselves are quite simple by and large. Screen shots of the pages in the Edge Kiosk

  • Events — I used a View to import an RSS feed from Yahoo Pipes (which combines events from our own Library system and the larger Duke system).
  • Reserve Spaces – this page loads in content from Springshare’s LibCal system using an iFrame.
  • Map — I drew a simplified map in Illustrator based architect’s floor plan , then saved it out as an SVG and added ID tags to the areas I wanted to make interactive.
  • Staff — this page loads in content from a google spreadsheet using a technique I outlined previously on Bitstreams.
  • Help — this page loads our LibraryH3LP Chat Widget and a Qualtrics email form.

The Frontend

When it comes time to design an interface, my first step is almost always to sketch on paper. For this project, I did some playing around and ended up settling on a circular motif for the main navigational interface. I based the color scheme and typography on a branding and style guide that was developed for the Edge. Edge Kiosk home page design Many years ago I used to turn my sketches into high fidelity mockups in photoshop or illustrator, but for the past couple of years I’ve tended to just dive right in and design on the fly with html/css. I created a special stylesheet just for this kiosk — it’s based on a fixed pixel layout as it is only ever intended to be used on that single Dell computer — and also assigned it to load using Delta. One important aspect of a kiosk is providing some hinting to users that they can indeed interact with it. In my experience, this is usually handled in the form of an attract loop.

I created a very simple motion design using my favorite NLE and rendered out an mp4 to use with the kiosk. I then setup the home page to show the video when it first loads and to hide it when the screen is touched. This helps the actual home page content appear to load very quickly (as it’s actually sitting beneath the video). I also included a script on every page to go to the homepage after a preset period on inactivity. It’s currently set to three minutes, but we may tweak that. Video stills of attract loop All in all I’m pleased with how things turned out. We’re planning to spend some time evaluating the usage of the kiosk over the next couple of months and then make any necessary tweaks to improve user experience. Swing by the Edge some time and try it out!

Indiana Jones and The Greek Manuscripts

One of my favorite movies as a youngster was Steven Spielberg’s “Raiders of the Lost Ark.” It’s non-stop action as the adventurous Indiana Jones criss-crosses the globe in an exciting yet dangerous race against the Nazis for possession of the Ark of the Covenant. According to the Book of Exodus, the Ark is a golden chest which contains the original stone tablets on which the Ten Commandments are inscribed, the moral foundation for both Judiasm and Christianity. The Ark is so powerful that it single-handedly destroys the Nazis and then turns Steven Spielberg and Harrison Ford into billionaires. Countless sequels, TV shows, theme-park rides and merchandise follow.

emsgk010940010
Greek manuscript 94, binding consists of heavily decorated repoussé silver over leather.

Fast-forward several decades, and I am asked to digitize Duke Libraries’ Kenneth Willis Clark Collection of Greek Manuscripts. Although not quite as old as the Ten Commandments, this is an amazing collection of biblical texts dating all the way back to the 9th century. These are weighty volumes, hand-written using ancient inks, often on animal-skin parchment. The bindings are characterized as Byzantine, and often covered in leathers like goatskin, sometimes with additional metal ornamentation. Although I have not had to run from giant boulders, or navigate a pit of snakes, I do feel a bit like Indiana Jones when holding one of these rare, ancient texts in my hands. I’m sure one of these books must house a secret code that can bestow fame and fortune, in addition to the obvious eternal salvation.

Before digitization, Senior Conservator Erin Hammeke evaluates the condition of each Greek manuscript, and rules out any that are deemed too fragile to digitize. Some are considered sturdy enough, but still need repairs, so Erin makes the necessary fixes. Once a manuscript is given the green light for digitization, I carefully place it in our book cradle so that it cannot be opened beyond a 90-degree angle. This helps protect our fragile bound materials from unnecessary stress on the binding. Next, the aperture, exposure, and focus are carefully adjusted on our Phase One P65+ digital camera so that the numerical values of our X-rite color calibration target, placed on top of the manuscript, match the numerical readings shown on our calibrated monitors.

cradle
Greek manuscript 101, with X-Rite color calibration target, secured in book cradle.

As the photography begins, each page of the manuscript is carefully turned by hand, so that a new image can be made of the following page. This is a tedious process, but requires careful concentration so the pages are consistently captured throughout each volume. Right-hand (recto) pages are captured first, in succession. Then the volume is turned over, so that the left-hand (verso) pages can be captured. I can’t read Greek, but it’s fascinating to see the beauty of the calligraphy, and view the occasional illustrations that appear on some pages. Sometimes, I discover that moths, beetles or termites have bored through the pages over time. It’s interesting to speculate as to which century this invasive destruction may have occurred. Perhaps the Nazis from the Indiana Jones movies traveled back in time, and placed the insects there?

worm2
Greek manuscript 101, showing insect damage.

Once the photography is complete, the recto and verso images are processed and then interleaved to recreate the left-right page order of the original manuscript. Next, the images go through a quality-control process in which any extraneous background area is cropped out, and each page is checked for clarity and consistent color and illumination. After that, another round of quality control insures that no pages are missing, or out of order. Finally, the images are converted to Pyramid TIFF files, which allow our web site users to zoom out and see all the pages at once, or zoom in to see maximum detail of any selected page. 38 Greek manuscripts are ready for online viewing now, and many more are coming soon. Stay tuned for the exciting sequel: “Indiana Jones and Even More Greek Manuscripts.”

Embeds, Math & Beyond

This week, in conjunction with our H. Lee Waters Film Collection unveiling, we rolled out a handy new Embed feature for digital collections items.  The idea is to make it as easy as possible for someone to share their discoveries from our collections, with proper attribution, on other websites or blogs.

How To

It’s simple, really, and mimics the experience you’re likely to encounter getting embed code from other popular sites with videos, images, and the like. We modeled our approach loosely on the Internet Archive‘s video embed service (e.g., visit this video and click the Share icon, but only if you are unafraid of clowns).

Embed Link

Click the “Embed” link under an item from Duke Digital Collections, and copy the snippet of code that pops up. Paste it in your website, and you’re done!

Examples

I’ll paste a few examples below using different kinds of items. The embed code is short and nearly identical for all of these:

A Single Image

Paginated Item

A Video

Single-Track Audio

Multi-Track Audio

Document with Document Viewer

Technical Considerations

Building this feature required a little bit of math, some trial & error, and a few tricks. The steps were to:

  • Set up a service to return customized item pages at the path http://library.duke.edu/digitalcollections/embed/<itemid>/
  • Use CSS & JS to make the media as fluid as possible to fill whatever space it ends up in
  • Use a fixed height and overflow: auto on the attribution box so longer content will scroll
  • Use link rel=”canonical” to ensure the item’s embed page is associated with the real item page (especially to improve links / ranking signals for search engines).
  • Present the user a copyable HTML <iframe> element in the regular item page that has the correct height & width attributes to accommodate the item(s) to be embedded

This last point is where the math comes in. Take a single image item, for example. With a landscape-orientation image we need to give the user a different <iframe> height to copy than we would for a portrait. It gets even more complicated when we have to account for multiple tracks of audio or video, or combinations of the two.

Coming Soon

We’ll refine this feature a bit in the coming weeks, and work out any embed-bugs we discover. We’ll also be developing a similar feature for embedding digitized content found in our archival collection guides.