Category Archives: Behind the Scenes

Digital Tools for Civil Rights History

The One Person, One Vote Project is trying to do history a different way. Fifty years ago, young activists in the Student Nonviolent Coordinating Committee broke open the segregationist south with the help of local leaders. Despite rerouting the trajectories of history, historical actors rarely get to have a say in how their stories are told. Duke and the SNCC Legacy Project are changing that. The documentary website we’re building (One Person, One Vote: The Legacy of SNCC and the Struggle for Voting Right) puts SNCC veterans at the center of narrating their history.

SNCC field secretary and Editorial Board member Charlie Cobb.
SNCC field secretary and Editorial Board member Charlie Cobb. Courtesy of

So how does that make the story we tell different? First and foremost, civil rights becomes about grassroots organizing and the hundreds of local individuals who built the movement from the bottom up. Our SNCC partners want to tell a story driven by the whys and hows of history. How did their experiences organizing in southwest Mississippi shape SNCC strategies in southwest Georgia and the Mississippi Delta? Why did SNCC turn to parallel politics in organizing the Mississippi Freedom Democratic Party? How did ideas drive the decisions they made and the actions they took?

For the One Person, One Vote site, we’ve been searching for tools that can help us tell this story of ideas, one focused on why SNCC turned to grassroots mobilization and how they organized. In a world where new tools for data visualization, mapping, and digital humanities appear each month, we’ve had plenty of possibilities to choose from. The tools we’ve gravitated towards have some common traits; they all let us tell multi-layered narratives and bring them to life with video clips, photographs, documents, and music. Here are a couple we’ve found:

This StoryMap traces how the idea of Manifest Destiny progressed through the years and across the geography of the United States.
This StoryMap traces how the idea of Manifest Destiny progressed through the years and across the geography of the U.S.

StoryMap: Knightlab’s StoryMap tool is great for telling stories. But better yet, StoryMap lets us illustrate how stories unfold over time and space. Each slide in a StoryMap is grounded with a date and a place. Within the slides, creators can embed videos and images and explain the significance of a particular place with text. Unlike other mapping tools, StoryMaps progress linearly; one slide follows another in a sequence, and viewers click through a particular path. In terms of SNCC, StoryMaps give us the opportunity to trace how SNCC formed out of the Greensboro sit-ins, adopted a strategy of jail-no-bail in Rock Hill, SC, picked up the Freedom Rides down to Jackson, Mississippi, and then started organizing its first voter registration campaign in McComb, Mississippi.

Timeline.JS: We wanted timelines in the One Person, One Vote site to trace significant events in SNCC’s history but also to illustrate how SNCC’s experiences on the ground transformed their thinking, organizing, and acting. Timeline.JS, another Knightlab tool, provides the flexibility to tell overlapping stories in clean, understandable manner. Markers in Timeline.JS let us embed videos, maps, and photos, cite where they come from, and explain their significance. Different tracks on the timeline  give us the option of categorizing events into geographic regions, modes of organizing, or evolving ideas.

The history of Duke University as displayed by Timeline.JS.
The history of Duke University as displayed by Timeline.JS.

DH Press: Many of the mapping tools we checked out relied on number-heavy data sets, for example those comparing how many robberies took place on the corners of different city blocks. Data sets for One Person, One Vote come mostly in the form of people, places, and stories. We needed a tool that let us bring together events and relevant multimedia material and primary sources and represent them on a map. After checking out a variety of mapping tools, we found that DH Press served many of our needs.

DH Press project representing buildings and uses in Durham's Hayti neighborhood.
DH Press project representing buildings and uses in Durham’s Hayti neighborhood.

Coming out of the University of North Carolina – Chapel Hill’s Digital Innovation Lab, DH Press is a WordPress plugin designed specifically with digital humanities projects in mind. While numerous tools can plot events on a map, DH Press markers provide depth. We can embed the video of an oral history interview and have a transcript running simultaneously as it plays. A marker might include a detailed story about an event, and chronicle all of the people who were there. Additionally, we can customize the map legends to generate different spatial representations of our data.

Example of a marker in DH Press. Markers can be customized to include a range of information about a particular place or event.
Example of a marker in DH Press. Markers can be customized to include a range of information about a particular place or event.


These are some of the digital tools we’ve found that let us tell civil rights history through stories and ideas. And the search continues on.

Analog to Digital to Analog: Impact of digital collections on permission-to-publish requests

We’ve written many posts on this blog that describe (in detail) how we build our digital collections at Duke, how we describe them, and how we make them accessible to researchers.

At a Rubenstein Library staff meeting this morning one of my colleagues–Sarah Carrier–gave an interesting report on how some of our researchers are actually using our digital collections. Sarah’s report focused specifically on permission-to-publish requests, that is, cases where researchers requested permission from the library to publish reproductions of materials in our collection in scholarly monographs, journal articles, exhibits, websites, documentaries, and any number of other creative works. To be clear, Sarah examined all of these requests, not just those involving digital collections. Below is a chart showing the distribution of the types of publication uses.

Types of permission-to-publish requests, FY2013-2014
Types of permission-to-publish requests, FY2013-2014

What I found especially interesting about Sarah’s report, though, is that nearly 76% of permission-to-publish requests did involve materials from the Rubenstein that have been digitized and are available in Duke Digital Collections. The chart below shows the Rubenstein collections that generate the highest percentage of requests. Notice that three of these in Duke Digital Collections were responsible for 40% of all permission-to-publish requests:

Collections generating the most permission-to-publish requests, FY2013-2014
Collections generating the most permission-to-publish requests, FY2013-2014

So, even though we’ve only digitized a small fraction of the Rubenstein’s holdings (probably less than 1%), it is this 1% that generates the overwhelming majority of permission-to-publish requests.

I find this stat both encouraging and discouraging at the same time. On one hand, it’s great to see that folks are finding our digital collections and using them in their publications or other creative output. On the other hand, it’s frightening to think that the remainder of our amazing but yet-to-be digitized collections are rarely if ever used in publications, exhibits, and websites.

I’m not suggesting that researchers aren’t using un-digitized materials. They certainly are, in record numbers. More patrons are visiting our reading room than ever before. So how do we explain these numbers? Perhaps research and publication are really two separate processes. Imagine you’ve just written a 400 page monograph on the evolution of popular song in America, you probably just want to sit down at your computer, fire up your web browser, and do a Google Image Search for “historic sheet music” to find some cool images to illustrate your book. Maybe I’m wrong, but if I’m not, we’ve got you covered. After it’s published, send us a hard copy. We’ll add it to the collection and maybe we’ll even digitize it someday.

[Data analysis and charts provided by Sarah Carrier – thanks Sarah!]

Large-Scale Digitization and Lessons from the CCC Project

Back in February 2014, we wrapped up the CCC project, a collaborative three year IMLS-funded digitization initiative with our partners in the Triangle Research Libraries Network (TRLN). The full title of the project is a mouthful, but it captures its essence: “Content, Context, and Capacity: A Collaborative Large-Scale Digitization Project on the Long Civil Rights Movement in North Carolina.”

Together, the four university libraries (Duke, NC State, UNC-Chapel Hill, NC Central) digitized over 360,000 documents from thirty-eight collections of manuscripts relevant to the project theme. About 66,000 were from our David M. Rubenstein Rare Book & Manuscript Library collections.


So how large is “large-scale”? By comparison, when the project kicked off in summer 2011, we had a grand total of 57,000 digitized objects available online (“published”), collectively accumulated through sixteen years of digitization projects. That number was 69,000 by the time we began publishing CCC manuscripts in June 2012. Putting just as many documents online in three years as we’d been able to do in the previous sixteen naturally requires a much different approach to creating digital collections.

Traditional Digitization Large-Scale Digitization
Individual items identified during scanning No item-level identification: entire folders scanned
Descriptive metadata applied to each item Archival description only (e.g., at the folder level)
Robust portals for search & browse Finding aid / collection guide as access point

There are some considerable tradeoffs between document availability vs. discovery and access features, but going at this scale speeds publication considerably. Large-scale digitization was new for all four partners, so we benefited by working together.

Digitized documents accessed through an archival finding aid / collection guide with folder-level description.

Project Evaluation

CCC staff completed qualitative and quantitative evaluations of this large-scale digitization approach during the course of the project, ranging from conducting user focus groups and surveys to analyzing the impact on materials prep time and image quality control. Researcher assessments targeted three distinct user groups: 1) Faculty & History Scholars; 2) Undergraduate Students (in research courses at UNC & NC State); 3) NC Secondary Educators.

Here are some of the more interesting findings (consult the full reports for details):

  • Ease of Use. Faculty and scholars, for the most part, found it easy to use digitized content presented this way. Undergraduates were more ambivalent, and secondary educators had the most difficulty.
  • To Embed or Not to Embed. In 2012, Duke was the only library presenting the image thumbnails embedded directly within finding aids and a lightbox-style image navigator. Undergrads who used Duke’s interface found it easier to use than UNC or NC Central’s, and Duke’s collections had a higher rate of images viewed per folder than the other partners. UNC & NC Central’s interfaces now use a similar convention.
  • Potential for Use. Most users surveyed said they could indeed imagine themselves using digitized collections presented in this way in the course of their research. However, the approach falls short in meeting key needs for secondary educators’ use of primary sources in their classes.
  • Desired Enhancements. The top two most desired features by faculty/scholars and undergrads alike were 1) the ability to search the text of the documents (OCR), and 2) the ability to explore by topic, date, document type (i.e., things enabled by item-level metadata). PDF download was also a popular pick.


Impact on Duke Digitization Projects

Since the moment we began putting our CCC manuscripts online (June 2012), we’ve completed the eight CCC collections using this large-scale strategy, and an additional eight manuscript collections outside of CCC using the same approach. We have now cumulatively put more digital objects online using the large-scale method (96,000) than we have via traditional means (75,000). But in that time, we have also completed eleven digitization projects with traditional item-level identification and description.

We see the large-scale model for digitization as complementary to our existing practices: a technique we can use to meet the publication needs of some projects.


Do people actually use the collections when presented in this way? Some interesting figures:

  • Views / item in 2013-14 (traditional digital object; item-level description): 13.2
  • Views / item in 2013-14 (digitized image within finding aid; folder-level description): 1.0
  • Views / folder in 2013-14 (digitized folder view in finding aid): 8.5

It’s hard to attribute the usage disparity entirely to the publication method (they’re different collections, for one). But it’s reasonable to deduce (and unsurprising) that bypassing item-level description generally results in less traffic per item.

On the other hand, one of our CCC collections (The Allen Building Takeover Collection) has indeed seen heavy use–so much, in fact, that nearly 90% of TRLN’s CCC items viewed in the final six months of the project were from Duke. Its images averaged over 78 views apiece in the past year; its eighteen folders opened 363 times apiece. Why? The publication of this collection coincided with an on-campus exhibit. And it was incorporated into multiple courses at Duke for assignments to write using primary sources.

The takeaway is, sometimes having interesting, important, and timely content available for use online is more important than the features enabled or the process by which it all gets there.

Looking Ahead

We’ll keep pushing ahead with evolving our practices for putting digitized materials online. We’ve introduced many recent enhancements, like fulltext searching, a document viewer, and embedded HTML5 video. Inspired by the CCC project, we’ll continue to enhance our finding aids to provide access to digitized objects inline for context (e.g., The Jazz Loft Project Records). Our TRLN partners have also made excellent upgrades to the interfaces to their CCC collections (e.g., at UNC, at NC State) and we plan, as usual, to learn from them as we go.

On the Reels: Challenges in Digitizing Open Reel Audio Tape

The audio tapes in the recently acquired Radio Haiti collection posed a number of digitization challenges.  Some of these were discussed in this video produced by Duke’s Rubenstein Library:

In this post, I will use a short audio clip from the collection to illustrate some of the issues that we face in working with this particular type of analog media.

First, I present the raw digitized audio, taken from a tape labelled “Tambour Vaudou”:


As you can hear, there are a number of confusing and disorienting things going on there.  I’ll attempt to break these down into a series of discrete issues that we can diagnose and fix if necessary.

Tape Speed

Analog tape machines typically offer more than one speed for recording, meaning that you can change the rate at which the reels turn and the tape moves across the record or playback head.  The faster the speed, the higher the fidelity of the result.  On the other hand, faster speeds use more tape (which is expensive).  Tape speed is measured in “ips” (inches per second).  The tapes we work with were usually recorded at speeds of 3.75 or 7.5 ips, and our playback deck is set up to handle either of these.  We preview each tape before digitizing to determine what the proper setting is.

In the audio example above, you can hear that the tape speed was changed at around 10 seconds into the recording.  This accounts for the “spawn of Satan” voice you hear at the beginning.  Shifting the speed in the opposite direction would have resulted in a “chipmunk voice” effect.  This issue is usually easy to detect by ear.  The solution in this case would be to digitize the first 10 seconds at the faster speed (7.5 ips), and then switch back to the slower playback speed (3.75 ips) for the remainder of the tape.

The Otari MX-5050 tape machine
The Otari MX-5050 tape machine

Volume Level and Background Noise

The tapes we work with come from many sources and locations and were recorded on a variety of equipment by people with varying levels of technical knowledge.  As a result, the audio can be all over the place in terms of fidelity and volume.  In the audio example above, the volume jumps dramatically when the drums come in at around 00:10.  Then you hear that the person making the recording gradually brings the level down before raising it again slightly.  There are similar fluctuations in volume level throughout the audio clip.  Because we are digitizing for archival preservation, we don’t attempt to make any changes to smooth out the sometimes jarring volume discrepancies across the course of a tape.  We simply find the loudest part of the content, and use that to set our levels for capture.  The goal is to get as much signal as possible to our audio interface (which converts the analog signal to digital information that can be read by software) without overloading it.  This requires previewing the tape, monitoring the input volume in our audio software, and adjusting accordingly.

This recording happens to be fairly clean in terms of background noise, which is often not the case.  Many of the oral histories that we work with were recorded in noisy public spaces or in homes with appliances running, people talking in the background, or the subject not in close enough proximity to the microphone.  As a result, the content can be obscured by noise.  Unfortunately there is little that can be done about this since the problem is in the recording itself, not the playback.  There are a number of hum, hiss, and noise removal tools for digital audio on the market, but we typically don’t use these on our archival files.  As mentioned above, we try to capture the source material as faithfully as possible, warts and all.  After each transfer, we clean the tape heads and all other surfaces that the tape touches with a Q-tip and denatured alcohol.  This ensures that we’re not introducing additional noise or signal loss on our end.



While cleaning the Radio Haiti tapes (as detailed in the video above), we discovered that many of the tapes were comprised of multiple sections of tape spliced together.  A splice is simply a place where two different pieces of audio tape are connected by a piece of sticky tape (much like the familiar Scotch tape that you find in any office).  This may be done to edit together various content into a seamless whole, or to repair damaged tape.  Unfortunately, the sticky tape used for splicing dries out over time, becomes brittle, and loses it’s adhesive qualities.  In the course of cleaning and digitizing the Radio Haiti tapes, many of these splices came undone and had to be repaired before our transfers could be completed.

Tape ready for splicing
Tape ready for splicing

Our playback deck includes a handy splicing block that holds the tape in the correct position for this delicate operation.  First I use a razor blade to clean up any rough edges on both ends of the tape and cut it to the proper 45 degree angle.  The splicing block includes a groove that helps to make a clean and accurate cut.  Then I move the two pieces of tape end to end, so that they are just touching but not overlapping.  Finally I apply the sticky splicing tape (the blue piece in the photo below) and gently press on it to make sure it is evenly and fully attached to the audio tape.  Now the reel is once again ready for playback and digitization.  In the “Tambour Vaudou” audio clip above, you may notice three separate sections of content:  the voice at the beginning, the drums in the middle, and the singing at the end.  These were three pieces of tape that were spliced together on the original reel and that we repaired right here in the library’s Digital Production Center.

A finished splice.  Note that the splice is made on the shiny back of the tape, not on the matte side that audio signal is encoded on.
A finished splice. Note that the splice is made on the shiny back of the tape, not on the matte side that audio is recorded on.


These are just a few of many issues that can arise in the course of digitizing a collection of analog open reel audio tapes.  Fortunately, we can solve or mitigate most of these problems, get a clean transfer, and generate a high-quality archival digital file.  Until next time…keep your heads clean, your splices intact, and your reels spinning!


Focus: it’s about vision and teamwork

image of crossed eyes from an old advertisementSo much work to do, so little time. But what keeps us focused as we work to make a wealth of resources available via the web? It often comes down to a willingness to collaborate and a commitment to a common vision.

Staying focused through vision and values

When Duke University Libraries embarked on our 2012-2013 website redesign, we created a vision and values statement that became a guidepost during our decision making. It worked so well for that single project, that we later decided to apply it to current and future web projects. You can read the full statement on our website, but here are just a few of the key ideas:

  • Put users first.
  • Verify data and information, perpetually remove outdated or inaccurate data and content, & present relevant content at the point of need.
  • Strengthen our role as essential partners in research, teaching, and scholarly communication: be a center of intellectual life at Duke.
  • Maintain flexibility in the site to foster experimentation, risk-taking, and future innovation.

As we decide which projects to undertake, what our priorities should be, and how we should implement these projects, we often consider what aligns well with our vision and values. And when something doesn’t fit well, it’s often time to reconsider.

Team work, supporting and balancing one another

Vision counts, but having people who collaborate well is what really enables us to maintain focus and to take a coherent approach to our work.

A number of cross-departmental teams within Duke University Libraries consider which web-based projects we should undertake, who should implement them, when, and how. By ensuring that multiple voices are at the table, each bringing different expertise, we make use of the collective wisdom from within our staff.


The Web Experience Team (WebX) is responsible for the overall visual consistency and functional integrity of our web interfaces. It not only provides vision for our website, but actively leads or contributes to the implementation of numerous projects. Sample projects include:

  • The introduction of a new eBook service called Overdrive
  • The development of a new, Bento-style, version of our search portal to be released in August
  • Testing the usability of our web interfaces with patrons leading to changes such as the introduction of a selectable default search tab

Members of WebX are Aaron Welborn, Emily Daly, Heidi Madden, Jacquie Samples, Kate Collins, Michael Peper, Sean Aery, and Thomas Crichlow.


While we love to see the research community using our collections within our reading rooms, we understand the value in making these collections available online. The Advisory Committee for Digital Collections (ACDC) decides which collections of rare material will be published online. Members of ACDC are Andy Armacost, David Pavelich, Jeff Kosokoff, Kat Stefko, Liz Milewicz, Molly Bragg, Naomi Nelson, Valerie Gillispie, and Will Sexton.


The Digital Collections Implementation Team (DCIT) both guides and undertakes much of the work needed to digitize and publish our unique online collections. Popular collections DCIT has published include:
Man and woman standing next to each other as they each encircle one eye with their thumb and forefinger

Members of DCIT are Erin Hammeke, Mike Adamo, Molly Bragg, Noah Huffman, Sean Aery, and Will Sexton.

These groups have their individual responsibilities, but they also work well together. The teamwork extends beyond these groups as each relies on individuals and departments throughout Duke Libraries and beyond to ensure the success of our projects.

Most importantly, it helps that we like to work together, we value each other’s viewpoints, and we remain connected to a common vision.

Bento is Coming!

A unified search results page, commonly referred to as the “Bento Box” approach, has been an increasingly popular method to display search results on library websites. This method helps users gain quick access to a limited result set across a variety of information scopes while providing links to the various silos for the full results. NCSU’s QuickSearch implementation has been in place since 2005 and has been extremely influential on the approach taken by other institutions.

Way back in December of 2012, the DUL began investigating and planning for implementing a Bento search results layout on our website. Extensive testing revealed that users favor searching from a single box — as is their typical experience conducting web searches via Google and the like. Like many libraries, we’ve been using Summon as a unified discovery layer for articles, books, and other resources for a few years, providing an ‘All’ tab on our homepage as the entry point. Summon aggregates these various sources into a common index, presented in a single stream on search results pages. Our users often find this presentation overwhelming or confusing and prefer other search tools. As such, we’ve demoted the our ‘All’ search on our homepage — although users can set it as the default thanks to the very slick Default Scope search tool built by Sean Aery (with inspiration from the University of Notre Dame’s Hesburgh Libraries website):

Default Search Tool

The library’s Web Experience Team (WebX) proposed the Bento project in September of 2013. Some justifications for the proposal were as follows:

Bento boxing helps solve these problems:

  • We won’t have to choose which silo should be our default search scope (in our homepage or masthead)
  • Synthesizing relevance ranking across very different resources is extremely challenging, e.g., articles get in the way of books if you’re just looking for books (and vice-versa).
  • We need to move from “full collection discovery to full library discovery” – in the same search, users discover expertise, guides/experts, other library provisions alongside items from the collections. 1
  • “A single search box communicates confidence to users that our search tools can meet their information needs from a single point of entry.” 2


  1. Thirteen Ways of Looking at Libraries, Discovery, and the Catalog by Lorcan Dempsey.
  2. How Users Search the Library from a Single Search Box by Cory Lown, Tito Sierra, and Josh Boyer

Sean also developed this mockup of what Bento results could look like on our website and we’ve been using it as the model for our project going forward:

Bento Mockup

For the past month our Bento project team has been actively developing our own implementation. We have had the great luxury of building upon work that was already done by brilliant developers at our sister institutions (NCSU and UNC) — and particular thanks goes out to Tim Shearer at UNC Libraries who provided us with the code that they are using on their Bento results page, which in turn was heavily influenced by the work done at NCSU Libraries.

Our approach includes using results from Summon, Endeca, Springshare, and Google. We’re building this as a Drupal module which will make it easy to integrate into our site. We’re also hosting the code on GitHub so others can gain from what we’ve learned — and to help make our future enhancements to the module even easier to implement.

Our plan is to roll out Bento search in August, so stay tuned for the official launch announcement!


PS — as the 4th of July holiday is right around the corner, here are some interesting items from our digital collections related to independence day:


Digitization Details: Thunderbolts, Waveforms & Black Magic

The technology for digitizing analog videotape is continually evolving. Thanks to increases in data transfer-rates and hard drive write-speeds, as well as the availability of more powerful computer processors at cheaper price-points, the Digital Production Center recently decided to upgrade its video digitization system. Funding for the improved technology was procured by Winston Atkins, Duke Libraries Preservation Officer. Of all the materials we work with in the Digital Production Center, analog videotape has one of the shortest lifespans. Thus, it is high on the list of the Library’s priorities for long-term digital preservation. Thanks, Winston!

Thunderbolt is leaving USB in the dust.

Due to innovative design, ease of use, and dominance within the video and filmmaking communities, we decided to go with a combination of products designed by Apple Inc., and Blackmagic Design. A new computer hardware interface recently adopted by Apple and Blackmagic, called Thunderbolt, allows the the two companies’ products to work seamlessly together at an unprecedented data-transfer speed of 10 Gigabits per second, per channel. This is much faster than previously available interfaces such as Firewire and USB. Because video content incorporates an enormous amount of data, the improved data-transfer speed allows the computer to capture the video signal in real time, without interruption or dropped frames.

Blackmagic design converts the analog video signal to SDI (serial digital interface).

Our new data stream works as follows. Once a tape is playing on an analog videotape deck, the output signal travels through an Analog to SDI (serial digital interface) converter. This converts the content from analog to digital. Next, the digital signal travels via SDI cable through a Blackmagic SmartScope monitor, which allows for monitoring via waveform and vectorscope readouts. A veteran television engineer I know will talk to you for days regarding the physics of this, but, in layperson terms, these readouts let you verify the integrity of the color signal, and make sure your video levels are not too high (blown-out highlights) or too low (crushed shadows). If there is a problem, adjustments can be made via analog video signal processor or time-base corrector to bring the video signal within acceptable limits.

Blackmagic’s SmartScope allows for monitoring of the video’s waveform. The signal must stay between 0 and 700 (left side) or clipping will occur, which means you need to get that videotape to the emergency room, STAT!

Next, the video content travels via SDI cable to a Blackmagic Ultrastudio interface, which converts the signal from SDI to Thunderbolt, so it can now be recognized by a computer. The content then travels via Thunderbolt cable to a 27″ Apple iMac utilizing a 3.5 GHz Quad-core processor and NVIDIA GeForce graphics processor. Blackmagic’s Media Express software writes the data, via Thunderbolt cable, to a G-Drive Pro external storage system as a 10-bit, uncompressed preservation master file. After capture, editing can be done using Apple’s Final Cut Pro or QuickTime Pro. Compressed Mp4 access derivatives are then batch-processed using Apple’s Compressor software, or other utilities such as MPEG-Streamclip. Finally, the preservation master files are uploaded to Duke’s servers for long-term storage. Unless there are copyright restrictions, the access derivatives will be published online.

Video digitization happens in real time. A one-hour tape is digitized in, well, one hour, which is more than enough Bob Hope jokes for anyone.

Leveling Up Our Document Viewer

This past week, we were excited to be able to publish a rare 1804 manuscript copy of the Haitian Declaration of Independence in our digital collections website. We used the project as a catalyst for improving our document-viewing user experience, since we knew our existing platforms just wouldn’t cut it for this particular treasure from the Rubenstein Library collection. In order to present the declaration online, we decided to implement the open-source Diva.js viewer. We’re happy with the results so far and look forward to making more strides in our ability to represent documents in our site as the year progresses.

Haitian Declaration of Independence as seen in Diva.js document viewer with full text transcription.

Challenges to Address

We have had two glaring limitations in providing access to digitized collections to date: 1) a less-than-stellar zoom & pan feature for images and 2) a suboptimal experience for navigating documents with multiple pages. For zooming and panning (see example), we use software called OpenLayers, which is primarily a mapping application. And for paginated items we’ve used two plugins designed to showcase image galleries, Galleria (example) and Colorbox (example). These tools are all pretty good at what they do, but we’ve been using them more as stopgap solutions for things they weren’t really created to do in the first place. As the old saying goes, when all you have is a hammer, everything looks like a nail.

Big (OR Zoom-Dependent) Things

A selection from our digitized Italian Cultural Posters. Large derivative is 11,000 x 8,000 pixels, a 28MB JPG.
A selection from our digitized Italian Cultural Posters. The “large” derivative is 11,000 x 8,000 pixels, a 28MB JPG.

Traditionally as we digitize images, whether freestanding or components of a multi-page object, at the end of the process we generate three JPG derivatives per page. We make a thumbnail (helpful in search results or other item sets), medium image (what you see on an item’s webpage), and large image (same dimensions as the preservation master, viewed via the ‘all sizes’ link). That’s a common approach, but there are several places where that doesn’t always work so well. Some things we’ve digitized are big, as in “shoot them in sections with a camera and stitch the images together” big. And we’ve got several more materials like this waiting in the wings to make available. A medium image doesn’t always do these things justice, but good luck downloading and navigating a giant 28MB JPG when all you want to do is zoom in a little bit.

Likewise, an object doesn’t have to be large to really need easy zooming to be part of the viewing experience. You might want to read the fine print on that newspaper ad, see the surgeon general’s warning on that billboard, or inspect the brushstrokes in that beautiful hand-painted glass lantern slide.

And finally, it’s not easy to anticipate the exact dimensions at which all our images will be useful to a person or program using them. Using our data to power an interactive display for a media wall? A mobile app? A slideshow on the web? You’ll probably want images that are different dimensions than what we’ve stored online. But to date, we haven’t been able to provide ways to specify different parameters (like height, width, and rotation angle) in the image URLs to help people use our images in environments beyond our website.

A page from Mary McCornack Thompson's 1908 travel diary, underrepresented by its presentation via an image gallery.
A page from Mary McCornack Thompson’s 1908 travel diary, limited by its presentation via an image gallery.

Paginated Things

We do love our documentary photography collections, but a lot of our digitized objects are represented by more than just a single image. Take an 11-page piece of sheet music or a 127-page diary, for example. Those aren’t just sequences or collections of images. Their paginated orientation is pretty essential to their representation online, but a lot of what characterizes those materials is unfortunately lost in translation when we use gallery tools to display them.

The Intersection of (Big OR Zoom-Dependent) AND Paginated

Here’s where things get interesting and quite a bit more complicated: when zooming, panning, page navigation, and system performance are all essential to interacting with a digital object. There are several tools out there that support these various aspects, but very few that do them all AND do them well. We knew we needed something that did.

Our Solution: Diva.js

diva-logoWe decided to use the open-source Diva.js (Document Image Viewer with AJAX). Developed at the Distributed Digital Music Archives and Libraries Lab (DDMAL) at McGill University, it’s “a Javascript frontend for viewing documents, designed to work with digital libraries to present multi-page documents as a single, continuous item” (see About page). We liked its combination of zooming, panning, and page navigation, as well as its extensibility. This Code4Lib article nicely summarizes how it works and why it was developed.

Setting up Diva.js required us to add a few new pieces to our infrastructure. The most significant was an image server (in our case, IIPImage) that could 1) deliver parts of a digital image upon request, and 2) deliver complete images at whatever size is requested via URL parameters.

Our Interface: How it Works

By default, we present a document in our usual item page template that provides branding, context, and metadata. You can scroll up and down to navigate pages, use Page Up or Page Down keys, or enter a page number to jump to a page directly. There’s a slider to zoom in or out, or alternatively you can double-click to zoom in / Ctrl-double-click to zoom out. You can toggle to a grid view of all pages and adjust how many pages to view at once in the grid. There’s a really handy full-screen option, too.

Fulltext transcription presented in fullscreen mode, thumbnail view.
Fulltext transcription presented in fullscreen mode, thumbnail view.
Page 4, zoom level 4, with link to download.
Page 4, zoom level 4, with link to download.

It’s optimized for performance via AJAX-driven “lazy loading”: only the page of the document that you’re currently viewing has to load in your browser, and likewise only the visible part of that page image in the viewer must load (via square tiles). You can also download a complete JPG for a page at the current resolution by clicking the grey arrow.

We extended Diva.js by building a synchronized fulltext pane that displays the transcript of the current page alongside the image (and beneath it in full-screen view). That doesn’t come out-of-the-box, but Diva.js provides some useful hooks into its various functions to enable developing this sort of thing. We also slightly modified the styles.

image tile
A tile delivered by IIPImage server

Behind the scenes, we have pyramid TIFF images (one for each page), served up as JPGs by IIPImage server. These files comprise arrays of 256×256 JPG tiles for each available zoom level for the image. Let’s take page 1 of the declaration for example. At zoom level 0 (all the way zoomed out), there’s only one image tile: it’s under 256×256 pixels; level 1 is 4 tiles, level 2 is 12, level 3 is 48, level 4 is 176. The page image at level 5 (all the way zoomed in) includes 682 tiles (example of one), which sounds like a lot, but then again the server only has to deliver the parts that you’re currently viewing.

Every item using Diva.js also needs to load a JSON stream including the dimensions for each page within the document, so we had to generate that data. If there’s a transcript present, we store it as a single HTML file, then use AJAX to dynamically pull in the part of that file that corresponds to the currently-viewed page in the document.

Diva.js & IIPImage Limitations

It’s a good interface, and is the best document representation we’ve been able to provide to date. Yet it’s far from perfect. There are several areas that are limiting or that we want to explore more as we look to make more documents available in the future.

Out of the box, Diva.js doesn’t support page metadata, transcriptions, or search & retrieval within a document. We do display a synchronized transcript, but there’s currently no mapping between the text and the location within each page where each word appears, nor can you perform a search and discover which pages contain a given keyword. Other folks using Diva.js are working on robust applications that handle these kinds of interactions, but the degree to which they must customize the application is high. See for example, the Salzinnes Antiphonal: a 485-page liturgical manuscript w/text and music or a prototype for the Liber Usualis: a 2,000+ page manuscript using optical music recognition to encode melodic fragments.

Diva.js also has discrete zooming, which can feel a little jarring when you jump between zoom levels. It’s not the smooth, continuous zoom experience that is becoming more commonplace in other viewers.

With the IIPImage server, we’ll likely re-evaluate using Pyramid TIFFs vs. JPEG2000s to see which file format works best for our digitization and publication workflow. In either case, there are several compression and caching variables to tinker with to find an ideal balance between image quality, storage space required, and system performance. We also discovered that the IIP server unfortunately strips out the images’ ICC color profiles when it delivers JPGs, so users may not be getting a true-to-form representation of the image colors we captured during digitization.

Next Steps

Launching our first project using Diva.js gives us a solid jumping-off point for expanding our ability to provide useful, compelling representations of our digitized documents online. We’ll assess how well this same approach would scale to other potential projects and in the meantime keep an eye on the landscape to see how things evolve. We’re better equipped now than ever to investigate alternative approaches and complementary tools for doing this work.

We’ll also engage more closely with our esteemed colleagues in the Duke Collaboratory for Classics Computing (DC3), who are at the forefront of building tools and services in support of digital scholarship. Well beyond supporting discovery and access to documents, their work enables a community of scholars to collaboratively transcribe and annotate items (an incredible–and incredibly useful–feat!). There’s a lot we’re eager to learn as we look ahead.

Digitization Details: Before We Push the “Scan” Button

The Digital Production Center at the Perkins Library has a clearly stated mission to “create digital captures of unique, valuable, or compelling primary resources for the purpose of preservation, access, and publication.”  Our mission statement goes on to say, “Our operating principle is to achieve consistent results of a measurable quality. We plan and perform our work in a structured and scalable way, so that our results are predictable and repeatable, and our digital collections are uniform.”

That’s a mouthful!


What it means is the images have to be consistent not only from image to image within a collection but also from collection to collection over time.  And if that isn’t complex enough this has to be done using many different capture devices.  Each capture device has its own characteristics, which record and reproduce color in different ways.

How do we produce consistent images?

There are many variables to consider when solving the puzzle of “consistent results of a measurable quality.”  First, we start with the viewing environment, then move to monitor calibration and profiling, and end with capture device profiling.  All of these variables play a part in producing consistent results.

Full spectrum lighting is used in the Digital Production Center to create a neutral environment for viewing the original material.  Lighting that is not full spectrum often has a blue, magenta, green or yellow color shift, which we often don’t notice because our eyes are able to adjust effortlessly.  In the image below you can see the difference between tungsten lighting and neutral lighting.

Tungsten light (left) Neutral light (right)
Tungsten light (left) Neutral light (right)

Our walls are also painted 18 percent gray, which is neutral, so that no color is reflected from the walls onto the image while comparing it to the digital image.

Now that we have a neutral viewing environment, the next variable to consider is the computer monitors used to view our digitized images.  We use a spectrophotometer (straight out of the Jetsons, right?) made by xrite to measure the color accuracy, luminance and contrast of the monitor.  This hardware/software combination uses the spectrophotometer as it’s attached to the computer screen to read the brightness (luminance), contrast, white point and gamma of your monitor and makes adjustments for optimal viewing.  This is called monitor calibration.  The software then displays a series of color patches with known RGB values which the spectrophotometer measures and records the difference.  The result is an icc display profile.  This profile is saved to your operating system and is used to translate colors from what your monitor natively produces to a more accurate color representation.

Now our environment is neutral and our monitor is calibrated and profiled.  The next step in the process is to profile your capture device, whether it is a high-end digital scan back like the Phase One or BetterLight or an overhead scanner like a Zeutschel. From Epson flatbed scanners to Nikon slide scanners, all of these devices can be calibrated in the same way.  With all of the auto settings on your scanner turned off, a color target is digitized on the device you wish to calibrate.  The swatches on the color target are known values similar to the series of color patches used for profiling the monitor.  The digitized target is fed to the profiling software.  Each patch is measured and compared against its known value.  The differences are recorded and the result is an icc device profile.

Now that we have a neutral viewing environment for viewing the original material, our eyes don’t need to compensate for any color shift from the overhead lights or reflection from the walls.  Our monitors are calibrated/profiled so that the digitized images display correctly and our devices are profiled so they are able to produce consistent images regardless of what brand or type of capture device we use.

Gretag Macbeth color checker
Gretag Macbeth color checker

During our daily workflow we a Gretag Macbeth color checker to measure the output of the capture devices every day before we begin digitizing material to verify that the device is still working properly.

All of this work is done before we push the “scan” button to ensure that our results are predictable and repeatable, measurable and scalable.  Amen.

Mapping the Broadsides Collection, or, how to make an interactive map in 30 minutes or less

Ever find yourself with a pile of data that you want to plot on a map? You’ve got names of places and lots of other data associated with those places, maybe even images? Well, this happened to me recently. Let me explain.

A few years ago we published the Broadsides and Ephemera digital collection, which consists of over 4,100 items representing almost every U.S. state. When we cataloged the items in the collection, we made sure to identify, if possible, the state, county, and city of each broadside. We put quite a bit of effort into this part of the metadata work, but recently I got to thinking…what do we have to show for all of that work? Sure, we have a browseable list of place terms and someone can easily search for something like “Ninety-Six, South Carolina.” But, wouldn’t it be more interesting (and useful) if we could see all of the places represented in the Broadsides collection on one interactive map? Of course it would.

So, I decided to make a map. It was about 4:30pm on a Friday and I don’t work past 5, especially on a Friday. Here’s what I came up with in 30 minutes, a Map of Broadside Places. Below, I’ll explain how I used some free and easy-to-use tools like Excel, Open Refine, and Google Fusion Tables to put this together before quittin’ time.

Step 1: Get some structured data with geographic information
Mapping only works if your data contain some geographic information. You don’t necessarily need coordinates, just a list of place names, addresses, zip codes, etc. It helps if the geographic information is separated from any other data in your source, like in a separate spreadsheet column or database field. The more precise, structured, and consistent your geographic data, the easier it will be to map accurately. To produce the Broadsides Map, I simply exported all of the metadata records from our metadata management system (CONTENTdm) as a tab delimited text file, opened it in Excel, and removed some of the columns that I didn’t want to display on the map.

Step 2: Clean up any messy data..
For the best results, you’ll want to clean your data. After opening my tabbed file in Excel, I noticed that the place name column contained values for country, state, county, and city all strung together in the same cell but separated with semicolons (e.g. United States; North Carolina; Durham County (N.C.); Durham (N.C.)). Because I was only really interested in plotting the cities on the map, I decided to split the place name column into several columns in order to isolate the city values.

To do this, you have a couple of options. You can use Excel’s “text to columns” feature, instructing it to split the column into new columns based on the semicolon delimiter or you can load your tabbed file into Open Refine and use its “split columns into several columns” feature. Both tools work well for this task, but I prefer OpenRefine because it includes several more advanced data cleaning features. If you’ve never used OpenRefine before, I highly recommend it. It’s “cluster and edit” feature will blow your mind (if you’re a metadata librarian).

Step 3: Load the cleaned data into Google Fusion Tables
Google Fusion Tables is a great tool for merging two or more data sets and for mapping geographic data. You can access Fusion Tables from your Google Drive (formerly Google Docs) account. Just upload your spreadsheet to Fusion Tables and typically the application will automatically detect if one of your columns contains geographic or location data. If so, it will create a map view in a separate tab, and then begin geocoding the location data.


If Fusion Tables doesn’t automatically detect the geographic data in your source file, you can explicitly change a column’s data type in Fusion Tables to “Location” to trigger the geocoding process. Once the geocoding process begins, Fusion Tables will process every place name in your spreadsheet through the Google Maps API and attempt to plot that place on the map. In essence, it’s as if you were searching for each one of those terms in Google Maps and putting the results of all of those searches on the same map.

Once the geocoding process is complete, you’re left with a map that features a placemark for every place term the service was able to geocode. If you click on any of the placemarks, you’ll see a pop-up information window that, by default, lists all of the other metadata elements and values associated with that record. You’ll notice that the field labels in the info window match the column headers in your spreadsheet. You’ll probably want to tweak some settings to make this info window a little more user-friendly.


Step 4: Make some simple tweaks to add images and clickable links to your map
To change the appearance of the information window, select the “change” option under the map tab then choose “change info window.” From here, you can add or remove fields from the info window display, change the data labels, or add some custom HTML code to turn the titles into clickable links or add thumbnail images. If your spreadsheet contains any sort of URL, or identifier that you can use to reliably construct a URL, adding these links and images is quite simple. You can call any value in your spreadsheet by referencing the column name in braces (e.g. {Identifier-DukeID}). Below is the custom HTML code I used to style the info window for my Broadsides map. Notice how the data in the {Identifier-DukeID} column is used to construct the links for the titles and image thumbnails in the info window.


Step 5: Publish your map
Once you’re satisfied with you map, you can share a link to it or embed the map in your own web page or blog…like this one. Just choose tools->publish to grab the link or copy and paste the HTML code into your web page or blog.

To learn more about creating maps in Google Fusion Tables, see this Tutorial or contact the Duke Library’s Data and GIS Services.