We try to keep our posts pretty focussed on the important work at hand here at Bitstreams central, but sometimes even we get distracted (speaking of, did you know that you can listen to the Go-Gos for hours and hours on Spotify?). With most of our colleagues in the library leaving for or returning from vacation, it can be difficult to think about anything but exotic locations and what to do with all the time we are not spending in meetings. So this week, dear reader, we give you a few snapshots of vacation adventures told through Duke Digital Collections.
Many of Duke’s librarians (myself included) head directly East for a few days of R/R at the one of many beautiful North Carolina beaches. Who can blame them? It seems like everyone loves the beach including William Gedney, Deena Stryker, Paul Kwilecki and even Sydney Gamble. Lucky for North Carolina, the beach is only a short trip away, but of course there are essentials that you must not forget even on such a short journey.
Of course many colleagues have ventured even farther afield to West Virginia, Minnesota, Oregon, Maine and even Africa!! Wherever our colleagues are, we hope they are enjoying some well deserved time-off. For those of us who have already had our time away or are looking forward to next time, we will just have to live vicariously through our colleagues’ and our collections’ adventures.
The audio tapes in the recently acquired Radio Haiti collection posed a number of digitization challenges. Some of these were discussed in this video produced by Duke’s Rubenstein Library:
In this post, I will use a short audio clip from the collection to illustrate some of the issues that we face in working with this particular type of analog media.
First, I present the raw digitized audio, taken from a tape labelled “Tambour Vaudou”:
As you can hear, there are a number of confusing and disorienting things going on there. I’ll attempt to break these down into a series of discrete issues that we can diagnose and fix if necessary.
Analog tape machines typically offer more than one speed for recording, meaning that you can change the rate at which the reels turn and the tape moves across the record or playback head. The faster the speed, the higher the fidelity of the result. On the other hand, faster speeds use more tape (which is expensive). Tape speed is measured in “ips” (inches per second). The tapes we work with were usually recorded at speeds of 3.75 or 7.5 ips, and our playback deck is set up to handle either of these. We preview each tape before digitizing to determine what the proper setting is.
In the audio example above, you can hear that the tape speed was changed at around 10 seconds into the recording. This accounts for the “spawn of Satan” voice you hear at the beginning. Shifting the speed in the opposite direction would have resulted in a “chipmunk voice” effect. This issue is usually easy to detect by ear. The solution in this case would be to digitize the first 10 seconds at the faster speed (7.5 ips), and then switch back to the slower playback speed (3.75 ips) for the remainder of the tape.
Volume Level and Background Noise
The tapes we work with come from many sources and locations and were recorded on a variety of equipment by people with varying levels of technical knowledge. As a result, the audio can be all over the place in terms of fidelity and volume. In the audio example above, the volume jumps dramatically when the drums come in at around 00:10. Then you hear that the person making the recording gradually brings the level down before raising it again slightly. There are similar fluctuations in volume level throughout the audio clip. Because we are digitizing for archival preservation, we don’t attempt to make any changes to smooth out the sometimes jarring volume discrepancies across the course of a tape. We simply find the loudest part of the content, and use that to set our levels for capture. The goal is to get as much signal as possible to our audio interface (which converts the analog signal to digital information that can be read by software) without overloading it. This requires previewing the tape, monitoring the input volume in our audio software, and adjusting accordingly.
This recording happens to be fairly clean in terms of background noise, which is often not the case. Many of the oral histories that we work with were recorded in noisy public spaces or in homes with appliances running, people talking in the background, or the subject not in close enough proximity to the microphone. As a result, the content can be obscured by noise. Unfortunately there is little that can be done about this since the problem is in the recording itself, not the playback. There are a number of hum, hiss, and noise removal tools for digital audio on the market, but we typically don’t use these on our archival files. As mentioned above, we try to capture the source material as faithfully as possible, warts and all. After each transfer, we clean the tape heads and all other surfaces that the tape touches with a Q-tip and denatured alcohol. This ensures that we’re not introducing additional noise or signal loss on our end.
While cleaning the Radio Haiti tapes (as detailed in the video above), we discovered that many of the tapes were comprised of multiple sections of tape spliced together. A splice is simply a place where two different pieces of audio tape are connected by a piece of sticky tape (much like the familiar Scotch tape that you find in any office). This may be done to edit together various content into a seamless whole, or to repair damaged tape. Unfortunately, the sticky tape used for splicing dries out over time, becomes brittle, and loses it’s adhesive qualities. In the course of cleaning and digitizing the Radio Haiti tapes, many of these splices came undone and had to be repaired before our transfers could be completed.
Our playback deck includes a handy splicing block that holds the tape in the correct position for this delicate operation. First I use a razor blade to clean up any rough edges on both ends of the tape and cut it to the proper 45 degree angle. The splicing block includes a groove that helps to make a clean and accurate cut. Then I move the two pieces of tape end to end, so that they are just touching but not overlapping. Finally I apply the sticky splicing tape (the blue piece in the photo below) and gently press on it to make sure it is evenly and fully attached to the audio tape. Now the reel is once again ready for playback and digitization. In the “Tambour Vaudou” audio clip above, you may notice three separate sections of content: the voice at the beginning, the drums in the middle, and the singing at the end. These were three pieces of tape that were spliced together on the original reel and that we repaired right here in the library’s Digital Production Center.
These are just a few of many issues that can arise in the course of digitizing a collection of analog open reel audio tapes. Fortunately, we can solve or mitigate most of these problems, get a clean transfer, and generate a high-quality archival digital file. Until next time…keep your heads clean, your splices intact, and your reels spinning!
The era in which libraries have digitized their collections and published them on the Internet is less than two decades old. As an observer and participant during this time, I’ve seen some great projects come online. For me, one stands out for its impact and importance – the Farm Security Administration/Office of War Information Black-and-White Negatives, which is Library of Congress’ collection of 175,000 photographs taken by employees of the US government in the 1930s and 40s.
The FSA photographers produced some of the most iconic images of the past century. In the decades following the program, they became known via those who journeyed to D.C. to select, reproduce, and publish in monographs, or display in exhibits. But the entire collection, funded by the federal government, was as public as public domain gets. When the LoC took on the digitization of the collection, it became available in mass. All those years, it had been waiting for the Internet.
The FSA photographers covered the US. This wonderful site built by a team from Yale can help you determine whether they passed through your hometown. Between 1939 and 1940, Dorothea Lange, Marion Post Wolcott, and Jack Delano traveled through the town and the county where I live, and some 73 of their photos are now online. I’ve studied them, and also witnessed the wonderment of my friends and neighbors when they happen upon the pictures. The director of the FSA program, Roy Stryker, was one of the visionaries of the Twentieth Century, but it took the digital collection to make the scope and reach of his vision apparent.
Photography has been an emphasis of our own digital collections program over the years. At the same time that the FSA traveled to rural Chatham County on their mission of “introducing America to Americans,” anonymous photographers employed by the RC Maxwell Company shot their outdoor advertising installations in places like Atlantic City, New Jersey and Richmond, Virginia. Maybe they were merely “introducing advertising to advertisers,” but I like to think of them as our own mini-Langes and mini-Wolcotts, freezing scenes that others cruised past in their Studebakers.
Certainly the most important traveling photographer we’ve published has been Sidney Gamble, an American who visited Asia, particularly China, on four occasions between 1908 and 1932. As with the FSA photos, I’ve spent time studying the scenes of places known to me. I’ve never been to China or Siberia, but I did live in Japan for a while some years ago, and come back to photos of a few places I visited – or maybe didn’t – while I was there.
The first place is the Great Buddha at Kamakura. It’s a popular tourist site south of Tokyo; I visited with some friends in 1990. Our collection has four photographs by Gamble of the Daibutsu. I don’t find anything particular of interest in Gamble’s shots, just the unmistakable calm and grandeur of the same scene I saw 60+ years later.
More intriguing for me, however, is the photo that Gamble took of the YMCA* in Yokohama, probably in 1917. For a while during my stay in Japan, I lived a few train stops from Yokohama, and got involved in a weekly game of pickup basketball at the Y there. I don’t remember much about the exterior of the building, but I recall the interior as somewhat funky, with lots of polished wood and a sweet wooden court. It was very distinctive for Tokyo and environs – a city where most of the architecture is best described as transient and flimsy, designed to have minimum impact when flattened by massive forces like earthquakes or bombers. I’ve always wondered if the building in Gamble’s photo was the same that I visited.
So I began to construct a response to this question based entirely on my own fading memories, some superficial research, and a fractional comprehension of a series of youtube videos on the history of the YMCA in Yokohama. To begin with, a screenshot of Google street view of the Yokohama YMCA in 2011 shows a building quite different from the original.
The youtube video includes a photograph of a building, clearly the same as the one in Gamble’s photograph, that was built in 1884. There are shots of people playing basketball and table tennis, and the few details of the interior look a lot like the place I remember. Could it be the same?
But then we see the building damaged from the Great Kanto Earthquake of 1923. That the structure was standing at all would have been remarkable. You can easily search and find images of the astonishing devastation of that event, but I’ll let these harrowing words from a correspondent of The Atlantic convey the scale of it.
Yokohama, the city of almost half a million souls, had become a vast plain of fire, of red, devouring sheets of flame which played and flickered. Here and there a remnant of a building, a few shattered walls, stood up like rocks above the expanse of flame, unrecognizable. There seemed to be nothing left to burn. It was as if the very earth were now burning.
According to my understanding of the video, the YMCA moved into another building in 1926. Based on the photos of the interior, my guess is that it was the same building where I visited in the early 1990s. The shots of basketball and table tennis from earlier might have been taken inside this building, even if the members of the Y engaged in those activities in the original.
Still, I couldn’t help but ask – would the Japanese have played basketball in the original building, between the game’s invention in 1891 and the earthquake in 1923? It seemed anachronistic to me, until I looked into it a little further.
It’s not hard to imagine Ishikawa making a beeline from the ship when it docked at Yokohama to the YMCA. If so, it makes the building that Gamble shot one of the sanctified sites of the sport, like many shrines since ruined but replaced. Sure it was impressive to gaze up at a Giant Buddha cast in bronze some 800 years prior, but what I really like to think about is how that sweet court I played on in Yokohama bears a direct line of descent from the origins of the game.
So much work to do, so little time. But what keeps us focused as we work to make a wealth of resources available via the web? It often comes down to a willingness to collaborate and a commitment to a common vision.
Staying focused through vision and values
When Duke University Libraries embarked on our 2012-2013 website redesign, we created a vision and values statement that became a guidepost during our decision making. It worked so well for that single project, that we later decided to apply it to current and future web projects. You can read the full statement on our website, but here are just a few of the key ideas:
Put users first.
Verify data and information, perpetually remove outdated or inaccurate data and content, & present relevant content at the point of need.
Strengthen our role as essential partners in research, teaching, and scholarly communication: be a center of intellectual life at Duke.
Maintain flexibility in the site to foster experimentation, risk-taking, and future innovation.
As we decide which projects to undertake, what our priorities should be, and how we should implement these projects, we often consider what aligns well with our vision and values. And when something doesn’t fit well, it’s often time to reconsider.
Team work, supporting and balancing one another
Vision counts, but having people who collaborate well is what really enables us to maintain focus and to take a coherent approach to our work.
A number of cross-departmental teams within Duke University Libraries consider which web-based projects we should undertake, who should implement them, when, and how. By ensuring that multiple voices are at the table, each bringing different expertise, we make use of the collective wisdom from within our staff.
The Web Experience Team (WebX) is responsible for the overall visual consistency and functional integrity of our web interfaces. It not only provides vision for our website, but actively leads or contributes to the implementation of numerous projects. Sample projects include:
The introduction of a new eBook service called Overdrive
The development of a new, Bento-style, version of our search portal to be released in August
Members of WebX are Aaron Welborn, Emily Daly, Heidi Madden, Jacquie Samples, Kate Collins, Michael Peper, Sean Aery, and Thomas Crichlow.
While we love to see the research community using our collections within our reading rooms, we understand the value in making these collections available online. The Advisory Committee for Digital Collections (ACDC) decides which collections of rare material will be published online. Members of ACDC are Andy Armacost, David Pavelich, Jeff Kosokoff, Kat Stefko, Liz Milewicz, Molly Bragg, Naomi Nelson, Valerie Gillispie, and Will Sexton.
The Digital Collections Implementation Team (DCIT) both guides and undertakes much of the work needed to digitize and publish our unique online collections. Popular collections DCIT has published include:
Members of DCIT are Erin Hammeke, Mike Adamo, Molly Bragg, Noah Huffman, Sean Aery, and Will Sexton.
These groups have their individual responsibilities, but they also work well together. The teamwork extends beyond these groups as each relies on individuals and departments throughout Duke Libraries and beyond to ensure the success of our projects.
Most importantly, it helps that we like to work together, we value each other’s viewpoints, and we remain connected to a common vision.
A unified search results page, commonly referred to as the “Bento Box” approach, has been an increasingly popular method to display search results on library websites. This method helps users gain quick access to a limited result set across a variety of information scopes while providing links to the various silos for the full results. NCSU’s QuickSearch implementation has been in place since 2005 and has been extremely influential on the approach taken by other institutions.
Way back in December of 2012, the DUL began investigating and planning for implementing a Bento search results layout on our website. Extensive testing revealed that users favor searching from a single box — as is their typical experience conducting web searches via Google and the like. Like many libraries, we’ve been using Summon as a unified discovery layer for articles, books, and other resources for a few years, providing an ‘All’ tab on our homepage as the entry point. Summon aggregates these various sources into a common index, presented in a single stream on search results pages. Our users often find this presentation overwhelming or confusing and prefer other search tools. As such, we’ve demoted the our ‘All’ search on our homepage — although users can set it as the default thanks to the very slick Default Scope search tool built by Sean Aery (with inspiration from the University of Notre Dame’s Hesburgh Libraries website):
The library’s Web Experience Team (WebX) proposed the Bento project in September of 2013. Some justifications for the proposal were as follows:
Bento boxing helps solve these problems:
We won’t have to choose which silo should be our default search scope (in our homepage or masthead)
Synthesizing relevance ranking across very different resources is extremely challenging, e.g., articles get in the way of books if you’re just looking for books (and vice-versa).
We need to move from “full collection discovery to full library discovery” – in the same search, users discover expertise, guides/experts, other library provisions alongside items from the collections. 1
“A single search box communicates confidence to users that our search tools can meet their information needs from a single point of entry.” 2
Sean also developed this mockup of what Bento results could look like on our website and we’ve been using it as the model for our project going forward:
For the past month our Bento project team has been actively developing our own implementation. We have had the great luxury of building upon work that was already done by brilliant developers at our sister institutions (NCSU and UNC) — and particular thanks goes out to Tim Shearer at UNC Libraries who provided us with the code that they are using on their Bento results page, which in turn was heavily influenced by the work done at NCSU Libraries.
Our approach includes using results from Summon, Endeca, Springshare, and Google. We’re building this as a Drupal module which will make it easy to integrate into our site. We’re also hosting the code on GitHub so others can gain from what we’ve learned — and to help make our future enhancements to the module even easier to implement.
Our plan is to roll out Bento search in August, so stay tuned for the official launch announcement!
PS — as the 4th of July holiday is right around the corner, here are some interesting items from our digital collections related to independence day:
Fifty years ago, hundreds of student volunteers headed south to join the Student Nonviolent Coordinating Committee’s (SNCC) field staff and local people in their fight against white supremacy in Mississippi. This week, veterans of Freedom Summer are gathering at Tougaloo College, just north of Jackson, Mississippi, to commemorate their efforts to remake American democracy.
The 50th anniversary events, however, aren’t only for movement veterans. Students, young organizers, educators, historians, archivists, and local Mississippians make up the nearly one thousand people flocking to Tougaloo’s campus this Wednesday through Saturday. We here at Duke Libraries, as well as members of the SNCC Legacy Project Editorial Board, are in the mix, making connections with both activists and archivists about our forthcoming website, One Person, One Vote: The Legacy of SNCC and the Fight for Voting Rights.
This site will bring together material created in and around SNCC’s struggle for voting rights in the 1960s and pair it with new interpretations of that history by the movement veterans themselves. To pull this off, we’ll be drawing on Duke’s own collection of SNCC-related material, as well as incorporating the wealth of material already digitized by institutions like the University of Southern Mississippi, the Wisconsin Historical Society’s Freedom Summer Collection, the Mississippi Department of Archives and History, as well as others.
What becomes clear while circling through the panels, films, and hallway conversations at Freedom Summer 50th events is how the fight for voting rights is really a story of thousands of local people. The One Person, One Vote site will feature these everyday people – Mississippians like Peggy Jean Connor, Fannie Lou Hamer, Vernon Dahmer, and SNCC workers like Hollis Watkins, Bob Moses, and Charlie Cobb. And the list goes on. It’s not everyday that so many of these people come together under one roof, and we’re doing our share of listening to and connecting with the people whose stories will make up the One Person, One Vote site.
Many of us here at Duke have been excited about the Digital Public Library of America (DPLA) since their launch in April of 2013. DPLA’s mission is to bring together America’s cultural riches into one portal. Additionally, they provide a platform for accessing and sharing library data in technologically innovative and impactful ways via the DPLA API. If you are not familiar with DPLA, be sure to take a look at their website and watch their introductory video.
The North Carolina Digital Heritage Center (NCDHC) is our local service hub for DPLA and we met with them to understand requirements for contributing metadata as well as the nuts and bolts of exposing our records for harvesting. They have a system in place that is really easy for contributing libraries around the state, and we are very thankful for their efforts. On our side, we chose our first collection to share, updated rights statements for the items in that collection and contacted NCDCH to let them know where to find our metadata (admittedly these tasks involved a bit more nitty gritty work than I am describing here, but it was still a relatively simple process).
In mid-June, NCDHC harvested metadata from our Broadsides and Ephemera digital collection and shortly thereafter, voila the records are available through DPLA!!
We plan to continue making more collections available to DPLA, but are still selecting materials. What collections do you think we should share? Let us know in the comments below or through Twitter or Facebook.
Thanks again to NCDHC for the wonderful work they do in helping us and other libraries across North Carolina participate in the ambitious mission of the Digital Public Library of America!
The technology for digitizing analog videotape is continually evolving. Thanks to increases in data transfer-rates and hard drive write-speeds, as well as the availability of more powerful computer processors at cheaper price-points, the Digital Production Center recently decided to upgrade its video digitization system. Funding for the improved technology was procured by Winston Atkins, Duke Libraries Preservation Officer. Of all the materials we work with in the Digital Production Center, analog videotape has one of the shortest lifespans. Thus, it is high on the list of the Library’s priorities for long-term digital preservation. Thanks, Winston!
Due to innovative design, ease of use, and dominance within the video and filmmaking communities, we decided to go with a combination of products designed by Apple Inc., and Blackmagic Design. A new computer hardware interface recently adopted by Apple and Blackmagic, called Thunderbolt, allows the the two companies’ products to work seamlessly together at an unprecedented data-transfer speed of 10 Gigabits per second, per channel. This is much faster than previously available interfaces such as Firewire and USB. Because video content incorporates an enormous amount of data, the improved data-transfer speed allows the computer to capture the video signal in real time, without interruption or dropped frames.
Our new data stream works as follows. Once a tape is playing on an analog videotape deck, the output signal travels through an Analog to SDI (serial digital interface) converter. This converts the content from analog to digital. Next, the digital signal travels via SDI cable through a Blackmagic SmartScope monitor, which allows for monitoring via waveform and vectorscope readouts. A veteran television engineer I know will talk to you for days regarding the physics of this, but, in layperson terms, these readouts let you verify the integrity of the color signal, and make sure your video levels are not too high (blown-out highlights) or too low (crushed shadows). If there is a problem, adjustments can be made via analog video signal processor or time-base corrector to bring the video signal within acceptable limits.
Next, the video content travels via SDI cable to a Blackmagic Ultrastudio interface, which converts the signal from SDI to Thunderbolt, so it can now be recognized by a computer. The content then travels via Thunderbolt cable to a 27″ Apple iMac utilizing a 3.5 GHz Quad-core processor and NVIDIA GeForce graphics processor. Blackmagic’s Media Express software writes the data, via Thunderbolt cable, to a G-Drive Pro external storage system as a 10-bit, uncompressed preservation master file. After capture, editing can be done using Apple’s Final Cut Pro or QuickTime Pro. Compressed Mp4 access derivatives are then batch-processed using Apple’s Compressor software, or other utilities such as MPEG-Streamclip. Finally, the preservation master files are uploaded to Duke’s servers for long-term storage. Unless there are copyright restrictions, the access derivatives will be published online.
Thanks for all you do throughout the year to make our lives better, brighter, and a bit more fun! From teaching us to fish to helping us move, fathers and father-figures have always been there to help children learn, grow and achieve. While parenting roles and identities continue to evolve, the love of family persists. So, this Father’s Day here is a Digital Collections salute to dads everywhere!
This past week, we were excited to be able to publish a rare 1804 manuscript copy of the Haitian Declaration of Independence in our digital collections website. We used the project as a catalyst for improving our document-viewing user experience, since we knew our existing platforms just wouldn’t cut it for this particular treasure from the Rubenstein Library collection. In order to present the declaration online, we decided to implement the open-source Diva.js viewer. We’re happy with the results so far and look forward to making more strides in our ability to represent documents in our site as the year progresses.
Challenges to Address
We have had two glaring limitations in providing access to digitized collections to date: 1) a less-than-stellar zoom & pan feature for images and 2) a suboptimal experience for navigating documents with multiple pages. For zooming and panning (see example), we use software called OpenLayers, which is primarily a mapping application. And for paginated items we’ve used two plugins designed to showcase image galleries, Galleria (example) and Colorbox (example). These tools are all pretty good at what they do, but we’ve been using them more as stopgap solutions for things they weren’t really created to do in the first place. As the old saying goes, when all you have is a hammer, everything looks like a nail.
Big (OR Zoom-Dependent) Things
Traditionally as we digitize images, whether freestanding or components of a multi-page object, at the end of the process we generate three JPG derivatives per page. We make a thumbnail (helpful in search results or other item sets), medium image (what you see on an item’s webpage), and large image (same dimensions as the preservation master, viewed via the ‘all sizes’ link). That’s a common approach, but there are several places where that doesn’t always work so well. Some things we’ve digitized are big, as in “shoot them in sections with a camera and stitch the images together” big. And we’ve got several more materials like this waiting in the wings to make available. A medium image doesn’t always do these things justice, but good luck downloading and navigating a giant 28MB JPG when all you want to do is zoom in a little bit.
Likewise, an object doesn’t have to be large to really need easy zooming to be part of the viewing experience. You might want to read the fine print on that newspaper ad, see the surgeon general’s warning on that billboard, or inspect the brushstrokes in that beautiful hand-painted glass lantern slide.
And finally, it’s not easy to anticipate the exact dimensions at which all our images will be useful to a person or program using them. Using our data to power an interactive display for a media wall? A mobile app? A slideshow on the web? You’ll probably want images that are different dimensions than what we’ve stored online. But to date, we haven’t been able to provide ways to specify different parameters (like height, width, and rotation angle) in the image URLs to help people use our images in environments beyond our website.
We do love our documentary photography collections, but a lot of our digitized objects are represented by more than just a single image. Take an 11-page piece of sheet music or a 127-page diary, for example. Those aren’t just sequences or collections of images. Their paginated orientation is pretty essential to their representation online, but a lot of what characterizes those materials is unfortunately lost in translation when we use gallery tools to display them.
The Intersection of (Big OR Zoom-Dependent) AND Paginated
Here’s where things get interesting and quite a bit more complicated: when zooming, panning, page navigation, and system performance are all essential to interacting with a digital object. There are several tools out there that support these various aspects, but very few that do them all AND do them well. We knew we needed something that did.
Our Solution: Diva.js
Setting up Diva.js required us to add a few new pieces to our infrastructure. The most significant was an image server (in our case, IIPImage) that could 1) deliver parts of a digital image upon request, and 2) deliver complete images at whatever size is requested via URL parameters.
Our Interface: How it Works
By default, we present a document in our usual item page template that provides branding, context, and metadata. You can scroll up and down to navigate pages, use Page Up or Page Down keys, or enter a page number to jump to a page directly. There’s a slider to zoom in or out, or alternatively you can double-click to zoom in / Ctrl-double-click to zoom out. You can toggle to a grid view of all pages and adjust how many pages to view at once in the grid. There’s a really handy full-screen option, too.
It’s optimized for performance via AJAX-driven “lazy loading”: only the page of the document that you’re currently viewing has to load in your browser, and likewise only the visible part of that page image in the viewer must load (via square tiles). You can also download a complete JPG for a page at the current resolution by clicking the grey arrow.
We extended Diva.js by building a synchronized fulltext pane that displays the transcript of the current page alongside the image (and beneath it in full-screen view). That doesn’t come out-of-the-box, but Diva.js provides some useful hooks into its various functions to enable developing this sort of thing. We also slightly modified the styles.
Behind the scenes, we have pyramid TIFF images (one for each page), served up as JPGs by IIPImage server. These files comprise arrays of 256×256 JPG tiles for each available zoom level for the image. Let’s take page 1 of the declaration for example. At zoom level 0 (all the way zoomed out), there’s only one image tile: it’s under 256×256 pixels; level 1 is 4 tiles, level 2 is 12, level 3 is 48, level 4 is 176. The page image at level 5 (all the way zoomed in) includes 682 tiles (example of one), which sounds like a lot, but then again the server only has to deliver the parts that you’re currently viewing.
Every item using Diva.js also needs to load a JSON stream including the dimensions for each page within the document, so we had to generate that data. If there’s a transcript present, we store it as a single HTML file, then use AJAX to dynamically pull in the part of that file that corresponds to the currently-viewed page in the document.
Diva.js & IIPImage Limitations
It’s a good interface, and is the best document representation we’ve been able to provide to date. Yet it’s far from perfect. There are several areas that are limiting or that we want to explore more as we look to make more documents available in the future.
Out of the box, Diva.js doesn’t support page metadata, transcriptions, or search & retrieval within a document. We do display a synchronized transcript, but there’s currently no mapping between the text and the location within each page where each word appears, nor can you perform a search and discover which pages contain a given keyword. Other folks using Diva.js are working on robust applications that handle these kinds of interactions, but the degree to which they must customize the application is high. See for example, the Salzinnes Antiphonal: a 485-page liturgical manuscript w/text and music or a prototype for the Liber Usualis: a 2,000+ page manuscript using optical music recognition to encode melodic fragments.
Diva.js also has discrete zooming, which can feel a little jarring when you jump between zoom levels. It’s not the smooth, continuous zoom experience that is becoming more commonplace in other viewers.
With the IIPImage server, we’ll likely re-evaluate using Pyramid TIFFs vs. JPEG2000s to see which file format works best for our digitization and publication workflow. In either case, there are several compression and caching variables to tinker with to find an ideal balance between image quality, storage space required, and system performance. We also discovered that the IIP server unfortunately strips out the images’ ICC color profiles when it delivers JPGs, so users may not be getting a true-to-form representation of the image colors we captured during digitization.
Launching our first project using Diva.js gives us a solid jumping-off point for expanding our ability to provide useful, compelling representations of our digitized documents online. We’ll assess how well this same approach would scale to other potential projects and in the meantime keep an eye on the landscape to see how things evolve. We’re better equipped now than ever to investigate alternative approaches and complementary tools for doing this work.
We’ll also engage more closely with our esteemed colleagues in the Duke Collaboratory for Classics Computing (DC3), who are at the forefront of building tools and services in support of digital scholarship. Well beyond supporting discovery and access to documents, their work enables a community of scholars to collaboratively transcribe and annotate items (an incredible–and incredibly useful–feat!). There’s a lot we’re eager to learn as we look ahead.
Notes from the Duke University Libraries Digital Projects Team