Behind the Scenes, Collections, Digital Collections, Digitization Expertise, Equipment

Digital Projects and Production Services’ “Best Of” List, 2015

December 11, 2015 Molly Bragg

Its that time of year when all the year end “best of” lists come out, best music, movies, books, etc. Well, we could not resist following suit this year, so… Ladies in gentlemen, I give you in – no particular order – the 2015 best of list for the Digital Projects and Production Services department (DPPS).

Metadata Architect
In 2015, DPPS welcomed a new staff member to our team; Maggie Dickson came on board as our metadata architect! She is already leading a team to whip our digital collections metadata into shape, and is actively consulting with the digital repository team and others around the library. Bringing metadata expertise into the DPPS portfolio ensures that collections are as discoverable, shareable, and re-purposable as possible.

King Intern for Digital Collections
DPPS started the year with two large University Archives projects on our plates: the ongoing Duke University Chronicle digitization and a grant to digitize hundreds of Chapel recordings. Thankfully, University Archives allocated funding for us to hire an intern, and what a fabulous intern we found in Jessica Serrao (the proof is in her wonderful blogposts). The internship has been an unqualified success, and we hope to be able to repeat such a collaboration with other units around the library.

Tripod 3
Our digital project developers have spent much of the year developing the new Tripod3 interface for the Duke Digital Repository. This process has been an excellent opportunity for cross departmental collaborative application development and implementing Agile methodology with sprints, scrums, and stand up meetings galore! We launched our first collection not the new platform in October and we will have a second one out the door before the end of this year. We plan on building on this success in 2016 as we migrate existing collections over to Tripod3.

Repository ingest planning
Speaking of Tripod3 and the Duke Digital Repository, we have ingesting digital collections into the Duke Digital Repository since 2014. However, we have a plan to kick ingests up a notch (or 5). Although the real work will happen in 2016, the planning has been a long time coming and we are all very excited to be at this phase of the Tripod3 / repository process (even if it will be a lot of work). Stay tuned!

Digital Collections Promotional Card
This is admittedly a small achievement, but it is one that has been on my to-do list for 2 years so it actually feels like a pretty big deal. In 2015, we designed a 5 x 7 postcard to hand out during Digital Production Center (DPC) tours, at conferences, and to any visitors to the library. Also, I just really love to see my UNC fan colleagues cringe every time they turn the card over and see Coach K’s face. Its really the little things that make our work fun.

New Exhibits Website
In anticipation of opening of new exhibit spaces in the renovated Rubenstein library, DPPS collaborated with the exhibits coordinator to create a brand new library exhibits webpage. This is your one stop shop for all library exhibits information in all its well-designed glory.

Aggressive cassette rehousing procedures

Audio and Video Preservation
In 2014, the Digital production Center bolstered workflows for preservation based digitization. Unlike our digital collections projects, these preservation digitization efforts do not have a publication outcome so they often go unnoticed. Over the past year, we have quietly digitized around 400 audio cassettes in house (this doesn’t count outsourced Chapel Recordings digitization), some of which need to be dramatically re-housed.

On the video side, efforts have been sidelined by digital preservation storage costs. However some behind the scenes planning is in the works, which means we should be able to do more next year. Also, we were able to purchase a Umatic tape cleaner this year, which while it doesn’t sound very glamorous to the rest of the world, thrills us to no end.

Revisiting the William Gedney Digital Collection
Fans of Duke Digital Collections are familiar with the current Gedney Digital Collection. Both the physical and digital collection have long needed an update. So in recent years, the physical collection has been reprocessed, and this Fall we started an effort to digitized more materials in the collection and to higher standards than were practical in the late 1990s.

DPC's new work room — DPC’s new work room

Expanding DPC
When the Rubenstein Library re-opened, our neighbor moved into the new building, and the DPC got to expand into his office! The extra breathing room means more space for our specialists and our equipment, which is not only more comfortable but also better for our digitization practices. The two spaces are separate for now, but we are hoping to be able to combine them in the next year or two.

2015 was a great year in DPPS, and there are many more accomplishments we could add to this list. One of our team mottos is: “great productivity and collaboration, business as usual”. We look forward to more of the same in 2016!

Conferences, Technology

Star Wars: The Fans Strike Back

December 3, 2015 Alex Marsh 2 Comments

At the recent Association of Moving Image Archivists conference in Portland, Oregon, I saw a lot of great presentations related to film and video preservation. As a Star Wars fan, I found one session particularly interesting. It was presented by Jimi Jones, a doctoral student at the University of Illinois at Urbana-Champaign, and is the result of his research into the world of fan edits.

This is a fairly modern phenomenon, whereby fans of a particular film, music recording or television show, often frustrated by the unavailability of that work on modern media, take it upon themselves to make it available, irrespective of copyright and/or the original creator’s wishes. Some fan edits appropriate the work, and alter it significantly, to make their own unique version. Neither Jimi Jones nor AMIA is advocating for fan edits, but merely exploring the sociological and technological implications they may have in the world of film and video digitization and preservation.

An example is the original 1977 theatrical release of “Star Wars” (later retitled Star Wars Episode IV: A New Hope), a movie I spent my entire 1977 summer allowance on as a child, because I was so awestruck that I went back to my local theater to see it again and again. The version that I saw then, free of more recently superimposed CGI elements like Jabba The Hut, and the version in which Han Solo shoots Greedo in the Mos Eisley Cantina, before Greedo can shoot Solo, is not commercially available today via any modern high definition media such as Blu-Ray DVD or HD streaming.

The last time most fans saw the original, unaltered Star Wars Trilogy, it was likely on VHS tape (as shown above). George Lucas, the creator of Star Wars, insists that his more recent “Special Editions” of the Star Wars Trilogy, with the added CGI and the more politically-correct, less trigger-happy Han Solo, are the “definitive” versions. Thus Lucas has refused to allow any other version to be legally distributed for at least the past decade. Many Star Wars fans, however, find this unacceptable, and they are striking back.

Armed with sophisticated video digitization and editing software, a network of Star Wars fans have collaborated to create “Star Wars: Despecialized Edition,” a composite of assorted pre-existing elements that accurately presents the 1977-1983 theatrical versions of the original Star Wars Trilogy in high definition for the first time. The project is led by an English teacher in Czechoslovakia, who goes by the name of “Harmy” online and is referred to as a “guerilla restorationist.” Using BitTorrent, and other peer-to-peer networks, fans can now download “Despecialized,” burn it to Blu-Ray, print out high-quality cover art, and watch it on their modern widescreen TV sets in high definition.

The fans, rightly or wrongly, claim these are the versions of the films they grew up with, and they have a right to see them, regardless of what George Lucas thinks. Personally, I never liked the changes Lucas later made to the original trilogy, and I agree that “Han Shot First,” or to paraphrase Johnny Cash, “I shot a man named Greedo, just to watch him die.” We all know Greedo was a scumbag who was about to kill Solo anyway, so Han’s preemptive shot in the original Star Wars makes perfect sense. I’m not endorsing piracy, but, as a fan, I certainly understand the pent-up demand for “Star Wars: Despecialized Edition.”

The psychology of nostalgia is interesting, particularly when fans desire something so intensely, they will go to great lengths, technologically, and otherwise, to satiate that need. Absence makes the heart, or fan, grow stronger. This is not unique to Star Wars. For instance, Neil Young, one of the best songwriters of his generation, released a major-label record in 1973 called “Time Fades Away,” which, to this day, has never been released on compact disc.

The album, recorded on tour while his biggest hit single, “Heart of Gold,” was topping the charts, is an abrupt shift in mood and approach, and the beginning of a darker, more desolate string of albums that fans refer to as “The Ditch Trilogy.” Regarding this period, Neil said: “Heart of Gold put me in the middle of the road. Traveling there soon became a bore, so I headed for the ditch. A rougher ride but I saw more interesting people there.” Many fans, myself included, regard the three records that comprise the ditch trilogy as his greatest achievement, due to their brutal honesty, and Neil’s absolute refusal to play it safe by coasting on his recent mainstream success. But for Neil, Time Fades Away brings up so many bad memories, particularly regarding the death of his guitarist, Danny Whitten, that he has long refused to release it on CD.

In 2005, Neil Young fans began gathering at least 14,000 petition signatures to get the album released on compact disc, but that yielded no results. So many took it upon themselves, using modern technology, to meticulously transfer mint-condition vinyl copies of “Time Fades Away” from their turntable to desktop computer using widely available professional audio software, and then burn the album to CD. Fans also scanned the original cover art from the vinyl record, and made compact disc covers and labels that closely approximate what it would look like if the CD had been officially released.

Other fans, using peer-to-peer networks, were able to locate a digital “test pressing” of the audio for a future CD release that was nixed by Neil before it went into production. Combining that test pressing audio, free of vinyl static, with professional artwork, the fans were essentially able to produce what Neil refused to allow, a pristine-sounding, and professionally-looking version of Time Fades Away on compact disc. Perhaps in response, Neil, has, just in the last year, allowed Time Fades Away to be released in digital form via his high-resolution 192.0kHz/24bit music service, Pono Music.

It’s clear that the main intent of the fans of Star Wars, Time Fades Away and other works of art is not to profit off their hybrid creations, or to anger the original creators. It’s merely to finally have access to what they are so nostalgic about. Ironically, if it wasn’t for the unavailability of these works, a lot of this community, creativity, software mastery and “guerrilla restoration” would not be taking place. There’s something about the fact that certain works are missing from the marketplace, that makes fans hunger for them, talk about them, obsess about them, and then find creative ways of acquiring or reproducing them.

This is the same impulse that fuels the fire of toy collectors, book collectors, garage-sale hunters and eBay bidders. It’s this feeling that you had something, or experienced something magical when you were younger, and no one has the right to alter it, or take access to it away from you, not even the person who created it. If you can just find it again, watch it, listen to it and hold it in your hands, you can recapture that youthful feeling, share it with others, and protect the work from oblivion. It seems like just yesterday that I was watching Han Solo shoot Greedo first on the big screen, but that was almost 40 years ago. “’Cause you know how time fades away.”

Behind the Scenes, Digital Collections, Technology, User Experience

Zoomable Hi-Res Images: Hopping Aboard the OpenSeadragon Bandwagon

November 20, 2015 Sean Aery

Our new W. Duke & Sons digital collection (released a month ago) stands as an important milestone for us: our first collection constructed in the (Hydra-based) Duke Digital Repository, which is built on a suite of community-built open source software. Among that software is a remarkable image viewer tool called OpenSeadragon. Its website describes it as:

“an open-source, web-based viewer for high-resolution zoomable images, implemented in pure Javascript, for desktop and mobile.”

OpenSeadragon viewer in action on W. Duke & Sons collection.

OpenSeadragon zoomed in, W. Duke & Sons collection.

In concert with tiled digital images (we use Pyramid TIFFs), an image server (IIPImage), and a standard image data model (IIIF: International Image Interoperability Framework), OpenSeadragon considerably elevates the experience of viewing our image collections online. Its greatest virtues include:

smooth, continuous zooming and panning for high-resolution images
open source, built on web standards
extensible and well-documented

We can’t wait to get to share more of our image collections in the new platform.

OpenSeadragon Examples Elsewhere

Arthur C. Clarke’s Third Law states, “Any sufficiently advanced technology is indistinguishable from magic.” And looking at high-res images in OpenSeadragon feels pretty darn magical. Here are some of my favorite implementations from places that inspired us to use it:

The Metropolitan Museum of Art. Zooming in close on this van Gogh self-portrait gives you a means to inspect the intense brushstrokes and texture of the canvas in a way that you couldn’t otherwise experience, even by visiting the museum in-person.
Self-Portrait with a Straw Hat (obverse: The Potato Peeler). Vincent van Gogh, 1887.
Chronicling America: Historic American Newspapers (Library of Congress). For instance, zoom to read in the July 21, 1871 issue of “The Sun” (New York City) about my great-great-grandfather George Aery’s conquest being crowned the Schuetzen King, sharpshooting champion, at a popular annual festival of marksmen.

The sun. (New York [N.Y.]), 21 July 1871. Chronicling America: Historic American Newspapers. Lib. of Congress.
Other GLAMs. See these other nice examples from The National Gallery of Art, The Smithsonian National Museum of American Museum, NYPL Digital Collections, and Digital Public Library of America (DPLA).

OpenSeadragon’s Microsoft Origins

The software began with a company called Sand Codex, founded in Princeton, NJ in 2003. By 2005, the company had moved to Seattle and changed its name to Seadragon Software. Microsoft acquired the company in 2006 and positioned Seadragon within Microsoft Live Labs.

In March 2007, Seadragon founder Blaise Agüera y Arcase gave a TED Talk where he showcased the power of continuous multi-resolution deep-zooming for applications built on Seadragon. In the months that followed, we held a well-attended staff event at Duke Libraries to watch the talk. There was a lot of ooh-ing and aah-ing. Indeed, it looked like magic. But while it did foretell a real future for our image collections, at the time it felt unattainable and impractical for our needs. It was a Microsoft thing. It required special software to view. It wasn’t going to happen here, not when we were making a commitment to move away from proprietary platforms and plugins.

Sometime in 2008, Microsoft developed a more open Javascript-based version of Seadragon called Seadragon Ajax, and by 2009 had shared it as open-source software via a New BSD license. That curtailed many barriers for use, however it still required a Microsoft server-side framework and Microsoft AJAX library. So in the years since, the software has been re-engineered to be truly open, framework-agnostic, and has thus been rebranded as OpenSeadragon. Having a technology that’s this advanced–and so useful–be so open has been an incredible boon to cultural heritage institutions and, by extension, to the patrons we serve.

Setup

OpenSeadragon’s documentation is thorough, so that helped us get up and running quickly with adding and customizing features. W. Duke & Sons cards were scanned front & back, and the albums are paginated, so we knew we had to support navigation within multi-image items. These are the key features involved:

Sequence Mode. Previous/Next navigation through an image sequence.
Image Reference Strip. Clickable thumbnails.
Viewport Navigator. Small thumbnail showing current position in overall image.
IIIF Tile Sources. Provide the viewer with an array of IIP-generated IIIF info.json URLs: it does the rest.

Customizations

Some aspects of the interface weren’t quite as we needed them to be out-of-the-box, so we added and customized a few features.

Custom Button Binding. Created our own navigation menu to match our site’s more modern aesthetic.
Page Indicator / Jump to Page. Developed a page indicator and direct-input page jump box using the OpenSeadragon API
Styling. Revised the look & feel with additional CSS & Javascript.

Future Directions: Page-Turning & IIIF

OpenSeadragon does have some limitations where we think that it alone won’t meet all our needs for image interfaces. When we have highly-structured paginated items with associated transcriptions or annotations, we’ll need to implement something a bit more complex. Mirador (example) and Universal Viewer (example) are two example open-source page-viewer tools that are built on top of OpenSeadragon. Both projects depend on “manifests” using the IIIF presentation API to model this additional data.

The Hydra Page Turner Interest Group recently produced a summary report that compares these page-viewer tools and features, and highlights strategies for creating the multi-image IIIF manifests they rely upon. Several Hydra partners are already off and running; at Duke we still have some additional research and development to do in this area.

We’ll be adding many more image collections in the coming months, including migrating all of our existing ones that predated our new platform. Exciting times lie ahead. Stay tuned.

Animated Demo

Collections, Digitization Expertise, Uncategorized

William Gedney Wants Me To Build A Darkroom

Gallery November 13, 2015 Mike Adamo 1 Comment

The initial thought I had for this blog post was to describe a slice of my day that revolved around the work of William Gedney. I was going to spin a tale about being on the hunt for a light meter to take lux (luminance) readings used to help calibrate the capture environment of one of our scanners. On my search for the light meter I bumped into the new exhibit of William Gedney’s handmade books displayed in the Chappell Family Gallery in the Perkins Library. I had digitized a number of these books a few months ago and enjoyed pretty much every image in the books. One of the books on display was opened to a particular photograph. To my surprise, I had just digitized a finished print of the same image that very morning while working on a larger project to digitize all of Gedney’s finished prints, proof prints, contact sheets and other material. Once the project is complete (a year or so from now) I will have personally seen, handled and digitized over 20,000 of Gedney’s photographs. Whoa! Would I be able to recognize Gedney images whenever one presented itself just like the book in the gallery? Maybe.

Once the collection is digitized and published through Duke Digital Collections the whole world will be able to see this amazing body of work. Instead of boring you with the details of that story I thought I would just leave you with a few images from the collection. For me, many of Gedney’s photographs have a kinetic energy to them. It seems as if I can almost feel the air. My imagination may be working overtime to achieve this and the reality of what was happening when the photograph was taken may be wholly different but the fact is these photographs spin up my imagination and transport me to the moments he has captured. These photographs inspire me to dust off my enlarger and set up a darkroom.

It may take some time to complete this particular project but there are other William Gedney related projects, materials and events available at Duke.

Behind the Scenes, Digital Collections, Projects

Onwards, Outwards: Remediating Metadata for Migration to Tripod3

November 6, 2015 Maggie Dickson

Duke University Libraries has been sharing its rich resources by creating and publishing digital collections for more than 20 years (remember the Scriptorium?), and to date the digital projects team stewards more than 100 collections consisting of 191,000+ items. Over the years the technologies and practices employed to deliver this content have changed often and drastically. Two weeks ago, we announced the latest iteration of our digital collections interface with the release of the Tripod3. Currently Tripod3 only features one collection – W. Duke, Sons & Co. Advertising Materials, 1880-1910 – but in early 2016 all of the digital collections currently being delivered using Tripod2 (the predecessor to Tripod3) will be migrated to the new system.

And that’s where I come in… Hello! I’m Maggie – DUL’s newly hired Metadata Architect, and I’ve been here for about a month and a half. Now that I’ve got my sea legs, I’m embarking on a project to remediate all of the Tripod2 metadata in advance of its migration to Tripod3. I’m not going it alone, though – we’ve formed a task group to guide this process as well as make recommendations for the ongoing creation and management of metadata associated with all materials in the Duke Digital Repository (and beyond).

Back in the day – metadata in the time of the Scriptorium:

Back in 1995, when the first digital collections were being created, the focus was on providing access to those collections in a standalone way, and little thought was given to cross-collection and federated searching and browsing, because the capabilities just hadn’t evolved yet. We were still pretty excited about hypertext. Metadata standards and practices for digital collections were in their nascent stages, as well, and so their application was spotty and inconsistent. This resulted in the ‘silo-ization’ of our digital collections. Now, we have a robust, consolidated preservation and access system and the capability to share our collections much more broadly through aggregators such as the Digital Public Library of America. And LINKED DATA, y’all! But the discovery and access of our resources, even in the most sophisticated of systems, is only as good as the metadata used to describe them.

Just about yesterday – a Tripod2 metadata record:

And now, Tripod 3:

Remediating all of the legacy metadata is a big job – turns out you can create a LOT of metadata over the course of 20 years. Expressed as RDF, we have more than two million statements. And inevitably, as it’s been created over many years, by many people of varying backgrounds and experience, and according to many different practices and standards, it’s a mess. So, in the coming weeks and months, we’ll be tackling each and every one of the 85 (!) fields used in the creation of DUL’s digital collections, assessing usage and mappings and doing a whole heck of a lot of data munging (thank goodness for OpenRefine). And we’ll be diving into the world of linked data and reconciling our metadata against linked open data sets wherever possible.

A visualization of the current state of Tripod2 metadata fields and Dublin Core mappings:

Once we’re finished with this project, our metadata will not only be beautiful, it will lend itself to a much more comprehensible experience for our users, as well as the ability to effectively and efficiently share our materials broadly. We’ve only just begun this work and will report on our progress periodically. Please stay tuned!

Projects

The Duke-SLP Partnership Continues with the SNCC Digital Gateway

October 27, 2015 Kaley Deal

Content production is deep underway here at the SNCC Digital Gateway, a continuation of the collaboration between Duke University, the SNCC Legacy Project, and Movement scholars that created the One Person, One Vote website. Our project room is piled high with books about the Movement, our walls covered with information about source documents and citation, and our workshop sessions are rich in conversation about who SNCC was, what SNCC did, and what SNCC’s legacy is today.

Over the past few months, the project has been working to lay the digital groundwork for the website. Before beginning the conversation with design contractors about the vision for the SNCC Digital Gateway, we first had to explore some of the challenges of working with a digital platform ourselves.

Lucky for us, the library has a wealth of knowledge about web development on the third floor of Bostock.

Unlike a book, there is no straight-forward beginning, middle, and end to a website, and there are limits on the amount of text that we can put on a page. So, how do we present this material in a way that keeps the user engaged? How can we have multiple access points to this content while still keeping it grounded in the larger narrative? How will the users want to approach this material, and how do we hope to steer them?

Rather than following a linear exploration of this history, the SNCC Digital Gateway emphasizes the layering of ideas, people, and places. It recognizes the importance of chronology for contextual understanding but is not driven by it. It emphasizes the need to document not only the stories of those involved in the Movement, but also how they organized, the local landscape of where they organized, and the kinds of conversations they were having. It hopes to tie the narrative of SNCC and other Movement veterans to today’s struggles, exploring history to not only understand the roots of systemic oppression but to provide tools for organizing today.

Why, Brinck, we would love to design a website that works!

We asked ourselves, how do you start to organize all of this information? Well, why not pick up an Expo marker and start drawing on the walls (if you’re in The Edge, of course)? This is, at least, what SNCC Digital Gateway team did this past spring.

Wireframe after wireframe, we began to piece together an information architecture for the site content. With a projected scope of hundreds of discrete pages, each with written content, embedded primary source documents, and audio/visual material, it was clear that this would have to be carefully planned so that the user wouldn’t get lost or overwhelmed.

We want users to be able to engage with the site differently each time they visit – following thematic threads through SNCC’s history, understanding the political landscape before SNCC came to the scene, delving into defining moments that spurred ideological shifts in the organization, seeing what the complexity of this narrative and these relationships meant to different people. And we want to do so in a way that even a 5^th grader can understand.

Hopefully our site will be a little easier to follow than this affinity model found in _Information Architecture for the World Wide Web_.

Armed with book upon book about site design and navigation, we’ve tried to find a way to break with the typical hierarchical site structure and find one that was more suited to the fluidity of our content’s dimensions. We settled on having two main entry points into the content: chronological and thematic. We will continue to produce profiles that are tied to different geographic areas and have a section that explore SNCC’s internal and external network and relationships. But these will be connected to the thematic/chronological core of the site, so that the user can easily navigate between all different categories and types of content.

Without going into the nitty gritty, we’ve pulled together the skeleton for our site (just in time for Halloween) and have begun to flesh out this conversation with a design contractor. Not only is it important for us to think about how to tell the story of SNCC, it’s important for us to think about how to present it.

Conferences

Who Are you and Why are you Here: a Duke Digital Collections Poster

October 27, 2015 Molly Bragg

This week, my colleague Will Sexton and I (as well as several other Duke folks) are attending the Digital Library Federation conference in beautiful Vancouver, British Columbia. While here, we presented a poster on our work to assess scholarly use of digital collections. Please have a look at our poster below.

DC.DLF-poster.FINAL copy — Click image for a larger version

If you are interested in learning more about our assessment project, check out these previous blog posts:

We will also publish a report based on our survey findings sometime in the next few months – so stay tuned!

Announcements, Digital Collections, New Collections

Today is the New Future: The Tripod3 Project and our Next-Gen UI for Digital Collections

October 22, 2015 Will Sexton

Yesterday was Back to the Future day, and the Internet had a lot of fun with it. I guess now it falls to each and every one of us, to determine whether or not today begins a new future. It’s certainly true for Duke Digital Collections.

Today we roll out – softly – the first release of Tripod3, the next-generation platform for digital collections. For now, the current version supports a single, new collection, the W. Duke, Sons & Co. Advertising Materials, 1880-1910. We’re excited about both the collection – which Noah Huffman previewed in this blog almost exactly a year ago – and the platform, which represents a major milestone in a project that began nearly a year ago.

The next few months will see a great deal more work on the project. We have new collections scheduled for December and the first quarter of 2016, we’ll gradually migrate the collections from our existing site, and we’ll be developing the features and the look of the new site in an iterative process of feedback, analysis, and implementation. Our current plan is to have nearly all of the content of Duke Digital Collections available in the new platform by the end of March, 2016.

The completion of the Tripod3 project will mean the end of life for the current-generation platform, which we call, to no one’s surprise, Tripod2. However, we have not set an exact timeline for sunsetting Tripod2. During the transitional phase, we will do everything we can to make the architecture of Duke Digital Collections transparent, and our plans clear.

After the jump, I’ll spend the rest of this post going into a little more depth about the project, but want to express my pride and gratitude to an excellent team – you know who you are – who helped us achieve this milestone.

Continue reading Today is the New Future: The Tripod3 Project and our Next-Gen UI for Digital Collections →

Projects, Technology

Using Community-Built, Open-Source Software to Build a New Digital Collections Platform

October 19, 2015 Cory Lown

The Library’s Digital Projects Services department has been working with Digital Repository Services on a software project that will eventually replace our existing Digital Collections platform. There will be future posts announcing the new way of discovering and accessing Duke’s Digital Collections, but I want to use this post to reflect on the tools and practices we’ve been using to build this new application.

There are a few important differences between this not yet released new application and our current system. One is that Digital Collections will be part of the library’s Digital Repository, which includes a much broader range of digital items and collections. The second is that since the repository is being developed using Project Hydra, we’re using a component of the Hydra stack, Project Blacklight, as the discovery and access layer for Digital Collections.

The Blacklight Wiki explains that:

Blacklight is an open source, Ruby on Rails Engine that provides a basic discovery interface for searching an Apache Solr index, and provides search box, facet constraints, stable document urls, etc., all of which is customizable via Rails (templating) mechanisms.

The Blacklight Development Google Group has posts going back to 2009, and the GitHub repository has commits back to 2009 as well. So, the project’s been actively developed and used for a while. The Project Blacklight website maintains a list of different implementations of the software, where you can see the range of interfaces it has been used to develop.

One of the benefits of using a widely adopted open source platform is access to a community of developers who use the same software. I was able to solve many problems just by searching the Blacklight Development Google Group for answers. Blacklight made it easy to get a basic interface up and running quickly and provided a platform to add local customizations. Because the basics were already in place we were able to spend our time on more specialized features and local requirements. For example, specifying which search filters should appear for a collection and what metadata fields should be included in search were as easy as adding a few lines of configuration code to the application.

Even for some of the more specialized features, we’ve relied as much as possible on available add-ons and tools to add features to Blacklight. Because of this we’ve been able to add advanced features to the new application that did not require a large amount of development time. For example, we’re using the Blacklight Range Limit Ruby Gem to add a visual date picker with a histogram for searching the collections by year.

We also used the Blacklight Gallery Ruby Gem to add an option to view search results as a gallery with larger thumbnails.

Both of these features were relatively easy to implement because we were able to make use of plugins shared with the Blacklight community.

Another new (to us) tool we’re using is the IIPImage server for serving images to the application. Because the image server automatically creates and then returns the right size image based on parameters sent in a request, we don’t have to pre-generate thumbnails of various sizes to support different displays in the application. The image server can even crop images. Because the image server stores the images as Pyramid TIFFs, we’re able to provide very smooth and fast in-browser pan and zoom of images, which works similarly to Google maps. To get a better idea of what this means for exploring high resolution images in your browser, you can explore some of the examples on the IIPImage site.

To manage this project we’ve been following Agile project management techniques, which for us meant taking an iterative approach to designing and building features of the application in two week sprints. At the beginning of each sprint we decide what we’re going to work on from a backlog of user stories, and our goal by the end of the two weeks is to have a version of the code that is working and deployed with these features implemented. Each day we have a 15-minute stand-up meeting during which each person reviews what they worked on yesterday, explains what they’re going to work on today, and then notes anything that’s blocking their progress. These quick, daily meetings have helped keep the project moving by increasing communication and helping to focus our work.

We’re still putting some pieces in place, so our new platform for publishing Digital Collections isn’t available yet, but look for it soon along with more information about the project and its first published collection.

Digital Collections

Google Analytics and Digitized Cultural Heritage

October 9, 2015 Jessica Serrao 2 Comments

For centuries, cultural heritage institutions—like libraries and archives—monitored the use of their collections through varying means of counting and recording. From rare manuscripts used in special collections reading rooms to the copy of Moby Dick checked out at the circulation desk, we like to keep note of who is using what. But what about those digitized special collections that patrons use more and more often? How do we monitor use of materials when they live on websites and are accessed remotely by computers, tablets, and smartphones? That’s where web analytics comes into play.

Google Analytics is by far the largest analytics aggregator today, and it is what many cultural heritage institutions turn to for data on digital collections. We can now rely on pageviews and sessions, and a plethora of other metrics, to inform us how patrons are using materials online.

Recently, I began examining the use of Duke University Archives’ digital collections to see what I could find. I quickly found that I was lost. Google Analytics is so overwhelmingly abundant with data, what I’d venture to call a statistical minefield (or ninja warrior obstacle course?), that I found myself in a fog of confusion. Don’t get me wrong, these data sets can be extremely useful if you know what you’re doing. It just took me a while to get my bearings and slowly crawl out of the fog.

With that said, if you’re interested in learning more, use every resource available to wrap your head around what Google Analytics offers and how it can help your institution. Google provides a set of tutorials at Analytics Academy. Another site, Lynda.com is a great subscription resource that may be accessible through institutional memberships. Don’t rule out YouTube either. I also learned a lot of the basics from Molly Bragg, my supervisor, who is on the Digital Library Federation Assessment Interest Group’s (DLF AIG) Analytics subcommittee. They’ve been working on a white paper to lay out digital library analytics best practices, which they hope will help steer cultural heritage institutions in the right direction.

In my own experience scouring usage data from the Duke Chapel Recordings collection, I found many rather predictable results: most users come from North Carolina, Durham in particular.

But then there were strange statistics that can sometimes be hard to figure out. Like why is Texas our third highest state for traffic, with 7% of our sessions originating there?

Of Texas’ total sessions, 22% viewed webpages relating to Carlyle Marney’s sermons. For much of the 1970s, Marney was a visiting professor at Duke’s Divinity School, but this web traffic all originated in Austin, TX. Doing some internet digging, I found that in the 1940s and 1950s, Marney was a pastor and seminary professor in Austin. It is understandable why the interest in his sermons comes from a region in Texas that is likely familiar with his pastoral work.

I also found that referrals from our very own Bitstreams blog make up a portion of the traffic to the collection. That explains some of our spikes in pageviews, which correspond with blog post dates. This is proof that social media does generate traffic!

Once that disorienting fog has lifted, and you have navigated the statistical minefield, you might just find that analytics can be fun. Now it doesn’t look so much like a minefield but a gold mine.

Have you found analytics useful at your cultural heritage institution? We’d love to hear from you!