Category Archives: Technology

Mapping Duke University Libraries’ Discovery System Environment

Just over one year ago, Duke University Library’s Web Experience team charged a new subgroup – the Discovery Strategy Team – with “providing cohesion for the Libraries’ discovery environment and facilitate discussion and activity across the units responsible for the various systems and policies that support discovery for DUL users”. Jacquie Samples, head of the Metadata and Discovery Strategy Department in our Technical Services Unit, and I teamed up to co-chair the group, and we were excited to take on this critical work along with 8 of our colleagues from across the libraries.

Our first task was one that had long been recognized as a need by many people throughout the library – to create an up-to-date visualization of the systems that underpin DUL’s discovery environment, including the data sources, data flows, connections, and technical/functional ownership for each of these systems. Our goal was not to depict an ideal discovery landscape but rather to depict things as they are now (ideal could come later).

Before we could create a visualization of these systems and how they interacted, however, we realized we needed to identify what they were! This part of the process involved creating a giant laundry list of all of systems in the form of a google spreadsheet, so we could work on it collaboratively and iteratively. This spreadsheet became the foundation of the document we eventually produced, containing contextual information about the systems including:

  • Name(s) of the system
  • Description/Notes
  • Host
  • Path
  • Links to documentation
  • Technical & functional owners

Once we had our list of systems to work from, we began the process of visualizing how they work here at DUL. Each meeting of the team involved doing a lot of drawing on the whiteboard as we hashed out how a given system works – how staff & other systems interact with it, whether processes are automated or not, frequency of those processes, among other attributes. At the end of these meetings we would have a messy whiteboard drawing like this one:

We were very lucky to have the talented (and patient!) developer and designer Michael Daul on the team for this project, and his role was to take our whiteboard drawings and turn them into beautiful, legible visualizations using Lucidchart:

Once we had created visualizations that represented all of the systems in our spreadsheet, and shared them with stakeholders for feedback, we (ahem, Michael) compiled them into an interactive PDF using Adobe InDesign. We originally had high hopes of creating a super cool interactive and zoomable website where you could move in and out to create dynamic views of the visualizations, but ultimately realized this wouldn’t be easily updatable or sustainable. So, PDF it is, which may not be the fanciest of vehicles but is certainly easily consumed.

We’ve titled our document “Networked Discovery Systems at DUL”, and it contains two main sections: the visualizations that graphically depict the systems, and documentation derived from the spreadsheet we created to provide more information and context for each system. Users can click from a high-level view of the discovery system universe to documentation pages, to granular views of particular ‘constellations’ of systems. Anyone interested in checking it out can download it from this link

We’ve identified a number of potential use cases for this documentation, and hope that others will surface:

  • New staff orientation
  • Systems transparency
  • Improved communication
  • Planning
  • Troubleshooting

We’re going to keep iterating and updating the PDF as our discovery environment shifts and changes, and hope that having this documentation will help us to identify areas for improvement and get us closer to achieving that ideal discovery environment.

Fun with Solr Queries

Apache Solr is behind many of our systems that provide a way to search and browse via a web application (such as the Duke Digital Repository, parts of our Bento search application, and the not yet public next generation TRLN Discovery catalog). It’s a tool for indexing data and provides a powerful query API. In this post I will document a few Solr querying techniques that might be interesting or useful. In some cases I won’t be able to provide live links to queries because we restrict direct access to Solr. However, many of these Solr querying techniques can be used directly in an application’s search box. In those cases, I will include live links to example queries in the Duke Digital Repository.

Find a list of items from their identifiers.

With this query you can specify exactly what items you want to appear in a search result from a list of identifiers.


id:"duke:448098" OR id:"duke:282429" OR id:"duke:142581"
Try it in the Duke Digital Repository

Find all records that have a value (any value) in a specific field.

This query will find all the items in the repository that have a value in the product field. (As with most of these queries, you must know the field name in Solr.)


Try it in the Duke Digital Repository

Find all the items in the repository that are missing a field value.

You can find all items in the repository that don’t have any date metadata. Inquiring minds want to know.


-date_tesim:[* TO *]
Try it in the Duke Digital Repository

Find items using a begins-with (left-anchored) query.

I want to see all items that have a subject term that begins with “Soviet Union.” The example is a left-anchored query and will exactly match fields that begin with “Soviet Union.” (Note, the field must not be tokenized for this to work as expected.)


subject_facet_sim:/Soviet Union.*/
Try it in the Duke Digital Repository

Find items with an ends-with (right-anchored) query.

Again, this will only work as expected with an untokenized field.


subject_facet_sim:/.*20th century/
Try it in the Duke Digital Repository

Some of you might have noticed that these queries look a lot like regular expressions. And you’re right! Read more about Solr’s support for regular expression queries.

The following examples require direct access to Solr, which is restricted to authorized users and applications. Instead of providing live links, I’ll show the basic syntax, a complete example query using http://localhost:8983/solr/core/* as the sample URL for a Solr index, and a sample response from Solr.

Count instances of values in a field.

I want to know how many items in the repository have a workflow state of published and how many are unpublished. To do that I can write a facet query that will count instances of each value in the specified field. (This is another query that will only work as expected with an untokenized field.)



Solr Response (truncated)

<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="workflow_state_ssi">
<int name="published">484075</int>
<int name="unpublished">2228</int>

Collapse multiple records into one result based on a shared field value.

This one is somewhat advanced and likely only useful in particular circumstance. But if you had multiple records that were slight variants of each other, and wanted to collapse each variant down to a single result you can do that with a collapse query — as long as the records you want to collapse share a value.



  • !collapse instructs Solr to use the Collapsing Query Parser.
  • field=oclc_number instructs Solr to collapse records that share the same value in the oclc_number field.
  • nullPolicy=expand instructs Solr to return any document without a matching OCLC as part of the result set. If this is excluded then records that don’t share an oclc_number with another record will be excluded from the results.
  • max=termfreq(institution,duke) instructs Solr to select as the representative record when collapsing multiple records the one that has the value “duke” in institution field.

CSV response writer (or JSON, Ruby, etc.)

Solr has a number of tricks up its sleeve when it comes to returning results. By default it will return results as XML. You can also specify JSON, or Ruby. You specify a response writer by adding the wt parameter to the URL (wt=json or wt=ruby, etc.).

Solr will also return results as a CSV file, which can then be opened in an Excel spreadsheet — a useful feature for working with metadata.



Solr Response

duke:194006,Sun Bowl...Sun City...
duke:194002,Sun Bowl...Sun City...
duke:194009,Sun Bowl...Sun City.
duke:194019,Sun Bowl...Sun City.
duke:194037,"Sun City\, Sun Bowl"
duke:194036,"Sun City\, Sun Bowl"
duke:194030,Sun City
duke:194073,Sun City
duke:335601,Sun Control
duke:355105,Proved! Fast starts at 30° below zero!

This is just a small sample of useful ways you can query Solr.

Adventures in 4K

When it comes to moving image digitization, Duke Libraries’ Digital Production Center primarily deals with obsolete videotape formats like U-matic, Betacam, VHS and DV, which are in standard-definition (SD). We typically don’t work with high-definition (HD) or ultra-high-definition (UHD) video because that is usually “born digital,” and doesn’t need any kind of conversion from analog, or real-time migration from magnetic tape. It’s already in the form of a digital file.

However, when I’m not at Duke, I do like to watch TV at home, in high-definition. This past Christmas, the television in my living room decided to kick the bucket, so I set out to get a new one. I went to my local Best Buy and a few other stores, to check out all the latest and greatest TVs. The first thing I noticed is that just about every TV on the market now features 4K ultra-high-definition (UHD), and many have high dynamic range (HDR).

Before we dive into 4K, some history is in order. Traditional, standard-definition televisions offered 480 lines of vertical resolution, with a 4:3 aspect ratio, meaning the height of the image display is 3/4 the dimension of the width. This is how television was broadcast for most of the 20th century. Full HD television, which gained popularity at the turn of the millennium, has 1080 pixels of vertical resolution (over twice as much as SD), and an aspect ratio of 16:9, which makes the height barely more than 1/2 the size of the width.

16:9 more closely resembles the proportions of a movie theater screen, and this change in TV specification helped to usher in the “home theater” era. Once 16:9 HD TVs became popular, the emergence of Blu-ray discs and players allowed consumers to rent or purchase movies, watch them in full HD and hear them in theater-like high fidelity, by adding 5.1 surround sound speakers and subwoofers. Those who could afford it started converting their basements and spare rooms into small movie theaters.

4K UHD has 3840 horizontal pixels and 2160 vertical pixels, twice as much resolution as HD, and almost five times more resolution than SD.

The next step in the television evolution was 4K ultra-high-definition (UHD) TVs, which have flooded big box stores in recent years. 4K UHD has an astounding resolution of 3840 horizontal pixels and 2160 vertical pixels, twice as much resolution as HD, and almost five times more resolution than SD. Gazing at the images on these 4K TVs in that Best Buy was pretty disorienting. The image is so sharp and finely-detailed, that it’s almost too much for your eyes and brain to process.

For example, looking at footage of a mountain range in 4K UHD feels like you’re seeing more detail than you would if you were actually looking at the same mountain range in person, with your naked eye. And high dynamic range (HDR) increases this effect, by offering a much larger palette of colors and more levels of subtle gradation from light to dark. The latter allows for more detail in the highlight and shadow areas of the image. The 4K experience is a textbook example of hyperreality, which is rapidly encroaching into every aspect of our modern lives, from entertainment to politics.

The next thing that dawned on me was: If I get a 4K TV, where am I going to get the 4K content? No television stations or cable channels are broadcasting in 4K and my old Blu-ray player doesn’t play 4K. Fortunately, all 4K TVs will also display 1080p HD content beautifully, so that warmed me up to the purchase. It mean’t I didn’t have to immediately replace my Blu-ray player, or just stare at a black screen night after night, waiting for my favorite TV stations to catch up with the new technology.

The salesperson that was helping me alerted me to the fact that Best Buy also sells 4K UHD Blu-ray discs and 4K-ready Blu-ray players, and that some content providers, like Netflix, are streaming many shows in 4K and in HDR, like “Stranger Things,” “Daredevil” and “The Punisher,” to name a few. So I went ahead with the purchase and brought home my new 4K TV. I also picked up a 4K-enabled Roku, which allows anyone with a fast internet connection and subscription to stream content from Netflix, Amazon and Hulu, as well as accessing most cable-TV channels via services like DirecTV Now, YouTube TV, Sling and Hulu.

I connected the new TV (a 55” Sony X800E) to my 4K Roku, ethernet, HD antenna and stereo system and sat down to watch. The 1080p broadcasts from the local HD stations looked and sounded great, and so did my favorite 1080p shows streaming from Netflix. I went with a larger TV than I had previously, so that was also a big improvement.

To get the true 4K HDR experience, I upgraded my Netflix account to the 4K-capable version, and started watching the new Marvel series, “The Punisher.” It didn’t look quite as razor sharp as the 4K images did in Best Buy, but that’s likely due to the fact that the 4K Netflix content is more compressed for streaming, whereas the TVs on the sales floor are playing 4K video in-house, that has very little, if any, compression.

As a test, I went back and forth between watching The Punisher in 4K UHD, and watching the same Punisher episodes in HD, using an additional, older Roku though a separate HDMI port. The 4K version did have a lot more detail than it’s HD counterpart, but it was also more grainy, with horizons of clear skies showing additional noise, as if the 4K technology is trying too hard to bring detail out of something that is inherently a flat plane of the same color.

Also, because of the high dynamic range, the image loses a bit of overall contrast when displaying so many subtle gradations between dark and light. 4K streaming also requires a fast internet connection and it downloads a lot of data, so if you want to go 4K, you may need to upgrade your ISP plan, and make sure there are no data caps. I have a 300 Mbps fiber connection, with ethernet cable routed to my TV, and that works perfectly when I’m streaming 4K content.

I have yet to buy a 4K Blu-ray player and try out a 4K Blu-ray disc, so I don’t know how that will look on my new TV, but from what I’ve read, it more fully takes advantage of the 4K data than streaming 4K does. One reason I’m reluctant to buy a 4K Blu-ray player gets back to content. Almost all the 4K Blu-ray discs for sale or rent now are recently-made Hollywood movies. If I’m going to buy a 4K Blu-ray player, I want to watch classics like 2001: A Space Odyssey,” The Godfather,” “Apocalypse Now” and Vertigo” in 4K, but those aren’t currently available because the studios have yet to release them in 4K. This requires going back to the original film stock and painstakingly digitizing and restoring them in 4K.

Some older films may not have enough inherent resolution to take full advantage of 4K, but it seems like films such as “2001: A Space Odyssey,” which was originally shot in 65 mm, would really be enhanced by a 4K restoration. Filmmakers and the entertainment industry are already experimenting with 8K and 16K technology, so I guess my 4K TV will be obsolete in a few years, and we’ll all be having seizures while watching TV, because our brains will no longer be able to handle the amount of data flooding our senses.

Prepare yourself for 8K and 16K video.


Interactive Transcripts have Arrived!

Interactive Transcripts have Arrived!

This week Duke Digital Collections added our first set of interactive transcripts to one of our newest digital collections: the Silent Vigil (1968) and Allen Building Takeover (1969) collection of audio recordings.   This marks an exciting milestone in the accessibility efforts Duke University Libraries has been engaged in for the past 2.5 years. Last October, my colleague Sean wrote about our new accessibility features and the technology powering them, and today I’m going to tell you a little more about why we started these efforts as well as share some examples.

Interactive Transcript in the Silent Vigil (1968) and Allen Building Takeover (1969) Audio Recordings

Providing access to captions and transcripts is not new for digital collections.  We have been able to provide access to pdf transcripts and caption both in digital collections and finding aids for years. See items from the Behind the Veil and Memory Project digital collections for examples.

In recent years however, we stepped our efforts in creating captions and transcripts. Our work began in response to a 2015 lawsuit brought against Harvard and MIT by the National Association of the Deaf. The lawsuit triggered many discussions in the library, and the Advisory Council for Digital Collections eventually decided that we would proactively create captions or transcripts for all new A/V digital collections assuming it is feasible and reasonable to do so.  The feasible and reasonable part of our policy is key.  The Radio Haiti collection for example is composed of thousands of recordings primarily in Haitian Creole and French.  The costs to transcribe that volume of material in non-English languages make it unreasonable (and not feasible) to transcribe. In addition to our work in the library, Duke has established campus wide web accessibility guidelines that includes captioning and  transcription.  Therefore our work in digital collections is only one aspect of campus wide accessibility efforts.

To create transcripts and captions, we have partnered with several vendors since 2015, and we have seen the costs for these services drop dramatically.  Our primary vendor right now is Rev, who also works with Duke’s Academic Media Services department.  Rev guarantees 99% accurate captions or transcripts for $1/minute.

Early on, Duke Digital Collections decided to center our captioning efforts around the WebVTT format, which is a time-coded text based file and a W3C standard.  We use it for both audio and video captions when possible, but we can also accommodate legacy transcript formats like pdfs.  Transcripts and captions can be easily replaced with new versions if and when edits need to be made.

Examples from the Silent Vigil (1968) and Allen Building Takeover (1969) Audio Recordings

When WebVTT captions are present, they load in the interface as an interactive transcript.  This transcript can be used for navigation purposes; click the text and the file moves to that portion of the recording.

Click the image above to see the full item and transcript.

In addition to providing access to transcripts on the screen, we offer downloadable versions of the WebVTT transcript as a text file, a pdf or in the original webVTT format.

An advantage of the WebVTT format is that it includes “v” tags, which can be used to note changes in speakers and one can even add names to the transcript.  This can require additional  manual work if the names of the speakers is not obvious to the vendor, but we are excited to have this opportunity.

As Sean described in his blog post, we can also provide access to legacy pdf documents.  They cannot be rendered into an interactive version, but they are still accessible for download.

On a related note, we also have a new feature that links time codes listed in the description metadata field of an item to the corresponding portion of the audio or video file.  This enables librarians to describe specific segments of audio and/or video items.  The Radio Haiti digital collection is the first to utilize this feature, but the feature will be a huge benefit to the H. Lee Waters and Chapel Recordings digital collections as well as many others.

Click the image above to interact with linked time codes.

As mentioned at the top of this post, the Duke Vigil and Allen Building Takeover collection includes our first batch of interactive transcripts.  We plan to launch more this Spring, so stay tuned!!

A New & Improved Rubenstein Library Website

We kicked off the spring 2018 semester by rolling out a brand-new design for the David M. Rubenstein Library website.  The new site features updated imagery from the collections, better navigation, and more prominent presence for the exhibits currently on display.

Rubenstein website, Jan 2018
January 2018: New design for the Rubenstein Library homepage.


Much credit goes to Katie Henningsen and Kate Collins who championed the project.

Objectives for the Redesign

  • Make wayfinding from the homepage clearer (by reorganizing links into a primary dropdown navigation)
  • Dynamically feature Rubenstein Library exhibits that are currently on display
  • Improve navigation to key Rubenstein site pages from within research center / collection pages
  • Display larger images illustrative of the library’s distinctive and diverse collections
  • Retain aspects of the homepage that have been effective, e.g., hours and resource search boxes
  • Improve the site aesthetic

Internal Navigation

With a new primary navigation in hand on the Rubenstein homepage that links to key pages in the site, we began to explore ways to get visitors to those links in an unobtrusive way when they aren’t on the homepage.  Each research center within the library, e.g., the John W. Hartman Center for Sales, Advertising & Marketing History, has its own sub-site with its own secondary menus, which already contend a bit with the blue Duke Libraries menu in the masthead.  To avoid burying visitors in a Russian nesting doll of navigation, we decided to try dropping the RL menu down from the breadcrumb trail link so it’s tucked away, but still accessible when needed. We’re eager to learn whether or not this is effective.

Rubenstein primary navigation menu available from breadcrumb trail.

A Look Back

Depending on how you count, this is now the seventh or eighth homepage design for the Rubenstein Library (formerly the Rare Book, Manuscript, and Special Collections Library; formerly the Special Collections Library). I thought I’d take a quick stroll down memory lane, courtesy of the Internet Archive’s Wayback Machine, to reflect on how far we have come over the years.


Duke Special Collections Library website, 1996
October 1996 (explore)


  • prominent news, exhibits, and online collections
  • links to online SGML- and HTML-encoded finding aids (42 of them!)
  • a site search box powered by Excite!


April 1997 (explore)


  • two-column layout with a left-hand nav
  • digitized collections
  • a special collections newsletter called The Broadside
  • became the “Rare Book, Manuscript, and Special Collections Library” in 1997


December 2005 (explore Jan 2006)


  • color-coded navigation broken into three groups of links
  • image from the collections
  • featured exhibit with image
  • rounded corners and shadows
  • first use of a CMS (content management system named Cascade Server)*


August 2007 (explore)


  • first time sharing a masthead with rest of the Duke University Libraries
  • retained the lists of links, single collection image, and featured exhibit from previous iteration


September 2011 (explore)


  • renamed as the David M. Rubenstein Rare Book & Manuscript Library
  • first time with catalog and finding aids search boxes on the homepage
  • first appearance of social media & RSS icons
  • first iteration to display library hours
  • first news carousel appearance


January 2014 (explore)


  • new site in Drupal content management system
  • first responsive RL website (works well on mobile devices)
  • array of vertical image panels from the collections
  • extended color palette to match Duke University website styles (at the time)
  • gradients and rounded buttons with shadows
  • first time able to search digital collections from RL homepage
  • first site with Login button for Aeon (Special Collections request system)


January 2017 (explore)



Rubenstein website, Jan 2018
January 2018


  • lightened the overall aesthetic
  • featured image cycling from selections at random (diagonally sliced using css clip-path polygons)
  • prominent current exhibits feed with images
  • a primary nav with dropdown menus

How long will this latest edition of the Rubenstein Library homepage stick around? Only time will tell, but we’ll surely continue to iterate, learn from the past, and improve with each attempt. For now, we’re pleased with the new site, and hope you will be as well.

* Revised Feb 9, 2018 to reflect that the first version using a content management system was in 2005 rather than 2007.

A Year in the Life of Digital Collections

2017 has been an action packed year for Digital Collections full of exciting projects, interface developments and new processes and procedures. This blog post is an attempt to summarize just a few of our favorite accomplishments from the last year. Digital Collections is truly a group cross-departmental collaboration here at Duke, and we couldn’t do complete any of the work listed below without all our colleagues across the library – thanks to all!

New Digital Collections Portal

Regular visitors to Duke Digital Collections may have noticed that our old portal ( now redirects to our new homepage on the Duke Digital Repository (DDR) public interface. We are thrilled to make this change! But never fear, your favorite collections that have not been migrated to DDR are still accessible either on our Tripod2 interface or by visiting new placeholder landing pages in the Digital Repository.

Audiovisual Features

Supporting A/V materials in the Digital Repository has been a major software development priority throughout 2017. As a result our A/V items are becoming more accessible and easier to share. Thanks to a year of hard work we can now do and support the following (we posted examples of these on a previous post).

  • Model, store and stream A/V derivatives
  • Share A/V easily through our embed feature (even on Duke WordPress sites- a long standing bug)
  • Finding aids can now display inline AV for DAOs from DDR
  • Clickable timecode links in item descriptions (example)
  • Display captions and interactive transcripts
  • Download and Export captions and transcripts (as .pdf, .txt.,or .vtt)
  • Display Video thumbnails & poster frames

Rights Statements and Metadata

Bitstreams recently featured a review of all things metadata from 2017, many of which impact the digital collections program. We are especially pleased with our rights management work from the last year and our rights statements implementation ( We are still in the process of retrospectively applying the statements, but we are making good progress. The end result will give our patrons a clearer indication of the copyright status of our digital objects and how they can be used. Read more about our rights management work in past Bitstreams posts.

Also this year in metadata, we have been developing integrations between ArchivesSpace (the tool Rubenstein Library uses for finding aids) and the repository (this is a project that has been in the works since 2015. With these new features Rubenstein’s archivist for metadata and encoding is in the process of reconciling metadata between ArchivesSpace and the Digital Repository for approximately 50 collections to enable bi-directional links between the two systems. Bi-directonal linking helps our patrons move easily from a digital object in the repository to its finding aid or catalog record and vice versa. You can read about the start of this work in a blog post from 2016.


At the end of 2016, Duke Libraries purchased Multispectral Imaging (MSI) equipment, and members of Digital Collections, Data and Visualization Studies, Conservation Services, the Duke Collaboratory for Classics Computing, and the Rubenstein Library joined forces to explore how to best use the technology to serve the Duke community. The past year has been a time of research, development, and exploration around MSI and you can read about our efforts on Bitstreams. Our plan is to launch an MSI service in 2018. Stay tuned!

Ingest into the Duke Digital Repository (DDR)

With the addition of new colleagues focussed on research data management, there have been more demands on and enhancements to our DDR ingest tools. Digital collections has benefited from more robust batch ingest features as well as the ability to upload more types of files (captions, transcripts, derivatives, thumbnails) through the user interface. We can also now ingest nested folders of collections. On the opposite side of the spectrum we now have the ability to batch export sets of files or even whole collections.

Project Management

The Digital Collections Advisory Committee and Implementation Team are always looking for more efficient ways to manage our sprawling portfolio of projects and services. We started 2017 with a new call for proposals around the themes of diversity and inclusion, which resulted in 7 successful proposals that are now in the process of implementation.

In addition to a thematic call for proposals, we later rolled out a new process for our colleagues to propose smaller projects in response to faculty requests, events or for other reasons. In other words, projects of a certain size and scope that were not required to respond to a thematic call for proposals. The idea being that these projects can be easily implemented, and therefore do not require extensive project management to complete. Our first completed “easy” project is the Carlo Naya photograph albums of Venice.

In 2016 (perhaps even back in 2015), the digital collections team started working with colleagues in Rubenstein to digitize the set of collections known in as “Section A”. The history of this moniker is a little uncertain, so let me just say that Section A is a set of 3000+ small manuscript collections (1-2 folders each) boxed together; each Section A box holds up to 30 collections. Section A collections are highly used and are often the subject of reproduction requests, hence they are perfect candidates for digitization. Our goal has been to set up a mass-digitization pipeline for these collections, that involves vetting rights, updating description, evaluating their condition, digitizing them, ingesting them into DDR, crosswalking metadata and finally making them publicly accessible in the repository and through their finding aids. In 2017 we evaluated 37 boxes for rights restrictions, updated descriptions for 24 boxes, assessed the condition of 31 boxes, digitized 19 boxes, ingested 4 boxes, crosswalked metadata for 2 boxes and box 1 is now online! Read more about the project in a May Bitstreams post. Although progress has felt slow given all the other projects we manage simultaneously, we really feel like our foot is on the gas now!

You can see the fruits of our digital collection labors in the list of new and migrated collections from the past year. We are excited to see what 2018 will bring!!

New Collections Launched in 2017

Migrated Collections

Yasak/Banned Kiosk

Recently I worked on a simple kiosk for a new exhibit in the library, Yasak/Banned: Political Cartoons from Late Ottoman and Republican Turkey. The interface presents users with three videos featuring the curators of the exhibit that explain the historical context and significance of the items in the exhibit space. We also included a looping background video that highlights many of the illustrations from the exhibit and plays examples of Turkish music to help envelop visitors into the overall experience.

With some of the things I’ve built in the past, I would setup different section of an interface to view videos, but in this case I wanted to keep things as simple as possible. My plan was to open each video in an overlay using fancybox. However, I didn’t want the looping background audio to interfere with the curator videos. We also needed to include a ‘play/pause’ button to turn off the background music in case there was something going on in the exhibit space. And I wanted all these transitions to be as smooth as possible so that the experience wouldn’t be jarring. What seemed reasonably simple on the surface proved a little more difficult than I thought.

After trying a few different approaches with limited success, as usual stackoverflow revealed a promising direction to go in — the key turned out to be the .animate jQuery method.

The first step was to write functions to ‘play/pause’ – even though in this case we’d let the background video continue to play and only lower the volume to zero. The playVid function sets the volume to 0, then animates it back up to 100% over three seconds. pauseVid does the inverse, but more quickly.

function playVid() {
  vid.volume = 0;
    volume: 1
  }, 3000); // 3 seconds = 'none'; = 'block';

function pauseVid() {
  // really just fading out music
  vid.volume = 1;
    volume: 0
  }, 750); // .75 seconds = 'block'; = 'none';

The play and pause buttons, which are positioned on top of each other using CSS, are set to display or hide in the functions above. The are also set to call the appropriate function using an onclick event. Their markup looks like this:

<button onclick="playVid()" type="button" id="play-btn">Play</button>
<button onclick="pauseVid()" type="button" id="pause-btn">Pause</button>

Next I added an onclick event calling our pause function to the link that opens our fancybox window with the video in it, like so:

<a data-fancybox-type="iframe" href="https://my-video-url" onclick="pauseVid();"><img src="the-video-thumbnail" /></a>

And finally I used the fancybox callback afterClose to ramp up the background video volume over three seconds:

afterClose: function () {
  vid.volume = 0;
    volume: 1
  }, 3000); = 'none'; = 'block';

play/pause demo
play/pause demo

You can view a demo version and download a zip of the assets in case it’s helpful. I think the final product works really well and I’ll likely use a similar implementation in future projects.

Accessible AV in the Duke Digital Repository

Over the course of 2017, we improved our capacity to support digital audiovisual materials in the Duke Digital Repository (DDR) by leaps and bounds. A little more than a year ago, I had written a Bitstreams blog post highlighting the new features we had just developed in the DDR to provide basic functionality for AV, especially in support of the Duke Chapel Recordings collection. What a difference a year makes.

This past year brought renewed focus on AV development, as we worked to bring the NEH grant-funded Radio Haiti Archive online (launched in June).  At the same time, our digital collections legacy platform migration efforts shifted toward moving our existing high-profile digital AV material into the repository.

Closed Captions

At Duke University Libraries, we take accessibility seriously. We aim to include captions or transcripts for the audiovisual objects made available via the Duke Digital Repository, especially to ensure that the materials can be perceived and navigated by people with disabilities. For instance, work is well underway to create closed captions for all 1,400 items in the Duke Chapel Recordings project.

Screenshot showing Charmin commercial from AdViews collection with caption overlay
Captioned video displays a CC button and shows captions as an overlay in the video player. Example from the AdViews collection, coming soon to the DDR.

The DDR now accommodates modeling and ingest for caption files, and our AV player interface (powered by JW Player) presents a CC button whenever a caption file is available. Caption files are encoded using WebVTT, the modern W3C standard for associating timed text with HTML audio and video. WebVTT is structured so as to be machine-processable, while remaining lightweight enough to be reasonably read, created, or edited by a person. It’s a format that transcription vendors can provide. And given its endorsement by W3C, it should be a viable captioning format for a wide range of applications and devices for the foreseeable future.

Example WebVTT captions
Text cues from a WebVTT caption file for an audio item in the Duke Chapel Recordings collection.

Interactive Transcripts

Displaying captions within the player UI is helpful, but it only gets us so far. For one, that doesn’t give a user a way to just read the caption text without requiring them to play the media. We also need to support captions for audio files, but unlike with video, the audio player doesn’t include enough real estate within itself to render the captions. There’s no room for them to appear.

So for both audio and video, our solution is to convert the WebVTT caption files on-the-fly into an interactive in-page transcript. Using the webvtt-ruby gem (developed by Coconut) , we parse the WebVTT text cues into Ruby objects, then render them back on the page as HTML. We then use the JWPlayer Javascript API to keep the media player and the HTML transcript in sync. Clicking on a transcript cue advances the player to the corresponding moment in the media, and the currently-playing cue gets highlighted as the media plays.

Screenshot of interactive audio transcript
Example interactive synchronized transcript for an audio item (rendered from a WebVTT caption file). From a collection coming soon to the DDR.


We also do some extra formatting when the WebVTT cues include voice tags (<v> tags), which can optionally indicate the name of the speaker (e.g., <v Jane Smith>). The in-page transcript is indexed by Google for search retrieval.

Transcript Documents

In many cases, especially for audio items, we may have only a PDF or other type of document with a transcript of a recording that isn’t structured or time-coded.  Like captions, these documents are important for accessibility. We have developed support for displaying links to these documents near the media player. Look for some new collections using this feature to become available in early 2018.

Screenshot of a transcript document menu above the AV player
Transcript documents presented above the media player. Coming soon to AV collections in the DDR.



A/V Embedding

The DDR web interface provides an optimal viewing or listening experience for AV, but we also want to make it easy to present objects from the DDR on other websites, too. When used on other sites, we’d like the objects to include some metadata, a link to the DDR page, and proper attribution. To that end, we now have copyable <iframe> embed code available from the Share menu for AV items.

Embed code in the Share menu for an audio item.
Copyable embed code from an audio recording in the Radio Haiti Archive.

This embed code is also what we now use within the Rubenstein Library collection guides (finding aids) interface: it lets us present digital objects from the DDR directly from within a corresponding collection guide. So as a researcher browses the inventory of a physical archival collection, they can play the media inline without having to leave.

Screenshot of Rubenstein Library collection guide presenting a Duke Chapel Recordings video inline.
Embedded view of a DDR video from the Duke Chapel Recordings collection presented inline  in a Rubenstein Library archival collection guide.

Sites@Duke Integration
If your website or blog is one of the thousands of WordPress sites hosted and supported by Sites@Duke — a service of Duke’s Office of Information Technology (OIT) — we have good news for you. You can now embed objects from the DDR using WordPress shortcode. Sites@Duke, like many content management systems, doesn’t allow authors to enter <iframe> tags, so shortcode is the only way to get embeddable media to render.

Example of WordPress shortcode for DDR embedding on sites.
Sites@Duke WordPress sites can embed DDR media by using shortcode with the DDR item’s permalink.


And More!

Here are the other AV-related features we have been able to develop in 2017:

  • Access control: master files &  derivatives alike can be protected so access is limited to only authorized users/groups
  • Video thumbnail images: model, manage, and display
  • Video poster frames: model, manage, and display
  • Intermediate/mezzanine files: model and manage
  • Rights display: display icons and info from and Creative Commons, so it’s clear what users are permitted to do with media.

What’s Next

We look forward to sharing our recent AV development with our peers at the upcoming Samvera Connect conference (Nov 6-9, 2017 in Evanston, IL). Here’s our poster summarizing the work to date:

Poster presentation screenshot for Samvera Connect 2017
Poster about Duke’s AV development for Samvera Connect conference, Nov 6-9, 2017 (Evanston, IL)

Looking ahead to the next couple months, we aim to round out the year by completing a few more AV-related features, most notably:

  • Export WebVTT captions as PDF or .txt
  • Advance the player via linked timecodes in the description field in an item’s metadata
  • Improve workflows for uploading caption files and transcript documents

Now that these features are in place, we’ll be sharing a bunch of great new AV collections soon!

Nested Folders of Files in the Duke Digital Repository

Born digital archival material present unique challenges to representation, access, and discovery in the DDR. A hard drive arrives at the archives and we want to preserve and provide access to the files. In addition to the content of the files, it’s often important to preserve to some degree the organization of the material on the hard drive in nested directories.

One challenge to representing complex inter-object relationships in the repository is the repository’s relatively simple object model. A collection contains one or more items. An item contains one or more components. And a component has one or more data streams. There’s no accommodation in this model for complex groups and hierarchies of items. We tend to talk about this as a limitation, but it also makes it possible to provide search and discovery of a wide range of kinds and arrangements of materials in a single repository and forces us to make decisions about how to model collections in sustainable and consistent ways. But we still need to preserve and provide access to the original structure of the material.

One approach is to ingest the disk image or a zip archive of the directories and files and store the content as a single file in the repository. This approach is straightforward, but makes it impossible to search for individual files in the repository or to understand much about the content without first downloading and unarchiving it.

As a first pass at solving this problem of how to preserve and represent files in nested directories in the DDR we’ve taken a two-pronged approach. We will use a simple approach to modeling disk image and directory content in the repository. Every file is modeled in the repository as an item with a single component that contains the data stream of the file. This provides convenient discovery and access to each individual file from the collection in the DDR, but does not represent any folder hierarchies. The files are just a flat list of objects contained by a collection.

To preserve and store information about the structure of the files we add an XML METS structMap as metadata on the collection. In addition we store on each item a metadata field that stores the complete original file path of the file.

Below is a small sample of the kind of structural metadata that encodes the nested folder information on the collection. It encodes the structure and nesting, directory names (in the LABEL attribute), the order of files and directories, as well as the identifiers for each of the files/items in the collection.

<?xml version="1.0"?>
<mets xmlns="" xmlns:xlink="">
    <agent ROLE="CREATOR">
      <name>REPOSITORY DEFAULT</name>
  <structMap TYPE="default">
    <div LABEL="2017-0040" ORDER="1" TYPE="Directory">
      <div ORDER="1">
        <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk42j6qc37"/>
      <div LABEL="RL11405-LFF-0001_Programs" ORDER="2" TYPE="Directory">
        <div ORDER="1">
          <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk4j67r45s"/>
        <div ORDER="2">
          <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk4d50x529"/>
        <div ORDER="3">
          <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk4086jd3r"/>
      <div LABEL="RL11405-LFF-0002_H1_Early-Records-of-Decentralization-Conference" ORDER="3" TYPE="Directory">
        <div ORDER="1">
          <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk4697f56f"/>
        <div ORDER="2">
          <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk45h7t22s"/>

Combining the 1:1 (item:component) object model with structural metadata that preserves the original directory structure of the files on the file system enables us to display a user interface that reflects the original structure of the content even though the structure of the items in the repository is flat.

There’s more to it of course. We had to develop a new ingest process that could take as its starting point a file path and then crawl it and its subdirectories to ingest files and construct the necessary structural metadata.

On the UI end of things a nifty Javascript plugin called jsTree powers the interactive directory structure display on the collection page.

Because some of the collections are very large and loading a directory tree structure of 100,000 or more items would be very slow, we implemented a small web service in the application that loads the jsTree data only when someone clicks to open a directory in the interface.

The file paths are also keyword searchable from within the public interface. So if a file is contained in a directory named “kitchen/fruits/bananas/this-banana.txt” you would be able to find the file this-banana.txt by searching for “kitchen” or “fruit” or “banana.”

This new functionality to ingest, preserve, and represent files in nested folder structures in the Duke Digital Repository will be included in the September release of the Duke Digital Repository.