Category Archives: Behind the Scenes

Behind the Scenes, Duke Digital Repository, Technology, User Experience

Nested Folders of Files in the Duke Digital Repository

August 4, 2017 Cory Lown

Born digital archival material present unique challenges to representation, access, and discovery in the DDR. A hard drive arrives at the archives and we want to preserve and provide access to the files. In addition to the content of the files, it’s often important to preserve to some degree the organization of the material on the hard drive in nested directories.

One challenge to representing complex inter-object relationships in the repository is the repository’s relatively simple object model. A collection contains one or more items. An item contains one or more components. And a component has one or more data streams. There’s no accommodation in this model for complex groups and hierarchies of items. We tend to talk about this as a limitation, but it also makes it possible to provide search and discovery of a wide range of kinds and arrangements of materials in a single repository and forces us to make decisions about how to model collections in sustainable and consistent ways. But we still need to preserve and provide access to the original structure of the material.

One approach is to ingest the disk image or a zip archive of the directories and files and store the content as a single file in the repository. This approach is straightforward, but makes it impossible to search for individual files in the repository or to understand much about the content without first downloading and unarchiving it.

As a first pass at solving this problem of how to preserve and represent files in nested directories in the DDR we’ve taken a two-pronged approach. We will use a simple approach to modeling disk image and directory content in the repository. Every file is modeled in the repository as an item with a single component that contains the data stream of the file. This provides convenient discovery and access to each individual file from the collection in the DDR, but does not represent any folder hierarchies. The files are just a flat list of objects contained by a collection.

To preserve and store information about the structure of the files we add an XML METS structMap as metadata on the collection. In addition we store on each item a metadata field that stores the complete original file path of the file.

Below is a small sample of the kind of structural metadata that encodes the nested folder information on the collection. It encodes the structure and nesting, directory names (in the LABEL attribute), the order of files and directories, as well as the identifiers for each of the files/items in the collection.

<?xml version="1.0"?>
<mets xmlns="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink">
  <metsHdr>
    <agent ROLE="CREATOR">
      <name>REPOSITORY DEFAULT</name>
    </agent>
  </metsHdr>
  <structMap TYPE="default">
    <div LABEL="2017-0040" ORDER="1" TYPE="Directory">
      <div ORDER="1">
        <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk42j6qc37"/>
      </div>
      <div LABEL="RL11405-LFF-0001_Programs" ORDER="2" TYPE="Directory">
        <div ORDER="1">
          <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk4j67r45s"/>
        </div>
        <div ORDER="2">
          <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk4d50x529"/>
        </div>
        <div ORDER="3">
          <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk4086jd3r"/>
        </div>
      </div>
      <div LABEL="RL11405-LFF-0002_H1_Early-Records-of-Decentralization-Conference" ORDER="3" TYPE="Directory">
        <div ORDER="1">
          <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk4697f56f"/>
        </div>
        <div ORDER="2">
          <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk45h7t22s"/>
        </div>
      </div>
    </div>
  </structMap>
</mets>

Combining the 1:1 (item:component) object model with structural metadata that preserves the original directory structure of the files on the file system enables us to display a user interface that reflects the original structure of the content even though the structure of the items in the repository is flat.

There’s more to it of course. We had to develop a new ingest process that could take as its starting point a file path and then crawl it and its subdirectories to ingest files and construct the necessary structural metadata.

On the UI end of things a nifty Javascript plugin called jsTree powers the interactive directory structure display on the collection page.

Because some of the collections are very large and loading a directory tree structure of 100,000 or more items would be very slow, we implemented a small web service in the application that loads the jsTree data only when someone clicks to open a directory in the interface.

The file paths are also keyword searchable from within the public interface. So if a file is contained in a directory named “kitchen/fruits/bananas/this-banana.txt” you would be able to find the file this-banana.txt by searching for “kitchen” or “fruit” or “banana.”

This new functionality to ingest, preserve, and represent files in nested folder structures in the Duke Digital Repository will be included in the September release of the Duke Digital Repository.

Behind the Scenes, Technology

SNCC Digital Gateway Homepage Updates

July 14, 2017 Michael Daul 1 Comment

Earlier this summer I worked with the SNCC Digital Gateway team to launch a revised version of their homepage. The SNCC Digital Gateway site originally was launched in the Fall of 2016. Since then much more content has been incorporated into the site. The team and their advisory board wanted to highlight some of this new content on the homepage (by making it scrollable) while also staying true to the original design.

The previous version of the homepage included two main features:

a large black and white photograph that would randomly load (based on five different options) every time a user visited the page
a ‘fixed’ primary navigation in the footer

Rotating Background Images

In my experience, the ‘go to’ approach for doing any kind of image rotation is to use a Javascript library, probably one that relies on jQuery. My personal favorite for a long time has been jQuery Cycle 2 which I appreciated for it’s lightweight, flexible implementation, and price ($free!). With the new SNCC homepage, I wanted to figure out a way to both crossfade the background images and fade the caption text in and out elegantly. It was also critical that the captions match up perfectly with their associated images. I was worried that doing this with Cycle 2 was going to be overly complicated with respect to syncing the timing, as in some past projects I’d run into trouble keeping discrete carousels locked in sync after several iterations — for example, with leaving the page up and running for several minutes.

I decided to try and build the SNCC background rotation using CSS animations. In the past I’d shied away from using CSS animations for anything that was presented as a primary feature or that was complex as the browser support was spotty. However, the current state of browser support is better, even though it still has a ways to go. In my first attempt I tried crossfading the images as backgrounds in a wrapper div, as this was going to make things work with resizing the page much easier by using background-size: cover property. But I discovered that animating background images isn’t actually supported in the spec, even though it worked perfectly in Chrome and Opera. So instead I went with the approach where you stack the images on top of each other and change the opacity one at a time, like so:

<div class="bg-image-wrapper">
  <img src="image-1.jpg" alt="">
  <img src="image-2.jpg" alt="">
  <img src="image-3.jpg" alt="">
  <img src="image-4.jpg" alt="">
  <img src="image-5.jpg" alt="">
</div>

I setup the structure for the captions in a similar way:

<div id="home-caption">
  <li>caption 1</li>
  <li>caption 2</li>
  <li>caption 3</li>
  <li>caption 4</li>
  <li>caption 5</li>
</div>

I won’t bore you with the details of CSS animation, but in short they are based on keyframes that can be looped and applied to html elements. The one thing that proved to be a little tricky was the timing between the images and the captions, as the keyframes are represented in percentages of the entire animation. This was further complicated by the types of transitions I was using (crossfading the images and linearly fading the captions) and that I wanted to slightly stagger the caption animations so that they would come in after the crossfade completes and transition out just before the next crossfade starts, like so:

Crossfade illustration — As time moves from left to right, the images and captions have independent transitions

The SNCC team and I also discussed a few options for the overall timing of the transitions and settled on eight seconds per image. With five images in our rotation, the total time of the animation would be 40 seconds. The entire animation is applied to each image, and offset with a delay based on their position in the .bg-image-wrapper stack. The CSS for the images looks like this:

.bg-image-wrapper img {
  animation-name: sncc-fader;
  animation-timing-function: ease-in-out;
  animation-iteration-count: infinite;
  animation-duration: 40s;
}


@keyframes sncc-fader {
  0% {
    opacity:1;
  }
  16% {
    opacity:1;
  }
  21% {
    opacity:0;
  }
  95% {
    opacity:0;
  }
  100% {
    opacity:1;
  }
}

.bg-image-wrapper img:nth-of-type(1) {
  animation-delay: 32s;
}
.bg-image-wrapper img:nth-of-type(2) {
  animation-delay: 24s;
}
.bg-image-wrapper img:nth-of-type(3) {
  animation-delay: 16s;
}
.bg-image-wrapper img:nth-of-type(4) {
  animation-delay: 8s;
}
.bg-image-wrapper img:nth-of-type(5) {
  animation-delay: 0;
}

The resulting animation looks something like this:

The other piece of the puzzle was emulating the behavior of background: cover which resizes a background image to fill the entire width of a div and positions the image vertically in a consistent way. In general I really like using this attribute. I struggled to get things working on my own, but eventually came across a great code example of how to get things working. So I copied that implementation and it worked perfectly.

Fixed Nav

I was worried that getting the navigation bar to stay consistently positioned at the bottom of the page and allowing for scrolling — while also working responsively — was going to be a bit of a challenge. But in the end the solution was relatively simple.

The navigation bar is structured in a very typical way — as an unordered list with each menu element represented as a list item, like so:

<div id="navigation">
    <ul>
      <li><a href="url1">menu item 1</a></li>
      <li><a href="url2">menu item 2</a></li>
      <li><a href="url3">menu item 3</a></li>
    </ul>
</div>

To get it to ‘stick’ to the bottom of the page, I just placed it using position: absolute, gave it a fixed height, and set the width to 100%. Surprisingly, worked great just like that, and also allowed the page to be scrolled to reveal the content further down the page.

You can view the updated homepage by visiting snccdigital.org.

Behind the Scenes, Duke Digital Repository, Technology, User Experience

Turning on the Rights in the Duke Digital Repository

June 30, 2017 Sean Aery 3 Comments

As 2017 reaches its halfway point, we have concluded another busy quarter of development on the Duke Digital Repository (DDR). We have several new features to share, and one we’re particularly delighted to introduce is Rights display.

Back in March, my colleague Maggie Dickson shared our plans for rights management in the DDR, a strategy built upon using rights status URIs from RightsStatements.org, and in a similar fashion, licenses from Creative Commons. In some cases, we supplement the status with free text in a local Rights Note property. Our implementation goals here were two-fold: 1) use standard statuses that are machine-readable; 2) display them in an easily understood manner to users.

New rights display feature in action on a digital object.

What to Display

Getting and assigning machine-readable URIs for Rights is a significant milestone in its own right. Using that value to power a display that makes sense to users is the next logical step. So, how do we make it clear to a user what they can or can’t do with a resource they have discovered? While we could simply display the URI and link to its webpage (e.g., http://rightsstatements.org/vocab/InC-EDU/1.0/ ) the key info still remains a click away. Alternatively, we could display the rights statement or license title with the link, but some of them aren’t exactly intuitive or easy on the eyes. “Attribution-NonCommercial-NoDerivatives 4.0 International,” anyone?

Inspiration

Looking around to see how other cultural heritage institutions have solved this problem led us to very few examples. RightsStatements.org is still fairly new and it takes time for good design patterns to emerge. However, Europeana — co-champion of the RightsStatements.org initiative along with DPLA — has a stellar collections site, and, as it turns out, a wonderfully effective design for displaying rights statuses to users. Our solution ended up very much inspired by theirs; hats off to the Europeana team.

Image from Europeana site. — Europeana Collections UI.

Icons

Both Creative Commons and RightsStatements.org provide downloadable icons at their sites (here and here). We opted to store a local copy of the circular SVG versions for both to render in our UI. They’re easily styled, they don’t take up a lot of space, and used together, they have some nice visual unity.

Labels & Titles

We have a lightweight Rails app with an easy-to-use administrative UI for managing auxiliary content for the DDR, so that made a good home for our rights statuses and associated text. Statements are modeled to have a URI and Title, but can also have three additional optional fields: short title, re-use text, and an array of icon classes.

Editing rights info associated with each statement.

Displaying the Info

We wanted to be sure to show the rights status in the flow of the rest of an object’s metadata. We also wanted to emphasize this information for anyone looking to download a digital object. So we decided to render the rights status prominently in the download menu, too.

Rights status in download menu — Rights status displays in the download menu.

Rights status also displays alongside other metadata.

What’s Next

Our focus in this area now shifts toward applying these newly available rights statuses to our existing digital objects in the repository, while ensuring that new ingests/deposits get assessed and assigned appropriate values. We’ll also have opportunities to refine where and how the statuses get displayed. We stand to learn a lot from our peer organizations implementing their own rights management strategies, and from our visitors as they use this new feature on our site. There’s a lot of work ahead, but we’re thrilled to have reached this noteworthy milestone.

Behind the Scenes, MSI

Infrastructure and Multispectral Imaging in the Library

June 23, 2017 Mike Adamo

As we continue to work on our “standard” full color digitization projects such as Section A and the William Gedney Photograph Collection, both of which are multiyear projects, we are still hard at work with a variety of things related to Multispectral Imaging (MSI). We have been writing documentation and posting it to our Knowledgebase, building tools to track MSI requests and establishing a dedicated storage space for MSI image stacks. Below are some high-level details about these things and the kinks we are ironing out of the MSI process. As with any new venture, it can be messy in the beginning and tedious to put all the details in order but in the end it’s worth it.

MSI Knowledge Base

We established a knowledge base for documents related to MSI that cover a wide variety of subjects: How-To articles, to do lists, templates, notes taken during imaging sessions, technical support issues and more. These documents will help us develop sound guidelines and workflows which in turn will make our work in this area more consistent, efficient and productive.

Dedicated storage space

Working with other IT staff, a new server space has been established specifically for MSI. This is such a relief because, as we began testing the system in the early days, we didn’t have a dedicated space for storing the MSI image stacks and most of our established spaces were permissions restricted, preventing our large MSI group from using it. On top of this we didn’t have any file management strategies in place for MSI. This made for some messy file management. From our first demo, initial testing and eventual purchase of the system, we used a variety of storage spaces and a number of folder structures as we learned the system. We used our shared Library server, the Digital Production Center’s production server, Box and Google Drive. Files were all over the place! What a mess! In our new dedicated space, we have established standard folder structures and file management strategies and store all of our MSI image stacks in one place now. Whew!

The Request Queue

In the beginning, once the MSI system was up and running, our group had a brainstorming session to identify a variety of material that we could use to test with and hone our skills in using the new system. Initially this queue was a bulleted list in Basecamp identifying an item. As we worked through the list it would sometimes be confusing as to what had already been done and what item was next. This process became more cumbersome because multiple people were working through the list at the same time, both on capture and processing, with no specific reporting mechanism to track who was doing what. We have recently built an MSI Request Queue that tracks items to be captured in a more straightforward, clear manner. We have included title, barcode and item information along with the research question to be answered, it priority level, due date, requester information and internal contact information. The MSI group will use this queue for a few weeks then tweak it as necessary. No more confusion.

The Processing Queue

As described in a previous post, capturing with MSI produces lots of image stacks that contain lots of files. On average, capturing one page can produce 6 image stacks totaling 364 images. There are 6 different stages of conversion/processing that the image stack goes through before it might be considered “done”, and the fact that everyone on the MSI team has other job responsibilities makes it difficult to carve out a large enough block of time to convert and process the image stacks through all of the stages. This made it difficult to know what items had been completely processed or not. We have recently built an MSI Processing Queue that tracks what stage of processing each item is in. We have included root file names, flat field information, PPI and a column for each phase of processing to indicate whether or not an image stack has passed through a phase. As with the Request Queue, the MSI group will use this queue for a few weeks then tweak it as necessary. No more confusion.

Duke University East Campus Progress Picture #27

As with most blog posts, the progress described above has been boiled down and simplified as to not bore you to death, but this is a fair amount of work nonetheless. Having dedicated storage and a standardized folder structure simplifies the management of lots of files and puts them in a predictable structure. Streamlining the Request Queue establishes a clear path of work and provides enough information about the request in order to move forward with a clear goal in mind. Developing a Processing Queue that provides a snapshot of the state of processing across multiple requests and provides enough information so that any staff member familiar with our MSI process can complete a request. Establishing a knowledge base to document our workflows and guidelines ties everything together in an organized and searchable manner making it easier to find information about established procedures and troubleshoot technical problems.

It is important to put this infrastructure in place and build a strong foundation for Multispectral Imaging at the Library so it will scale in the future. This is only the beginning!

_______________________

Want to learn even more about MSI at DUL?

Watch an imaging Session
Read other MSI posts on Duke Libraries’ Bitstreams and Preservation Underground blogs

Behind the Scenes, Collections, Digital Collections, New Collections, Projects

A Summer Day in the Life of Digital Collections

June 16, 2017 Molly Bragg

A recent tweet from my colleague in the Rubenstein Library (pictured above) pretty much sums up the last few weeks at work. Although I rarely work directly with students and classes, I am still impacted by the hustle and bustle in the library when classes are in session. Throughout the busy Spring I found myself saying, oh I’ll have time to work on that over the Summer. Now Summer is here, so it is time to make some progress on those delayed projects while keeping others moving forward. With that in mind here is your late Spring and early Summer round-up of Digital Collections news and updates.

Radio Haiti

The long anticipated launch of the Radio Haiti Archives is upon us. After many meetings to review the metadata profile, discuss modeling relationships between recordings, and find a pragmatic approach to representing metadata in 3 languages all in the Duke Digital Repository public interface, we are now in preview mode, and it is thrilling. Behind the scenes, Radio Haiti represents a huge step forward in the Duke Digital Repository’s ability to store and play back audio and video files.

You can already listen to many recordings via the Radio Haiti collection guide, and we will share the digital collection with the world in late June or early July. In the meantime, check out this teaser image of the homepage.

Section A

My colleague Meghan recently wrote about our ambitions Section A digitization project, which will result in creating finding aids for and digitizing 3000+ small manuscript collections from the Rubenstein library. This past week the 12 people involved in the project met to review our workflow. Although we are trying to take a mass digitization and streamlined approach to this project, there are still a lot of people and steps. For example, we spent about 20-30 minutes of our 90 minute meeting reviewing the various status codes we use on our giant Google spreadsheet and when to update them. I’ve also created a 6 page project plan that encompasses both a high and medium level view of the project. In addition to that document, each part of the process (appraisal, cataloging review, digitization, etc.) also has their own more detailed documentation. This project is going to last at least a few years, so taking the time to document every step is essential, as is agreeing on status codes and how to use them. It is a big process, but with every box the project gets a little easier.

Status codes for tracking our evaluation, remediation, and digitization workflow.

Diversity and Inclusion Digitization Initiative Proposals and Easy Projects

As Bitstreams readers and DUL colleagues know, this year we instituted 2 new processes for proposing digitization projects. Our second digitization initiative deadline has just passed (it was June 15) and I will be working with the review committee to review new proposals as well as reevaluate 2 proposals from the first round in June and early July. I’m excited to say that we have already approved one project outright (Emma Goldman papers), and plan to announce more approved projects later this Summer.

We also codified “easy project” guidelines and have received several easy project proposals. It is still too soon to really assess this process, but so far the process is going well.

Transcription and Closed Captioning

Speaking of A/V developments, another large project planned for this Summer is to begin codifying our captioning and transcription practices. Duke Libraries has had a mandate to create transcriptions and closed captions for newly digitized A/V for over a year. In that time we have been working with vendors on selected projects. Our next steps will serve two fronts; on the programmatic side we need review the time and expense captioning efforts have incurred so far and see how we can scale our efforts to our backlog of publicly accessible A/V. On the technology side I’ve partnered with one of our amazing developers to sketch out a multi-phase plan for storing and providing access to captions and time-coded transcriptions accessible and searchable in our user interface. The first phase goes into development this Summer. All of these efforts will no doubt be the subject of a future blog post.

Testing VTT captions of Duke Chapel Recordings in JWPlayer

Summer of Documentation

My aspirational Summer project this year is to update digital collections project tracking documentation, review/consolidate/replace/trash existing digital collections documentation and work with the Digital Production Center to create a DPC manual. Admittedly writing and reviewing documentation is not the most exciting Summer plan, but with so many projects and collaborators in the air, this documentation is essential to our productivity, communication practices, and my personal sanity.

Late Spring Collection launches and Migrations

Over the past few months we launched several new digital collections as well as completed the migration of a number of collections from our old platform into the Duke Digital Repository.

New Collections:

A. Brouseau & Co. records: https://repository.duke.edu/dc/abrouseauco-002388080
Abolitionist Speech: https://repository.duke.edu/dc/abolitionistspeech-001099688
Frank Baker Collection of Wesleyana and British Methodism: https://repository.duke.edu/dc/frankbakercol
Wesley Family papers: https://repository.duke.edu/dc/wesleyfamilypapers
William Wilberforce papers: https://repository.duke.edu/dc/wilberf
Samuel Wilberforce papers: https://repository.duke.edu/dc/wilberforcesamuel

Migrated Collections:

Alexander H. Stephens Papers: https://repository.duke.edu/dc/ahstephens
Classical String Quartets: https://repository.duke.edu/dc/quartets
Duke Football Programs: https://repository.duke.edu/dc/dfp
History of Medicine Artifacts: https://repository.duke.edu/dc/homartifacts
Italian Cultural Posters: https://repository.duke.edu/dc/italianposters
Manuscripts and Woodcuts: Visions and Designs from Bloomsbury: https://repository.duke.edu/dc/bloomsbury
Marshall Meyer Papers: https://repository.duke.edu/dc/meyermarshall
Musee des Horreurs: https://repository.duke.edu/dc/museedeshorreurs
Paul Kwilecki Photographs: https://repository.duke.edu/dc/kwilecki
Protestant Children and Families: https://repository.duke.edu/dc/protfam
Ration Coupons on the Home Front: https://repository.duke.edu/dc/hfc
Russian Posters Collection: https://repository.duke.edu/dc/russianposters

…And so Much More!

In addition to the projects above, we continue to make slow and steady progress on our MSI system, are exploring using the FFv1 format for preserving selected moving image collections, planning the next phase of the Digital Collections migration into the Duke Digital Repository, thinking deeply about collection level metadata and structured metadata, planning to launch newly digitized Gedney images, integrating digital objects in finding aids and more. No doubt some of these efforts will appear in subsequent Bitstreams posts. In the meantime, let’s all try not to let this Summer fly by too quickly!

Behind the Scenes, Projects

The New and Improved SNCC Digital Gateway

May 31, 2017 Karlyn Forner

It may only be 6 months old, but as of May 31, the SNCC Digital Gateway is sporting a new look. Since going live in December 2016, we’ve been doing assessment, talking to contemporary activists and movement veterans and conducting user testing and student surveys. The feedback’s been overwhelmingly positive, but a few suggestions kept coming up. Give people a better sense of who SNCC was right from the homepage, and make it more active. Connect SNCC’s history to organizing today. As one of the young organizers put it, “What is it about SNCC’s legacy now that matters for people?” So we took those suggestions to heart and are proud to present a reworked, redesigned SNCC Digital Gateway. Keep reading for a breakdown of what’s new and why.

Today Section

The new Today section highlights important strategies and lessons from SNCC’s work and explores their usefulness to today’s struggles. Through short, engaging videos, contemporary activists talk about how SNCC’s work continues to be relevant to their organizing today. The nine framing questions and answers of today’s organizers speak to enduring themes at the heart of SNCC’s work: uniting with local people to build a grassroots movement for change that empowered Black communities and transformed the nation. Check out this example:

More Expansive Homepage

The new homepage is longer and gives visitors to the site more context and direction. It includes descriptions of who SNCC was and links users to The Story of SNCC, which tells an expansive but concise history of SNCC’s work. It features videos from the new Today section, and gives users a way to explore the site through themes like voting rights, the organizing tradition, and Black Power.

Themes

Want to know more about voting rights? Black Power? Or are you not as familiar with SNCC’s history and need an entry point? The theme buttons on the homepage give users a window into SNCC’s history through particular aspects of the organization’s work. Theme pages feature select profiles and events focused on a central component of SNCC’s organizing. From there, click through the documents or follow the links to dig deeper into the story.

Navigation Updates

To improve navigation for the site, we’ve changed the name of the History section to Timeline and the former Perspectives to Our Voices. We’ve also moved the About section to the footer to make space for the new Today section.

Have suggestions? Comments? We’re always interested in what you’re thinking. Add a comment or send us an e-mail to snccdigital@gmail.org.

Behind the Scenes, Duke Digital Repository

Going with the Flow: building a research data curation workflow

April 14, 2017 Moira Downey

Why research data? Data generated by scholars in the course of investigation are increasingly being recognized as outputs nearly equal in importance to the scholarly publications they support. Among other benefits, the open sharing of research data reinforces unfettered intellectual inquiry, fosters reproducibility and broader analysis, and permits the creation of new data sets when data from multiple sources are combined. Data sharing, though, starts with data curation.

In January of this year, Duke University Libraries brought on four new staff members–two Research Data Management Consultants and two Digital Content Analysts–to engage in this curatorial effort, and we have spent the last few months mapping out and refining a research data curation workflow to ensure best practices are applied to managing data before, during, and after ingest into the Duke Digital Repository.

What does this workflow entail? A high level overview of the process looks something like the following:

After collecting their data, the researcher will take what steps they are able to prepare it for deposit. This generally means tasks like cleaning and de-identifying the data, arranging files in a structure expected by the system, and compiling documentation to ensure that the data is comprehensible to future researchers. The Research Data Management Consultants will be on hand to help guide these efforts and provide researchers with feedback about data management best practices as they prepare their materials.

Depositors will then be asked to complete a metadata form and electronically sign a deposit agreement defining the terms of deposit. After we receive this information, someone from our team will invite the depositor to transfer their files to us, usually through Box.

As this stage, the Research Data Management Consultants will begin a preliminary review of the researcher’s data by performing a cursory examination for personally identifying or protected health information, inspecting the researcher’s documentation for comprehension and completeness, analyzing the submitted metadata for compliance with the research data application profile, and evaluating file formats for preservation suitability. If they have any concerns, they will contact the researcher to make some suggestions about ways to better align the deposit with best practices.

When the deposit is in good shape, the Research Data Management Consultants will notify the Digital Content Analysts, who will finalize the file arrangement and migrate some file formats, generate and normalize any necessary or missing metadata, ingest the files into the repository, and assign the deposit a DOI. After the ingest is complete, the Digital Content Analysts will carry out some quality assurance on the data to verify that the deposit was appropriately and coherently structured and that metadata has been correctly assigned. When this is confirmed, they will publish the data in the repository and notify the depositor.

Of course, this workflow isn’t a finished piece–we hope to continue to clarify and optimize the process as we develop relationships with researchers at Duke and receive more data. The Research Data Management Consultants in particular are enthusiastic about the opportunity to engage with scholars earlier in the research life cycle in order to help them better incorporate data curation standards in the beginning phases of their projects. All of us are looking forward to growing into our new roles, while helping to preserve Duke’s research output for some time to come.

Behind the Scenes, Duke Digital Repository, User Experience

Rights Management and the Duke Digital Repository

March 10, 2017 Maggie Dickson

Last spring, we were awfully excited to see the DPLA/Europeana release of RightStatements.org, a suite of standardized rights statements for describing the copyright and re-use status of digital resources. We have never had a comprehensive approach towards rights management for the Duke Digital Repository, but with the release of RightsStatements.org, we now feel we are equipped to wrestle that beast.

Managing and communicating rights statuses for digital collections has long been a challenge for us. The DDR currently allows for the application and display of Creative Commons licenses, which can be used for situations where the copyright holders themselves can assert the rights statuses for their own resources. RightsStatements.org fills a giant gap for us, in that it allow us to assign machine-readable rights to repository resources for which we know something about the rights status but do not hold the copyrights for. Additionally, these statements accommodate for the often fluid and ambiguous nature of copyrights for cultural heritage materials.

So, it’s been nearly a year since the statements were published, and during that time a community best practice has started to develop. The approach we have decided on for rights management in the Duke Digital Repository follows this emerging best practice, and involves using one field – Dublin Core Rights, as that is the metadata standard our repository uses – to store either a Creative Commons or RightsStatements.org URI, and nothing but that URI, and another field – a local property which we are calling ‘Rights Note’ – to store free text contextual information relating to the rights status of the resource (as long as it’s not in conflict with rights statement applied). Having machine-processable rights statuses means we will have a much better rights management strategy (we don’t currently have a way to report on the rights status of repository materials), as well as the ability to clearly communicate to users what they can and cannot do with resources they find.

Now that we’ve got a strategy for doing rights management, however, we need to develop a strategy for implementing it. We’ll tackle the low-hanging fruit first – collections that have a single, identifiable creator or for which the date ranges put them into the public domain – and then move on to the trickier stuff – for example, collections representing multiple or unidentified creators. Digital collections of archival materials present especially difficult challenges, as the the repository ‘itemness’ is frequently at the folder-level, meaning that the ‘item’, in these cases, might contain works by multiple creators of varying rights statuses (think of a folder of correspondence, for example).

The good news is, there are a lot of smart people working on addressing these challenges. Laura Capell and Elliott Williams of the University of Miami published a helpful poster, Assigning Rights Statements to Legacy Digital Collections describing the the decision matrix they developed to help them apply rights statements to their digital collections, and as I was writing this blog post, the Society of American Archivists circulated their Guide to Implementing Rights Statements from RightsStatements.org (nice timing, SAA!). I’m hoping to find some good nuggets of wisdom in its pages. We feel especially well-positioned to tackle rights management here at Duke, as Dave Hansen, who was deeply involved in the development of RightsStatements.org, joined us as our Director of Copyright and Scholarly Communications last year. We’d love to hear from other organizations as they develop their own local implementations – we know we’re not in this alone!

Behind the Scenes, Digitization Expertise, Technology

The Outer Limits of Aspect Ratios

February 16, 2017 Alex Marsh

“There is nothing wrong with your television set. Do not attempt to adjust the picture. We are controlling transmission. We will control the horizontal. We will control the vertical. We repeat: there is nothing wrong with your television set.”

That was part of the cold open of one of the best science fiction shows of the 1960’s, “The Outer Limits.” The implication being that by controlling everything you see and hear in the next hour, the show’s producers were about to blow your mind and take you to the outer limits of human thought and fantasy, which the show often did.

In regards to controlling the horizontal and the vertical, one of the more mysterious parts of my job is dealing with aspect ratios when it comes to digitizing videotape. The aspect ratio of any shape is the proportion of it’s dimensions. For example, the aspect ratio of a square is always 1 : 1 (width : height). That means, in any square, the width is always equal to the height, regardless of whether a square is 1-inch wide or 10-feet wide. Traditionally, television sets displayed images in a 4 : 3 ratio. So, if you owned a 20” CRT (cathode ray tube) TV back in the olden days, like say 1980, the broadcast image on the screen was 16” wide by 12” high. So, the height was 3/4 the size of the width, or 4 : 3. The 20” dimension was determined by measuring the rectangle diagonally, and was mainly used to categorize and advertise the TV.

Almost all standard-definition analog videotapes, like U-matic, Beta and VHS, have a 4 : 3 aspect ratio. But when digitizing the content, things get more complicated. Analog video monitors display pixels that are tall and thin in shape. The height of these pixels is greater than their width, whereas modern computer displays use pixels that are square in shape. On an analog video monitor, NTSC video displays at roughly 720 (tall and skinny) pixels per horizontal line, and there are 486 visible horizontal lines. If you do the math on that, 720 x 486 is not 4 : 3. But because the analog pixels display tall and thin, you need more of them aligned vertically to fill up a 4 : 3 video monitor frame.

When Duke Libraries digitizes analog video, we create a master file that is 720 x 486 pixels, so that if someone from the broadcast television world later wants to use the file, it will be native to that traditional standard-definition broadcast specification. However, in order to display the digitized video on Duke’s website, we make a new file, called a derivative, with the dimensions changed to 640 x 480 pixels, because it will ultimately be viewed on computer monitors, laptops and smart phones, which use square pixels. Because the pixels are square, 640 x 480 is mathematically a 4 : 3 aspect ratio, and the video will display properly. The derivative video file is also compressed, so that it will stream smoothly regardless of internet bandwidth limits.

“We now return control of your television set to you. Until next week at the same time, when the control voice will take you to – The Outer Limits.”

Behind the Scenes, Digitization Expertise, Equipment, MSI, Technology

Multispectral Imaging Through Collaboration

Image January 19, 2017 Mike Adamo

I am sure you have all been following the Library’s exploration into Multispectral Imaging (MSI) here on Bitstreams, Preservation Underground and the News & Observer. Previous posts have detailed our collaboration with R.B. Toth Associates and the Duke Eye Center, the basic process and equipment, and the wide range of departments that could benefit from MSI. In early December of last year (that sounds like it was so long ago!), we finished readying the room for MSI capture, installed the equipment, and went to MSI boot camp.

Obligatory before and after shot. In the bottom image, the new MSI system is in the background on the left with the full spectrum system that we have been using for years on the right. Other additions to the room are blackout curtains, neutral gray walls and black ceiling tiles all to control light spill between the two camera systems. Full spectrum overhead lighting and a new tile floor were installed which is standard for an imaging lab in the Library.

Well, boot camp came to us. Meghan Wilson, an independent contractor who has worked with R.B. Toth Associates for many years, started our training with an overview of the equipment and the basic science behind it. She covered the different lighting schemes and when they should be used. She explained MSI applications for identifying resins, adhesives and pigments and how to use UV lighting and filters to expose obscured text. We quickly went from talking to doing. As with any training session worth its salt, things went awry right off the bat (not Meghan’s fault). We had powered up the equipment but the camera would not communicate with the software and the lights would not fire when the shutter was triggered. This was actually a good experience because we had to troubleshoot on the spot and figure out what was going on together as a team. It turns out that there are six different pieces of equipment that have to be powered-up in a specific sequence in order for the system to communicate properly (tee up Apollo 13 soundtrack). Once we got the system up and running we took turns driving the software and hardware to capture a number of items that we had pre-selected. This is an involved process that produces a bunch of files that eventually produce an image stack that can be manipulated using specialized software. When it’s all said and done, files have been converted, cleaned, flattened, manipulated and variations produced that are somewhere in the neighborhood of 300 files. Whoa!

This is not your parents’ point and shoot—not the room, the lights, the curtains, the hardware, the software, the pricetag, none of it. But it is different in another more important way too. This process is team-driven and interdisciplinary. Our R&D working group is diverse and includes representatives from the following library departments.

The Digital Production Center (DPC) has expertise in high-end, full spectrum imaging for cultural heritage institutions along with a deep knowledge of the camera and lighting systems involved in MSI, file storage, naming and management of large sets of files with complex relationships.
The Duke Collaboratory for Classics Computing (DC3) offers a scholarly and research perspective on papyri, manuscripts, etc., as well as experience with MSI and other imaging modalities
The Conservation Lab brings expertise in the Libraries’ collections and a deep understanding of the materiality and history of the objects we are imaging.
Duke Libraries’ Data Visualization Services (DVS) has expertise in the processing and display of complex data.
The Rubenstein Library’s Collection Development brings a deep understanding of the collections, provenance and history of materials, and valuable contacts with researchers near and far.

To get the most out of MSI we need all of those skills and perspectives. What MSI really offers is the ability to ask—and we hope answer—strings of good questions. Is there ink beneath that paste-down or paint? Is this a palimpsest? What text is obscured by that stain or fire-damage or water damage? Can we recover it without having to intervene physically? What does the ‘invisible’ text say and what if anything does this tell us about the object’s history? Is the reflectance signature of the ink compatible with the proposed date or provenance of the object? That’s just for starters. But you can see how even framing the right question requires a range of perspectives; we have to understand what kinds of properties MSI is likely to illuminate, what kinds of questions the material objects themselves suggest or demand, what the historical and scholarly stakes are, what the wider implications for our and others’ collections are, and how best to facilitate human interface with the data that we collect. No single person on the team commands all of this.

Working in any large group can be a challenge. But when it all comes together, it is worth it. Below is a page from Jantz 723, one processed as a black and white image and the other a Principal Component Analysis produced by the MSI capture and processed using ImageJ and a set of tools created by Bill Christens-Barry of R.B. Toth Associates with false color applied using Photoshop. Using MSI we were able to better reveal this watermark which had previously been obscured.

I think we feel like 16-year-old kids with newly minted drivers’ licenses who have never driven a car on the highway or out of town. A whole new world has just opened up to us, and we are really excited and a little apprehensive!

What now?

Practice, experiment, document, refine. Over the next 12 (16? 18) months we will work together to hone our collective skills, driving the system, deepening our understanding of the scholarly, conservation, and curatorial use-cases for the technology, optimizing workflow, documenting best practices, getting a firm grip on scale, pace, and cost of what we can do. The team will assemble monthly, practice what we have learned, and lean on each other’s expertise to develop a solid workflow that includes the right expertise at the right time. We will select a wide variety of materials so that we can develop a feel for how far we can push the system and what we can expect day to day. During all of this practice, workflows, guidelines, policies and expectations will come into sharper focus.

As you can tell from the above, we are going to learn a lot over the coming months. We plan to share what we learn via regular posts here and elsewhere. Although we are not prepared yet to offer MSI as a standard library service, we are interested to hear your suggestions for Duke Library collection items that may benefit from MSI imaging. We have a long queue of items that we would like to shoot, and are excited to add more research questions, use cases, and new opportunities to push our skills forward. To suggest materials, contact Molly Bragg, Digital Collections Program Manager (molly.bragg at Duke.edu), Joshua Sosin, Associate Professor in Classical Studies & History (jds15 at Duke.edu) or Curator of Collections (andrew.armacost at Duke.edu).