Category Archives: Behind the Scenes

Change is afoot in Software Development and Integration Services

We’re experimenting with changing our approach to projects in Software Development and Integration Services (SDIS). There’s been much talk of Agile (see the Agile Manifesto) over the past few years within our department, but we’ve faced challenges implementing this as an approach to our work given our broad portfolio, relatively small team, and large number of internal stakeholders.

After some productive conversations among staff and managers in SDIS where we reflected on our work over the past few years we decided to commit to applying the Scrum framework to one or more projects.

Scrum Framework
Source: https://commons.wikimedia.org/wiki/File:Scrum_Framework.png

There are many resources available for learning about Agile and Scrum. The resources I’ve found most useful so far in learning about the framework include:

Scrum seems best suited to developing new products or software and defines the roles, workflow, and artifacts that help a team make the most of its capacity to build the highest value features first and deliver usable software on a regular and frequent schedule.

To start, we’ll be applying this process to a new project to build a prototype of a research data repository based on Hyrax. We’ve formed a small team, including a product owner, scrum master, and development team to build the repository. So far, we’ve developed an initial backlog of requirements in the form of user stories in Jira, the software we use to manage projects. We’ve done some backlog refinement to prioritize the most important and highest value features, and defined acceptance criteria for the ones that we’ll consider first. The development team has estimated the story points (relative estimate of effort and complexity) for some of the user stories to help us with sprint planning and release projection. Our first two-week sprint will begin the week after Thanksgiving. By the end of January we expect to have completed four, two-week sprints and have a pilot ready with a basic set of features implemented for evaluation by internal stakeholders.

One of the important aspects of Scrum is that group reflection on the process itself is built into the workflow through retrospective meetings after each sprint. Done right, routine retrospectives serve to reinforce what is working well and allows for adjustments to address things that aren’t. In the future we hope to adapt what we learn from applying the Scrum framework to the research data repository pilot to improve our approach to other aspects of our work in SDIS.

The Letter Compels You!

A few weeks ago at Duke Libraries, we had our 5th annual “Screamfest.” The event, which occurs on Halloween, is when the David M. Rubenstein Rare Book & Manuscript Library shows off unique holdings related to extrasensory perception, premature burial, 16th century witches, devils (not just blue ones), creepy advertisements, eerie pulp fiction, scary zines and more. Attendees sometimes show up in costumes, and there is of course, lots of candy. I always eat too much.

As I was looking through the various materials on display, there was one item in particular that seemed to draw me in. In fact, you could say I was compelled to read it, almost as if I was not in control of my actions! It’s a simple one-page letter, written in 1949 by Luther M. Schulze, a Lutheran pastor in Washington, D.C., addressed to J.B Rhine, the scientist who founded parapsychology as a branch of psychology, and started the Duke Parapsychology Laboratory, which operated at Duke University from 1930 until the 1960’s. Parapsychology is the study of phenomena such as telepathy, clairvoyance, hypnosis, psychokinesis and other paranormal mysteries.

The 1949 letter from the Rev. Luther Schulze to J.B. Rhine. (click to enlarge)

The letter begins: “We have in our congregation a family who are being disturbed by poltergeist phenomena. It first appeared about January 15, 1949. The family consists of the maternal grandmother, a fourteen (year) old boy who is an only child, and his parents. The phenomena is present only in the boy’s presence. I had him in my home on the night of February 17-18 to observe for myself. Chairs moved with him and one threw him out. His bed shook whenever he was in it.” The letter also states that his family says that “words appeared on the boy’s body” and he “has visions of the devil and goes into a trance and speaks in a strange language”

As a fan of classic horror films, this letter immediately reminded me of what is generally regarded to be the scariest movie of all time, “The Exorcist.” I was too young to see the film when it was originally released in 1973, but got the chance to see the director’s cut on the big screen in 2000. It’s definitely the scariest movie of all time, but not because of gratuitous gore like you see in today’s monotonously-sadistic slasher films. The Exorcist is the scariest movie ever because it expertly taps into one of the central fears within our Judeao-Christian collective subconscious: That evil isn’t just something we battle outside of ourselves. The most frightening evil of all is that which can take root within us.

It turns out there’s a direct link between this mysterious letter to J.B Rhine and “The Exorcist.” William Peter Blatty, who wrote the 1971 novel and adapted it for the film, based his book on a real-life 1949 exorcism performed by Jesuit priests in St. Louis. The exorcism was performed on a 14-yr-old boy under the pseudonym of “Roland Doe” and that is the same boy that Rev. Schulze is referring to in his letter to J.B. Rhine at Duke. When Rhine received the letter, Roland’s family had taken him to St. Louis for the exorcism, having given up on conventional psychiatry. Blatty changed the gender and age of the child for his novel and screenplay, but many of the occurances described in the letter are recognizable to anyone familiar with the book or movie.

The reply from J.B. Rhine to the Rev. Luther Schulze. (click to enlarge)

Unfortunately for this blog post, poltergeists or demons or psychosomatic illnesses (depending on your point of view) often vanish as unexpectedly as they show up, and that’s what happened in this case. After an initial reply to the letter from L.E. Rhine, his wife and lab partner, J.B. Rhine responded to Rev. Schulze that he was “deeply interested in this case,” and that “the most likely normal explanation is that the boy is, himself led to create the effect of being the victim of mysterious agencies or forces and might be sincerely convinced of it. Such movements as those of the chair and bed might, from your very brief account of them, have originated within himself.” Part of the reason Rhine was successful in his field is that he was an empirical skeptic. Rhine later visited Schulze in person, but by then, the exorcism had ended, and Roland’s condition had returned to normal.

According to subsequent research, Roland married, had children and leads a quiet, ordinary life near Washington, D.C. He refuses to talk about the events of 1949, other than saying he doesn’t remember. In the mid-1960’s, Duke and J.B. Rhine parted ways, and the Duke Parapsychology Lab closed. This was likely due in part to the fact that, despite Rhine’s extensive research and empirical testing, parapsychology was, and still is, considered a dubious pseudoscience. Duke probably realized the association wasn’t helping their reputation as a stellar academic institution. The Rhines continued their research, setting up the “Foundation for Research on the Nature of Man,” independently of Duke. But the records of the Duke Parapsychology Laboratory are available for study at Duke Libraries. I wonder what other dark secrets might be discovered,  brought to light and exorcized?

Yasak/Banned Kiosk

Recently I worked on a simple kiosk for a new exhibit in the library, Yasak/Banned: Political Cartoons from Late Ottoman and Republican Turkey. The interface presents users with three videos featuring the curators of the exhibit that explain the historical context and significance of the items in the exhibit space. We also included a looping background video that highlights many of the illustrations from the exhibit and plays examples of Turkish music to help envelop visitors into the overall experience.

With some of the things I’ve built in the past, I would setup different section of an interface to view videos, but in this case I wanted to keep things as simple as possible. My plan was to open each video in an overlay using fancybox. However, I didn’t want the looping background audio to interfere with the curator videos. We also needed to include a ‘play/pause’ button to turn off the background music in case there was something going on in the exhibit space. And I wanted all these transitions to be as smooth as possible so that the experience wouldn’t be jarring. What seemed reasonably simple on the surface proved a little more difficult than I thought.

After trying a few different approaches with limited success, as usual stackoverflow revealed a promising direction to go in — the key turned out to be the .animate jQuery method.

The first step was to write functions to ‘play/pause’ – even though in this case we’d let the background video continue to play and only lower the volume to zero. The playVid function sets the volume to 0, then animates it back up to 100% over three seconds. pauseVid does the inverse, but more quickly.

function playVid() {
  vid.volume = 0;
  $('#myVideo').animate({
    volume: 1
  }, 3000); // 3 seconds
  playBtn.style.display = 'none';
  pauseBtn.style.display = 'block';
}

function pauseVid() {
  // really just fading out music
  vid.volume = 1;
  $('#myVideo').animate({
    volume: 0
  }, 750); // .75 seconds

  playBtn.style.display = 'block';
  pauseBtn.style.display = 'none';
}

The play and pause buttons, which are positioned on top of each other using CSS, are set to display or hide in the functions above. The are also set to call the appropriate function using an onclick event. Their markup looks like this:

<button onclick="playVid()" type="button" id="play-btn">Play</button>
<button onclick="pauseVid()" type="button" id="pause-btn">Pause</button>

Next I added an onclick event calling our pause function to the link that opens our fancybox window with the video in it, like so:

<a data-fancybox-type="iframe" href="https://my-video-url" onclick="pauseVid();"><img src="the-video-thumbnail" /></a>

And finally I used the fancybox callback afterClose to ramp up the background video volume over three seconds:

afterClose: function () {
  vid.volume = 0;
  $('#myVideo').animate({
    volume: 1
  }, 3000);
  playBtn.style.display = 'none';
  pauseBtn.style.display = 'block';
},

play/pause demo
play/pause demo

You can view a demo version and download a zip of the assets in case it’s helpful. I think the final product works really well and I’ll likely use a similar implementation in future projects.

New and Recently Migrated Digital Collections

In the past 3 months, we have launched a number of exciting digital collections!  Our brand new offerings are either available now or will be very soon.  They are:

  • Duke Property Plats: https://repository.duke.edu/dc/uapropplat
  • Early Arabic Manuscripts (included in the recently migrated Early Greek Manuscripts): https://repository.duke.edu/dc/earlymss
  • International Broadsides (added to migrated Broadsides and Ephemera collection): https://repository.duke.edu/dc/broadsides
  • Orange County Tax List Ledger, 1875: https://repository.duke.edu/dc/orangecountytaxlist
  • Radio Haiti Archive, second batch of recordings: https://repository.duke.edu/dc/radiohaiti
  • William Gedney Finished Prints and Contact Sheets (newly re-digitized with new and improved metadata): https://repository.duke.edu/dc/gedney
A selection from the William Gedney Photographs digital collection

In addition to the brand new items, the digital collections team is constantly chipping away at the digital collections migration.  Here are the latest collections to move from Tripod 2 to the Duke Digital Repository (these are either available now or will be very soon):

One of the Greek items in the Early Manuscripts Collection.

Regular readers of Bitstreams are familiar with our digital collections migrations project; we first started writing about it almost 2 years ago when we announced the first collection to be launched in the new Duke Digital Repository interface.  Since then we have posted about various aspects of the migration with some regularity.

What we hoped would be a speedy transition is still a work in progress 2 years later.   This is due to a variety of factors one of which is that the work itself is very complex.  Before we can move a collection into the digital repository it has to be reviewed, all digital objects fully accounted for, and all metadata remediated and crosswalked into the DDR metadata profile.  Sometimes this process requires little effort.   However other times, especially with older collection, we have items with no metadata, or metadata with no items, or the numbers in our various systems simply do not match.  Tracking down the answers can require some major detective work on the part of my amazing colleagues.

Despite these challenges, we eagerly press on.  As each collection moves we get a little closer to having all of our digital collections under preservation control and providing access to all of them from a single platform.  Onward!

September scale-up: promoting the DDR and associated services to faculty and students

It’s September, and Duke students aren’t the only folks on campus in back-to-school mode. On the contrary, we here at the Duke Digital Repository are gearing up to begin promoting our research data curation services in real earnest. Over the last eight months, our four new research data staff have been busy getting to know the campus and the libraries, getting to know the repository itself and the tools we’re working with, and establishing a workflow. Now we’re ready to begin actively recruiting research data depositors!

As our colleagues in Data and Visualization Services noted in a presentation just last week, we’re aiming to scale up our data services in a big way by engaging researchers at all stages of the research lifecycle, not just at the very end of a research project. We hope to make this effort a two-front one. Through a series of ongoing workshops and consultations, the Research Data Management Consultants aspire to help researchers develop better data management habits and take the longterm preservation and re-use of their data into account when designing a project or applying for grants. On the back-end of things, the Content Analysts will be able to carry out many of the manual tasks that facilitate that longterm preservation and re-use, and are beginning to think about ways in which to tweak our existing software to better accommodate the needs of capital-D Data.

This past spring, the Data Management Consultants carried out a series of workshops intending to help researchers navigate the often muddy waters of data management and data sharing; topics ranged from available and useful tools to the occasionally thorny process of obtaining consent for–and the re-use of–data from human subjects.

Looking forward to the fall, the RDM consultants are planning another series of workshops to expand on the sessions given in the spring, covering new tools and strategies for managing research output. One of the tools we’re most excited to share is the Open Science Framework (OSF) for Institutions, which Duke joined just this spring. OSF is a powerful project management tool that helps promote transparency in research and allows scholars to associate their work and projects with Duke.

On the back-end of things, much work has been done to shore up our existing workflows, and a number of policies–both internal and external–have been met with approval by the Repository Program Committee. The Content Analysts continue to become more familiar with the available repository tools, while weighing in on ways in which we can make the software work better. The better part of the summer was devoted to collecting and analyzing requirements from research data stakeholders (among others), and we hope to put those needs in the development spotlight later this fall.

All of this is to say: we’re ready for it, so bring us your data!

Nested Folders of Files in the Duke Digital Repository

Born digital archival material present unique challenges to representation, access, and discovery in the DDR. A hard drive arrives at the archives and we want to preserve and provide access to the files. In addition to the content of the files, it’s often important to preserve to some degree the organization of the material on the hard drive in nested directories.

One challenge to representing complex inter-object relationships in the repository is the repository’s relatively simple object model. A collection contains one or more items. An item contains one or more components. And a component has one or more data streams. There’s no accommodation in this model for complex groups and hierarchies of items. We tend to talk about this as a limitation, but it also makes it possible to provide search and discovery of a wide range of kinds and arrangements of materials in a single repository and forces us to make decisions about how to model collections in sustainable and consistent ways. But we still need to preserve and provide access to the original structure of the material.

One approach is to ingest the disk image or a zip archive of the directories and files and store the content as a single file in the repository. This approach is straightforward, but makes it impossible to search for individual files in the repository or to understand much about the content without first downloading and unarchiving it.

As a first pass at solving this problem of how to preserve and represent files in nested directories in the DDR we’ve taken a two-pronged approach. We will use a simple approach to modeling disk image and directory content in the repository. Every file is modeled in the repository as an item with a single component that contains the data stream of the file. This provides convenient discovery and access to each individual file from the collection in the DDR, but does not represent any folder hierarchies. The files are just a flat list of objects contained by a collection.

To preserve and store information about the structure of the files we add an XML METS structMap as metadata on the collection. In addition we store on each item a metadata field that stores the complete original file path of the file.

Below is a small sample of the kind of structural metadata that encodes the nested folder information on the collection. It encodes the structure and nesting, directory names (in the LABEL attribute), the order of files and directories, as well as the identifiers for each of the files/items in the collection.

<?xml version="1.0"?>
<mets xmlns="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink">
  <metsHdr>
    <agent ROLE="CREATOR">
      <name>REPOSITORY DEFAULT</name>
    </agent>
  </metsHdr>
  <structMap TYPE="default">
    <div LABEL="2017-0040" ORDER="1" TYPE="Directory">
      <div ORDER="1">
        <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk42j6qc37"/>
      </div>
      <div LABEL="RL11405-LFF-0001_Programs" ORDER="2" TYPE="Directory">
        <div ORDER="1">
          <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk4j67r45s"/>
        </div>
        <div ORDER="2">
          <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk4d50x529"/>
        </div>
        <div ORDER="3">
          <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk4086jd3r"/>
        </div>
      </div>
      <div LABEL="RL11405-LFF-0002_H1_Early-Records-of-Decentralization-Conference" ORDER="3" TYPE="Directory">
        <div ORDER="1">
          <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk4697f56f"/>
        </div>
        <div ORDER="2">
          <mptr LOCTYPE="ARK" xlink:href="ark:/99999/fk45h7t22s"/>
        </div>
      </div>
    </div>
  </structMap>
</mets>

Combining the 1:1 (item:component) object model with structural metadata that preserves the original directory structure of the files on the file system enables us to display a user interface that reflects the original structure of the content even though the structure of the items in the repository is flat.

There’s more to it of course. We had to develop a new ingest process that could take as its starting point a file path and then crawl it and its subdirectories to ingest files and construct the necessary structural metadata.

On the UI end of things a nifty Javascript plugin called jsTree powers the interactive directory structure display on the collection page.

Because some of the collections are very large and loading a directory tree structure of 100,000 or more items would be very slow, we implemented a small web service in the application that loads the jsTree data only when someone clicks to open a directory in the interface.

The file paths are also keyword searchable from within the public interface. So if a file is contained in a directory named “kitchen/fruits/bananas/this-banana.txt” you would be able to find the file this-banana.txt by searching for “kitchen” or “fruit” or “banana.”

This new functionality to ingest, preserve, and represent files in nested folder structures in the Duke Digital Repository will be included in the September release of the Duke Digital Repository.

SNCC Digital Gateway Homepage Updates

Earlier this summer I worked with the SNCC Digital Gateway team to launch a revised version of their homepage. The SNCC Digital Gateway site originally was launched in the Fall of 2016. Since then much more content has been incorporated into the site. The team and their advisory board wanted to highlight some of this new content on the homepage (by making it scrollable) while also staying true to the original design.

The previous version of the homepage included two main features:

  • a large black and white photograph that would randomly load (based on five different options) every time a user visited the page
  • a ‘fixed’ primary navigation in the footer

Rotating Background Images

In my experience, the ‘go to’ approach for doing any kind of image rotation is to use a Javascript library, probably one that relies on jQuery. My personal favorite for a long time has been jQuery Cycle 2 which I appreciated for it’s lightweight, flexible implementation, and price ($free!). With the new SNCC homepage, I wanted to figure out a way to both crossfade the background images and fade the caption text in and out elegantly. It was also critical that the captions match up perfectly with their associated images. I was worried that doing this with Cycle 2 was going to be overly complicated with respect to syncing the timing, as in some past projects I’d run into trouble keeping discrete carousels locked in sync after several iterations — for example, with leaving the page up and running for several minutes.

I decided to try and build the SNCC background rotation using CSS animations. In the past I’d shied away from using CSS animations for anything that was presented as a primary feature or that was complex as the browser support was spotty. However, the current state of browser support is better, even though it still has a ways to go. In my first attempt I tried crossfading the images as backgrounds in a wrapper div, as this was going to make things work with resizing the page much easier by using background-size: cover property. But I discovered that animating background images isn’t actually supported in the spec, even though it worked perfectly in Chrome and Opera. So instead I went with the approach where you stack the images on top of each other and change the opacity one at a time, like so:

<div class="bg-image-wrapper">
  <img src="image-1.jpg" alt="">
  <img src="image-2.jpg" alt="">
  <img src="image-3.jpg" alt="">
  <img src="image-4.jpg" alt="">
  <img src="image-5.jpg" alt="">
</div>

I setup the structure for the captions in a similar way:

<div id="home-caption">
  <li>caption 1</li>
  <li>caption 2</li>
  <li>caption 3</li>
  <li>caption 4</li>
  <li>caption 5</li>
</div>

I won’t bore you with the details of CSS animation, but in short they are based on keyframes that can be looped and applied to html elements. The one thing that proved to be a little tricky was the timing between the images and the captions, as the keyframes are represented in percentages of the entire animation. This was further complicated by the types of transitions I was using (crossfading the images and linearly fading the captions) and that I wanted to slightly stagger the caption animations so that they would come in after the crossfade completes and transition out just before the next crossfade starts, like so:

Crossfade illustration
As time moves from left to right, the images and captions have independent transitions

The SNCC team and I also discussed a few options for the overall timing of the transitions and settled on eight seconds per image. With five images in our rotation, the total time of the animation would be 40 seconds. The entire animation is applied to each image, and offset with a delay based on their position in the .bg-image-wrapper stack. The CSS for the images looks like this:

.bg-image-wrapper img {
  animation-name: sncc-fader;
  animation-timing-function: ease-in-out;
  animation-iteration-count: infinite;
  animation-duration: 40s;
}


@keyframes sncc-fader {
  0% {
    opacity:1;
  }
  16% {
    opacity:1;
  }
  21% {
    opacity:0;
  }
  95% {
    opacity:0;
  }
  100% {
    opacity:1;
  }
}

.bg-image-wrapper img:nth-of-type(1) {
  animation-delay: 32s;
}
.bg-image-wrapper img:nth-of-type(2) {
  animation-delay: 24s;
}
.bg-image-wrapper img:nth-of-type(3) {
  animation-delay: 16s;
}
.bg-image-wrapper img:nth-of-type(4) {
  animation-delay: 8s;
}
.bg-image-wrapper img:nth-of-type(5) {
  animation-delay: 0;
}

The resulting animation looks something like this:

SNCC example rotation

The other piece of the puzzle was emulating the behavior of background: cover which resizes a background image to fill the entire width of a div and positions the image vertically in a consistent way. In general I really like using this attribute. I struggled to get things working on my own, but eventually came across a great code example of how to get things working. So I copied that implementation and it worked perfectly.

Fixed Nav

I was worried that getting the navigation bar to stay consistently positioned at the bottom of the page and allowing for scrolling — while also working responsively — was going to be a bit of a challenge. But in the end the solution was relatively simple.

The navigation bar is structured in a very typical way — as an unordered list with each menu element represented as a list item, like so:

<div id="navigation">
    <ul>
      <li><a href="url1">menu item 1</a></li>
      <li><a href="url2">menu item 2</a></li>
      <li><a href="url3">menu item 3</a></li>
    </ul>
</div>

To get it to ‘stick’ to the bottom of the page, I just placed it using position: absolute, gave it a fixed height, and set the width to 100%. Surprisingly, worked great just like that, and also allowed the page to be scrolled to reveal the content further down the page.


You can view the updated homepage by visiting snccdigital.org.

Turning on the Rights in the Duke Digital Repository

As 2017 reaches its halfway point, we have concluded another busy quarter of development on the Duke Digital Repository (DDR). We have several new features to share, and one we’re particularly delighted to introduce is Rights display.

Back in March, my colleague Maggie Dickson shared our plans for rights management in the DDR, a strategy built upon using rights status URIs from RightsStatements.org, and in a similar fashion, licenses from Creative Commons. In some cases, we supplement the status with free text in a local Rights Note property. Our implementation goals here were two-fold: 1) use standard statuses that are machine-readable; 2) display them in an easily understood manner to users.

New rights display feature in action on a digital object.

What to Display

Getting and assigning machine-readable URIs for Rights is a significant milestone in its own right. Using that value to power a display that makes sense to users is the next logical step. So, how do we make it clear to a user what they can or can’t do with a resource they have discovered? While we could simply display the URI and link to its webpage (e.g., http://rightsstatements.org/vocab/InC-EDU/1.0/ ) the key info still remains a click away. Alternatively, we could display the rights statement or license title with the link, but some of them aren’t exactly intuitive or easy on the eyes. “Attribution-NonCommercial-NoDerivatives 4.0 International,” anyone?

Inspiration

Looking around to see how other cultural heritage institutions have solved this problem led us to very few examples. RightsStatements.org is still fairly new and it takes time for good design patterns to emerge. However, Europeana — co-champion of the RightsStatements.org initiative along with DPLA — has a stellar collections site, and, as it turns out, a wonderfully effective design for displaying rights statuses to users. Our solution ended up very much inspired by theirs; hats off to the Europeana team.

Image from Europeana site.
Europeana Collections UI.

Icons

Both Creative Commons and RightsStatements.org provide downloadable icons at their sites (here and here). We opted to store a local copy of the circular SVG versions for both to render in our UI. They’re easily styled, they don’t take up a lot of space, and used together, they have some nice visual unity.

Rights & Licenses Icons
Circular icons from Creative Commons & RightsStatements.org

Labels & Titles

We have a lightweight Rails app with an easy-to-use administrative UI for managing auxiliary content for the DDR, so that made a good home for our rights statuses and associated text. Statements are modeled to have a URI and Title, but can also have three additional optional fields: short title, re-use text, and an array of icon classes.

Editing rights info associated with each statement.

Displaying the Info

We wanted to be sure to show the rights status in the flow of the rest of an object’s metadata. We also wanted to emphasize this information for anyone looking to download a digital object. So we decided to render the rights status prominently in the download menu, too.

Rights status in download menu
Rights status displays in the download menu.

 

Rights status also displays alongside other metadata.

What’s Next

Our focus in this area now shifts toward applying these newly available rights statuses to our existing digital objects in the repository, while ensuring that new ingests/deposits get assessed and assigned appropriate values. We’ll also have opportunities to refine where and how the statuses get displayed. We stand to learn a lot from our peer organizations implementing their own rights management strategies, and from our visitors as they use this new feature on our site. There’s a lot of work ahead, but we’re thrilled to have reached this noteworthy milestone.

Infrastructure and Multispectral Imaging in the Library

As we continue to work on our “standard” full color digitization projects such as Section A and the William Gedney Photograph Collection, both of which are multiyear projects, we are still hard at work with a variety of things related to Multispectral Imaging (MSI).  We have been writing documentation and posting it to our Knowledgebase, building tools to track MSI requests and establishing a dedicated storage space for MSI image stacks.  Below are some high-level details about these things and the kinks we are ironing out of the MSI process.  As with any new venture, it can be messy in the beginning and tedious to put all the details in order but in the end it’s worth it.

MSI Knowledge Base

We established a knowledge base for documents related to MSI that cover a wide variety of subjects:  How-To articles, to do lists, templates, notes taken during imaging sessions, technical support issues and more.  These documents will help us develop sound guidelines and workflows which in turn will make our work in this area more consistent, efficient and productive.

Dedicated storage space

Working with other IT staff, a new server space has been established specifically for MSI.  This is such a relief because, as we began testing the system in the early days, we didn’t have a dedicated space for storing the MSI image stacks and most of our established spaces were permissions restricted, preventing our large MSI group from using it.  On top of this we didn’t have any file management strategies in place for MSI.  This made for some messy file management. From our first demo, initial testing and eventual purchase of the system, we used a variety of storage spaces and a number of folder structures as we learned the system.  We used our shared Library server, the Digital Production Center’s production server, Box and Google Drive.  Files were all over the place!  What a mess!  In our new dedicated space, we have established standard folder structures and file management strategies and store all of our MSI image stacks in one place now.  Whew!

The Request Queue

In the beginning, once the MSI system was up and running, our group had a brainstorming session to identify a variety of material that we could use to test with and hone our skills in using the new system.  Initially this queue was a bulleted list in Basecamp identifying an item.  As we worked through the list it would sometimes be confusing as to what had already been done and what item was next.  This process became more cumbersome because multiple people were working through the list at the same time, both on capture and processing, with no specific reporting mechanism to track who was doing what.  We have recently built an MSI Request Queue that tracks items to be captured in a more straightforward, clear manner.  We have included title, barcode and item information along with the research question to be answered, it priority level, due date, requester information and internal contact information.  The MSI group will use this queue for a few weeks then tweak it as necessary.  No more confusion.

The Processing Queue

As described in a previous post, capturing with MSI produces lots of image stacks that contain lots of files.  On average, capturing one page can produce 6 image stacks totaling 364 images.  There are 6 different stages of conversion/processing that the image stack goes through before it might be considered “done”, and the fact that everyone on the MSI team has other job responsibilities makes it difficult to carve out a large enough block of time to convert and process the image stacks through all of the stages.  This made it difficult to know what items had been completely processed or not.  We have recently built an MSI Processing Queue that tracks what stage of processing each item is in.  We have included root file names, flat field information, PPI and a column for each phase of processing to indicate whether or not an image stack has passed through a phase.  As with the Request Queue, the MSI group will use this queue for a few weeks then tweak it as necessary.  No more confusion.

Duke University East Campus Progress Picture #27

As with most blog posts, the progress described above has been boiled down and simplified as to not bore you to death, but this is a fair amount of work nonetheless.  Having dedicated storage and a standardized folder structure simplifies the management of lots of files and puts them in a predictable structure.  Streamlining the Request Queue establishes a clear path of work and provides enough information about the request in order to move forward with a clear goal in mind.  Developing a Processing Queue that provides a snapshot of the state of processing across multiple requests and provides enough information so that any staff member familiar with our MSI process can complete a request.  Establishing a knowledge base to document our workflows and guidelines ties everything together in an organized and searchable manner making it easier to find information about established procedures and troubleshoot technical problems.

It is important to put this infrastructure in place and build a strong foundation for Multispectral Imaging at the Library so it will scale in the future.  This is only the beginning!

_______________________

Want to learn even more about MSI at DUL?

 

 

A Summer Day in the Life of Digital Collections

A recent tweet from my colleague in the Rubenstein Library (pictured above) pretty much sums up the last few weeks at work.  Although I rarely work directly with students and classes, I am still impacted by the hustle and bustle in the library when classes are in session.  Throughout the busy Spring I found myself saying, oh I’ll have time to work on that over the Summer.  Now Summer is here, so it is time to make some progress on those delayed projects while keeping others moving forward.  With that in mind here is your late Spring and early Summer round-up of Digital Collections news and updates.

Radio Haiti

A preview of the soon to be live Radio Haiti Archive digital collection.

The long anticipated launch of the Radio Haiti Archives is upon us.  After many meetings to review the metadata profile, discuss modeling relationships between recordings, and find a pragmatic approach to representing metadata in 3 languages all in the Duke Digital Repository public interface, we are now in preview mode, and it is thrilling.  Behind the scenes, Radio Haiti represents a huge step forward in the Duke Digital Repository’s ability to store and play back audio and video files.

You can already listen to many recordings via the Radio Haiti collection guide, and we will share the digital collection with the world in late June or early July.  In the meantime, check out this teaser image of the homepage.

 

Section A

My colleague Meghan recently wrote about our ambitions Section A digitization project, which will result in creating finding aids for and digitizing 3000+ small manuscript collections from the Rubenstein library.  This past week the 12 people involved in the project met to review our workflow.  Although we are trying to take a mass digitization and streamlined approach to this project, there are still a lot of people and steps.  For example, we spent about 20-30 minutes of our 90 minute meeting reviewing the various status codes we use on our giant Google spreadsheet and when to update them. I’ve also created a 6 page project plan that encompasses both a high and medium level view of the project. In addition to that document, each part of the process (appraisal, cataloging review, digitization, etc.) also has their own more detailed documentation.  This project is going to last at least a few years, so taking the time to document every step is essential, as is agreeing on status codes and how to use them.  It is a big process, but with every box the project gets a little easier.

Status codes for tracking our evaluation, remediation, and digitization workflow.
Section A Project Plan Summary

 

 

 

 

 

 

 

Diversity and Inclusion Digitization Initiative Proposals and Easy Projects

As Bitstreams readers and DUL colleagues know, this year we instituted 2 new processes for proposing digitization projects.  Our second digitization initiative deadline has just passed (it was June 15) and I will be working with the review committee to review new proposals as well as reevaluate 2 proposals from the first round in June and early July.  I’m excited to say that we have already approved one project outright (Emma Goldman papers), and plan to announce more approved projects later this Summer. 

We also codified “easy project” guidelines and have received several easy project proposals.  It is still too soon to really assess this process, but so far the process is going well.

Transcription and Closed Captioning

Speaking of A/V developments, another large project planned for this Summer is to begin codifying our captioning and transcription practices.  Duke Libraries has had a mandate to create transcriptions and closed captions for newly digitized A/V for over a year. In that time we have been working with vendors on selected projects.  Our next steps will serve two fronts; on the programmatic side we need  review the time and expense captioning efforts have incurred so far and see how we can scale our efforts to our backlog of publicly accessible A/V.  On the technology side I’ve partnered with one of our amazing developers to sketch out a multi-phase plan for storing and providing access to captions and time-coded transcriptions accessible and searchable in our user interface.  The first phase goes into development this Summer.  All of these efforts will no doubt be the subject of a future blog post.  

Testing VTT captions of Duke Chapel Recordings in JWPlayer

Summer of Documentation

My aspirational Summer project this year is to update digital collections project tracking documentation, review/consolidate/replace/trash existing digital collections documentation and work with the Digital Production Center to create a DPC manual.  Admittedly writing and reviewing documentation is not the most exciting Summer plan,  but with so many projects and collaborators in the air, this documentation is essential to our productivity, communication practices, and my personal sanity.   

Late Spring Collection launches and Migrations

Over the past few months we launched several new digital collections as well as completed the migration of a number of collections from our old platform into the Duke Digital Repository.  

New Collections:

Migrated Collections:

…And so Much More!

In addition to the projects above, we continue to make slow and steady progress on our MSI system, are exploring using the FFv1 format for preserving selected moving image collections, planning the next phase of the Digital Collections migration into the Duke Digital Repository, thinking deeply about collection level metadata and structured metadata, planning to launch newly digitized Gedney images, integrating digital objects in finding aids and more.  No doubt some of these efforts will appear in subsequent Bitstreams posts.  In the meantime, let’s all try not to let this Summer fly by too quickly!

Enjoy Summer while you can!