This post was written by Jen Jordan, a graduate student at Simmons University studying Library Science with a concentration in Archives Management. She is the Digital Collections intern with the Digital Collections and Curation Services Department. Jen will complete her masters degree in December 2021.
The Digital Production Center (DPC) is thrilled to announce that work is underway on a 3-year long National Endowment for the Humanities (NEH) grant-funded project to digitize the entirety of Behind the Veil: Documenting African-American Life in the Jim Crow South, an oral history project that produced 1,260 interviews spanning more than 1,800 audio cassette tapes. Accompanying the 2,000 plus hours of audio is a sizable collection of visual materials (e.g.- photographic prints and slides) that form a connection with the recorded voices.
We are here to summarize the logistical details relating to the digitization of this incredible collection. To learn more about its historical significance and the grant that is funding this project, titled “Documenting African American Life in the Jim Crow South: Digital Access to the Behind the Veil Project Archive,” please take some time to read the July announcementwritten by John Gartrell, Director of the John Hope Franklin Research Center and Principal Investigator for this project. Co-Principal Investigator of this grant is Giao Luong Baker, Digital Production Services Manager.
Digitizing Behind the Veil (BTV) will require, in part, the services of outside vendors to handle the audio digitization and subsequent captioning of the recordings. While the DPC regularly digitizes audio recordings, we are not equipped to do so at this scale (while balancing other existing priorities). The folks at Rubenstein Library have already been hard at work double checking the inventory to ensure that each cassette tape and case are labeled with identifiers. The DPC then received the tapes, filling 48 archival boxes, along with a digitization guide (i.e. – an Excel spreadsheet) containing detailed metadata for each tape in the collection. Upon receiving the tapes, DPC staff set to boxing them for shipment to the vendor. As of this writing, the boxes are snugly wrapped on a pallet in Perkins Shipping & Receiving, where they will soon begin their journey to a digital format.
The wait has begun! In eight to twelve weeks we anticipate receiving the digital files, at which point we will perform quality control (QC) on each one before sending them off for captioning. As the captions are returned, we will run through a second round of QC. From there, the files will be ingested into the Duke Digital Repository, at which point our job is complete. Of course, we still have the visual materials to contend with, but we’ll save that for another blog!
As we creep closer to the two-year mark of the COVID-19 pandemic and the varying degrees of restrictions that have come with it, the DPC will continue to focus on fulfilling patron reproduction requests, which have comprised the bulk of our work for some time now. We are proud to support researchers by facilitating digital access to materials, and we are equally excited to have begun work on a project of the scale and cultural impact that is Behind the Veil. When finished, this collection will be accessible for all to learn from and meditate on—and that’s what it’s all about.
In spite of the dumpster fire of 2020, Duke Digital Collections had a productive and action packed year (maybe too action packed at times).
Per usual we launched new and added content to existing digital collections (full list below). We are also wrapping up our mega-migration from our old digital collections system (Tripod2) to the Duke Digital Repository! This migration has been in process for 5 years, yes 5 years. We plan to celebrate this exciting milestone more in January so stay tuned.
The Digital Production Center, in collaboration with the Rubenstein Library, shifted to a new folder level workflow for patron and instruction requests. This workflow was introduced just in time for the pandemic and the resulting unprecedented number of digitization requests. As a result of the demand for digital images, all project work has been put aside and the DPC is focusing on patron and instruction requests only. Since late June, the DPC has produced over 40,000 images!
Looking ahead to 2021, our priorities will be the folder level digitization workflow for researcher and instruction requests. The DPC received 200+ requests since June, and we need to get all those digitized folders moved into the repository. We are also experimenting with preserving scans created outside of the DPC. For example Rubenstein Library staff created a huge number of access copies using reading room scanners, and we would like to make them available to others. Lastly, we have a few bigger digital collections to ingest and launch as well.
Thanks to everyone associated with Digital Collections for their incredible work this year!! Whew, it has been…a year.
Earlier this year and prior to the pandemic, Digital Production Center (DPC) staff piloted an alternative approach to digitize patron requests with the Rubenstein Library’s Research Services (RLRS) team. The previous approach was focused on digitizing specific items that instruction librarians and patrons requested, and these items were delivered directly to that person. The alternative strategy, the Folder Level digitization approach, involves digitizing the contents of the entire folder that the item is contained in, ingesting these materials to the Duke Digital Repository (to enable Duke Library staff to retrieve these items), and when possible, publishing these materials so that they are available to anyone with internet access. This soft launch prepared us for what is now an all-hands-on-deck-but-in-a-socially-distant-manner digitization workflow.
Since returning to campus for onsite digitization in late June, the DPC’s primary focus has been to perfect and ramp up this new workflow. It is important to note that the term “folder” in this case is more of a concept and that its contents and their conditions vary widely. Some folders may have 2 pages, other folders have over 300 pages. Some folders consists of pamphlets, notebooks, maps, papyri, and bound items. All this to say that a “folder” is a relatively loose term.
Like many initiatives at Duke Libraries, Folder Level Digitization is not just a DPC operation, it is a collaborative effort. This effort includes RLRS working with instructors and patrons to identify and retrieve the materials. RLRS also works with Rubenstein Library Technical Services (RLTS) to create starter digitization guides, which are the building blocks for our digitization guide. Lastly, RLRS vets the materials and determines their level of access. When necessary, Duke Library’s Conservation team steps in to prepare materials for digitization. After the materials are digitized, ingest and metadata work by the Digital Collections and Curation Services as well as the RLTS teams ensure that the materials are preserved and available in our systems.
Doing this work in the midst of a pandemic requires that DPC work closely with the Rubenstein Library Access Services Reproduction Team (a section of RLRS) to track our workflow using a Google Doc. We track the point where the materials are identified by RLRS, through multiple quarantine periods, scanning, post processing, file delivery, to ingest. Also, DPC staff are digitizing in a manner that is consistent with COVID-19 guidelines. Materials are quarantined before and after they arrive at the DPC, machines and workspaces are cleaned before and after use, capture is done in separate rooms, and quality control is done off site with specialized calibrated monitors.
Since we started Folder Level digitization, the DPC has received close to 200 unique Instruction and Patron requests from RLRS. As of the publication of this post, 207 individual folders (an individual request may contain several folders) have been digitized. In total, we’ve scanned and quality controlled over 26,000 images since we returned to campus!
By digitizing entire folders, we hope this will allow for increased access to the materials without risking damage through their physical handling. So far we anticipate that 80 new digital collections will be ingested to the Duke Digital Repository. This number will only grow as we receive more requests. Folder Level Digitization is an exciting approach towards digital collection development, as it is directly responsive to instruction and researcher needs. With this approach, it is access for one, access for all!
The Coronavirus pandemic has me thinking about labor–as a concept, a social process, a political constituency, and the driving force of our economy–in a way that I haven’t in my lifetime. It’s become alarmingly clear (as if it wasn’t before) that we all need food, supplies, and services to survive past next week, and that there are real human beings out there working to produce and deliver these things. No amount of entrepreneurship, innovation, or financial sleight of hand will help us through the coming months if people are not working to provide the basic requirements for life as we know it.
This blog post draws from images in our digitized library collections to pay tribute to all of the essential workers who are keeping us afloat during these challenging times. As I browsed these photographs and mused on our current situation, a few important and oft-overlooked questions came to mind.
Who grows our food? Where does it come from and how is it processed? How does it get to us?
What kind of physical environment do we work in and how does that affect us?
How do we interact with machines and technology in our work? Can our labor be automated or performed remotely?
What equipment and clothing do we need to work safely and productively?
Are we paid fairly for our work? How do relative wages for different types of work reflect what is valued in our society?
How we think about and respond to these questions will inform how we navigate the aftermath of this ongoing crisis and whether or not we thrive into the future. As we celebrate International Workers’ Day on May 1 and beyond, I hope everyone will take some time to think about what labor means to them and to our society as a whole.
‘Tis the time of year for top 10 lists. Here at Duke Digital Collections HQ, we cannot just pick 10, because all our digital collections are tops! What follows is a list of all the digital collections we have launched for public access this calendar year.
Our newest collections include a range of formats and subject areas from 19th Century manuscripts to African American soldiers photograph albums to Duke Mens Basketball posters to our first Multispectral Images of papyrus to be ingested into the repository. We also added new content to 4 existing digital collections. Lastly, our platform migration is still ongoing, but we made some incredible progress this year as you will see below. Our goal is to finish the migration by the end of 2020.
New Digital Collections
African American Soldiers Photo Albums (browse all 8 or 1 by 1 using the links below):
In 2020, we’ll be making significant changes to our systems supporting archival discovery and access. The main impetus for this shift is that our current platform has grown outdated and is no longer sustainable going forward. We intend to replace our platform with ArcLight, open source software backed by a community of peer institutions.
Finding Aids at Duke: Innovations Past
At Duke, we’re no strangers to pushing the boundaries of archival discovery through advances in technology. Way back in the mid 1990s, Duke was among pioneers rendering SGML-encoded finding aids into HTML. For most of the 90s and aughts we used a commercial platform, but we decided to develop our own homegrown finding aids front-end in 2007 (using the Apache Cocoon framework). We then replaced it in 2012 with another in-house platform built on the Django web framework.
Since going home-grown in 2007, we have been able to find some key opportunities to innovate within our platforms. Here are a few examples:
Our current platform was pretty good for its time, but a lot has changed in eight years. The way we build web applications today is much different than it used to be. And beyond desiring a modern toolset, there are major concerns going forward around size, search/indexing, and support.
We have some enormous finding aids. And we have added more big ones over the years. This causes problems of scale, particularly with an interface like ours that renders each collection as a single web page with all of the text of its contents written in the markup. One of our finding aids contains over 21,000 components; all told it is 9MB of raw EAD transformed into 15MB of HTML.
No amount of caching or server wizardry can change the fact that this is simply too much data to be delivered and rendered in a single webpage, especially for researchers in lower-bandwidth conditions. We need a solution that divides the data for any given finding aid into smaller payloads.
Google Custom Search does a pretty nice job of relevance ranking and highlighting where in a finding aid a term matches (after all, that’s Google’s bread-and-butter). However, when used to power search in an application like this, it has some serious limitations. It only returns a maximum of one hundred results per query. Google doesn’t index 100% of the text, especially for our larger finding aids. And some finding aids are just mysteriously omitted despite our best efforts optimizing our markup for SEO and providing a sitemap.
We need search functionality where we have complete control of what gets indexed, when, and how. And we need assurance that the entirety of the materials described will be discoverable.
This is a familiar story. Homegrown applications used for several years by organizations with a small number of developers and a large number of projects to support become difficult to sustain over time. We have only one developer remaining who can fix our finding aids platform when it breaks, or prevent it from breaking when the systems around it change. Many of the software components powering the system are at or nearing end-of-life and they can’t be easily upgraded.
Where to Go From Here?
It has been clear for awhile that we would soon need a new platform for finding aids, but not as clear what platform we should pursue. We had been eyeing the progress of two promising open source community-built solutions emerging from our peer institutions: the ArchivesSpace Public UI (PUI), and ArcLight.
Over 2018-19, my colleague Noah Huffman and I co-led a project to install pilot instances of the ASpace PUI and ArcLight, index all of our finding aids in them, and then evaluate the platforms for their suitability to meet Duke’s needs going forward. The project involved gathering feedback from Duke archivists, curators, research services staff, and our digital collections implementation team. We looked at six criteria: 1) features; 2) ease of migration/customization; 3) integration with other systems; 4) data cleanup considerations; 5) impact on existing workflows; 6) sustainability/maintenance.
There’s a lot to like about both the ASpace PUI and ArcLight. Feature-wise, they’re fairly comparable. Both are backed by a community of talented, respected peers, and either would be a suitable foundation for a usable, accessible interface to archives. In the end, we recommended that Duke pursue ArcLight, in large part due to its similarity to so much of the other software in our IT portfolio.
Duke is certainly not alone in our desire to replace an outdated, unsustainable homegrown finding aids platform, and intention to use ArcLight as a replacement.
This fall, with tremendous leadership from Stanford University Libraries, five universities collaborated on developing the ArcLight software further to address shared needs. Over a nine week work cycle from August to October, we had the good fortune of working alongside Stanford, Princeton, Michigan, and Indiana. The team addressed needs on several fronts, especially: usability, accessibility, indexing, context/navigation, and integrations.
Three Duke staff members participated: I was a member of the Development Team, Noah Huffman a member of the Product Owners Team, and Will Sexton on the Steering Group.
The work cycle is complete and you can try out the current state of the core ArcLight demo application. It includes several finding aids from each of the participating partner institutions. Here are just a few highlights that have us excited about bringing ArcLight to Duke:
Here’s a final demo video (37 min) that nicely summarizes the work completed in the fall 2019 work cycle.
Lighting the Way
With some serious momentum from the fall ArcLight work cycle and plans taking shape to implement the software in 2020, the Duke Libraries intend to participate in the Stanford-led, IMLS grant-funded Lighting the Way project, a platform-agnostic National Forum on Archival Discovery and Delivery. Per the project website:
Lighting the Way is a year-long project led by Stanford University Libraries running from September 2019-August 2020 focused on convening a series of meetings focused on improving discovery and delivery for archives and special collections.
Coming in 2020: ArcLight Implementation at Duke
There’ll be much more share about this in the new year, but we are gearing up now for a 2020 ArcLight launch at Duke. As good as the platform is now out-of-the-box, we’ll have to do additional development to address some local needs, including:
An efficient preview/publication workflow
Digital object viewing / repository integration
Some data cleanup
Building these local customizations will be time well-spent. We’ll also look for more opportunities to collaborate with peers and contribute code back to the community. The future looks bright for Duke with ArcLight lighting the way.
The ongoing tensions between academic institutions and publishers have been escalating the last few months, but those tensions have existed for many years. The term “Big Deal” has been coined to describe a long-standing, industry-wide practice of journal bundling that forces libraries to subscribe to unwanted and unneeded publications rather than paying more for a limited number of individual subscriptions. This is a practice you see in other industries – for example, cable packages that provide hundreds of channels, even if you only want one or two specific channels.
What is especially problematic in higher education is that academics produce and review the content that gets published in the journals (for free), and then the universities have to pay the publishers a subscription fee to access the content. Imagine if YouTube required a subscription fee to watch any videos, including the ones you had posted. It’s a system that makes research harder to access and inhibits global scientific progress, all so publishers can earn an enormous profit margin.
Right now, academic publishing is controlled by five publishers (the “Big Five”) – a monopoly that makes it very difficult for libraries to negotiate better deals. Only very large organizations or consortia, like the University of California, have been able to start pushing back against the system. It will likely take large shake-ups like this for any large changes to take hold, but it in the meantime there may be ways to situate ourselves for making better purchasing decisions.
At Duke, we often review our usage of specific journal titles as we prepare to make purchasing decisions. Usage data comes in a variety of forms, but the most popular are counts of Duke views and downloads that come directly from the publishers and the number of times Duke authors publish in or cite a particular journal. There are many other kinds of data that might be of interest, however, including Duke participation on editorial boards, usage differences across disciplines, and even whether or not the journal is fully open access. Blending various data sources and optimizing the search decisions for a given budget cycle can be overwhelming.
Last fall, Duke University Libraries decided to propose a project for Duke’s Data+ summer program – a summer research experience in data science for undergraduate students. Our project, “Breaking the Bundle: Analyzing Duke’s Journal Subscriptions“, focuses on Duke’s subscriptions to journals published by Elsevier. The program is in its third week, and our team of two incredibly-sharp undergraduates has been hard at work building and blending our datasets. Our goal by the end of summer is to have a proof-of-concept dashboard that lets collection managers adjust the weights of various usage measures to generate an ideal collection of journals for a particular budget.
It is still very early in the process, but the students have been hard at work and have made great progress. We decided it would be best to develop the analysis software and dashboard using R, a statistical computing project with a rich history and many helpful development tools. In addition to publisher-provided views and downloads, the students have been able to use websites and APIs to collect data on journal open access status, editorial boards, numbers of publications, and numbers of citations. All Data+ teams present publicly on the projects twice during the summer, and we hope to schedule a third talk for a library audience before the end of the program on August 2.
We look forward to seeing what the summer will bring! While this project is just one small step, automating the collection and analysis of journal usage will position us well, both for responsible purchases and for a hopefully-changing publishing landscape.
Looking for something to keep you company on your Summer vacation? Why not direct your devices to a Duke Digital collections! Seriously! Here are a few of the compelling collections we debuted earlier this Spring, and we have have more coming in late June.
These maps and 2 volume report document Durham’s Hayti-Elizabeth st neighborhood infrastructure prior to the construction of the Durham Freeway, as well as the justifications for the redevelopment of the area. This is an excellent resource for folks studying Durham history and/or the urban renewal initiatives of the mid-20th century.
We launched 8 collections of photograph albums created by African American soldiers serving in the military across the world including Japan, Vietnam and Iowa. Together these albums help “document the complexity of the African American military experience” (Bennett Carpenter from his blog post, “War in Black and White: African American Soldiers’ Photograph Albums”).
This photograph album contains pictures taken by Sir Percy Moleworth Sykes during his travels in a mountainous region of Central Asia, now the Xinjiang Uyghur Autonomous Region of China, with his sister, Ella Sykes. According to the collection guide, the album’s “images are large, crisp, and rich with detail, offering views of a remote area and its culture during tensions in the decades following the Russo-Turkish War”.
Our work never stops, and we have several large projects in the works that are scheduled to launch by the end of June. They are the first batch of video recordings from the Memory Project. We are busy migrating the incredible photographs from the Sydney Gamble collection – into the digital repository. Finally there is one last batch of Radio Haiti recordings on the way.
Keeping in touch
We launch new digital collections just about every quarter, and have been investigating new ways to promote our collections as part of an assessment project. We are thinking of starting a newsletter – would you subscribe? What other ways would you like to keep in touch with Duke Digital Collections? Post a comment or contact me directly.
Hello! This is my first blog as the new Digital Production Service Manager, and I’d like to take this opportunity to take you, the reader, through my journey of discovering the treasures that the Duke Digital Collections program offers. To personalize this task, I explored the materials related to my family’s journey to the United States. First, I should contextualize. After migrating from south China in the mid-1800s, my family fled Vietnam in the late 1970s and we left with the bare necessities – mainly food, clothes, and essential documents. All I have now are a few family pictures from that era and vividly told stories from my parents to help me connect the dots of my family’s history.
When I started delving into Duke’s Digital Collections, it was heartening to find materials of China, Vietnam, and even anti-war materials in the U.S. The following are some materials and collections that I’d like to highlight.
The Sidney D. Gamble Photographs offer over 5,000 photographs of China in the early 20th century. Images of everyday life in China and landscapes are available in this collection.The above image from the Gamble collection, is that of a junk, or houseboat, photographed in the early 1900s. When my family fled Vietnam, fifty people crammed into a similar vessel and sailed in the dead of night along the Gulf of Tonkin. My parents spoke of how they were guided by the moonlight and how fearful they were of the junk catching fire from cooking rice.
The African American Soldier’s Vietnam War photograph album collection offers these gorgeous images of Vietnam. This is the country that was home for multiple generations for my family, and up until the war, it was a good life. I am astounded and grateful that these postcards were collected by an American soldier in the middle of war. Considering that I grew up in Los Angeles, California, I have no sense of the world that my parents inhabited, and these images help me appreciate their stories even more. On the other side of the planet, there were efforts to stop the war and it was intriguing to see a variety of digital collections depicting these perspectives through art and documentary photography. The image below is that of a poster from the Italian Cultural Posters collection depicting Uncle Sam and the Viet Cong.
In addition to capturing street scenes in London, the Ronald Reis Collection, includes images of Vietnam during the war and anti-war effort in the United States. The image below is that of a demonstration in Bryant Park in New York City. I recognize that the conflict was fought on multiple fronts and am grateful for these demonstrations, as they ultimately led to the end of the war.Lastly, the James Karales Photos collection depicts Vietnam during the war. The image below, titled “Soldiers leaving on helicopter” is one that reminds me of my uncle who left with the American soldiers and started a new life in the United States. In 1980, thanks to the Family Reunification Act, the aid of the American Red Cross, and my uncle’s sponsorship, we started a new chapter in America.
Perhaps this is typical of the immigrant experience, but it still is important to put into words. Not every community has the resources and the privilege to be remembered, and where there are materials to help piece those stories together, they are absolutely valued and appreciated. Thank you, Duke University Libraries, for making these materials available.
In anticipation of next Tuesday’s midterm elections, here is a photo gallery of voting-related images from Duke Digital Collections. Click on a photo to view more images from our collections dealing with political movements, voting rights, propaganda, activism, and more!
If you haven’t already taken advantage of early voting, we at Bitstreams encourage you to exercise your right on November 6!
Notes from the Duke University Libraries Digital Projects Team