Category Archives: Collections

Respectfully Yours: A Deep Dive into Digitizing the Booker T. Washington Collection

Post authored by Jen Jordan, Digital Collections Intern.

Hello, readers. This marks my third, and final blog as the Digital Collections intern, a position that I began in June of last year.* Over the course of this internship I have been fortunate to gain experience in nearly every step of the digitization and digital collections processes. One of the things I’ve come to appreciate most about the different workflows I’ve learned about is how well they accommodate the variety of collection materials that pass through. This means that when unique cases arise, there is space to consider them. I’d like to describe one such case, involving a pretty remarkable collection. 

Cheyne, C.E. “Booker T. Washington sitting and holding books,” 1903. 2 photographs on 1 mount : gelatin silver print ; sheets 14 x 10 cm. In Washington, D.C., Library of Congress Prints and Photographs Division.

In early October I arrived to work in the Digital Production Center (DPC) and was excited to see the Booker T. Washington correspondence, 1903-1916, 1933 and undated was next up in the queue for digitization. The collection is small, containing mostly letters exchanged between Washington, W. E. B. DuBois, and a host of other prominent leaders in the Black community during the early 1900s. A 2003 article published in Duke Magazine shortly after the Washington collection was donated to the John Hope Franklin Research Center provides a summary of the collection and the events it covers. 

Arranged chronologically, the papers were stacked neatly in a small box, each letter sealed in a protective sleeve, presumably after undergoing extensive conservation treatments to remediate water and mildew damage. As I scanned the pages, I made a note to learn more about the relationship between Washington and DuBois, as well as the events the collection is centered around—the Carnegie Hall Conference and the formation of the short-lived Committee of Twelve for the Advancement of the Interests of the Negro Race. When I did follow up, I was surprised to find that remarkably little has been written about either.

As I’ve mentioned before, there is little time to actually look at materials when we scan them, but the process can reveal broad themes and tone. Many of the names in the letters were unfamiliar to me, but I observed extensive discussion between DuBois and Washington regarding who would be invited to the conference and included in the Committee of Twelve. I later learned that this collection documents what would be the final attempt at collaboration between DuBois and Washington.

Washington to Browne, 21 July 1904, South Weymouth, Massachusetts

Once scanned, the digital surrogates pass through several stages in the DPC before they are prepared for ingest into the Duke Digital Repository (DDR); you can read a comprehensive overview of the DPC digitization workflow here. Fulfilling patron requests is top priority, so after patrons receive the requested materials, it might be some time before the files are submitted for ingest to the DDR. Because of this, I was fortunate to be on the receiving end of the BTW collection in late January. By then I was gaining experience in the actual creation of digital collections—basically everything that happens with the files once the DPC signals that they are ready to move into long term storage. 

There are a few different ways that new digital collections are created. Thus far, most of my experience has been with the files produced through patron requests handled by the DPC. These tend to be smaller in size and have a simple file structure. The files are migrated into the DDR, into either a new or existing collection, after which file counts are checked, and identifiers assigned. The collection is then reviewed by one of a few different folks with RL Technical Services. Noah Huffman conducted the review in this case, after which he asked if we might consider itemizing the collection, given the letter-level descriptive metadata available in the collection guide. 

I’d like to pause for a moment to discuss the tricky nature of “itemness,” and how the meaning can shift between RL and DCCS. If you reference the collection guide linked in the second paragraph, you will see that the BTW collection received item-level description during processing—with each letter constituting an item in the collection. The physical arrangement of the papers does not reflect the itemized intellectual arrangement, as the letters are grouped together in the box they are housed in. When fulfilling patron reproduction requests, itemness is generally dictated by physical arrangement, in what is called the folder-level model; materials housed together are treated as a single unit. So in this case, because the letters were grouped together inside of the box, the box was treated as the folder, or item. If, however, each letter in the box was housed within its own folder, then each folder would be considered an item. To be clear, the papers were housed according to best practices; my intent is simply to describe how the processes between the two departments sometimes diverge.  

Processing archival collections is labor intensive, so it’s increasingly uncommon to see item-level description. Collections can sit unprocessed in “backlog” for many years, and though the depth of that backlog varies by institution, even well-resourced archives confront the problem of backlog. Enter: More Product, Less Process (MPLP), introduced by Mark Greene and Dennis Meissner in a 2005 article as a means to address the growing problem. They called on archivists to prioritize access over meticulous arrangement and description.  

The spirit of folder-level digitization is quite similar to MPLP, as it enables the DPC to provide access to a broader selection of collection materials digitized through patron requests, and it also simplifies the process of putting the materials online for public access. Most of the time, the DPC’s approach to itemness aligns closely with the level of description given during processing of the collection, but the inevitable variance found between archival collections requires a degree of flexibility from those working to provide access to them. Numerous examples of digital collections that received item-level description can be found in the DDR, but those are generally tied to planned efforts to digitize specific collections. 

Because the BTW collection was digitized as an item, the digital files were grouped together in a single folder, which translated to a single landing page in the DDR’s public user interface. Itemizing the collection would give each item/letter its own landing page, with the potential to add unique metadata. Similarly, when users navigate the RL collection guide, embedded digital surrogates appear for each item. A moment ago I described the utility of More Product Less Process. There are times, however, when it seems right to do more. Given the research value of this collection, as well as its relatively small size, the decision to proceed with itemization was unanimous. 

Itemizing the collection was fairly straightforward. Noah shared a spreadsheet with metadata from the collection guide. There were 108 items, with each item’s title containing the sender and recipient of a correspondence, as well as the location and date sent. Given the collection’s chronological physical arrangement, it was fairly simple to work through the files and assign them to new folders. Once that was finished, I selected additional descriptive metadata terms to add to the spreadsheet, in accordance with the DDR Metadata Application Profile. Because there was a known sender and recipient for almost every letter, my goal was to identify any additional name authority records not included in the collection guide. This would provide an additional access point by which to navigate the collection. It would also help me to identify death dates for the creators, which determines copyright status. I think the added time and effort was well worth it.

This isn’t the space for analysis, but I do hope you’re inspired to spend some time with this fascinating collection. Primary source materials offer an important path to understanding history, and this particular collection captures the planning and aftermath of an event that hasn’t received much analysis. There is more coverage of what came after; Washington and DuBois parted ways, after which DuBois became a founding member of the Niagara Movement. Though also short lived, it is considered a precursor to the NAACP, which many members of the Niagara Movement would go on to join. A significant portion of W. E. B. DuBois’s correspondence has been digitized and made available to view through UMass Amherst. It contains many additional letters concerning the Carnegie Conference and Committee of Twelve, offering additional context and perspective, particularly in certain correspondence that were surely not intended for Washington’s eyes. What I found most fascinating, though, was the evidence of less public (and less adversarial) collaboration between the two men. 

The additional review and research required by the itemization and metadata creation was such a fascinating and valuable experience. This is true on a professional level as it offered the opportunity to do something new, but I also felt moved to try to understand more about the cast of characters who appear in this important collection. That endeavor extended far beyond the hours of my internship, and I found myself wondering if this was what the obsessive pursuit of a historian’s work is like. In any case, I am grateful to have learned more, and also reminded that there is so much more work to do.

Click here to view the Booker T. Washington correspondence in the Duke Digital Repository.

*Indeed, this marks my final post in this role, as my internship concludes at the end of April, after which I will move on to a permanent position. Happily, I won’t be going far, as I’ve been selected to remain with DCCS as one of the next Repository Services Analysts!    

Sources

Cheyne, C.E. “Booker T. Washington sitting and holding books,” 1903. 2 photographs on 1 mount : gelatin silver print ; sheets 14 x 10 cm. In Washington, D.C., Library of Congress Prints and Photographs Division. Accessed April 5, 2022. https://www.loc.gov/pictures/item/2004672766/

 

Wars of Aliens, Men, and Women: or, Some Things we Digitized in the DPC this Year

Post authored by Jen Jordan, Digital Collections Intern. 

As another strange year nears its end, I’m going out on a limb to assume that I’m not the only one around here challenged by a lack of focus. With that in mind, I’m going to keep things relatively light (or relatively unfocused) and take you readers on a short tour of items that have passed through the Digital Production Center (DPC) this year. 

Shortly before the arrival of COVID-19, the DPC implemented a folder-level model for digitization. This model was not developed in anticipation of a life-altering pandemic, but it was well-suited to meet the needs of researchers who, for a time, were unable to visit the Rubenstein Library to view materials in person. You can read about the implementation of folder-level digitization and its broader impact here. To summarize, before spring of 2020 it was standard practice to fill patron requests by imaging only the item needed (e.g. – a single page within a folder). Now, the default practice is to digitize the entire folder of materials. This has produced a variety of positive outcomes for stakeholders in the Duke University Libraries and broader research community, but for the purpose of this blog, I’d like to describe my experience interacting with materials in this way.

Digitization is time consuming, so the objective is to move as quickly as possible while maintaining a high level of accuracy. There isn’t much time for meaningful engagement with collection items, but context reveals itself in bits and pieces. Themes rise to the surface when working with large folders of material on a single topic, and sometimes the image on the page demands to be noticed. 

Even while working quickly, one would be hard-pressed to overlook this Vietnam-era anti-war message. One might imagine that was by design. From the Student Activism Reference collection: https://repository.duke.edu/dc/uastuactrc.

On more than one occasion I’ve found myself thinking about the similarities between scanning and browsing a social media app like Instagram. Stick with me here! Broadly speaking, both offer an endless stream of visual stimuli with little opportunity for meaningful engagement in the moment. Social media, when used strategically, can be world-expanding. Work in the DPC has been similarly world-expanding, but instead of an algorithm curating my experience, the information that I encounter on any given day is curated by patron requests for digitization. Also similar to social media is the range of internal responses triggered over the course of a work day, and sometimes in the span of a single minute. Amusement, joy, shock, sorrow—it all comes up.

I started keeping notes on collection materials and topics to revisit on my own time. Sometimes I was motivated by a stray fascination with the subject matter. Other times I encountered collections relating to prominent historical figures or events that I realized I should probably know a bit more about.

 

Image from the WPSU Scrapbook.

First wave feminism was one such topic that revealed itself. It was a movement I knew little about, but the DPC has digitized numerous items relating to women’s suffrage and other feminist issues at the turn of the 20th century. I was particularly intrigued by the radical leanings of the UK’s Women’s Social and Political Union (WSPU), organized by Emmeline Pankhurst to fight for the right to vote. When I started looking at newspaper clippings pasted into a scrapbook documenting WSPU activities, I was initially distracted by the amusing choice of words (“Coronation chair damaged by wild women’s bomb”). Curious to learn more, I went home and read about the WSPU. The following excerpt is from a speech by Pankhurst in which she provides justification for the militant tactics employed by the WSPU:

I want to say here and now that the only justification for violence, the only justification for damage to property, the only justification for risk to the comfort of other human beings is the fact that you have tried all other available means and have failed to secure justice. I tell you that in Great Britain there is no other way…

Pankhurst argued that men had to take the right to vote through war, so why shouldn’t women also resort to violence and destruction? And so they did.

As Rubenstein Library is home to the Sallie Bingham Center, it’s unsurprising that the DPC digitizes a fair amount of material on women’s issues. To share a few more examples, I appreciate the juxtaposition of the following two images, both of which I find funny, and yet sad.

Source collection: Young woman’s scrapbook, 1900-1905 and n.d.

The advertisement to the right is pasted inside a young woman’s scrapbook dated 1900—1905. It contains information on topics such as etiquette, how to manage a household, and how to be a good wife. Are we to gather that proper shade cloth is necessary to keep a man happy?

In contrast, the image below and to the left is from the book L’amour libre by French feminist, Madeleine Vernet, describes prostitution and marriage as the same kind of prison, with “free love” as the only answer. Some might call that a hyperbolic comparison, but after perusing the young woman’s scrapbook, I’m not so sure. I’m just thankful to have been born a woman near the end of the 20th century and not the start of it.

From the book L’amour libre by Madeline Vernet

This may be difficult to believe, but I didn’t set out to write a blog so focused on struggle. The reality, however, is that our special collections are full of struggle. That’s not all there is, of course, but I’m glad this material is preserved. It holds many lessons, some of which we still have yet to learn. 

I think we can all agree that 2021 was, well, a challenging year. I’d be remiss not to close with a common foe we might all rally around. As we move into 2022 and beyond, venturing ever deeper into space, we may encounter this enemy sooner than we imagined…

Image from an illustrated 1906 French translation of H.G. Wells’s ‘War of the Worlds’.

Sources:

Pankhurst, Emmeline. Why We Are Militant: A Speech Delivered by Mrs. Pankhurst in New York, October 21, 1913. London: Women’s Press, 1914. Print.

“‘Prayers for Prisoners’ and church protests.” Historic England, n.d., https://historicengland.org.uk/research/inclusive-heritage/womens-history/suffrage/church-protests/ 

 

Good News from the DPC: Digitization of Behind the Veil Tapes is Underway

This post was written by Jen Jordan, a graduate student at Simmons University studying Library Science with a concentration in Archives Management. She is the Digital Collections intern with the Digital Collections and Curation Services Department.  Jen will complete her masters degree in December 2021. 

The Digital Production Center (DPC) is thrilled to announce that work is underway on a 3-year long National Endowment for the Humanities (NEH) grant-funded project to digitize the entirety of Behind the Veil: Documenting African-American Life in the Jim Crow South, an oral history project that produced 1,260 interviews spanning more than 1,800 audio cassette tapes. Accompanying the 2,000 plus hours of audio is a sizable collection of visual materials (e.g.- photographic prints and slides) that form a connection with the recorded voices.

We are here to summarize the logistical details relating to the digitization of this incredible collection. To learn more about its historical significance and the grant that is funding this project, titled “Documenting African American Life in the Jim Crow South: Digital Access to the Behind the Veil Project Archive,” please take some time to read the July announcement written by John Gartrell, Director of the John Hope Franklin Research Center and Principal Investigator for this project. Co-Principal Investigator of this grant is Giao Luong Baker, Digital Production Services Manager.

Digitizing Behind the Veil (BTV) will require, in part, the services of outside vendors to handle the audio digitization and subsequent captioning of the recordings. While the DPC regularly digitizes audio recordings, we are not equipped to do so at this scale (while balancing other existing priorities). The folks at Rubenstein Library have already been hard at work double checking the inventory to ensure that each cassette tape and case are labeled with identifiers. The DPC then received the tapes, filling 48 archival boxes, along with a digitization guide (i.e. – an Excel spreadsheet) containing detailed metadata for each tape in the collection. Upon receiving the tapes, DPC staff set to boxing them for shipment to the vendor. As of this writing, the boxes are snugly wrapped on a pallet in Perkins Shipping & Receiving, where they will soon begin their journey to a digital format.

The wait has begun! In eight to twelve weeks we anticipate receiving the digital files, at which point we will perform quality control (QC) on each one before sending them off for captioning. As the captions are returned, we will run through a second round of QC. From there, the files will be ingested into the Duke Digital Repository, at which point our job is complete. Of course, we still have the visual materials to contend with, but we’ll save that for another blog! 

As we creep closer to the two-year mark of the COVID-19 pandemic and the varying degrees of restrictions that have come with it, the DPC will continue to focus on fulfilling patron reproduction requests, which have comprised the bulk of our work for some time now. We are proud to support researchers by facilitating digital access to materials, and we are equally excited to have begun work on a project of the scale and cultural impact that is Behind the Veil. When finished, this collection will be accessible for all to learn from and meditate on—and that’s what it’s all about. 

 

2020 Highlights from Digital Collections

Welcome to the 2020 digital collections round up!

In spite of the dumpster fire of 2020, Duke Digital Collections had a productive and action packed year (maybe too action packed at times). 

Per usual we launched new and added content to existing digital collections (full list below). We are also wrapping up our mega-migration from our old digital collections system (Tripod2) to the Duke Digital Repository! This migration has been in process for 5 years, yes 5 years. We plan to celebrate this exciting milestone more in January so stay tuned. 

A classroom and auditorium blueprint, digitized for a patron and launched this month.

The Digital Production Center, in collaboration with the Rubenstein Library, shifted to a new folder level workflow for patron and instruction requests. This workflow was introduced just in time for the pandemic and the resulting unprecedented number of digitization requests.  As a result of the demand for digital images, all project work has been put aside and the DPC is focusing on patron and instruction requests only. Since late June, the DPC has produced over 40,000 images!  

Another digital collections highlight from 2020 is the development of new features for our preservation and access interface, the Duke Digital Repository.  We have wasted no time using these new features especially “metadata only”  and the DDR to CONTENTdm connection

Looking ahead to 2021, our priorities will be the folder level digitization workflow for researcher and instruction requests. The DPC received 200+ requests since June, and we need to get all those digitized folders moved into the repository. We are also experimenting with preserving scans created outside of the DPC. For example Rubenstein Library staff created a huge number of access copies using reading room scanners, and we would like to make them available to others.  Lastly, we have a few bigger digital collections to ingest and launch as well. 

Thanks to everyone associated with Digital Collections for their incredible work this year!!  Whew, it has been…a year. 

One of our newest digital collections features postcards from Greece: Salonica / Selanik / Thessaloniki
One of the Radio Haiti photographs launched recently.

Laundry list of 2020 Digital Collections

New Collections

Digital Collections Additions

Migrated Collections

Access for One, Access for All: DPC’s Approach towards Folder Level Digitization

Earlier this year and prior to the pandemic, Digital Production Center (DPC) staff piloted an alternative approach to digitize patron requests with the Rubenstein Library’s Research Services (RLRS) team. The previous approach was focused on digitizing specific items that instruction librarians and patrons requested, and these items were delivered directly to that person. The alternative strategy, the Folder Level digitization approach, involves digitizing the contents of the entire folder that the item is contained in, ingesting these materials to the Duke Digital Repository (to enable Duke Library staff to retrieve these items), and when possible, publishing these materials so that they are available to anyone with internet access. This soft launch prepared us for what is now an all-hands-on-deck-but-in-a-socially-distant-manner digitization workflow.

Giao Luong Baker assessing folders in the DPC.

Since returning to campus for onsite digitization in late June, the DPC’s primary focus has been to perfect and ramp up this new workflow. It is important to note that the term “folder” in this case is more of a concept and that its contents and their conditions vary widely. Some folders may have 2 pages, other folders have over 300 pages. Some folders consists of pamphlets, notebooks, maps, papyri, and bound items. All this to say that a “folder” is a relatively loose term.

Like many initiatives at Duke Libraries, Folder Level Digitization is not just a DPC operation, it is a collaborative effort. This effort includes RLRS working with instructors and patrons to identify and retrieve the materials. RLRS also works with Rubenstein Library Technical Services (RLTS) to create starter digitization guides, which are the building blocks for our digitization guide. Lastly, RLRS vets the materials and determines their level of access. When necessary, Duke Library’s Conservation team steps in to prepare materials for digitization. After the materials are digitized, ingest and metadata work by the Digital Collections and Curation Services as well as the RLTS teams ensure that the materials are preserved and available in our systems.

Kristin Phelps captures a color target.

Doing this work in the midst of a pandemic requires that DPC work closely with the Rubenstein Library Access Services Reproduction Team (a section of RLRS) to track our workflow using a Google Doc. We track the point where the materials are identified by RLRS, through multiple quarantine periods, scanning, post processing, file delivery, to ingest. Also, DPC staff are digitizing in a manner that is consistent with COVID-19 guidelines. Materials are quarantined before and after they arrive at the DPC, machines and workspaces are cleaned before and after use, capture is done in separate rooms, and quality control is done off site with specialized calibrated monitors.

Since we started Folder Level digitization, the DPC has received close to 200 unique Instruction and Patron requests from RLRS. As of the publication of this post, 207 individual folders (an individual request may contain several folders) have been digitized. In total, we’ve scanned and quality controlled over 26,000 images since we returned to campus!

By digitizing entire folders, we hope this will allow for increased access to the materials without risking damage through their physical handling. So far we anticipate that 80 new digital collections will be ingested to the Duke Digital Repository. This number will only grow as we receive more requests. Folder Level Digitization is an exciting approach towards digital collection development, as it is directly responsive to instruction and researcher needs. With this approach, it is access for one, access for all!

Labor in the Time of Coronavirus

The Coronavirus pandemic has me thinking about labor–as a concept, a social process, a political constituency, and the driving force of our economy–in a way that I haven’t in my lifetime. It’s become alarmingly clear (as if it wasn’t before) that we all need food, supplies, and services to survive past next week, and that there are real human beings out there working to produce and deliver these things. No amount of entrepreneurship, innovation, or financial sleight of hand will help us through the coming months if people are not working to provide the basic requirements for life as we know it.

This blog post draws from  images in our digitized library collections to pay tribute to all of the essential workers who are keeping us afloat during these challenging times. As I browsed these photographs and mused on our current situation, a few important and oft-overlooked questions came to mind.

Who grows our food? Where does it come from and how is it processed? How does it get to us?

Bell pepper pickers, 1984 June. Paul Kwilecki Photographs.
Prisoners at the county farm killing hogs, 1983 Mar. Paul Kwilecki Photographs.
Worker atop rail car loading corn from storage tanks, 1991 Sept. Paul Kwilecki Photographs.
Manuel Molina, mushroom farm worker, Kennett Square, PA 1981. Frank Espada Photographs.

What kind of physical environment do we work in and how does that affect us?

Maids in a room of the Stephen Decatur Hotel shortly before it was torn down, 1970. Paul Kwilecki Photographs
Worker in pit preparing to weld. Southeastern Minerals Co. Bainbridge, 1991 Aug. Paul Kwilecki Photographs
Loggers in the woods near Attapulgus, 1978 Feb. Paul Kwilecki Photographs.

How do we interact with machines and technology in our work? Can our labor be automated or performed remotely?

Worker signaling for more logs. Elberta Crate and Box. Co. Bainbridge, 1991 Sept. Paul Kwilecki Photographs.
Worker, Williamson-Dickie plant. Bainbridge, 1991 Sept. Paul Kwilecki Photographs.
Machine operator watching computer controlled lathe, 1991 July. Paul Kwilecki Photographs.
Worker at her machine. Elberta Crate and Box Co. Bainbridge, 1981 Nov. Paul Kwilecki Photographs.

What equipment and clothing do we need to work safely and productively?

Cotton gin worker wearing safety glass and ear plugs for noise protection. Decatur Gin Co., 1991 Sept. Paul Kwilecki Photographs.
Worker operating bagging machine. Flint River Mills. Bainbridge, 1991 July. Paul Kwilecki Photographs.
Worker at State Dock, 1992 July. Paul Kwilecki Photographs.

Are we paid fairly for our work? How do relative wages for different types of work reflect what is valued in our society?

No known title. William Gedney Photographs.

How we think about and respond to these questions will inform how we navigate the aftermath of this ongoing crisis and whether or not we thrive into the future. As we celebrate International Workers’ Day on May 1 and beyond, I hope everyone will take some time to think about what labor means to them and to our society as a whole.

Digital Collections 2019

‘Tis the time of year for top 10 lists. Here at Duke Digital Collections HQ, we cannot just pick 10, because all our digital collections are tops!  What follows is a list of all the digital collections we have launched for public access this calendar year.

Our newest collections include a range of formats and subject areas from 19th Century manuscripts to African American soldiers photograph albums to Duke Mens Basketball posters to our first Multispectral Images of papyrus to be ingested into the repository.  We also added new content to 4 existing digital collections.  Lastly, our platform migration is still ongoing, but we made some incredible progress this year as you will see below.  Our goal is to finish the migration by the end of 2020.

New Digital Collections

Additions to Existing Collections

Migrated Collections into the Duke Digital Repository

 

 

 

 

ArcLight at the End of the Tunnel

Archival collection guides—also known as finding aids—are a critical part of the researcher experience when finding and accessing materials from the David M. Rubenstein Rare Book & Manuscript Library and the Duke University Archives. At present, we have guides for nearly 4,000 collections with upwards of one million components that have some level of description. Our collection guides site is visited by researchers about 400 times per day.

example finding aid
An example collection guide.

In 2020, we’ll be making significant changes to our systems supporting archival discovery and access. The main impetus for this shift is that our current platform has grown outdated and is no longer sustainable going forward.  We intend to replace our platform with ArcLight, open source software backed by a community of peer institutions.

Finding Aids at Duke: Innovations Past

At Duke, we’re no strangers to pushing the boundaries of archival discovery through advances in technology. Way back in the mid 1990s, Duke was among pioneers rendering SGML-encoded finding aids into HTML.  For most of the 90s and aughts we used a commercial platform, but we decided to develop our own homegrown finding aids front-end in 2007 (using the Apache Cocoon framework). We then replaced it in 2012 with another in-house platform built on the Django web framework.

Since going home-grown in 2007, we have been able to find some key opportunities to innovate within our platforms. Here are a few examples:

finding aid with digitized component
Example archival component with inline embedded digital object and sticky navigation.

So, Why Migrate Now?

Our current platform was pretty good for its time, but a lot has changed in eight years. The way we build web applications today is much different than it used to be. And beyond desiring a modern toolset,  there are major concerns going forward around size, search/indexing, and support.

Size

We have some enormous finding aids. And we have added more big ones over the years. This causes problems of scale, particularly with an interface like ours that renders each collection as a single web page with all of the text of its contents written in the markup. One of our finding aids contains over 21,000 components; all told it is 9MB of raw EAD transformed into 15MB of HTML.

JWT Competitive Ads finding aid
A large finding aid — 15MB of HTML in a single page.

No amount of caching or server wizardry can change the fact that this is simply too much data to be delivered and rendered in a single webpage, especially for researchers in lower-bandwidth conditions. We need a solution that divides the data for any given finding aid into smaller payloads.

Search

Google Custom Search does a pretty nice job of relevance ranking and highlighting where in a finding aid a term matches (after all, that’s Google’s bread-and-butter). However, when used to power search in an application like this, it has some serious limitations. It only returns a maximum of one hundred results per query. Google doesn’t index 100% of the text, especially for our larger finding aids. And some finding aids are just mysteriously omitted despite our best efforts optimizing our markup for SEO and providing a sitemap.

search results powered by Google
Search Results powered by Google Custom Search

We need search functionality where we have complete control of what gets indexed, when, and how. And we need assurance that the entirety of the materials described will be discoverable.

Support

This is a familiar story. Homegrown applications used for several years by organizations with a small number of developers and a large number of projects to support become difficult to sustain over time. We have only one developer remaining who can fix our finding aids platform when it breaks, or prevent it from breaking when the systems around it change. Many of the software components powering the system are at or nearing end-of-life and they can’t be easily upgraded.

Where to Go From Here?

It has been clear for awhile that we would soon need a new platform for finding aids, but not as clear what platform we should pursue. We had been eyeing the progress of two promising open source community-built solutions emerging from our peer institutions: the ArchivesSpace Public UI (PUI), and ArcLight.

Over 2018-19, my colleague Noah Huffman and I co-led a project to install pilot instances of the ASpace PUI and ArcLight, index all of our finding aids in them, and then evaluate the platforms for their suitability to meet Duke’s needs going forward. The project involved gathering feedback from Duke archivists, curators, research services staff, and our digital collections implementation team. We looked at six criteria: 1) features; 2) ease of migration/customization; 3) integration with other systems; 4) data cleanup considerations; 5) impact on existing workflows; 6) sustainability/maintenance.

comparison of PUI and arclight
Comparison of the ASpace PUI and ArcLight, out-of-the-box UI.

There’s a lot to like about both the ASpace PUI and ArcLight. Feature-wise, they’re fairly comparable. Both are backed by a community of talented, respected peers, and either would be a suitable foundation for a usable, accessible interface to archives. In the end, we recommended that Duke pursue ArcLight, in large part due to its similarity to so much of the other software in our IT portfolio.

ArcLight is an extension to Blacklight, which is the key software component powering our library catalog, our Digital Collections / Digital Repository, and our Hyrax-based Research Data Repository. Our developers and operations staff have accumulated considerable experience working together to build, customize, and maintain Blacklight applications.

ArcLight Community Work Cycle: Fall 2019

Duke is certainly not alone in our desire to replace an outdated, unsustainable homegrown finding aids platform, and intention to use ArcLight as a replacement.

This fall, with tremendous leadership from Stanford University Libraries, five universities collaborated on developing the ArcLight software further to address shared needs. Over a nine week work cycle from August to October, we had the good fortune of working alongside Stanford, Princeton, Michigan, and Indiana. The team addressed needs on several fronts, especially: usability, accessibility, indexing, context/navigation, and integrations.

Arclight community work cycle II
Duke joined Stanford, Princeton, Michigan, and Indiana for Arclight Community Work Cycle II in fall 2019.

Three Duke staff members participated: I was a member of the Development Team, Noah Huffman a member of the Product Owners Team, and Will Sexton on the Steering Group.

The work cycle is complete and you can try out the current state of the core  ArcLight demo application. It includes several finding aids from each of the participating partner institutions. Here are just a few highlights that have us excited about bringing ArcLight to Duke:

ArcLight UI screenshots
Search results can be grouped by collection. Faceted navigation helps pinpoint items of interest from deep within a finding aid.
Screenshots of Arclight UI
Components are individually discoverable and have their own pages. Integrations with online content viewers and request systems such as Aeon are possible.

Here’s a final demo video (37 min) that nicely summarizes the work completed in the fall 2019 work cycle.

Lighting the Way

National Forum on Archival Discovery and DeliveryWith some serious momentum from the fall ArcLight work cycle and plans taking shape to implement the software in 2020, the Duke Libraries intend to participate in the Stanford-led, IMLS grant-funded Lighting the Way project, a platform-agnostic National Forum on Archival Discovery and Delivery. Per the project website:

Lighting the Way is a year-long project led by Stanford University Libraries running from September 2019-August 2020 focused on convening a series of meetings focused on improving discovery and delivery for archives and special collections.

Coming in 2020: ArcLight Implementation at Duke

There’ll be much more share about this in the new year, but we are gearing up now for a 2020 ArcLight launch at Duke. As good as the platform is now out-of-the-box, we’ll have to do additional development to address some local needs, including:

  • Duke branding
  • An efficient preview/publication workflow
  • Digital object viewing / repository integration
  • Sitemap generation
  • Some data cleanup

Building these local customizations will be time well-spent. We’ll also look for more opportunities to collaborate with peers and contribute code back to the community. The future looks bright for Duke with ArcLight lighting the way.

Data Sciencing our Journal Subscriptions

The ongoing tensions between academic institutions and publishers have been escalating the last few months, but those tensions have existed for many years. The term “Big Deal” has been coined to describe a long-standing, industry-wide practice of journal bundling that forces libraries to subscribe to unwanted and unneeded publications rather than paying more for a limited number of individual subscriptions. This is a practice you see in other industries – for example, cable packages that provide hundreds of channels, even if you only want one or two specific channels.

What is especially problematic in higher education is that academics produce and review the content that gets published in the journals (for free), and then the universities have to pay the publishers a subscription fee to access the content. Imagine if YouTube required a subscription fee to watch any videos, including the ones you had posted. It’s a system that makes research harder to access and inhibits global scientific progress, all so publishers can earn an enormous profit margin.

Right now, academic publishing is controlled by five publishers (the “Big Five”) – a monopoly that makes it very difficult for libraries to negotiate better deals. Only very large organizations or consortia, like the University of California, have been able to start pushing back against the system. It will likely take large shake-ups like this for any large changes to take hold, but it in the meantime there may be ways to situate ourselves for making better purchasing decisions.

At Duke, we often review our usage of specific journal titles as we prepare to make purchasing decisions. Usage data comes in a variety of forms, but the most popular are counts of Duke views and downloads that come directly from the publishers and the number of times Duke authors publish in or cite a particular journal. There are many other kinds of data that might be of interest, however, including Duke participation on editorial boards, usage differences across disciplines, and even whether or not the journal is fully open access. Blending various data sources and optimizing the search decisions for a given budget cycle can be overwhelming.

Last fall, Duke University Libraries decided to propose a project for Duke’s Data+ summer program – a summer research experience in data science for undergraduate students. Our project, “Breaking the Bundle: Analyzing Duke’s Journal Subscriptions“, focuses on Duke’s subscriptions to journals published by Elsevier. The program is in its third week, and our team of two incredibly-sharp undergraduates has been hard at work building and blending our datasets. Our goal by the end of summer is to have a proof-of-concept dashboard that lets collection managers adjust the weights of various usage measures to generate an ideal collection of journals for a particular budget.

It is still very early in the process, but the students have been hard at work and have made great progress. We decided it would be best to develop the analysis software and dashboard using R, a statistical computing project with a rich history and many helpful development tools. In addition to publisher-provided views and downloads, the students have been able to use websites and APIs to collect data on journal open access status, editorial boards, numbers of publications, and numbers of citations. All Data+ teams present publicly on the projects twice during the summer, and we hope to schedule a third talk for a library audience before the end of the program on August 2.

Sample R code from the project
Just one of many files of R code generated for the project so far.

We look forward to seeing what the summer will bring! While this project is just one small step, automating the collection and analysis of journal usage will position us well, both for responsible purchases and for a hopefully-changing publishing landscape.

Just in time for Summer – New Digital Collections!

Looking for something to keep you company on your Summer vacation?  Why not direct your devices to a Duke Digital collections! Seriously! Here are a few of the compelling collections we debuted earlier this Spring, and we have have more coming in late June.

Hayti-Elizabeth Street renewal area

These maps and 2 volume report document Durham’s Hayti-Elizabeth st neighborhood infrastructure prior to the construction of the Durham Freeway, as well as the justifications for the redevelopment of the area.  This is an excellent resource for folks studying Durham history and/or the urban renewal initiatives of the mid-20th century. 

Map of the Hayti-Elizabeth Street renewal area
One map from “Hayti-Elizabeth Street renewal area : general neighborhood renewal plan, map 1”

African American Soldiers’ Photography albums

We launched 8 collections of photograph albums created by African American soldiers serving in the military across the world including Japan, Vietnam and Iowa.  Together these albums help “document the complexity of the African American military experience” (Bennett Carpenter from his blog post, “War in Black and White: African American Soldiers’ Photograph Albums”).  

One page from the African American soldier’s World War II photograph album of Munich, Germany

 

Sir Percy Moleworth Sykes Photograph Album

This photograph album contains pictures taken by Sir Percy Moleworth Sykes during his travels in a mountainous region of Central Asia, now the Xinjiang Uyghur Autonomous Region of China, with his sister, Ella Sykes.  According to the collection guide, the album’s “images are large, crisp, and rich with detail, offering views of a remote area and its culture during tensions in the decades following the Russo-Turkish War”.

A Sidenote

Both the Hayti-Elizabeth and soldiers’ albums collections were proposed in response to our 2017 call for digitization proposals related to diversity and inclusion.  Other collections in that batch include the Emma Goldman papers, Josephine Leary papers, and the ReImagining collection.  

Coming soon

Our work never stops, and we have several large projects in the works that are scheduled to launch by the end of June. They are the first batch of video recordings from the Memory Project. We are busy migrating the incredible photographs from the Sydney Gamble collection into the digital repository.  Finally there is one last batch of Radio Haiti recordings on the way.  

Advertisement for the American AirlinesKeeping in touch

We launch new digital collections just about every quarter, and have been investigating new ways to promote our collections as part of an assessment project.  We are thinking of starting a newsletter – would you subscribe? What other ways would you like to keep in touch with Duke Digital Collections? Post a comment or contact me directly