Tag Archives: behindtheveil

Two Years In: The Finish Line Approaches for Digitizing Behind the Veil

Behind the Veil Digitization intern Sarah Waugh and Digital Collections intern Kristina Zapfe’s efforts over the past year have focused on quality control of interviews transcribed by Rev.com. This post was authored by Sarah Waugh and Kristina Zapfe.

Introduction

The Digital Production Center (DPC) is proud to announce that we have reached a milestone in our work on Documenting African American Life in the Jim Crow South: Digital Access to the Behind the Veil Project Archive. We have completed digitization and are over halfway through our quality control of the audio transcripts! The project, funded by the National Endowment for the Humanities, will expand the Behind the Veil (BTV) digital collection, currently 410 audio files, to include the newly digitized copies of the original master recordings, photographic materials, and supplementary project files.

The collection derives from Behind the Veil: Documenting African-American Life in the Jim Crow South. This was an oral history project headed by Duke University’s Center for Documentary Studies from 1993 to 1995 and is currently housed in the David M. Rubenstein Rare Book and Manuscript Library and curated by the John Hope Franklin Research Center for African and African American History and Culture. The BTV collection documented and preserved the memory of African Americans who lived in the South from the 1890s to the 1950s, resulting in a culturally-significant and extensive multimedia collection. 

As interns, our work focused on ordering transcripts from Rev.com and performing quality control on transcripts for the digitized oral histories. July 2023 marked our arrival at the halfway point of completing the oral history transcript quality control process. At the time of writing, we’ve checked 1727 of 2876 files after a year of initial planning and hard work. With over 1,666 hours worth of audio files to complete, 3 interns and 7 student workers in the DPC contributed 849 combined hours to oral history transcript quality control so far. Because of their scope, transcription and quality control are the last pieces of the digitization puzzle before the collection moves on to be ingested and published in the Duke Digital Repository

We are approaching the home stretch with the deadline for transcript quality control coming in December 2023, and the collection scheduled to launch in 2024. With that goal approaching, here is what we’ve completed and what remains to be done.

Digitization Progress

A graphic showing the statistics of the Behind the Veil project alongside a tablet device of the Duke Libraries Behind the Veil webpage. Under Audio, the text reads: 1,577 tapes, 2.876 audio files, and 2,709 files transcribed. Under the word Admin Files, it reads: 27,737 image files, under the Prints and Negatives heading it reads, 1,294 image files. On the right side of the graphic, headings for Video, Project Records, and Photo Slides are here and under Video it reads 14 tapes, 14 video files, under Project Records, it reads 9,328 image files, and under Photo Slides, it reads 2,920 image files.

As the graphic above indicates, the BTV digitization project consists of many different media like audio, video, prints, negatives, slides, administrative and project related documents that tell a fuller story of this endeavor. With these formats digitized, we look forward to finishing quality control and preparing the files for handoff to members of the Digital Collections and Curation Services department for ingest, metadata application, and launch for public access in 2024. We plan to send all 2876 audio files to Rev.com service by the end of August and to perform quality control on all those transcripts by December 2023.

Developing the Transcription Quality Control Process

With 2876 files to check within 19 months, the cross-departmental BTV team developed a process to perform quality control as efficiently as possible without sacrificing accuracy, accessibility, and our commitment to our stakeholders. We made our decisions based on how we thought BTV interviewers and narrators would want their speech represented as text. Our choices in creating our quality control workflow began with Columbia University’s Oral History Transcription Style Guide and from that resource, we developed a workflow that made sense for our team and the project. 

Some voices were difficult to transcribe due to issues with the original recording, such as a microphone being placed too far away from a speaker, the interference of background noise, or mistakes with the tape. Since we did not have the resources to listen to entire interviews and check for every single mistake, we developed what we called the “spot-check” process of checking these interviews. Given the BTV project’s original ethos and the history of marginalized people in archives, the team decided to prioritize making sure race-related language met our standards across every single interview.

A few decisions on standards were quick and unanimous—such as not transcribing speech phonetically. With that, we avoided pitfalls from older oral histories of African Americans, like the WPA’s famous “Slave Narratives” project, that interviewed formerly-enslaved people, but often transcribed their words in non-standard phonetic spellings. Some narrators in the BTV project who may have been familiar with the WPA transcripts specifically requested the BTV project team not to use phonetic spelling. 

Other choices took more discussion: we agreed on capitalizing “Black” when describing race, but we had to decide whether to capitalize other racial terms, including “White” and antiquated designations like “Colored.” Ultimately, we decided to capitalize all racial terms (with the exception of slurs). The team did not want users to make distinctions between lower and uppercase terms if we did not choose to capitalize them all. Maintaining consistency with capitalization would provide clarity and align with BTV values of equality between all races.

Using a spot-check process where we use Rev’s find-and-replace feature to standardize our top priorities saved us time to improve the transcripts in other ways. For instance, we also try to find and correct proper nouns like street names or names of important people in our narrators’ communities, allowing users to make connections in their research. We corrected mistakes with phrases used mainly in the past or that are very specific to certain regions, such as calling a dance hall a “Piccolo joint” from an early jukebox brand name. We also listened to instances where the transcriptionist could not hear or understand a phrase and marked it as “indistinct,” so we can add in the dialogue later (assuming we are able to decipher what was said). 

While we developed these methods to increase the pace of our quality control process, one of the biggest improvements came from working with Rev. If we were able to attain more accurate transcripts, our quality control process would be more efficient. Luckily, Rev’s suite of services provided us this option without straying too far from our transcription budget.

Improving Accuracy with Southern Accents Specialists

When deciding on what would be the best speech-to-text option for our project’s needs, we elected to order Transcript Services from Rev, rather than their Caption Services. This decision hinged on the fact that the Transcript Services option is their only service that allows us to request Rev transcriptionists who specialize in Southern accents. Many people who were interviewed for Behind the Veil spoke with Southern accents that varied in strength and dialect. We found that the Southern accent expertise of the specialists had a significant impact on the accuracy of the transcripts we received from Rev. 

This improvement in transcript quality has made a substantial difference in the time we spend on quality control for each interview: on average, it only takes us about 48 seconds of work for every 60 seconds of audio we check. We appreciated Rev’s offering of Southern accent specialists enough that we chose that service, even though it meant that we had to then convert their text file format output to the WebVTT file format for enhanced accessibility in the Duke Digital Repository.   

Optimizing Accessibility with WebVTT File Format

The WebVTT file format provides visual tracking that coordinates the audio with the written transcript. This improvement in user experience and accessibility justified converting the interview transcripts to WebVTT format. Below is a visual of the WebVTT format in our existing BTV collection in the DDR. Click here to listen to the audio recording.

We have been collaborating with developer Sean Aery to convert transcript text files to WebVTT files so they will display properly in the Duke Digital Repository. He explained the conversion process that occurs after we hand off the transcripts in text file format.

“The .txt transcripts we received from the vendor are primarily formatted to be easy for people to read. However, they are structured well enough to be machine-readable as well. I created a script to batch-convert the files into standard WebVTT captions with long text cues. In WebVTT form, the caption files play nicely with our existing audiovisual features in the Duke Digital Repository, including an interactive transcript viewer, and PDF exports.”  Sean Aery, Digital Projects Developer, Duke University Libraries

Before conversion, we complete one more round of quality control using the spot-checking process. We have even referred to other components of the Behind the Veil collection (Administrative and Project Files Administrative Files) to cross-reference any alterations to metadata for accuracy.

Connecting the Local and Larger Community

Throughout the project, team members have been working on outreach. One big accomplishment by project PI John Gartrell and former BTV outreach intern Brianna McGruder was Behind the Veil at 30: Reflections on Chronicling African American Life in the Jim Crow South.” This 2-day virtual conference convened former BTV interviewers and current scholars of the BTV collection to discuss their work and the impact that this collection had on their research. 

We also recently presented at the Triangle Research Libraries Network annual meeting, where our presentation overlapped with some of what you’ve just read in this post. It was exciting to share our work publicly for the first time and answer questions from library staff across the region. We will also be presenting a poster about our BTV experience at the upcoming North Carolina Library Association conference in Winston-Salem in October.

A image of two people standing a podium with a screen behind them. Four people in the front row look out at them.
Sarah Waugh and Kristina Zapfe presenting at the 2023 TRLN Annual Conference.

As we’ve hoped to convey, this project heavily relies on collaboration from many library departments and external vendors, and there are more contributors than we can thoroughly include in this post. Behind the Veil is a large-scale and high-profile project that has impacted many people over its 30-year history, and this newest iteration of digital accessibility seeks to expand the reach of this collection. Two years on, we’ve built on the work of the many professionals who have come before us to create and develop Behind the Veil. We are honored to be part of this rewarding process. Look for more BTV stories when we cross the finish line in 2024. 

‘Tis the Season for New Beginnings

New Additions

Brief summaries of articles pulled from a future digitized issue published by The Chronicle, as part of the 1990s Duke Chronicle Digitization Project

The time has come for the temperature to drop, decadent smells to waft through the air, and eyes become tired and bloodshot. Yep, it’s exam week here at Duke! As students fill up every room, desk and floor within the libraries, the Digital Collections team is working diligently to process important projects.

One such project is the 1990s decade of The Duke Chronicle. By next week, we can look forward to the year 1991 being completely scanned. Although there are many steps involved before we can make this collection available to the public, it is nice to know that this momentous year is on its way to being accessible for all. While scanning several issues today, I noticed the last issue for the fall semester of 1991. It was the Exam Break Issue, and I was interested in the type of reading content published 26 years ago. What were the students of Duke browsing through before they scurried back home on December 16, 1991, you may ask…

  • There were several stories about students’ worst nightmares coming true, including one Physical Therapy graduate student who lost her research to a Greyhound bus, and an undergraduate dumpster diving to find an accidentally thrown away notebook, which encompassed his final paper.
  • A junior lamented whether it was worth it to drive 12 hours to his home in Florida, or take a plane after a previous debacle in the air; he drove home with no regrets.
  • In a satirical column, advice was given on how to survive exams. Two excellent gems suggested using an air horn instead of screaming and staking out a study carrel, in order to sell it to the highest bidder.

This is merely a sprinkling of hilarious yet simultaneously horrifying anecdotes from that time-period.

Updates to Existing Collections

Digital collections, originally located on the old Digital Collections website, now have new pages on the Repository website with a direct link to the content on the old website.

In addition to The Chronicle, Emma Goldman Papers, and other new projects, there is a continued push to make already digitized collections accessible on the Repository platform. Collections like Behind the Veil, Duke Papyrus Archive, and AdViews were originally placed on our old Digital Collections platform. However, the need to provide access is just as relevant today as when they were originally digitized.

As amazing as our current collections in the Repository are, we have some treasures from the past that must be brought forward. Accordingly, many of these older digital collections now possess new records in the Repository! As of now, the new Repository pages will not have the collections’ content, but they will provide a link to enable direct access.

New Page:

Vs.

Old Page:

The new pages will facilitate exposure to new researchers, while permitting previous researchers to use the same features previously allowed on the old platform. There are brief descriptions, direct links to the collections, and access to any applicable finding aids on the Repository landing pages.

Now that the semester has wound down to a semi-quiet lull of fattening foods, awkward but friendly functions, and mental recuperation, I urge everyone to take a moment to not just look at what was done, but all the good work you are planning to do.

Based on what I’ve observed so far, I’m looking forward to the new projects that Digital Collections will be bringing to the table for the Duke community next year.

 

 

References

Kueber, G. (1991). Beginning of exams signals end of a Monday, Monday era. The Duke Chronicle, p. 26.

Robbins, M. (1991). Driving or crying: is air travel during the holidays worth it? The Duke Chronicle, p. 13.

The Duke Chronicle. (1991). The Ultimate Academic Nightmares – and you thought you were going to have a bad week! pp. 4-5.

Digitization Details: Re-Formatting Audio Cassettes

Cassette
A real live audio cassette!

The 310 oral histories that comprise the newly published additions to the Behind the Veil digital collection were originally recorded in the 1990’s to the now (nearly) obsolete compact cassette format—what were commonly called “tapes”.  The beauty of the compact cassette format was that it was small and portable (especially compared to the earlier reel-to-reel tape format), relatively durable due to its hard plastic outer shell, and most of all—could easily be recorded to at home by non-professional users.  This made it perfect for oral historians who needed to be able to record interviews in the field at low cost with minimal hassle.  

Unfortunately, the compact cassette format hasn’t aged particularly well.  Due to cheap materials, poor storage conditions, and normal mechanical wear and tear, many of these tapes are already borderline unplayable a short 40 years after their first introduction.  This introduces a number of challenges to our process of converting the audio information on the tapes into a digital file format that can easily be accessed online by patrons.  I won’t exhaustively detail our digitization process here, but only touch on a few issues and how we dealt with them.

Inspecting a tape
Our fearless audio digitization expert carefully inspects a tapes.

Physical degradation and damage to tapes: We visually inspected each tape prior to digitization.  Any that were visibly broken or had twisted or jammed tape were rehoused in new outer shells.  At least with this collection, rehousing allowed us to successfully play back all of the tapes.

Poor quality of original recordings: We also did a brief audio inspection of each tape before digitization.  This allowed us to identify issues with audio quality.  We found that the interviews were done in a wide variety of locations, often with background traffic, television, appliance and conversation noise bleeding into the recording.  There was no easy fix for this, as these issues are inherent in the recording.  Our solution was to provide the best possible playback on a high-quality cassette deck, a direct and balanced signal path, and high quality analog-to-digital conversion at the preservation standard of 24 bits, 96.1 kHz.  This ensured that the digital copy faithfully reproduced the audio material on the cassette, warts and all.

Other errors in original recordings: There were some issues in the original recordings that we opted to fix via digital editing or processing in our files for patron use (while retaining the unaltered preservation files).

  • In cases where there was a significant gap of silence in the middle of a tape, we edited out the silence for continuity’s sake.
  • In cases where there were loud and abrasive clicks, pops, or microphone noise at the beginning or end of a tape side, we edited out these noises.
  • Several tapes were apparently recorded at the wrong speed, resulting in a “chipmunk voice” effect.  I used a Speed/Pitch function in our audio capture software to electronically slow these files down so that they play back intelligibly and as intended.
deck
Audio digitization deck

Another challenge, common to all time-based analog media, is the cassette tape’s “real-time” nature.  Unlike a digital file that can be copied nearly instantaneously, a 90-minute cassette tape actually takes 90 minutes to make a digital copy.  Currently we run two cassette decks simultaneously, allowing us to double our throughput.

As you can see, audio cassette digitization is more than just a matter of pressing “play”!

–post written by Zeke Graves

Still want to learn more about the Behind the Veil collection of oral histories?  Check out coverage of the collection over at Rubenstein Library blog, The Devil’s Tale.

 

Announcing 310 Newly Digitized Behind the Veil Interviews and a New Blog!

Duke Digital Collections is pleased to announce that we have published 310 newly digitized interviews in the Behind the Veil: Documenting African-American Life in the Jim Crow South digital collection!  The new interviews are specifically focussed on North Carolina residents.  Although several regions are represented, many interviews focus on the Charlotte, Durham and Enfield regions of the state.

Visit the Behind the Veil Digital Collection

The North Carolina recordings were all digitized as part of the Triangle Research Libraries Network’s project “Content, Context and Capacity: A Collaborative Large-Scale Digitization Project on the Long Civil Rights Movement in North Carolina.”  Publishing these recordings concludes this multi-year endeavor, which digitized collections from UNC Chapel Hill, NC Central University and NC State’s special collections holdings as well as Duke.

Prior to publishing the new NC recordings the Behind the Veil digital collection, contained 100 recordings.  Although we were able to build on the existing collection without developing new technology we essentially QUADRUPLED the number of interviews available online!!    The digital collection was created by digitizing the original audio cassettes and scanning any existing transcripts.   The entire collection (over 1,200 interviews on audio cassettes) is available for research at the John Hope Franklin Center for African and African American History and Culture in the David M. Rubenstein Rare Book & Manuscript Library.  Visit the Devil’s Tale (the David M. Rubenstein Rare Book & Manuscript Library blog) for more details.

Speaking of blogs, you are looking at the brand new blog of Duke’s Digital Projects and Production Services Department.  Visit Bitstreams to learn more about all the exciting and innovative digital projects at Duke University Libraries!