Category Archives: Conferences

Announcements, Behind the Scenes, Collections, Conferences, Digital Collections

Two Years In: The Finish Line Approaches for Digitizing Behind the Veil

July 31, 2023 Giao Baker 3 Comments

Behind the Veil Digitization intern Sarah Waugh and Digital Collections intern Kristina Zapfe’s efforts over the past year have focused on quality control of interviews transcribed by Rev.com. This post was authored by Sarah Waugh and Kristina Zapfe.

Introduction

The Digital Production Center (DPC) is proud to announce that we have reached a milestone in our work on Documenting African American Life in the Jim Crow South: Digital Access to the Behind the Veil Project Archive. We have completed digitization and are over halfway through our quality control of the audio transcripts! The project, funded by the National Endowment for the Humanities, will expand the Behind the Veil (BTV) digital collection, currently 410 audio files, to include the newly digitized copies of the original master recordings, photographic materials, and supplementary project files.

The collection derives from Behind the Veil: Documenting African-American Life in the Jim Crow South. This was an oral history project headed by Duke University’s Center for Documentary Studies from 1993 to 1995 and is currently housed in the David M. Rubenstein Rare Book and Manuscript Library and curated by the John Hope Franklin Research Center for African and African American History and Culture. The BTV collection documented and preserved the memory of African Americans who lived in the South from the 1890s to the 1950s, resulting in a culturally-significant and extensive multimedia collection.

As interns, our work focused on ordering transcripts from Rev.com and performing quality control on transcripts for the digitized oral histories. July 2023 marked our arrival at the halfway point of completing the oral history transcript quality control process. At the time of writing, we’ve checked 1727 of 2876 files after a year of initial planning and hard work. With over 1,666 hours worth of audio files to complete, 3 interns and 7 student workers in the DPC contributed 849 combined hours to oral history transcript quality control so far. Because of their scope, transcription and quality control are the last pieces of the digitization puzzle before the collection moves on to be ingested and published in the Duke Digital Repository.

We are approaching the home stretch with the deadline for transcript quality control coming in December 2023, and the collection scheduled to launch in 2024. With that goal approaching, here is what we’ve completed and what remains to be done.

Digitization Progress

As the graphic above indicates, the BTV digitization project consists of many different media like audio, video, prints, negatives, slides, administrative and project related documents that tell a fuller story of this endeavor. With these formats digitized, we look forward to finishing quality control and preparing the files for handoff to members of the Digital Collections and Curation Services department for ingest, metadata application, and launch for public access in 2024. We plan to send all 2876 audio files to Rev.com service by the end of August and to perform quality control on all those transcripts by December 2023.

Developing the Transcription Quality Control Process

With 2876 files to check within 19 months, the cross-departmental BTV team developed a process to perform quality control as efficiently as possible without sacrificing accuracy, accessibility, and our commitment to our stakeholders. We made our decisions based on how we thought BTV interviewers and narrators would want their speech represented as text. Our choices in creating our quality control workflow began with Columbia University’s Oral History Transcription Style Guide and from that resource, we developed a workflow that made sense for our team and the project.

Some voices were difficult to transcribe due to issues with the original recording, such as a microphone being placed too far away from a speaker, the interference of background noise, or mistakes with the tape. Since we did not have the resources to listen to entire interviews and check for every single mistake, we developed what we called the “spot-check” process of checking these interviews. Given the BTV project’s original ethos and the history of marginalized people in archives, the team decided to prioritize making sure race-related language met our standards across every single interview.

A few decisions on standards were quick and unanimous—such as not transcribing speech phonetically. With that, we avoided pitfalls from older oral histories of African Americans, like the WPA’s famous “Slave Narratives” project, that interviewed formerly-enslaved people, but often transcribed their words in non-standard phonetic spellings. Some narrators in the BTV project who may have been familiar with the WPA transcripts specifically requested the BTV project team not to use phonetic spelling.

Other choices took more discussion: we agreed on capitalizing “Black” when describing race, but we had to decide whether to capitalize other racial terms, including “White” and antiquated designations like “Colored.” Ultimately, we decided to capitalize all racial terms (with the exception of slurs). The team did not want users to make distinctions between lower and uppercase terms if we did not choose to capitalize them all. Maintaining consistency with capitalization would provide clarity and align with BTV values of equality between all races.

Using a spot-check process where we use Rev’s find-and-replace feature to standardize our top priorities saved us time to improve the transcripts in other ways. For instance, we also try to find and correct proper nouns like street names or names of important people in our narrators’ communities, allowing users to make connections in their research. We corrected mistakes with phrases used mainly in the past or that are very specific to certain regions, such as calling a dance hall a “Piccolo joint” from an early jukebox brand name. We also listened to instances where the transcriptionist could not hear or understand a phrase and marked it as “indistinct,” so we can add in the dialogue later (assuming we are able to decipher what was said).

While we developed these methods to increase the pace of our quality control process, one of the biggest improvements came from working with Rev. If we were able to attain more accurate transcripts, our quality control process would be more efficient. Luckily, Rev’s suite of services provided us this option without straying too far from our transcription budget.

Improving Accuracy with Southern Accents Specialists

When deciding on what would be the best speech-to-text option for our project’s needs, we elected to order Transcript Services from Rev, rather than their Caption Services. This decision hinged on the fact that the Transcript Services option is their only service that allows us to request Rev transcriptionists who specialize in Southern accents. Many people who were interviewed for Behind the Veil spoke with Southern accents that varied in strength and dialect. We found that the Southern accent expertise of the specialists had a significant impact on the accuracy of the transcripts we received from Rev.

This improvement in transcript quality has made a substantial difference in the time we spend on quality control for each interview: on average, it only takes us about 48 seconds of work for every 60 seconds of audio we check. We appreciated Rev’s offering of Southern accent specialists enough that we chose that service, even though it meant that we had to then convert their text file format output to the WebVTT file format for enhanced accessibility in the Duke Digital Repository.

Optimizing Accessibility with WebVTT File Format

The WebVTT file format provides visual tracking that coordinates the audio with the written transcript. This improvement in user experience and accessibility justified converting the interview transcripts to WebVTT format. Below is a visual of the WebVTT format in our existing BTV collection in the DDR. Click here to listen to the audio recording.

We have been collaborating with developer Sean Aery to convert transcript text files to WebVTT files so they will display properly in the Duke Digital Repository. He explained the conversion process that occurs after we hand off the transcripts in text file format.

“The .txt transcripts we received from the vendor are primarily formatted to be easy for people to read. However, they are structured well enough to be machine-readable as well. I created a script to batch-convert the files into standard WebVTT captions with long text cues. In WebVTT form, the caption files play nicely with our existing audiovisual features in the Duke Digital Repository, including an interactive transcript viewer, and PDF exports.” – Sean Aery, Digital Projects Developer, Duke University Libraries

Before conversion, we complete one more round of quality control using the spot-checking process. We have even referred to other components of the Behind the Veil collection (Administrative and Project Files Administrative Files) to cross-reference any alterations to metadata for accuracy.

Connecting the Local and Larger Community

Throughout the project, team members have been working on outreach. One big accomplishment by project PI John Gartrell and former BTV outreach intern Brianna McGruder was “Behind the Veil at 30: Reflections on Chronicling African American Life in the Jim Crow South.” This 2-day virtual conference convened former BTV interviewers and current scholars of the BTV collection to discuss their work and the impact that this collection had on their research.

We also recently presented at the Triangle Research Libraries Network annual meeting, where our presentation overlapped with some of what you’ve just read in this post. It was exciting to share our work publicly for the first time and answer questions from library staff across the region. We will also be presenting a poster about our BTV experience at the upcoming North Carolina Library Association conference in Winston-Salem in October.

A image of two people standing a podium with a screen behind them. Four people in the front row look out at them. — Sarah Waugh and Kristina Zapfe presenting at the 2023 TRLN Annual Conference.

As we’ve hoped to convey, this project heavily relies on collaboration from many library departments and external vendors, and there are more contributors than we can thoroughly include in this post. Behind the Veil is a large-scale and high-profile project that has impacted many people over its 30-year history, and this newest iteration of digital accessibility seeks to expand the reach of this collection. Two years on, we’ve built on the work of the many professionals who have come before us to create and develop Behind the Veil. We are honored to be part of this rewarding process. Look for more BTV stories when we cross the finish line in 2024.

Conferences, MSI

Assessment in Enhanced Imaging of Cultural Heritage

May 22, 2020 Henry Hebert

In what now seems like the way distant past, just before the library building closed, Ryan Baumann and I virtually presented on DUL’s work with multispectral imaging at the More than Meets the Eye conference, hosted by the University of Iowa. The three-day conference was a wonderful opportunity to hear examples from around the world on the application of enhanced digital imaging technologies in research on cultural heritage.

This month the library kicked off a weekly “Lunch & Learn” series, in which library staff give short (~20 min) presentations about their recent work or research interests. It provides an additional opportunity to connect with our colleagues and learn something new each week. Since I already had my slides from the Iowa conference, I volunteered to present in the first session, and today I would like to share those slides with Bitstreams readers:

Assessment in Enhanced Imaging of Cultural Heritage

I’ve included my script in the notes for each slide, so you can get the full context. There are also links to other talks the MSI team has done over the years and other work in enhanced imaging from my colleagues.

Have a safe Memorial Day weekend!

Conferences

Sharing data and research in a time of global pandemic

March 17, 2020 Moira Downey

[Header image from the New York Times Coronavirus Map, March 17th, 2020]

Just before Duke stopped travel for all faculty and staff last week, I was able to attend what will probably turn out to have been one of the last conferences of the spring in the Research Data Access and Preservation Association’s (RDAP) annual summit in Santa Fe, New Mexico. RDAP is a community of “data managers and curators, librarians, archivists, researchers, educators, students, technologists, and data scientists from academic institutions, data centers, funding agencies, and industry who represent a wide range of STEM disciplines, social sciences, and humanities,” and who are committed to creating, maintaining, and teaching best practices for the access and preservation of research data. While there were many interesting presentations and posters about the work being done in this area at various institutions around the country, the conference and RDAP’s work more broadly resonated with me in a very general and timely way, which did not necessarily stem from anything I heard during the week.

In a situation like the global pandemic we are now facing, open and unfettered access to research data is vital for treating patients, attempting to stem the course of the disease, and potentially developing life-saving vaccines lives.

A recent editorial in Science, Translational Medicine, argues that data-driven models and centralized data sharing are the best way to approach this kind of outbreak, stating “[w]e believe that scientific efforts need to include determining the values (and ranges) of the above key variables and identifying any other important ones. In addition, information on these variables should be shared freely among the scientific and the response and resilience communities, such as the Red Cross, other nongovernmental organizations, and emergency responders” [1]. As another article points out, sharing viral samples from around the world has allowed scientists to get a better picture of the disease’s genetic makeup: “[c]omparing those genomes allowed Bedford and colleagues to piece together a viral family tree. ‘We can chart this out on the map, then, because we know that this genome is connected to this genome by these mutations,’ he said. ‘And we can learn about these transmission links'” [2].

We can chart this out on the map, then, because we know that this genome is connected to this genome by these mutations. And we can learn about these transmission links.

Scientists are also accelerating the research lifecycle by using preprint servers like arXiv, bioRxiv, and medRxiv to share their preliminary conclusions without waiting on the often glacial process of peer review. This isn’t a wholly unalloyed positive, and many preprints warrant the increased scrutiny that peer review represents. Moreover, scientific research often benefits from the kind of contextualization and unpacking that peer review and science journalism can occasionally provide. But in the acute crisis that the current outbreak presents, the rapid spread of information among scientific peer networks can undoubtedly save lives.

Continuing to develop and build the infrastructure—in terms of both technology and policy frameworks—needed to conduct the kind of data sharing we are seeing now remains a goal for the scientific community moving forward.

The Libraries, along with communities like RDAP, the Research Data Alliance, and the Data Curation Network, endorse and support this mission, and we will continue to play our role in preserving and providing persistent access to research data as best we can as we all move forward through this together. In the meantime, we hope everyone in the Duke community stays safe and healthy!

[1] Layne, S. P., Hyman, J. M., Morens, D. M., & Taubenberger, J. K. (2020, March 11). New coronavirus outbreak: Framing questions for pandemic prevention. Science Translational Medicine 12(534). https://doi.org/10.1126/scitranslmed.abb1469

[2] Sanders, L. (2020, February 13). Coronavirus’s genetic fingerprints are used to rapidly map its spread. Science News. https://www.sciencenews.org/article/coronavirus-genetic-fingerprints-are-used-to-rapidly-map-spread

Conferences, Uncategorized

Managing Problematic Metadata, Take Two

February 2, 2020 Maggie Dickson 2 Comments

Back in August I wrote a Bitstreams post about the various ways by which those of us who work with library metadata could attempt to tackle the issue of problematic descriptions and descriptive standards. One of the methods I mentioned was activism, and I highlighted the documentary ‘Change the Subject!’, which follows the story of students and librarians at Dartmouth University as they worked together to lobby the Library of Congress to stop using the term ‘illegal aliens’ to describe undocumented immigrants.

Recently, the Triangle Research Libraries’ Network offered a screening of this documentary to its constituent libraries, who were treated to a special viewing (and free popcorn!) at Durham’s iconic Carolina Theater. I attended this screening and participated in a panel discussion following the film.

Image of the Carolina Theater in Durham, NC. — By Warren LeMay – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=81937356

I found the documentary to be both encouraging and disheartening: encouraging, as the student activists’ vision, fortitude, and perseverance is inspiring, but disheartening as ultimately, their campaign to have the term ‘illegal aliens’ removed from the Library of Congress Subject Headings failed, due to intervention from Congress.

However, the panel discussion following the screening restored some of my faith that we could still manage problematic metadata with the tools at our disposal. Some of the ideas that were mentioned included:

Identifying alternative thesauri and vocabularies that better represent diversity, equity, and inclusion, and being proactive in mapping problematic metadata to preferred terms.
Working with library vendors to communicate that this is an issue we care about, and perhaps suggesting the use of more inclusive language in their products.
Working with students and student activist groups to collaborate on identifying and remediating areas for improvement in our descriptive practices (as well as library work and spaces in general).
Continuing to use SACO funnels – formal channels for submitting subject authority records to the Library of Congress – while recognizing that this is time consuming yet important work.

And, of course, we can use the technological solution we have already developed for suppressing problematic subject headings from the shared TRLN discovery layer (eg, Duke, UNC, and NCSU’s catalog). Work has progressed on developing policies and governance to support workflows for implementing this solution, including the formation of a TRLN Discovery Metadata Team, which will focus on the shared discovery layer, and a more broadly focused TRLN Metadata Interest Group. Stay tuned!

Behind the Scenes, Conferences

Something Good

January 24, 2019 Alex Marsh

One of the highlights of the Association of Moving Image Archivists’ annual conference is “Archival Screening Night,” where members of the AMIA community showcase recently-discovered and newly-restored film and video footage. The event usually takes place in a historic movie theater, with skilled projectionists that are able to present the film materials in their original format, on the big screen. At the most recent AMIA conference, in Portland, Oregon, there was a wide array of impressive material presented, but one film in particular left the audience speechless, and is a wonderful example of how archivists can unearth treasures that can alter our perspective on human history, and humanity itself.

The film, “Something Good – Negro Kiss” was made in 1898. It’s silent, black & white, and is less than a minute long. But it’s groundbreaking, in that it shows the earliest known depiction of an African-American couple kissing, and stands in opposition to the racist, minstrel-show portrayals of black people so common in the early days of American filmmaking. The couple embrace, kiss, and sway back and forth in a playful, spontaneous dance that comes across as genuine and heartwarming. Although it may not have been intentional, the short film seems to be free of negative racial stereotypes. You can watch it here:

Dino Everett, an archivist at the University of Southern California’s Hugh M. Hefner Moving Image Archive, recently discovered the 35mm nitrate film within a batch of silent films once owned by a Louisiana collector. Unique perforation marks helped him to identify the film’s age, and Allyson Nadia Field, of the University of Chicago, was able to help track down the history: where it was shot (Chicago), who the filmmaker was (William Selig), and what the original title of the the film was (Something Good). The actors have been identified as Saint Suttle and Gertie Brown. The film has now been added to the Library of Congress’ National Film Registry.

The film is likely an homage to “The Kiss” (also known as the May Irwin Kiss), a film made in 1896, with a white couple kissing. It was one of the first films ever shown commercially, and is the very first kiss on film. Even though the couple was white, and the kissing is remarkably tame by today’s standards, it created a lot of controversy at the time, because kissing in public was prohibited by law. The Catholic church and newspaper editorials denounced “The Kiss” and called for censorship and prosecution. Although there is no documented history yet about the public reaction to “Something Good – Negro Kiss,” one can only imagine the shock and scandal it must have caused, showing an African-American couple kissing each other, only two years later.

Behind the Scenes, Conferences

Sustaining Open

June 9, 2018 Moira Downey

On learning that this year’s conference on Open Repositories would be held in Bozeman, Montana, I was initially perplexed. What an odd, out-of-the-way corner of the world in which to hold an international conference on the work of institutional digital repositories. After touching down in Montana, however, it quickly became apparent how appropriate the setting would be to this year’s conference—a geographic metaphor for the conference theme of openness and sustainability. I grew up out west, but coastal California has nothing on the incomprehensibly vast and panoramic expanse of western Montana. I was fortunate enough to pass a few days driving around the state before the conference began, culminating in a long afternoon spent at Yellowstone National Park. As we wrapped up our hike that afternoon by navigating the crowds and the boardwalks hovering over the terraces of the Mammoth Hot Springs, I wondered about the toll our presence took on the park, what responsible consumption of the landscape looks like, and how we might best preserve the park’s beauty for the future.

Beaver Pond Loop Trail, Yellowstone National Park

Tuesday’s opening remarks from Kenning Arlitsch, conference host Montana State University’s Dean of Libraries, reflected these concerns, pivoting from a few words on what “open” means for library and information professionals to a lengthier consideration of the impact of “openness” on the uniqueness and precarity of the greater Yellowstone eco-system. Dr. Arlitsch noted that “[w]e can always create more digital space, but we cannot create more of these wild spaces.” While I agree unreservedly with the latter part of his statement, as the conference progressed, I found myself re-evaluating the whole of that assertion. Although it’s true that we may be able to create more digital space with some ease (particularly as the strict monetary cost of digital storage becomes more manageable), it’s what we do with this space that is meaningful for the future. One of my chief takeaways from my time in Montana was that responsibly stewarding our digital commons and sustaining open knowledge for the long term is hard, complicated work. As the volume of ever more complex digital assets accelerates, finding ways responsibly ensure access now and for the future is increasingly difficult.

“Research and Cultural Heritage communities have embraced the idea of Open; open communities, open source software, open data, scholarly communications, and open access publications and collections. These projects and communities require different modes of thinking and resourcing than purchasing vended products. While open may be the way forward, mitigating fatigue, finding sustainable funding, and building flexible digital repository platforms is something most of us are striving for.”

Many of the sessions I attended took the curation of research data in institutional repositories as their focus; in particular, a Monday workshop on “Engaging Liaison Librarians in the Data Deposit Workflow: Starting the Conversation” highlighted that research data curation is taking place through a wide array of variously resourced and staffed workflows across institutions. A good number of institutions do not have their own local repository for data, and even those larger organizations with broad data curation expertise and robust curatorial workflows (like Carnegie Mellon University, representatives from which led the workshop) may outsource their data publishing infrastructure to applications like Figshare, rather than build a local solution. Curatorial tasks tended to mean different things in different organizational contexts, and workflows varied according to staffing capacity. Our workshop breakout group spent some time debating the question of whether institutional repositories should even be in the business of research data curation, given the demanding nature of the work and the disparity in available resources among research organizations. It’s a tough question without any easy answers; while there are some good reasons for institutions to engage in this kind of work where they are able (maintaining local ownership of open data, institutional branding for researchers), it’s hard to escape the conclusion that many IRs are under-equipped from the standpoint of staff or infrastructure to sustainably process the on-coming wave of large-scale research data.

Mammoth Hot Springs, Yellowstone National Park

Elsewhere, from a technical perspective, presentations chiefly seemed to emphasize modularity, microservices, and avoiding reinventing the wheel. Going forward, it seems as though community development and shared solutions to problems held in common will be integral strategies to sustainably preserving our institutional research output and digital cultural heritage. The challenge resides in equitably distributing this work and in providing appropriate infrastructure to support maintenance and governance of the systems preserving and providing access to our data.

Behind the Scenes, Collections, Conferences, Projects

Charm City Sounds

May 18, 2018 Zeke Graves

Last week I had the opportunity to attend the 52nd Association for Recorded Sound Collections Annual Conference in Baltimore, MD. From the ARSC website:

Founded in 1966, the Association for Recorded Sound Collections, Inc. is a nonprofit organization dedicated to the preservation and study of sound recordings—in all genres of music and speech, in all formats, and from all periods.

ARSC is unique in bringing together private individuals and institutional professionals. Archivists, librarians, and curators representing many of the world’s leading audiovisual repositories participate in ARSC alongside record collectors, record dealers, researchers, historians, discographers, musicians, engineers, producers, reviewers, and broadcasters.

ARSC’s vitality springs from more than 1000 knowledgeable, passionate, helpful members who really care about sound recordings.

ARSC Annual Conferences encourage open sharing of knowledge through informative presentations, workshops, and panel discussions. Tours, receptions, and special local events heighten the camaraderie that makes ARSC conferences lively and enjoyable.

This quote highlights several of the things that have made ARSC resources valuable and educational to me as the Audio Production Specialist at Duke Libraries:

The group’s membership includes both professionals and enthusiasts from a variety of backgrounds and types of institutions.
Members’ interests and specialties span a broad array of musical genres, media types, and time periods.
The organization serves as a repository of knowledge on obscure and obsolete sound recording media and technology.

This year’s conference offered a number of presentations that were directly relevant to our work here in Digital Collections and Curation Services, highlighting audio collections that have been digitized and the challenges encountered along the way. Here’s a quick recap of some that stood out to me:

“Uncovering the Indian Neck Folk Festival Collection” by Maya Lerman (Folklife Center, Library of Congress). This presentation showcased a collection of recordings and related documentation from a small invitation-only folk festival that ran from 1961-2014 and included early performances from Reverend Gary Davis, Dave Van Ronk, and Bob Dylan. It touched on some of the difficulties in archiving optical and born-digital media (lack of metadata, deterioration of CD-Rs) as well as the benefits of educating prospective donors on best practices for media and documentation.
“A Garage in South Philly: The Vernacular Music Research Archive of Thornton Hagert” by David Sager and Anne Stanfield-Hagert. This presentation paid tribute to the massive jazz archive of the late Mr. Hagert, comprising over 125,000 items of printed music, 75,000 recordings, 5,500 books, and 2,000 periodicals. It spoke to the difficulties of selling or donating a private collection of this magnitude without splitting it up and undoing the careful, but idiosyncratic organizational structure as envisioned by the collector.
“Freedom is a Constant Struggle: The Golden State Mutual Sound Recordings” by Kelly Besser, Yasmin Dessem and Shanni Miller (UCLA Library). This presentation covered the audio material from the archive of an African American-owned insurance company founded in 1925 in the Bay Area. While audio was only a small part of this larger collection, the speakers demonstrated how it added additional context and depth to photographs, video, and written documents. They also showed how this kind of archival audio can be an important tool in telling the stories of previously suppressed or unheard voices.
“Sounds, Sights and Sites of Activism in ’68” by Guha Shankar (Library of Congress). This presentation examined a collection of recordings from “Resurrection City” in Washington, DC. This was an encampment that was part of the Poor People’s Campaign, a demonstration for human rights organized by Martin Luther King, Jr. prior to his assassination in 1968. The talk showed how these archival documents are being accessed and used to inform new forms of social and political activism and wider circulation via podcasts, websites, public lecture and exhibitions.

The ARSC Conference also touched on my personal interests in American traditional and vernacular music, especially folk and blues from the early 20th Century. Presentations on the bluegrass scene in Baltimore, blues guitarist Johnny Shines, education outreach by the creators of PBS’s “American Epic” documentaries, and Hickory, NC’s own Blue Sky Boys provided a welcome break from favorite archivist topics such as metadata, workflows, and quality control. Other fun parts of the conference included an impromptu jam session, a silent auction of books & records, and posters documenting the musical history of Baltimore. True to the city’s nickname, I was charmed by my time in Baltimore and inspired by the amazingly diverse and dedicated work towards collecting and preserving our audio heritage by the ARSC community.

Conferences, Digital Collections, Duke Digital Repository, Technology, User Experience

Accessible AV in the Duke Digital Repository

October 24, 2017 Sean Aery 1 Comment

Over the course of 2017, we improved our capacity to support digital audiovisual materials in the Duke Digital Repository (DDR) by leaps and bounds. A little more than a year ago, I had written a Bitstreams blog post highlighting the new features we had just developed in the DDR to provide basic functionality for AV, especially in support of the Duke Chapel Recordings collection. What a difference a year makes.

This past year brought renewed focus on AV development, as we worked to bring the NEH grant-funded Radio Haiti Archive online (launched in June). At the same time, our digital collections legacy platform migration efforts shifted toward moving our existing high-profile digital AV material into the repository.

Closed Captions

At Duke University Libraries, we take accessibility seriously. We aim to include captions or transcripts for the audiovisual objects made available via the Duke Digital Repository, especially to ensure that the materials can be perceived and navigated by people with disabilities. For instance, work is well underway to create closed captions for all 1,400 items in the Duke Chapel Recordings project.

Screenshot showing Charmin commercial from AdViews collection with caption overlay — Captioned video displays a CC button and shows captions as an overlay in the video player. Example from the AdViews collection, coming soon to the DDR.

The DDR now accommodates modeling and ingest for caption files, and our AV player interface (powered by JW Player) presents a CC button whenever a caption file is available. Caption files are encoded using WebVTT, the modern W3C standard for associating timed text with HTML audio and video. WebVTT is structured so as to be machine-processable, while remaining lightweight enough to be reasonably read, created, or edited by a person. It’s a format that transcription vendors can provide. And given its endorsement by W3C, it should be a viable captioning format for a wide range of applications and devices for the foreseeable future.

Example WebVTT captions — Text cues from a WebVTT caption file for an audio item in the Duke Chapel Recordings collection.

Interactive Transcripts

Displaying captions within the player UI is helpful, but it only gets us so far. For one, that doesn’t give a user a way to just read the caption text without requiring them to play the media. We also need to support captions for audio files, but unlike with video, the audio player doesn’t include enough real estate within itself to render the captions. There’s no room for them to appear.

So for both audio and video, our solution is to convert the WebVTT caption files on-the-fly into an interactive in-page transcript. Using the webvtt-ruby gem (developed by Coconut) , we parse the WebVTT text cues into Ruby objects, then render them back on the page as HTML. We then use the JWPlayer Javascript API to keep the media player and the HTML transcript in sync. Clicking on a transcript cue advances the player to the corresponding moment in the media, and the currently-playing cue gets highlighted as the media plays.

Screenshot of interactive audio transcript — Example interactive synchronized transcript for an audio item (rendered from a WebVTT caption file). From a collection coming soon to the DDR.

We also do some extra formatting when the WebVTT cues include voice tags (<v> tags), which can optionally indicate the name of the speaker (e.g., <v Jane Smith>). The in-page transcript is indexed by Google for search retrieval.

Transcript Documents

In many cases, especially for audio items, we may have only a PDF or other type of document with a transcript of a recording that isn’t structured or time-coded. Like captions, these documents are important for accessibility. We have developed support for displaying links to these documents near the media player. Look for some new collections using this feature to become available in early 2018.

Screenshot of a transcript document menu above the AV player — Transcript documents presented above the media player. Coming soon to AV collections in the DDR.

A/V Embedding

The DDR web interface provides an optimal viewing or listening experience for AV, but we also want to make it easy to present objects from the DDR on other websites, too. When used on other sites, we’d like the objects to include some metadata, a link to the DDR page, and proper attribution. To that end, we now have copyable <iframe> embed code available from the Share menu for AV items.

Embed code in the Share menu for an audio item. — Copyable embed code from an audio recording in the Radio Haiti Archive.

This embed code is also what we now use within the Rubenstein Library collection guides (finding aids) interface: it lets us present digital objects from the DDR directly from within a corresponding collection guide. So as a researcher browses the inventory of a physical archival collection, they can play the media inline without having to leave.

Screenshot of Rubenstein Library collection guide presenting a Duke Chapel Recordings video inline. — Embedded view of a DDR video from the Duke Chapel Recordings collection presented inline in a Rubenstein Library archival collection guide.

Sites@Duke Integration
If your website or blog is one of the thousands of WordPress sites hosted and supported by Sites@Duke — a service of Duke’s Office of Information Technology (OIT) — we have good news for you. You can now embed objects from the DDR using WordPress shortcode. Sites@Duke, like many content management systems, doesn’t allow authors to enter <iframe> tags, so shortcode is the only way to get embeddable media to render.

Example of WordPress shortcode for DDR embedding on Sites@Duke.edu sites. — Sites@Duke WordPress sites can embed DDR media by using shortcode with the DDR item’s permalink.

And More!

Here are the other AV-related features we have been able to develop in 2017:

Access control: master files & derivatives alike can be protected so access is limited to only authorized users/groups
Video thumbnail images: model, manage, and display
Video poster frames: model, manage, and display
Intermediate/mezzanine files: model and manage
Rights display: display icons and info from RightsStatements.org and Creative Commons, so it’s clear what users are permitted to do with media.

What’s Next

We look forward to sharing our recent AV development with our peers at the upcoming Samvera Connect conference (Nov 6-9, 2017 in Evanston, IL). Here’s our poster summarizing the work to date:

Poster presentation screenshot for Samvera Connect 2017 — Poster about Duke’s AV development for Samvera Connect conference, Nov 6-9, 2017 (Evanston, IL)

Looking ahead to the next couple months, we aim to round out the year by completing a few more AV-related features, most notably:

Export WebVTT captions as PDF or .txt
Advance the player via linked timecodes in the description field in an item’s metadata
Improve workflows for uploading caption files and transcript documents

Now that these features are in place, we’ll be sharing a bunch of great new AV collections soon!

Conferences, Projects, Uncategorized

The Inaugural TRLN Institute – an Experiment in Consortial Collaboration

August 18, 2017 Maggie Dickson 1 Comment

In June of this year I was fortunate to have participated in the inaugural TRLN Institute. Modeled as a sort of Scholarly Communication Institute for TRLN (Triangle Research Libraries Network, a consortium located in the Triangle region of North Carolina), the Institute provided space (the magnificent Hunt Library on North Carolina State University’s campus), time (three full days), and food (Breakfast! Lunch! Coffee!) for groups of 4-6 people from member libraries to get together to exclusively focus on developing innovative solutions to shared problems. Not only was it productive, it was truly delightful to spend time with colleagues from member institutions who, although we are geographically close, don’t get together often enough.

Six projects were chosen from a pool of applicants who proposed topics around this year’s theme of Scholarly Communication:

Supporting Scholarly Communications in Libraries through Project Management Best Practices
Locating Research Data in an Age of Open Access
Clarifying Rights and Maximizing Reuse with RightsStatements.org
Building a Research Data Community of Practice in NC
Building the 21st Century Researcher Brand
Scholarship in the Sandbox: Showcasing Student Works

You can read descriptions of the projects as well as group membership here.

The 2017 TRLN Institute participants and organizers, a happy bunch.

Having this much dedicated and unencumbered time to thoughtfully and intentionally address a problem area with colleagues was invaluable. And the open schedule allowed groups to be flexible as their ideas and expectations changed throughout the course of the three-day program. My own group – Clarifying Rights and Maximizing Reuse with RightsStatements.org – was originally focused on developing practices for the application and representation of RightsStatements.org statements for TRLN libraries’ online digitized collections. Through talking as a group, however, we realized early on that some of the stickiest issues regarding the implementation of a new rights management strategy involves the work an institution has to do to identify appropriate staff to do the work, allocate resources, plan, and document the process.

So, we pivoted! Instead of developing a decision matrix for applying the RS.org statements in digital collections (which is what we originally thought our output would be), we instead spent our time drafting a report – a roadmap of sorts – that describes the following important components when implementing RightsStatements.org:

roles and responsibilities (including questions that a person in a role would need to ask)
necessary planning and documentation
technical decisions
example implementations (including steps taken and staff involved – perhaps the most useful section of the report)

This week, we put the finishing touches on our report: TRLN Rights Statements Report – A Roadmap for Implementing RightsStatements.org Statements (yep, yet another google doc). We’re excited to get feedback from the community, as well as hear about how other institutions are handling rights management metadata, especially as it relates to upstream archival information management. This is an area rife for future exploration!

I’d say that the first TRLN Institute was a success. I can’t imagine my group having self-organized and produced a document in just over a month without having first had three days to work together in the same space and unencumbered by other responsibilities. I think other groups have found valuable traction via the Institute as well, which will result in more collaborative efforts. I look forward to seeing what future TRLN Institute produce – this is definitely a model to continue!

Conferences, Duke Digital Repository

Rethinking Repositories at CNI Spring ’17

April 7, 2017 Will Sexton 1 Comment

One of the main areas of emphasis for the CNI Spring 2017 meeting was “new strategies and approaches for institutional repositories (IR).” A few of us at UNC and Duke decided to plug into the zeitgeist by proposing a panel to reflect on some of the ways that we have been rethinking – or even just thinking about – our repositories.

Continue reading Rethinking Repositories at CNI Spring ’17 →