Last week, an indefatigable team at Duke University Libraries released an upgraded version of the DukeSpace platform, completing the first phase of the critical project that I wrote about in this space in January. One member of the team remarked that we now surely have “one of the best DSpaces in the world,” and I dare anyone to prove otherwise.
DukeSpace serves as the Libraries’ open-access institutional repository, which makes it a key aspect of our mission to “partner in research,” as outlined in our strategic plan. As I wrote in January, the version of the DSpace platform that underlies the service had been stuck at 1.7, which was released during 2010 – the year the iPad came out, and Lady Gaga wore a meat dress. We upgraded to version 6.2, though the differences between the two versions are so great that it would be more accurate to call the project a migration.
That migration turned out to be one of the more complex technology projects we’ve undertaken over the years. The main complicating factor was the integration with Symplectic Elements, the Research Information Management System (RIMS) that powers the Scholars at Duke site. As far as we know, we are the first institution to integrate Elements with DSpace 6.2. It was a beast to do, and we are happy to share our knowledge gained if it will help any of our peers out there trying to do the same thing.
Meanwhile, feel free to click on over to and enjoy one of the best DSpaces in the world. And congratulations to one of the mightiest teams assembled since Spain won the World Cup!
Providing access to captions and transcripts is not new for digital collections. We have been able to provide access to pdf transcripts and caption both in digital collections and finding aids for years. See items from the Behind the Veil and Memory Project digital collections for examples.
In recent years however, we stepped our efforts in creating captions and transcripts. Our work began in response to a 2015 lawsuit brought against Harvard and MIT by the National Association of the Deaf. The lawsuit triggered many discussions in the library, and the Advisory Council for Digital Collections eventually decided that we would proactively create captions or transcripts for all new A/V digital collections assuming it is feasible and reasonable to do so. The feasible and reasonable part of our policy is key. The Radio Haiti collection for example is composed of thousands of recordings primarily in Haitian Creole and French. The costs to transcribe that volume of material in non-English languages make it unreasonable (and not feasible) to transcribe. In addition to our work in the library, Duke has established campus wide web accessibility guidelines that includes captioning and transcription. Therefore our work in digital collections is only one aspect of campus wide accessibility efforts.
To create transcripts and captions, we have partnered with several vendors since 2015, and we have seen the costs for these services drop dramatically. Our primary vendor right now is Rev, who also works with Duke’s Academic Media Services department. Rev guarantees 99% accurate captions or transcripts for $1/minute.
Early on, Duke Digital Collections decided to center our captioning efforts around the WebVTT format, which is a time-coded text based file and a W3C standard. We use it for both audio and video captions when possible, but we can also accommodate legacy transcript formats like pdfs. Transcripts and captions can be easily replaced with new versions if and when edits need to be made.
Examples from the Silent Vigil (1968) and Allen Building Takeover (1969) Audio Recordings
When WebVTT captions are present, they load in the interface as an interactive transcript. This transcript can be used for navigation purposes; click the text and the file moves to that portion of the recording.
In addition to providing access to transcripts on the screen, we offer downloadable versions of the WebVTT transcript as a text file, a pdf or in the original webVTT format.
An advantage of the WebVTT format is that it includes “v” tags, which can be used to note changes in speakers and one can even add names to the transcript. This can require additional manual work if the names of the speakers is not obvious to the vendor, but we are excited to have this opportunity.
As Sean described in his blog post, we can also provide access to legacy pdf documents. They cannot be rendered into an interactive version, but they are still accessible for download.
On a related note, we also have a new feature that links time codes listed in the description metadata field of an item to the corresponding portion of the audio or video file. This enables librarians to describe specific segments of audio and/or video items. The Radio Haiti digital collection is the first to utilize this feature, but the feature will be a huge benefit to the H. Lee Waters and Chapel Recordings digital collections as well as many others.
As mentioned at the top of this post, the Duke Vigil and Allen Building Takeover collection includes our first batch of interactive transcripts. We plan to launch more this Spring, so stay tuned!!
It’s a new year! And a new year means new priorities. One of the many projects DUL staff have on deck for the Duke Digital Repository in the coming calendar year is an upgrade to DSpace, the software application we use to manage and maintain our collections of scholarly publications and electronic theses and dissertations. As part of that upgrade, the existing DSpace content will need to be migrated to the new software. Until very recently, that existing content has included a few research datasets deposited by Duke community members. But with the advent of our new research data curation program, research datasets have been published in the Fedora 3 part of the repository. Naturally, we wanted all of our research data content to be found in one place, so that meant migrating the few existing outliers. And given the ongoing upgrade project, we wanted to be sure to have it done and out of the way before the rest of the DSpace content needed to be moved.
Most of the datasets that required moving were relatively small–a handful of files, all of manageable size (under a gigabyte) that could be exported using DSpace’s web interface. However, a limited series of data associated with a project called The Integrated Precipitation and Hydrology Experiment (IPHEx) posed a notable exception. There’s a lot of data associated with the IPHEx project (recorded daily for 7 years, along with some supplementary data files, and iterated over 3 different areas of coverage, the total footprint came to just under a terabyte, spread over more than 7,000 files), so this project needed some advance planning.
First, the size of the project meant that the data were too large to export through the DSpace web client, so we needed the developers to wrangle a behind the scenes dump of what was in DSpace to a local file system. Once we had everything we needed to work with (which included some previously unpublished updates to the data we received last year from the researchers), we had to make some decisions on how to model it. The data model used in DSpace was a bit limiting, which resulted in the data being made available as a long list of files for each part of the project. In moving the data to our Fedora repository, we gained a little more flexibility with how we could arrange the files. We determined that we wanted to deviate slightly from the arrangement in DSpace, grouping the files by month and year.
This meant we would have group all the files into subdirectories containing the data for each month–for over 7,000 files, that would have been extremely tedious to do by hand, so we wrote a script to do the sorting for us. That completed, we were able to carry out the ingest process as normal. The final wrinkle associated with the IPHEx project was making sure that the persistent identifiers each part of the project data had been assigned in DSpace still resolved to the correct content. One of our developers was able to set up a server redirect to ensure that each URL would still take a user to the right place. As of the new year, the IPHEx project data (along with our other migrated DSpace datasets) are available in their new home!
2017 has been an action packed year for Digital Collections full of exciting projects, interface developments and new processes and procedures. This blog post is an attempt to summarize just a few of our favorite accomplishments from the last year. Digital Collections is truly a group cross-departmental collaboration here at Duke, and we couldn’t do complete any of the work listed below without all our colleagues across the library – thanks to all!
New Digital Collections Portal
Regular visitors to Duke Digital Collections may have noticed that our old portal (library.duke.edu/digitalcollections/) now redirects to our new homepage on the Duke Digital Repository (DDR) public interface. We are thrilled to make this change! But never fear, your favorite collections that have not been migrated to DDR are still accessible either on our Tripod2 interface or by visiting new placeholder landing pages in the Digital Repository.
Supporting A/V materials in the Digital Repository has been a major software development priority throughout 2017. As a result our A/V items are becoming more accessible and easier to share. Thanks to a year of hard work we can now do and support the following (we posted examples of these on a previous post).
Model, store and stream A/V derivatives
Share A/V easily through our embed feature (even on Duke WordPress sites- a long standing bug)
Finding aids can now display inline AV for DAOs from DDR
Clickable timecode links in item descriptions (example)
Display captions and interactive transcripts
Download and Export captions and transcripts (as .pdf, .txt.,or .vtt)
Display Video thumbnails & poster frames
Rights Statements and Metadata
Bitstreams recently featured a review of all things metadata from 2017, many of which impact the digital collections program. We are especially pleased with our rights management work from the last year and our rights statements implementation (http://rightsstatements.org/en/). We are still in the process of retrospectively applying the statements, but we are making good progress. The end result will give our patrons a clearer indication of the copyright status of our digital objects and how they can be used. Read more about our rights management work in past Bitstreams posts.
Also this year in metadata, we have been developing integrations between ArchivesSpace (the tool Rubenstein Library uses for finding aids) and the repository (this is a project that has been in the works since 2015. With these new features Rubenstein’s archivist for metadata and encoding is in the process of reconciling metadata between ArchivesSpace and the Digital Repository for approximately 50 collections to enable bi-directional links between the two systems. Bi-directonal linking helps our patrons move easily from a digital object in the repository to its finding aid or catalog record and vice versa. You can read about the start of this work in a blog post from 2016.
At the end of 2016, Duke Libraries purchased Multispectral Imaging (MSI) equipment, and members of Digital Collections, Data and Visualization Studies, Conservation Services, the Duke Collaboratory for Classics Computing, and the Rubenstein Library joined forces to explore how to best use the technology to serve the Duke community. The past year has been a time of research, development, and exploration around MSI and you can read about our efforts on Bitstreams. Our plan is to launch an MSI service in 2018. Stay tuned!
Ingest into the Duke Digital Repository (DDR)
With the addition of new colleagues focussed on research data management, there have been more demands on and enhancements to our DDR ingest tools. Digital collections has benefited from more robust batch ingest features as well as the ability to upload more types of files (captions, transcripts, derivatives, thumbnails) through the user interface. We can also now ingest nested folders of collections. On the opposite side of the spectrum we now have the ability to batch export sets of files or even whole collections.
The Digital Collections Advisory Committee and Implementation Team are always looking for more efficient ways to manage our sprawling portfolio of projects and services. We started 2017 with a new call for proposals around the themes of diversity and inclusion, which resulted in 7 successful proposals that are now in the process of implementation.
In addition to a thematic call for proposals, we later rolled out a new process for our colleagues to propose smaller projects in response to faculty requests, events or for other reasons. In other words, projects of a certain size and scope that were not required to respond to a thematic call for proposals. The idea being that these projects can be easily implemented, and therefore do not require extensive project management to complete. Our first completed “easy” project is the Carlo Naya photograph albums of Venice.
In 2016 (perhaps even back in 2015), the digital collections team started working with colleagues in Rubenstein to digitize the set of collections known in as “Section A”. The history of this moniker is a little uncertain, so let me just say that Section A is a set of 3000+ small manuscript collections (1-2 folders each) boxed together; each Section A box holds up to 30 collections. Section A collections are highly used and are often the subject of reproduction requests, hence they are perfect candidates for digitization. Our goal has been to set up a mass-digitization pipeline for these collections, that involves vetting rights, updating description, evaluating their condition, digitizing them, ingesting them into DDR, crosswalking metadata and finally making them publicly accessible in the repository and through their finding aids. In 2017 we evaluated 37 boxes for rights restrictions, updated descriptions for 24 boxes, assessed the condition of 31 boxes, digitized 19 boxes, ingested 4 boxes, crosswalked metadata for 2 boxes and box 1 is now online! Read more about the project in a May Bitstreams post. Although progress has felt slow given all the other projects we manage simultaneously, we really feel like our foot is on the gas now!
You can see the fruits of our digital collection labors in the list of new and migrated collections from the past year. We are excited to see what 2018 will bring!!
We’re experimenting with changing our approach to projects in Software Development and Integration Services (SDIS). There’s been much talk of Agile (see the Agile Manifesto) over the past few years within our department, but we’ve faced challenges implementing this as an approach to our work given our broad portfolio, relatively small team, and large number of internal stakeholders.
After some productive conversations among staff and managers in SDIS where we reflected on our work over the past few years we decided to commit to applying the Scrum framework to one or more projects.
There are many resources available for learning about Agile and Scrum. The resources I’ve found most useful so far in learning about the framework include:
Scrum seems best suited to developing new products or software and defines the roles, workflow, and artifacts that help a team make the most of its capacity to build the highest value features first and deliver usable software on a regular and frequent schedule.
To start, we’ll be applying this process to a new project to build a prototype of a research data repository based on Hyrax. We’ve formed a small team, including a product owner, scrum master, and development team to build the repository. So far, we’ve developed an initial backlog of requirements in the form of user stories in Jira, the software we use to manage projects. We’ve done some backlog refinement to prioritize the most important and highest value features, and defined acceptance criteria for the ones that we’ll consider first. The development team has estimated the story points (relative estimate of effort and complexity) for some of the user stories to help us with sprint planning and release projection. Our first two-week sprint will begin the week after Thanksgiving. By the end of January we expect to have completed four, two-week sprints and have a pilot ready with a basic set of features implemented for evaluation by internal stakeholders.
One of the important aspects of Scrum is that group reflection on the process itself is built into the workflow through retrospective meetings after each sprint. Done right, routine retrospectives serve to reinforce what is working well and allows for adjustments to address things that aren’t. In the future we hope to adapt what we learn from applying the Scrum framework to the research data repository pilot to improve our approach to other aspects of our work in SDIS.
This past year the SNCC Digital Gateway has brought a number of activists to Duke’s campus to discuss lesser known aspects of the Student Nonviolent Coordinating Committee (SNCC)’s history and how their approach to organizing shifted over time. These sessions ranged the development of the symbol of the Black Panther for the Lowndes County Freedom Party, the strength of local people in the Movement in Southwest Georgia, and the global network supporting SNCC’s fight for Black empowerment in the U.S. and across the African Diaspora. Next month, there will be a session focused on music in the Movement, with a public panel the evening of September 19th.
These visiting activist sessions, often spanning the course of a few days, produce hours of audio and video material, as SNCC veterans reengage with the history through conversation with their comrades. And this material is rich, as memories are dusted off and those involved explore how and why they did what they did. However, considering the structure of the SNCC Digital Gateway and wanting to make these 10 hour collections of A/V material digestible and accessible, we’ve had to develop a means of breaking them down.
Step One: Transcription
As is true for many projects, you begin by putting pen to paper (or by typing furiously). With the amount of transcribing that we do for this project, we’re certainly interested in making the process as seamless as possible. We depend on ExpressScribe, which allows you to set hot keys to start, stop, rewind, and fast forward audio material. Another feature is that you can easily adjust the speed at which the recording is being played, which is helpful for keeping your typing flow steady and uninterrupted. For those who really want to dive in, there is a foot pedal extension (yes, one did temporarily live in our project room) that allows you to control the recording with your feet – keeping your fingers even more free to type at lightning speed. After transcribing, it is always good practice to review the transcription, which you can do efficiently while listening to a high speed playback.
Step Two: Selecting Clips
Once these have been transcribed (each session results in approximately a 130 page transcript, single-spaced), it is time to select clips. For the parameters of this project, we keep the clips roughly between 30 seconds and 8 minutes and intentionally try to pull out the most prominent themes from the conversation. We then try to fit our selections into a larger narrative that tells a story. This process takes multiple reviews of the material and a significant amount of back and forth to ensure that the narrative stays true to the sentiments of the entire conversation.
Step Three: Writing the Narrative
We want users to listen to all of the A/V material, but sometimes details need to be laid out so that the clips themselves make sense. This is where the written narrative comes in. Without detracting from the wealth of newly-created audio and video material, we try to fill in some of the gaps and contextualize the clips for those who might be less familiar with the history. In addition to the written narrative, we embed relevant documents and photographs that complement the A/V material and give greater depth to the user’s experience.
Step Four: Creating the Audio Files
With all of the chosen clips pulled from the transcript, it’s time to actually make the audio files. For each of these sessions, we have multiple recorders in the room, in order to ensure everyone can be heard on the tape and that none of the conversation is lost due to recorder malfunction. These recorders are set to record in .WAV files, an uncompressed audio format for maximum audio quality.
One complication with having multiple mics in the room, however, is that the timestamps on the files are not always one-to-one. In order to easily pull the clips from the best recording we have, we have to sync the files. Our process involves first creating a folder system on an external hard drive. We then create a project in Adobe Premiere and import the files. It’s important that these files be on the same hard drive as the project file so that Premiere can easily find them. Then, we make sequences of the recordings and match the waveform from each of the mics. With a combination of using the timestamps on the transcriptions and scrubbing through the material, it’s easy to find the clips we need. From there, we can make any post-production edits that are necessary in Adobe Audition and export them as .mp3 files with Adobe Media Encoder.
Step Five: Uploading & Populating
Due to the SNCC Digital Gateway’s sustainability requirements, we host the files in a Duke Digital Collections folder and then embed them in the website, which is built on a WordPress platform. These files are then formatted between text, document, and image, to tell a story.
In June of this year I was fortunate to have participated in the inaugural TRLN Institute. Modeled as a sort of Scholarly Communication Institute for TRLN (Triangle Research Libraries Network, a consortium located in the Triangle region of North Carolina), the Institute provided space (the magnificent Hunt Library on North Carolina State University’s campus), time (three full days), and food (Breakfast! Lunch! Coffee!) for groups of 4-6 people from member libraries to get together to exclusively focus on developing innovative solutions to shared problems. Not only was it productive, it was truly delightful to spend time with colleagues from member institutions who, although we are geographically close, don’t get together often enough.
Six projects were chosen from a pool of applicants who proposed topics around this year’s theme of Scholarly Communication:
Supporting Scholarly Communications in Libraries through Project Management Best Practices
Locating Research Data in an Age of Open Access
Clarifying Rights and Maximizing Reuse with RightsStatements.org
Building a Research Data Community of Practice in NC
Building the 21st Century Researcher Brand
Scholarship in the Sandbox: Showcasing Student Works
You can read descriptions of the projects as well as group membership here.
Having this much dedicated and unencumbered time to thoughtfully and intentionally address a problem area with colleagues was invaluable. And the open schedule allowed groups to be flexible as their ideas and expectations changed throughout the course of the three-day program. My own group – Clarifying Rights and Maximizing Reuse with RightsStatements.org – was originally focused on developing practices for the application and representation of RightsStatements.org statements for TRLN libraries’ online digitized collections. Through talking as a group, however, we realized early on that some of the stickiest issues regarding the implementation of a new rights management strategy involves the work an institution has to do to identify appropriate staff to do the work, allocate resources, plan, and document the process.
So, we pivoted! Instead of developing a decision matrix for applying the RS.org statements in digital collections (which is what we originally thought our output would be), we instead spent our time drafting a report – a roadmap of sorts – that describes the following important components when implementing RightsStatements.org:
roles and responsibilities (including questions that a person in a role would need to ask)
necessary planning and documentation
example implementations (including steps taken and staff involved – perhaps the most useful section of the report)
I’d say that the first TRLN Institute was a success. I can’t imagine my group having self-organized and produced a document in just over a month without having first had three days to work together in the same space and unencumbered by other responsibilities. I think other groups have found valuable traction via the Institute as well, which will result in more collaborative efforts. I look forward to seeing what future TRLN Institute produce – this is definitely a model to continue!
A recent tweet from my colleague in the Rubenstein Library (pictured above) pretty much sums up the last few weeks at work. Although I rarely work directly with students and classes, I am still impacted by the hustle and bustle in the library when classes are in session. Throughout the busy Spring I found myself saying, oh I’ll have time to work on that over the Summer. Now Summer is here, so it is time to make some progress on those delayed projects while keeping others moving forward. With that in mind here is your late Spring and early Summer round-up of Digital Collections news and updates.
The long anticipated launch of the Radio Haiti Archives is upon us. After many meetings to review the metadata profile, discuss modeling relationships between recordings, and find a pragmatic approach to representing metadata in 3 languages all in the Duke Digital Repository public interface, we are now in preview mode, and it is thrilling. Behind the scenes, Radio Haiti represents a huge step forward in the Duke Digital Repository’s ability to store and play back audio and video files.
You can already listen to many recordings via the Radio Haiti collection guide, and we will share the digital collection with the world in late June or early July. In the meantime, check out this teaser image of the homepage.
My colleague Meghan recently wrote about our ambitions Section A digitization project, which will result in creating finding aids for and digitizing 3000+ small manuscript collections from the Rubenstein library. This past week the 12 people involved in the project met to review our workflow. Although we are trying to take a mass digitization and streamlined approach to this project, there are still a lot of people and steps. For example, we spent about 20-30 minutes of our 90 minute meeting reviewing the various status codes we use on our giant Google spreadsheet and when to update them. I’ve also created a 6 page project plan that encompasses both a high and medium level view of the project. In addition to that document, each part of the process (appraisal, cataloging review, digitization, etc.) also has their own more detailed documentation. This project is going to last at least a few years, so taking the time to document every step is essential, as is agreeing on status codes and how to use them. It is a big process, but with every box the project gets a little easier.
Diversity and Inclusion Digitization Initiative Proposals and Easy Projects
As Bitstreams readers and DUL colleagues know, this year we instituted 2 new processes for proposing digitization projects. Our second digitization initiative deadline has just passed (it was June 15) and I will be working with the review committee to review new proposals as well as reevaluate 2 proposals from the first round in June and early July. I’m excited to say that we have already approved one project outright (Emma Goldman papers), and plan to announce more approved projects later this Summer.
We also codified “easy project” guidelines and have received several easy project proposals. It is still too soon to really assess this process, but so far the process is going well.
Transcription and Closed Captioning
Speaking of A/V developments, another large project planned for this Summer is to begin codifying our captioning and transcription practices. Duke Libraries has had a mandate to create transcriptions and closed captions for newly digitized A/V for over a year. In that time we have been working with vendors on selected projects. Our next steps will serve two fronts; on the programmatic side we need review the time and expense captioning efforts have incurred so far and see how we can scale our efforts to our backlog of publicly accessible A/V. On the technology side I’ve partnered with one of our amazing developers to sketch out a multi-phase plan for storing and providing access to captions and time-coded transcriptions accessible and searchable in our user interface. The first phase goes into development this Summer. All of these efforts will no doubt be the subject of a future blog post.
Summer of Documentation
My aspirational Summer project this year is to update digital collections project tracking documentation, review/consolidate/replace/trash existing digital collections documentation and work with the Digital Production Center to create a DPC manual. Admittedly writing and reviewing documentation is not the most exciting Summer plan, but with so many projects and collaborators in the air, this documentation is essential to our productivity, communication practices, and my personal sanity.
Late Spring Collection launches and Migrations
Over the past few months we launched several new digital collections as well as completed the migration of a number of collections from our old platform into the Duke Digital Repository.
In addition to the projects above, we continue to make slow and steady progress on our MSI system, are exploring using the FFv1 format for preserving selected moving image collections, planning the next phase of the Digital Collections migration into the Duke Digital Repository, thinking deeply about collection level metadata and structured metadata, planning to launch newly digitized Gedney images, integrating digital objects in finding aids and more. No doubt some of these efforts will appear in subsequent Bitstreams posts. In the meantime, let’s all try not to let this Summer fly by too quickly!
It may only be 6 months old, but as of May 31, the SNCC Digital Gateway is sporting a new look. Since going live in December 2016, we’ve been doing assessment, talking to contemporary activists and movement veterans and conducting user testing and student surveys. The feedback’s been overwhelmingly positive, but a few suggestions kept coming up. Give people a better sense of who SNCC was right from the homepage, and make it more active. Connect SNCC’s history to organizing today. As one of the young organizers put it, “What is it about SNCC’s legacy now that matters for people?” So we took those suggestions to heart and are proud to present a reworked, redesigned SNCC Digital Gateway. Keep reading for a breakdown of what’s new and why.
The new Today section highlights important strategies and lessons from SNCC’s work and explores their usefulness to today’s struggles. Through short, engaging videos, contemporary activists talk about how SNCC’s work continues to be relevant to their organizing today. The nine framing questions and answers of today’s organizers speak to enduring themes at the heart of SNCC’s work: uniting with local people to build a grassroots movement for change that empowered Black communities and transformed the nation. Check out this example:
More Expansive Homepage
The new homepage is longer and gives visitors to the site more context and direction. It includes descriptions of who SNCC was and links users to The Story of SNCC, which tells an expansive but concise history of SNCC’s work. It features videos from the new Today section, and gives users a way to explore the site through themes like voting rights, the organizing tradition, and Black Power.
Want to know more about voting rights? Black Power? Or are you not as familiar with SNCC’s history and need an entry point? The theme buttons on the homepage give users a window into SNCC’s history through particular aspects of the organization’s work. Theme pages feature select profiles and events focused on a central component of SNCC’s organizing. From there, click through the documents or follow the links to dig deeper into the story.
To improve navigation for the site, we’ve changed the name of the History section to Timeline and the former Perspectives to Our Voices. We’ve also moved the About section to the footer to make space for the new Today section.
Have suggestions? Comments? We’re always interested in what you’re thinking. Add a comment or send us an e-mail to firstname.lastname@example.org.
I’m not sure anyone who currently works in the library has any idea when the phrase “Section A” was first coined as a call number for small manuscript collections. Before the library’s renovation, before we barcoded all our books and boxes — back when the Rubenstein was still RBMSCL, and our reading room carpet was a very bright blue — there was a range of boxes holding single-folder manuscript collections, arranged alphabetically by collection creator. And this range was called Section A.
Presumably there used to be a Section B, Section C, and so on — and it could be that the old shelf ranges were tracked this way, I’m not sure — but the only one that has persisted through all our subsequent stacks moves and barcoding projects has been Section A. Today there are about 3900 small collections held in 175 boxes that make up the Section A call number. We continue to add new single-folder collections to this call number, although thanks to the miracle of barcodes in the catalog, we no longer have to shift files to keep things in perfect alphabetical order. The collections themselves have no relationship to one another except that they are all small. Each collection has a distinct provenance, and the range of topics and time periods is enormous — we have everything from the 17th to the 21st century filed in Section A boxes. Small manuscript collections can also contain a variety of formats: correspondence, writings, receipts, diaries or other volumes, accounts, some photographs, drawings, printed ephemera, and so on. The bang-for-your-buck ratio is pretty high in Section A: though small, the collections tend to be well-described, meaning that there are regular reproduction and reference requests. Section A is used so often that in 2016, Rubenstein Research Services staff approached Digital Collections to propose a mass digitization project, re-purposing the existing catalog description into digital collections within our repository. This will allow remote researchers to browse all the collections easily, and also reduce repetitive reproduction requests.
This project has been met with enthusiasm and trepidation from staff since last summer, when we began to develop a cross-departmental plan to appraise, enhance description, and digitize the 3900 small manuscript collections that are housed in Section A. It took us a bit of time, partially due to the migration and other pressing IT priorities, but this month we are celebrating a major milestone: we have finally launched our first 2 Section A collections, meant to serve as a proof of concept, as well as a chance for us to firmly define the project’s goals and scope. Check them out: Abolitionist Speech, approximately 1850, and the A. Brouseau and Co. Records, 1864-1866. (Appropriately, we started by digitizing the collections that began with the letter A.)
Why has it been so complicated? First, the sheer number of collections is daunting; while there are plenty of digital collections with huge item counts already in the repository, they tend to come from a single or a few archival collections. Each newly-digitized Section A collection will be a new collection in the repository, which has significant workflow repercussions for the Digital Collections team. There is no unifying thread for Section A collections, so we are not able to apply metadata in batch like we would normally do for outdoor advertising or women’s diaries. Rubenstein Research Services and Library Conservation Department staff have been going box by box through the collections (there are about 25 collections per box) to identify out-of-scope collections (typically reference material, not primary sources), preservation concerns, and copyright concerns. These are excluded from the digitization process. Technical Services staff are also reviewing and editing the Section A collections’ description. This project has led to our enhancing some of our oldest catalog records — updating titles, adding subject or name access, and upgrading the records to RDA, a relatively new standard. Using scripts and batch processes (details on GitHub), the refreshed MARC records are converted to EAD files for each collection, and the digitized folder is linked through ArchivesSpace, our collection management system. We crosswalk the catalog’s name and subject access data to both the finding aid and the repository’s metadata fields, allowing the collection to be discoverable through the Rubenstein finding aid portal, the Duke Libraries catalog, and the Duke Digital Repository.
It has been really exciting to see the first two collections go live, and there are many more already digitized and just waiting in the wings for us to automate some of our linking and publishing processes. Another future development that we expect will speed up the project is a batch ingest feature for collections entering the repository. With over 3000 collections to ingest, we are eager to streamline our processes and make things as efficient as possible. Stay tuned here for more updates on the Section A project, and keep an eye on Digital Collections if you’d like to explore some of these newly-digitized collections.
Notes from the Duke University Libraries Digital Projects Team