‘Tis the time of year for top 10 lists. Here at Duke Digital Collections HQ, we cannot just pick 10, because all our digital collections are tops! What follows is a list of all the digital collections we have launched for public access this calendar year.
Our newest collections include a range of formats and subject areas from 19th Century manuscripts to African American soldiers photograph albums to Duke Mens Basketball posters to our first Multispectral Images of papyrus to be ingested into the repository. We also added new content to 4 existing digital collections. Lastly, our platform migration is still ongoing, but we made some incredible progress this year as you will see below. Our goal is to finish the migration by the end of 2020.
New Digital Collections
African American Soldiers Photo Albums (browse all 8 or 1 by 1 using the links below):
The featured image is from a mockup of a new repositories home page that we’re working on in the Libraries, planned for rollout in January of 2020.
Working at the Libraries, it can be dizzying to think about all of our commitments.
There’s what we owe our patrons, a body of so many distinct and overlapping communities, all seeking to learn and discover, that we could split the library along an infinite number of lines to meet them where they work and think.
There’s what we owe the future, in our efforts to preserve and share the artifacts of knowledge that we acquire on the market, that scholars create on our own campus, or that seem to form from history and find us somehow.
There’s what we owe the field, and the network of peer libraries that serve their own communities, each of them linked in a web of scholarship with our own. Within our professional network, we seek to support and complement one another, to compete sometimes in ways that move our field forward, and to share what we learn from our experiences.
The needs of information technology underlie nearly all of these activities, and to meet those needs, we have an IT staff that’s modest in size, but prodigious in its skill and its dedication to the mission of the Libraries. Within that group, the responsibility for creating new software, and maintaining what we have, falls to a small team of developers and devops engineers. We depend on them to enhance and support a wide range of platforms, including our web services, our discovery platforms, and our digital repositories.
This fall, we did some reflection on how we want to approach support for our repository platforms. The result of that reflection was a Statement of Commitment to Repositories Support and Development, a document of roughly a page that expresses what we consider to be our values in this area, and the context of priorities in which we do that work.
The statement is explicit that we will not seek to find alternative platforms for our repository services in the next several years, and in particular while the FOLIO transition is underway. This decision is informed by our recognition that migration of content and services across platforms is complex and expensive. It’s also a recognition that we have invested a lot into these existing platforms, and we want to carve out as much space as we can for our talented staff to focus on maintaining and improving them, rather than locking ourselves into all-consuming cycles of content migration.
From a practical perspective, and speaking as the manager who oversees software development in the Libraries, I see this statement as part of an overall strategy to bring focus to our work. It’s a small but important symbolic measure that recognizes the drag that we create for our software team when give in to our urge to prioritize everything.
The phrase “context switching” is one that we have borrowed from the parlance of operating systems to describe the effects on a developer of working on multiple projects at once. There are real costs to moving between development environments, code bases, and architectures on the same day, in the same week, during the same sprint, or within even an extended work cycle. We also call this problem “multi-tasking,” and the penalty it imposes of performance is well documented.
Even more than performance, I think of it as a quality of life concern. People are generally happier and more invested when they’re able to do quality work. As a manager, I can work with scheduling and planning to try to mitigate those effects of multitasking on our team. But the responsibility really lies with the organization. We have our commitments, and they are vast in size and scope. We owe it to ourselves to do some introspection now and again, and ask what we can realistically do with what we have, or more accurately, who we are.
(Header image: Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark)
Here at Duke University Libraries, we often talk about digital preservation as though everyone is familiar with the various corners and implications of the phrase, but “digital preservation” is, in fact, a large and occasionally mystifying topic. What does it mean to “preserve” a digital resource for the long term? What does “the long term” even mean with regard to digital objects? How are libraries engaging in preserving our digital resources? And what are some of the best ways to ensure that your personal documents will be reusable in the future? While the answers to some of these questions are still emerging, the library can help you begin to think about good strategies for keeping your content available to other users over time by highlighting agreed-upon best practices, as well as some of the services we are able to provide to the Duke community.
Not all file formats have proven to be equally robust over time! Have you ever tried to open a document created using a Microsoft Office product from several years ago, only to be greeted with a page full of strangely encoded gibberish? Proprietary software like the products in the Office suite can be convenient and produce polished contemporary documents. But software changes, and there is often no guarantee that the beautifully formatted paper you’ve written using Word will be legible without the appropriate software 5 years down the line. One solution to this problem is to always have a version of that software available to you to use. Libraries are beginning to investigate this strategy (often using a technique called emulation) as an important piece of the digital preservation puzzle. The Emulation as a Service (EaaS) architecture is an emerging tool designed to simplify access to preserved digital assets by allowing end users to interact with the original environments running on different emulators.
An alternative to emulation as a solution is to save your files in a format that can be consumed by different, changing versions of software. Experts at cultural heritage institutions like the Library of Congress and the US National Archives and Records Administration have identified an array of file formats about which they feel some degree of confidence that the software of the future will be able to consume. Formats like plain text or PDFs for textual data, value separated files (like comma-separated values, or CSVs), MP3s and MP4s for audio and video data respectively, and JPEGs for still images have all proven to have some measure of durability as formats. What’s more, they will help to make your content or your data more easily accessible to folks who do not have access to particular kinds of software. It can be helpful to keep these format recommendations in mind when working with your own materials.
File format migration
The formats recommended by the LIbrary of Congress and others have been selected not only because they are interoperable with a wide variety of software applications, but also because they have proven to be relatively stable over time, resisting format obsolescence. The process of moving data from an obsolete format to one that is usable in the present day is known as file format migration or format conversion. Libraries generally have yet to establish scalable strategies for extensive migration of obsolete file formats, though it is generally a subject of some concern.
Here at DUL, we encourage the use of one of these recommended formats for content that is submitted to us for preservation, and will even go so far as to convert your files prior to preservation in one of our repository platforms where possible and when appropriate to do so. This helps us ensure that your data will be usable in the future. What we can’t necessarily promise is that, should you give us content in a file format that isn’t one we recommend, a user who is interested in your materials will be able to read or otherwise use your files ten years from now. For some widely used formats, like MP3 and MP4, staff at the Libraries anticipate developing a strategy for migrating our data from this format, in the event that the format becomes superseded. However, the Libraries do not currently have the staff to monitor and convert rarer, and especially proprietary formats to one that is immediately consumable by contemporary software. The best we can promise is that we are able to deliver to the end users of the future the same digital bits you initially gave to us.
Which brings me to a final component of digital preservation: bit-level preservation. At DUL, we calculate a checksum for each of the files we ingest into any of our preservation repositories. Briefly, a checksum is an algorithmically derived alphanumeric hash that is intended to surface errors that may have been introduced to the file during its transmission or storage. A checksum acts somewhat like a digital fingerprint, and is periodically recalculated for each file in the repository environment by the repository software to ensure that nothing has disrupted the bits that compose each individual file. In the event that the re-calculated checksum does not match the one supplied when the file has been ingested into the repository, we can conclude with some level of certainty that something has gone wrong with the file, and it may be necessary to revert to an earlier version of the data. THe process of generating, regenerating, and cross-checking these checksums is a way to ensure the file fixity, or file integrity, of the digital assets that DUL stewards.
Resonance: the reinforcement or prolongation of sound by reflection from a surface or by the synchronous vibration of a neighboring object
Nearly 4 months have passed since I moved to Durham from my hometown Chicago to join Duke’s Digital Collections & Curation Services team. With feelings of reflection and nostalgia, I have been thinking on the stories and memories that journeys create.
I have always believed a library the perfect place to discover another’s story. Libraries and digital collections are dynamic storytelling channels that connect people through narrative and memory. What are libraries if not places dedicated to memories? Memory made incarnate in the turn of page, the capturing of an image.
Memory is sensation.
In my mind memory is ethereal – wispy and nebulous. Like trying to grasp mist or fog only to be left with the shimmer of dew on your hands. Until one focuses on a detail, then the vision sharpens. Such as the soothing warmth of a pet’s fur. A trace of familiar perfume in the air as a stranger walks by. Hearing the lilt of an accent from your hometown. That heavy, sticky feeling on a muggy summer day.
Memories are made of moments.
I do not recall the first time I visited a library. However, one day my parents took me to the library and I checked out 11 books on dinosaurs. As a child I was fascinated by them. Due to watching so much of The Land Before Time and Jurassic Park no doubt. One of the books had beautiful full-length pullout diagrams. I remember this.
Experiences tether individuals together across time and place. Place, like the telling of a story is subjective. It holds a finite precision which is absent in the vagueness and vastness of space. This personal aspect is what captures a person when a tale is well told. A corresponding chord is struck, and the story resounds as listeners see themselves reflected.
When a narrative reaches someone with whom it resonates, its impact can be amplified beyond any expectations.
Last week, it was brought to our attention that Duke Digital Collections recently passed 100,000 individual items found in the Duke Digital Repository! To celebrate, I want to highlight some of the most recent materials digitized and uploaded from our Section A project. In the past, Bitstreams has blogged about what Section A is and what it means, but it’s been a couple of years since that post, and a little refresher couldn’t hurt.
What is Section A?
In 2016, the staff of Rubenstein Research Services proposed a mass digitization project of Section A. This is the umbrella term for 175 boxes of different historic materials that users often request – manuscripts, correspondence, receipts, diaries, drawings, and more. These boxes contain around 3,900 small collections that all had their own workflows. Every box needs consultations from Rubenstein Research Services, review by Library Conservation Department staff, review by Technical Services, metadata updates, and more, all to make sure that the collections could be launched and hosted within the Duke Digital Repository.
In the 2 years since that blog post, so much has happened! The first 2 Section A collections had gone live as a sort of proof-of-concept, and as a way to define what the digitization project would be and what it would look like. We’ve added over 500 more collections from Section A since then. This somehow barely even scratches the surface of the entire project! We’re digitizing the collections in alphabetical order, and even after all the collections that have gone online, we are currently still only on the letter “C”!
Nonetheless, there is already plenty of materials to check out and enjoy. I was a student of history in college, so in this blog post, I want to particularly highlight some of the historic materials from the latter half of the 19th century.
Showing off some of Section A
In 1869, after her work as a nurse in the Civil War, Clara Barton traveled around Europe to Geneva, Switzerland and Corsica, France. Included in the Duke Digital Collections is her diary and calling cards from her time there. These pages detail where she visited and stayed throughout the year. She also wrote about her views on the different European countries, how Americans and Europeans compare, and more. Despite her storied career and her many travels that year, Miss Barton felt that “I have accomplished very little in a year”, and hoped that in 1870, she “may be accounted worthy once more to take my place among the workers of the world, either in my own country or in some other”.
Back in America, around 1900, the Rev. John Malachi Bowden began dictating and documenting his experiences as a Confederate soldier during the Civil War, one of many that a nurse like Miss Barton may have treated. Although Bowden says he was not necessarily a secessionist at the beginning of the Civil War, he joined the 2nd Georgia Regiment in August 1861 after Georgia had seceded. During his time in the regiment, he fought in the Battles of Fredericksburg, Gettysburg, Spotsylvania Court House, and more. In 1864, Union forced captured and held Bowden as a prisoner at Maryland’s Point Lookout Prison, where he describes in great detail what life was like as a POW before his eventual release. He writes that he was “so indignant at being in a Federal prison” that he refused to cut his hair. His hair eventually grew to be shoulder-length, “somewhat like Buffalo Bill’s.”
Speaking of whom, Duke Digital Collections also has some material from Buffalo Bill (William Frederick Cody), courtesy of the Section A initiative. A showman and entertainer who performed in cowboy shows throughout the latter half of the 19th century, Buffalo Bill was enormously popular wherever he went. In this collection, he writes to a Brother Miner about how he invited seventy-five of his “old Brothers” from Bedford, VA to visit him in Roanoke. There is also a brief itinerary of future shows throughout North Carolina and South Carolina. This includes a stop here in Durham, NC a few weeks after Bill wrote this letter.
Around this time, Walter Clark, associate justice of the North Carolina Supreme Court, began writing his own histories of North Carolina throughout the 18th and 19th centuries. Three of Clark’s articles prepared for the University Magazine of the University of North Carolina have been digitized as part of Section A. This includes an article entitled “North Carolina in War”, where he made note of the Generals from North Carolina engaged in every war up to that point. It’s possible that John Malachi Bowden was once on the battlefield alongside some of these generals mentioned in Clark’s writings. This type of synergy in our collection is what makes Section A so exciting to dive into.
As the new Still Image Digitization Specialist at the Duke Digital Production Center, seeing projects like this take off in such a spectacular way is near and dear to my heart. Even just the four collections I’ve highlighted here have been so informative. We still have so many more Section A boxes to digitize and host online. It’s so exciting to think of what we might find and what we’ll digitize for all the world to see. Our work never stops, so remember to stay updated on Duke Digital Collections to see some of these newly digitized collections as they become available.
As one of the largest research libraries in the U.S., we have a whole lot of content on the web to consider.
Our website alone comprises over a thousand pages with more than fifty staff contributors. The library catalog interface displays records for over 13 million items at Duke and partner libraries. Our various digital repositories and digital exhibits platforms host hundreds of thousands of interactive digital objects of different types, including images, A/V, documents, datasets, and more. The list goes on.
Any attempt to take a full inventory of the library’s digital content reveals potentially several million web pages under the library’s purview, and all that content is managed and rendered via a dizzying array of technology platforms. We have upwards of a hundred web applications with public-facing interfaces. We built some of these ourselves, some are community-developed (with local customizations), and others we have licensed from vendors. Some interfaces are new, some are old. And some are really old, dating all the way back to the mid-90s.
Ensuring that this content is equally accessible to everyone is important, and it is indeed a significant undertaking. We must also be vigilant to ensure that it stays accessible over time.
With that as our context, I’d like to highlight a few recent efforts in the library to improve the accessibility of our digital resources.
Style Guide With Color Contrast Checks
In January 2019, we launched a new catalog, replacing a decade-old platform and its outdated interface. As we began developing the front-end, we knew we wanted to be consistent, constrained, and intentional in how we styled elements of the interface. We were especially focused on ensuring that any text in the UI had sufficient contrast with its background to be accessible to users with low vision or color-blindness.
This style guide is “living” in that it’s a real-time up-to-date reflection of how elements of the UI will appear when using particular color variable names and CSS classes. It helps to guide developers and other project team members to make good decisions about colors from our palette to stay in compliance with accessibility guidelines.
In the course of this assessment, we were able to identify (and then fix!) several accessibility issues in DukeSpace. I’ll share two strategies in particular from the guide that proved to be really effective. I highly recommend using them frequently.
The Keyboard Test
How easy is it to navigate your site using only your keyboard? Can you get where you want to go using TAB, ENTER, SPACE, UP, and DOWN? Is it clear which element of the page current has the focus?
If you’re a developer like me, chances are you already spend a lot of time using your browser’s Developer Tools pane to look under the hood of web pages, reverse-engineer UIs, mess with styles and markup, or troubleshoot problems.
The Deque Systems aXe Chrome Extension (also available for Firefox) integrates seamlessly into existing Dev Tools. It’s a remarkably useful tool to have in your toolset to help quickly find and fix accessibility issues. Its interface is clear and easy to understand. It finds and succinctly describes accessibility problems, and even tells you how to fix them in your code.
With aXe testing, we quickly learned we had some major issues to fix. The biggest problems revealed were missing form labels and page landmarks, and low contrast on color pairings. Again, these were not hard to fix since the tool explained what to do, and where.
Turning away from DSpace for a moment, see this example article published on a popular academic journal’s website. Note how it fares with an automated aXe accessibility test (197 violations of various types found). And if you were using a keyboard, you’d have to press Tab over 100 times in order to download a PDF of the article.
Libraries are increasingly becoming champions for open access to scholarly research. The overlap in aims between the open access movement and web accessibility in general is quite striking. It all boils down to removing barriers and making access to information as inclusive as possible.
Our open access repository UIs may never be able to match all the feature-rich bells and whistles present in many academic journal websites. But accessibility, well, that’s right up our alley. We can and should do better. It’s all about being true to our values, collaborating with our community of peers, and being vigilant in prioritizing the work.
Look for many more accessibility improvements throughout many of the library’s digital resources as the year progresses.
In early 2017, Duke University Libraries launched a research data curation program designed to help researchers on campus ensure that their data are adequately prepared for both sharing and publication, and long term preservation and re-use. Why the focus on research data? Data generated by scholars in the course of their investigation are increasingly being recognized as outputs similar in importance to the scholarly publications they support. Open data sharing reinforces unfettered intellectual inquiry, fosters transparency, reproducibility and broader analysis, and permits the creation of new data sets when data from multiple sources are combined. For these reasons, a growing number of publishers and funding agencies like PLoS ONE and the National Science Foundation are requiring researchers to make openly available the data underlying the results of their research.
But data sharing can only be successful if the data have been properly documented and described. And they are only useful in the long term if steps have been taken to mitigate the risks of file format obsolescence and bit rot. To address these concerns, Duke’s data curation workflow will review a researcher’s data for appropriate documentation (such as README files or codebooks), solicit and refine Dublin Core metadata about the dataset, and make sure files are named and arranged in a way that facilitates secondary use. Additionally, the curation team can make suggestions about preferred file formats for long-term re-use and conduct a brief review for personally identifiable information. Once the data package has been reviewed, the curation team can then help researchers make their data available in Duke’s own Research Data Repository, where the data can be licensed and assigned a Digital Object Identifier, ensuring persistent access.
“The Data Curation Network (DCN) serves as the “human layer” in the data repository stack and seamlessly connects local data sets to expert data curators via a cross-institutional shared staffing model.”
New to Duke’s curation workflow is the ability to rely on the domain expertise of our colleagues at a few other research institutions. While our data curators here at Duke possess a wealth of knowledge about general research data-related best practices, and are especially well-versed in the vagaries of social sciences data, they may not always have the all the information they need to sufficiently assess the state of a dataset from a researcher. As an answer to this problem, the Data Curation Network, an Alfred P. Sloan Foundation-funded endeavor, has established a cross-institutional staffing model that distributes the domain expertise of each of its partner institutions. Should a curator at one institution encounter data of a kind with which they are unfamiliar, submission to the DCN opens up the possibility for enhanced curation from a network partner with the requisite knowledge.
Duke joins Cornell University, Dryad Digital Repository, Johns Hopkins University, University of Illinois, University of Michigan, University of Minnesota, and Pennsylvania State University in partnering to provide curatorial expertise to the DCN. As of January of this year, the project has moved out of pilot phase into production, and is actively moving data through the network. If a Duke researcher were to submit a dataset our curation team thought would benefit from further examination by a curator with domain knowledge, we will now reach out to the potential depositor to receive clearance to submit the data to the network. We’re very excited about this opportunity to provide this enhancement to our service!
It’s that item of year where we like to reflect on all we have done in 2018, and your favorite digital collections and curation services team is no exception. This year, Digital Collections and Curation Services have been really focusing on getting collections and data into the Digital Repository and making it accessible to the world!
As you will see from the list below we launched 320 new digital collections, managed substantial additions to 2, and migrated 8. However, these publicly accessible digital collections are just the tip of the iceberg in terms of our work in Digital Collections and Curation Services.
So much more digitization happens behind the scenes than is reflected in the list of new digital collections. Many of our larger projects are years in the making. For example, we continue to digitize Gedney photographs and we hope to make them publicly accessible next year. There is also preservation digitization happening that we cannot make publicly accessible online. This work is essential to our preservation mission, though we cannot share the collections widely in the short term.
We strongly believe in keeping our metadata healthy, so in addition to managing new metadata, we often revisit existing metadata across our repositories in order to ensure its overall quality and functionality.
In 2016, after we launched the first iteration of the Duke Chapel Recordings Digital Collection in the Duke Digital Repository (DDR), we began a collaborative project between Digital Collections and Curation Services, University Archives, and the Duke Divinity School to enhance the metadata. The original metadata was fairly basic and allowed users to identify individual written, audio, and video sermons based on speaker, date, title, and format. All good stuff, but it didn’t allow for discovery based on the intellectual content of the sermons themselves. So, it was decided that, at the same time Divinity School staff listened to and corrected machine-generated transcripts for each sermon, they would also capture information that is useful from a homileticperspective.
At the very beginning of the project, the Divinity School convened two focus groups of preachers from a variety of denominations and backgrounds to ask them how they would like to be able to discover and use a digital collection of sermons. These groups developed a set of terms/categories based on which they would like to be able to identify sermons. From there I worked with the project team to begin thinking about what kinds of fields they would want to capture, and determine whether or how those fields could map to the existing metadata application profile that we use in the DDR.
It quickly became clear that this project was going to require the creation of new metadata fields in the DDR application. I try to be really judicious about creating new fields (because otherwise, you end up doing this), but in this case, I felt that the need was justified: homiletic metadata is fairly specialized, and given Duke’s commitment to this collecting area, making adjustments to accommodate it seemed more than reasonable. Since I always like to work with best practices, I attempted to identify any extant metadata schemas that might already exist for working with biblical metadata. I felt pretty confident that I would find one, considering that the Bible is actually one of the oldest books out there. While I did find some resources, they were pretty old (think last-updated-in-2006), and all of them were oriented towards marking up actual Biblical texts, rather than the encoding of metadata about those texts.
With no established standards to work with, we set about determining what the fields should be, using the practice of homiletics itself as a guide. We also developed a workflow for the capturing of this metadata, using a google spreadsheet with conditional formatting and pre-developed drop-down lists to control and facilitate data entry. And starting from the set of terms/categories developed during the focus groups, we came up with a normalized set of Library of Congress Subject Headings (LCSH) for staff to choose from or add to, as needs arose.
Working with LCSH was in itself a challenge, as it required us to navigate the tension between the need to use a standardized set of headings while also include concepts that weren’t themselves well represented in the vocabulary. In some cases we diverged from LCSH in the interest of using terms that would be familiar, expected, and recognizable to practitioners of homiletics. One example of this is the term ‘Community’, which has a particular meaning in a Biblical context, but which, were we to have used the LCSH term ‘Communities’, loses its intent.
We rolled out the new metadata properties and values in early August so they could be available for use by attendees at the international homiletics conference, Societas Homiletica, which was held at Duke University August 3-8, 2018. Now, users of the digital collection can facet and browse by: Liturgical Calendar, Biblical Book, Chapter and Verse, and Subject. We’ve also added curated abstracts, and key quotations from the sermons, which are free-text searchable.
The enhanced metadata makes for a much more meaningful experience using the Duke Chapel Recordings, and future plans involve the inclusion of sermon transcripts, as well as the development of a complimentary website, maintained by the Duke Divinity School, to provide even more information about the speakers and their sermons. With these enrichments, we are well on our way to having an unparalleled free and open resource for the study of homiletics, and hopefully, in so doing, we will facilitate the discovery and study of preachers whose voices have traditionally been underheard.
While we sometimes talk about “the repository” as if it were a monolith at Duke University Libraries, we have in fact developed and maintained two core platforms that function as repository applications. I’ll describe them briefly, then preview a third that is in development, as well as the rationale behind expanding in this way.