There is a particular fondness that I hold for digital photograph collections. If I had to pinpoint when this began, then I would have to say it started while digitizing material on a simple Epson flatbed scanner as an undergraduate student worker in the archives.
Witnessing the physical become digital is a wonder that never gets old.
Every day we are generating digital content. Pet pics. Food pics. Selfies. Gradually building a collection of experiences as we document our lives in images. Sporadic born digital collections stored on devices and in the cloud.
I do not remember the last time I printed a photograph.
My parents have photo albums that I love. Seeing images of them, then us. The tacky adhesive and the crinkle of thin plastic film as it is pulled back to lift out a photo. That perfect square imprint left behind from where the photo rested on the page.
Pretty sure that Polaroid camera is still around somewhere.
Sometimes I want to pull down my photos from the cloud and just print everything. Make my own album. Have something with heft and weight to share and say, “Hey, hold and look at this.” That sensory experience is invaluable.
Yet, I also value the convenience of being able to view hundreds of photos with the touch of a button.
Duke University Libraries offers access to thousands of images through its Digital Collections.
Here’s a couple photo collections to get you started:
Resonance: the reinforcement or prolongation of sound by reflection from a surface or by the synchronous vibration of a neighboring object
Nearly 4 months have passed since I moved to Durham from my hometown Chicago to join Duke’s Digital Collections & Curation Services team. With feelings of reflection and nostalgia, I have been thinking on the stories and memories that journeys create.
I have always believed a library the perfect place to discover another’s story. Libraries and digital collections are dynamic storytelling channels that connect people through narrative and memory. What are libraries if not places dedicated to memories? Memory made incarnate in the turn of page, the capturing of an image.
Memory is sensation.
In my mind memory is ethereal – wispy and nebulous. Like trying to grasp mist or fog only to be left with the shimmer of dew on your hands. Until one focuses on a detail, then the vision sharpens. Such as the soothing warmth of a pet’s fur. A trace of familiar perfume in the air as a stranger walks by. Hearing the lilt of an accent from your hometown. That heavy, sticky feeling on a muggy summer day.
Memories are made of moments.
I do not recall the first time I visited a library. However, one day my parents took me to the library and I checked out 11 books on dinosaurs. As a child I was fascinated by them. Due to watching so much of The Land Before Time and Jurassic Park no doubt. One of the books had beautiful full-length pullout diagrams. I remember this.
Experiences tether individuals together across time and place. Place, like the telling of a story is subjective. It holds a finite precision which is absent in the vagueness and vastness of space. This personal aspect is what captures a person when a tale is well told. A corresponding chord is struck, and the story resounds as listeners see themselves reflected.
When a narrative reaches someone with whom it resonates, its impact can be amplified beyond any expectations.
Last week, it was brought to our attention that Duke Digital Collections recently passed 100,000 individual items found in the Duke Digital Repository! To celebrate, I want to highlight some of the most recent materials digitized and uploaded from our Section A project. In the past, Bitstreams has blogged about what Section A is and what it means, but it’s been a couple of years since that post, and a little refresher couldn’t hurt.
What is Section A?
In 2016, the staff of Rubenstein Research Services proposed a mass digitization project of Section A. This is the umbrella term for 175 boxes of different historic materials that users often request – manuscripts, correspondence, receipts, diaries, drawings, and more. These boxes contain around 3,900 small collections that all had their own workflows. Every box needs consultations from Rubenstein Research Services, review by Library Conservation Department staff, review by Technical Services, metadata updates, and more, all to make sure that the collections could be launched and hosted within the Duke Digital Repository.
In the 2 years since that blog post, so much has happened! The first 2 Section A collections had gone live as a sort of proof-of-concept, and as a way to define what the digitization project would be and what it would look like. We’ve added over 500 more collections from Section A since then. This somehow barely even scratches the surface of the entire project! We’re digitizing the collections in alphabetical order, and even after all the collections that have gone online, we are currently still only on the letter “C”!
Nonetheless, there is already plenty of materials to check out and enjoy. I was a student of history in college, so in this blog post, I want to particularly highlight some of the historic materials from the latter half of the 19th century.
Showing off some of Section A
In 1869, after her work as a nurse in the Civil War, Clara Barton traveled around Europe to Geneva, Switzerland and Corsica, France. Included in the Duke Digital Collections is her diary and calling cards from her time there. These pages detail where she visited and stayed throughout the year. She also wrote about her views on the different European countries, how Americans and Europeans compare, and more. Despite her storied career and her many travels that year, Miss Barton felt that “I have accomplished very little in a year”, and hoped that in 1870, she “may be accounted worthy once more to take my place among the workers of the world, either in my own country or in some other”.
Back in America, around 1900, the Rev. John Malachi Bowden began dictating and documenting his experiences as a Confederate soldier during the Civil War, one of many that a nurse like Miss Barton may have treated. Although Bowden says he was not necessarily a secessionist at the beginning of the Civil War, he joined the 2nd Georgia Regiment in August 1861 after Georgia had seceded. During his time in the regiment, he fought in the Battles of Fredericksburg, Gettysburg, Spotsylvania Court House, and more. In 1864, Union forced captured and held Bowden as a prisoner at Maryland’s Point Lookout Prison, where he describes in great detail what life was like as a POW before his eventual release. He writes that he was “so indignant at being in a Federal prison” that he refused to cut his hair. His hair eventually grew to be shoulder-length, “somewhat like Buffalo Bill’s.”
Speaking of whom, Duke Digital Collections also has some material from Buffalo Bill (William Frederick Cody), courtesy of the Section A initiative. A showman and entertainer who performed in cowboy shows throughout the latter half of the 19th century, Buffalo Bill was enormously popular wherever he went. In this collection, he writes to a Brother Miner about how he invited seventy-five of his “old Brothers” from Bedford, VA to visit him in Roanoke. There is also a brief itinerary of future shows throughout North Carolina and South Carolina. This includes a stop here in Durham, NC a few weeks after Bill wrote this letter.
Around this time, Walter Clark, associate justice of the North Carolina Supreme Court, began writing his own histories of North Carolina throughout the 18th and 19th centuries. Three of Clark’s articles prepared for the University Magazine of the University of North Carolina have been digitized as part of Section A. This includes an article entitled “North Carolina in War”, where he made note of the Generals from North Carolina engaged in every war up to that point. It’s possible that John Malachi Bowden was once on the battlefield alongside some of these generals mentioned in Clark’s writings. This type of synergy in our collection is what makes Section A so exciting to dive into.
As the new Still Image Digitization Specialist at the Duke Digital Production Center, seeing projects like this take off in such a spectacular way is near and dear to my heart. Even just the four collections I’ve highlighted here have been so informative. We still have so many more Section A boxes to digitize and host online. It’s so exciting to think of what we might find and what we’ll digitize for all the world to see. Our work never stops, so remember to stay updated on Duke Digital Collections to see some of these newly digitized collections as they become available.
We’re experimenting with changing our approach to projects in Software Development and Integration Services (SDIS). There’s been much talk of Agile (see the Agile Manifesto) over the past few years within our department, but we’ve faced challenges implementing this as an approach to our work given our broad portfolio, relatively small team, and large number of internal stakeholders.
After some productive conversations among staff and managers in SDIS where we reflected on our work over the past few years we decided to commit to applying the Scrum framework to one or more projects.
There are many resources available for learning about Agile and Scrum. The resources I’ve found most useful so far in learning about the framework include:
Scrum seems best suited to developing new products or software and defines the roles, workflow, and artifacts that help a team make the most of its capacity to build the highest value features first and deliver usable software on a regular and frequent schedule.
To start, we’ll be applying this process to a new project to build a prototype of a research data repository based on Hyrax. We’ve formed a small team, including a product owner, scrum master, and development team to build the repository. So far, we’ve developed an initial backlog of requirements in the form of user stories in Jira, the software we use to manage projects. We’ve done some backlog refinement to prioritize the most important and highest value features, and defined acceptance criteria for the ones that we’ll consider first. The development team has estimated the story points (relative estimate of effort and complexity) for some of the user stories to help us with sprint planning and release projection. Our first two-week sprint will begin the week after Thanksgiving. By the end of January we expect to have completed four, two-week sprints and have a pilot ready with a basic set of features implemented for evaluation by internal stakeholders.
One of the important aspects of Scrum is that group reflection on the process itself is built into the workflow through retrospective meetings after each sprint. Done right, routine retrospectives serve to reinforce what is working well and allows for adjustments to address things that aren’t. In the future we hope to adapt what we learn from applying the Scrum framework to the research data repository pilot to improve our approach to other aspects of our work in SDIS.
It’s September, and Duke students aren’t the only folks on campus in back-to-school mode. On the contrary, we here at the Duke Digital Repository are gearing up to begin promoting our research data curation services in real earnest. Over the last eight months, our four new research data staff have been busy getting to know the campus and the libraries, getting to know the repository itself and the tools we’re working with, and establishing a workflow. Now we’re ready to begin actively recruiting research data depositors!
As our colleagues in Data and Visualization Services noted in a presentation just last week, we’re aiming to scale up our data services in a big way by engaging researchers at all stages of the research lifecycle, not just at the very end of a research project. We hope to make this effort a two-front one. Through a series of ongoing workshops and consultations, the Research Data Management Consultants aspire to help researchers develop better data management habits and take the longterm preservation and re-use of their data into account when designing a project or applying for grants. On the back-end of things, the Content Analysts will be able to carry out many of the manual tasks that facilitate that longterm preservation and re-use, and are beginning to think about ways in which to tweak our existing software to better accommodate the needs of capital-D Data.
This past spring, the Data Management Consultants carried out a series of workshops intending to help researchers navigate the often muddy waters of data management and data sharing; topics ranged from available and useful tools to the occasionally thorny process of obtaining consent for–and the re-use of–data from human subjects.
Looking forward to the fall, the RDM consultants are planning another series of workshops to expand on the sessions given in the spring, covering new tools and strategies for managing research output. One of the tools we’re most excited to share is the Open Science Framework (OSF) for Institutions, which Duke joined just this spring. OSF is a powerful project management tool that helps promote transparency in research and allows scholars to associate their work and projects with Duke.
On the back-end of things, much work has been done to shore up our existing workflows, and a number of policies–both internal and external–have been met with approval by the Repository Program Committee. The Content Analysts continue to become more familiar with the available repository tools, while weighing in on ways in which we can make the software work better. The better part of the summer was devoted to collecting and analyzing requirements from research data stakeholders (among others), and we hope to put those needs in the development spotlight later this fall.
All of this is to say: we’re ready for it, so bring us your data!
I’m not sure anyone who currently works in the library has any idea when the phrase “Section A” was first coined as a call number for small manuscript collections. Before the library’s renovation, before we barcoded all our books and boxes — back when the Rubenstein was still RBMSCL, and our reading room carpet was a very bright blue — there was a range of boxes holding single-folder manuscript collections, arranged alphabetically by collection creator. And this range was called Section A.
Presumably there used to be a Section B, Section C, and so on — and it could be that the old shelf ranges were tracked this way, I’m not sure — but the only one that has persisted through all our subsequent stacks moves and barcoding projects has been Section A. Today there are about 3900 small collections held in 175 boxes that make up the Section A call number. We continue to add new single-folder collections to this call number, although thanks to the miracle of barcodes in the catalog, we no longer have to shift files to keep things in perfect alphabetical order. The collections themselves have no relationship to one another except that they are all small. Each collection has a distinct provenance, and the range of topics and time periods is enormous — we have everything from the 17th to the 21st century filed in Section A boxes. Small manuscript collections can also contain a variety of formats: correspondence, writings, receipts, diaries or other volumes, accounts, some photographs, drawings, printed ephemera, and so on. The bang-for-your-buck ratio is pretty high in Section A: though small, the collections tend to be well-described, meaning that there are regular reproduction and reference requests. Section A is used so often that in 2016, Rubenstein Research Services staff approached Digital Collections to propose a mass digitization project, re-purposing the existing catalog description into digital collections within our repository. This will allow remote researchers to browse all the collections easily, and also reduce repetitive reproduction requests.
This project has been met with enthusiasm and trepidation from staff since last summer, when we began to develop a cross-departmental plan to appraise, enhance description, and digitize the 3900 small manuscript collections that are housed in Section A. It took us a bit of time, partially due to the migration and other pressing IT priorities, but this month we are celebrating a major milestone: we have finally launched our first 2 Section A collections, meant to serve as a proof of concept, as well as a chance for us to firmly define the project’s goals and scope. Check them out: Abolitionist Speech, approximately 1850, and the A. Brouseau and Co. Records, 1864-1866. (Appropriately, we started by digitizing the collections that began with the letter A.)
Why has it been so complicated? First, the sheer number of collections is daunting; while there are plenty of digital collections with huge item counts already in the repository, they tend to come from a single or a few archival collections. Each newly-digitized Section A collection will be a new collection in the repository, which has significant workflow repercussions for the Digital Collections team. There is no unifying thread for Section A collections, so we are not able to apply metadata in batch like we would normally do for outdoor advertising or women’s diaries. Rubenstein Research Services and Library Conservation Department staff have been going box by box through the collections (there are about 25 collections per box) to identify out-of-scope collections (typically reference material, not primary sources), preservation concerns, and copyright concerns. These are excluded from the digitization process. Technical Services staff are also reviewing and editing the Section A collections’ description. This project has led to our enhancing some of our oldest catalog records — updating titles, adding subject or name access, and upgrading the records to RDA, a relatively new standard. Using scripts and batch processes (details on GitHub), the refreshed MARC records are converted to EAD files for each collection, and the digitized folder is linked through ArchivesSpace, our collection management system. We crosswalk the catalog’s name and subject access data to both the finding aid and the repository’s metadata fields, allowing the collection to be discoverable through the Rubenstein finding aid portal, the Duke Libraries catalog, and the Duke Digital Repository.
It has been really exciting to see the first two collections go live, and there are many more already digitized and just waiting in the wings for us to automate some of our linking and publishing processes. Another future development that we expect will speed up the project is a batch ingest feature for collections entering the repository. With over 3000 collections to ingest, we are eager to streamline our processes and make things as efficient as possible. Stay tuned here for more updates on the Section A project, and keep an eye on Digital Collections if you’d like to explore some of these newly-digitized collections.
Besides the research data workflows between our two departments, what other things have the data management consultants and the digital content analysts been doing? In short, we’ve been busy!
In addition to envisioning stakeholder needs (which is an exercise we continuously do), we’ve received and ingested several data collections this year, which has given us an opportunity to also learn from experience. We have been tracking and documenting the types of data we’re receiving, the various needs that these types of data and depositors have, how we approach these needs (including investigating and implementing any additional tools that may help us better address these), how our repository displays the data and associated metadata, and the time spent on our management and curation tasks. Some of these are in the form of spreadsheets, others as draft policies that will first be reviewed by the library’s research data working group and then by a program committee, and others simply as brain dumps for things that require a further, more structured investigation by developers, the metadata architect, subject librarians, and other stakeholders. These documents live in either our shared online folder or our shared Box account, and, if a wider Duke library and/or public audience are required, are moved to our departments’ content collaboration software platforms (currently Confluence/Jira and Basecamp). The collaborative environments of these platforms support the dynamic nature of our work, particularly as our program takes form.
We also value the importance of face-to-face discussions, so we hold weekly meetings to talk through all of this work (we prefer outside when the weather is nice, and because squirrels are awesome).
One of the most exciting, and at times challenging, aspects of where we are is that we are essentially starting from the ground up and therefore able to develop procedures and features (and re-develop, and on and on again) until we find fits that best accommodate our users and their data. We rely heavily on each other’s knowledge about the research data field, and we also engage in periodic environmental scans of other institutions that offer data management and curation services.
When we began in January, we all considered the first 6-9 months as a “pilot phase”, though this description may not be accurate. In the minds of the data management consultants and the digital content analysts, we’re here and ready. Will we run into situations that require an adjustment to our procedures? Absolutely. It’s the nature of our work. Do we want feedback from the Duke community about how our services are (or are not) meeting their needs? Without a doubt. And will the DDR team continue to identify and implement features to better meet end-user needs? Certainly. We fully expect to adjust and readjust our tools and services, with the overall goal of fulfilling future needs before they’re even evident to our users. So, as always, keep watching to see how we grow!
Here at the Duke University Libraries we recently hosted a series of workshops that were part of a larger Research Symposium on campus. It was an opportunity for various campus agencies to talk about all of the evolving and innovative ways that they are planning for and accommodating research data. A few of my colleagues and I were asked to present on the new Research Data program that we’re rolling out in collaboration with the Duke Digital Repository, and we were happy to oblige!
I was asked to speak directly about the various software development initiatives that we have underway with the Duke Digital Repository. Since we’re in the midst of rolling out a brand new program area, we’ve got a lot of things cooking!
When I started planning for the conversation I initially thought I would talk a lot about our Fedora/Hydra stack, and the various inter-related systems that we’re planning to integrate into our repository eco-system. But what resulted from that was a lot of technical terms, and open-source software project names that didn’t mean a whole lot to anyone; especially those not embedded in the work. As a result, I took a step back and decided to focus at a higher level. I wanted to present to our faculty that we were implementing a series of software solutions that would meet their needs for accommodation of their data. This had me revisiting the age-old question: What is our Repository? And for the purposes of this conversation, it boiled down to this:
It is a highly complex, often mind-boggling set of software components, that are wrangled and tamed by a highly talented team with a diversity of skills and experience, all for the purposes of supporting Preservation, Curation, and Access of digital materials.
Those are our tenets or objectives. They are the principles that guide out work. Let’s dig in a bit on each.
Our first objection is Preservation. We want our researchers to feel 100% confident that when they give us their data, that we are preserving the integrity, longevity, and persistence of their data.
Our second objective is to support Curation. We aim to do that by providing software solutions that facilitate management and description of file sets, and logical arrangement of complex data sets. This piece is critically important because the data cannot be optimized without solid description and modeling that informs on its purpose, intended use, and to facilitate discovery of the materials for use.
Finally our work, our software, aims to facilitate discovery & access. We do this by architecture thoughtful solutions that optimize metadata and modeling, we build out features that enhance the consumption and usability of different format types, we tweak, refine and optimize our code to enhance performance and user experience.
The repository is a complex beast. It’s a software stack, and an eco-system of components. It’s Fedora. It’s Hydra. It’s a whole lot of other project names that are equally attractive and mystifying. At it’s core though, it’s a software initiative- one that seeks to serve up an eco-system of components with optimal functionality that meet the needs and desires of our programmatic stakeholders- our University.
Preservation, Curation, & Access are the heart of it.
Today is an eventful day for the Duke Digital Repository (DDR). Later today, I and several of my colleagues will present on the DDR at Day 1 of the Duke Research Computing Symposium. We’ll be introducing new staff who’ll focus on managing, curating, and preserving research data, as well as the role that the DDR will play as both a service and a platform. This event serves as a soft launch of our plans – which I wrote about last September – to support the work of researchers at Duke.
At the same time, the DDR gets a new look, at least on its home page. For years, we’ve used a rather drab and uninformative page that was essentially the out-of-the-box rendering by Blacklight, our discovery and access layer in the repository stack. Last fall, our DDR Program Committee took up the task of revamping that page to reflect how we conceptualize the repository and its major program areas.
The page design will evolve with the DDR itself, but it went live earlier today. More information about the DDR initiative and our plans will follow in the coming months.
Notes from the Duke University Libraries Digital Projects Team