We have been hard at work architecting a robust Repository program for our Duke University community. And while doing this, we’re in the midst of shoring things up architecturally on the back end. You may be asking yourself: Why all the fuss? What’s the big deal?
Well, part of the fuss is that it’s high time to move beyond the idea that our repository is a platform. We’d much prefer that our repository be know as a program. A suite of valuable services that serve the needs of our campus community. The repository will always be a platform. In fact, it will be a rock-solid preservation platform- a space to park your valuable digital assets and feel 100% confident that the Libraries will steward those materials for the long haul. But the repository is much more than a platform; it’s a suite of service goodness that we hope to market and promote!
Secondly, it’s because we’ve got some new and exciting developments happening in Repository-land, specifically in the realm of data management. To start with, the Provost graciously appointed four new positions to serve the data needs of the University, and those new positions will sit in the Libraries. We have two Senior Research Specialists and two Content Analysts joining our ranks in early January. These positions will be solely dedicated to the refinement of data curation processes, liaising with faculty on data management best practice, assisting researchers with the curation and deposit of research data, and acquiring persistent access to said data. Pretty cool stuff!
So in preparation for this, we’ve had a few things cooking. To begin with, we are re-designing our Duke Digital Repository homepage. We will highlight three service areas:
Duke Scholarship: This area will feature the research, scholarship and activities of Duke faculty members and academic staff. It will also highlight services in support of open access, copyright support, digital publishing, and more.
Research Data: This area will be dedicated to the fruits of Duke Scholarship, and will be an area that features research data and data sets. It will highlight services in support of data curation, data management, data deposit, data citation, and more.
Library Collections: This area will focus on digital collections that are owned or stewarded specifically by the Duke University Libraries. This includes digitized special collections, University Archives material, born digital materials, and more.
For each of these areas we’ve focused on defining a base collections policy for each, and are in the process of refining our service models, and shoring up policy that will drive preservation and digital asset management of these materials.
So now that I’ve got you all worked up about these new developments, you may be asking, ‘When can I know more?!’ You can expect to see and hear more about these developments (and our newly redesigned website) just after the New Year. In fact, you can likely expect another Bitstreams Repository post around that time with more updates on our progress, a preview of our site, and perhaps a profile or two of the new staff joining our efforts!
Until then, stay tuned, press ‘Save’, and call us if you’re looking for a better, more persistent, more authoritative approach to saving the fruits of your digital labor! (Or email us at firstname.lastname@example.org)
I always appreciate the bird’s-eye view of the work I do gained by attending national conferences, and often come away with novel ideas on how to solve old problems and colleagues to reach out to when I encounter new ones. So I was anticipating as much when I left home for the airport last week to attend the 2016 DLF Forum in Milwaukee, Wisconsin.
And that is indeed the experience I had, but in addition to gaining new ideas and new friends, the keynote for the conference challenged me to think deeply about the broader context in which we as librarians and information professionals do our work, who we do that work for, and whether or not we are living up to the values of inclusivity and accessibility that I hold dear.
The keynote speaker was Stacie Williams, a librarian/archivist who talked about the politics of labor in our communities, both specific to libraries and archives and beyond. She posited that all labor is local, and focused on the caregiving work that, so often performed by women and minorities in under- or unpaid positions, is the “beam underpinning this entire system of labor as we know it … and yet it remains the most invisible part of what makes our economy run”. In order to value ourselves and our work, we need to value all of the labor upon which our society is based. And she asked us to think of the work we do as librarians and archivists – the services we provide – as a form of caregiving to our own communities. She also posited that the information work in which we are engaged has followed the late capitalism trend of an anti-care ethos, and implored us to examine our own institutional practices, asking questions such as:
Do we engage in digitization projects where work was performed by at-will workers with no benefits or unpaid interns, or outsourced to prison workers?
Are we physically situated on university campuses that are inaccessible to our local community, either by way of location or prohibitive expense?
Have we undergone extreme cuts to our workforces, hindering our ability to provide services?
Do our hiring practices replicate systems that reward racial/gender class standards?
Do we build positions into grants that don’t pay living wages?
Williams asked us to interrogate the ways in which our labor practices are problematic and to center library work in the care ethics necessary “to reflect the standards of access and equality that we say we hold in this profession”. As a metadata specialist who spends a large chunk of her time working to create description for cultural heritage materials, this statement was especially resonant: “Few things are more liberatory than being able to tell your own story and history and have control and stewardship over your cultural narrative”. This is a tension I am especially aware of – describing resources for discovery and access in ways that honor and reflect the voices and self-identity of original creators or subjects.
The following days were of course filled with interesting and useful panels and presentations, working lunches, and meetups with new and old colleagues. But the keynote, along with the context of the national election, infused the rest of the conference with a spirit of thoughtfulness and openness toward engaging in a deep exploration of our labor practices and relationships to the communities of people we serve. It has given me a lot to think about, and I’m grateful to the DLF Forum planners for bringing this level of discourse to the annual conference.
Last week I traveled to lovely Princeton, NJ to attend Blacklight Summit. For the second year in a row a smallish group of developers who use or work on Project Blacklight met to talk about our work and learn from each other.
Blacklight is an open source project written in Ruby on Rails that serves as a discovery interface over a Lucene Solr search index. It’s commonly used to build library catalogs, but is generally agnostic about the source and type of the data you want to search. It was even used to help reporters explore the leaked Panama Papers.
At Duke we’re using Blacklight as the public interface to our digital repository. Metadata about repository objects are indexed in Solr and we use Blacklight (with a lot of customizations) to provide access to digital collections, including images, audio, and video. Some of the collections include: Gary Monroe Photographs, J. Walter Thompson Ford Advertisements, and Duke Chapel Recordings, among many others.
Blacklight has also been selected to replace the aging Endeca based catalog that provides search across the TRLN libraries. Expect to hear more information about this project in the future.
Blacklight Summit is more of an unconference meeting than a conference, with a relatively small number of participants. It’s a great chance to learn and talk about common problems and interests with library developers from other institutions.
I’m going to give a brief overview of some of what we talked about and did during the two and a half day meeting and provides links for you explore more on your own.
First, a representative from each institution gave about a five minute overview of how they’re using Blacklight:
The group participated in a workshop on customizing Blacklight. The organizers paired people based on experience, so the most experienced and least experienced (self-identified) were paired up, and so on. Links to the github project for the workshop: https://github.com/projectblacklight/blacklight_summit_demo
We got an update on the state of Blacklight 7. Some of the highlights of what’s coming:
Looking through Duke Libraries’ AdViews collection of television commercials, I recently came across the following commercial for Beech-Nut chewing tobacco, circa 1970:
Obviously this was before tobacco advertising was banned from the television airwaves, which took effect on January 2, 1971, as part of the Public Health Cigarette Smoking Act, signed by President Richard Nixon. At first listen, the commercial’s country-tinged jingle, and voice-over narration sound like “The Man in Black,” the legendary Johnny Cash. This would not be unusual, as Cash had previously done radio and television promos sponsored by Beech-Nut, and can be seen in other 1970’s television commercials, shilling for such clients as Lionel Trains, Amoco and STP. Obviously, Johnny was low on funds at this point in his career, as his music seemed old-fashioned to the younger generation of record-buyers, and his popularity had waned. Appearing in television commercials may have been necessary to balance his checkbook.
However, the Beech-Nut commercial above is mysterious. It sounds like Johnny Cash, but the pitch is slightly off. It’s also odd that Johnny doesn’t visually appear in the ad, like he does in the Lionel, Amoco and STP commercials. Showing his face would have likely yielded higher pay. This raises the question of whether it is in fact Johnny Cash in the Beech-Nut commercial, or someone imitating Johnny’s baritone singing voice and folksy speaking style. Who would be capable of such close imitation? Well, it could be Johnny’s brother, Tommy Cash. Most fans know about Johnny’s older brother, Jack, who died in a tragic accident when Johnny was a child (ironically, the accident was in a sawmill), but Johnny had six siblings, including younger brother Tommy.
Tommy Cash carved out a recording career of his own, and had several hit singles in the late 60’s and early 70’s, by conveniently co-opting Johnny’s sound and image. One of his biggest hits was “Six White Horses,” in 1969, a commentary on the deaths of JFK, RFK and MLK. Other hits included “One Song Away,” and “Rise and Shine.” Johnny and Tommy can be seen performing together in this performance, singing about their father. It turns out Tommy allowed his voice to be used on television commercials for Pepsi, Burger King, and Beech-Nut. So, it’s likely the Beech-Nut commercial is the work of Tommy Cash, rather than his more famous brother. Tommy, now in his 70’s, has continued to record as recently as 2008. Tommy’s also a real estate agent, and handled the sale of Johnny Cash’s home in Tennessee, after the deaths of Johnny and wife June Carter Cash in 2003.
The Duke University Libraries software development team just recently returned from a week in Boston, MA at a conference called Hydra Connect. We ate good seafood, admired beautiful cobblestones, strolled along the Charles River, and learned a ton about what’s going on in the Hydra-sphere.
At this point you may be scratching your head, exclaiming- huh?! Hydra? Hydrasphere? Have no fear, I shall explain!
Our repository, the Duke Digital Repository, is a Hydra/Fedora Repository. Hydra and Fedora are names for two prominent open-source communities in repository land. Fedora concerns itself with architecting the back-end of a repository- the storage layer. Hydra, on the other hand, refers to a multitude of end-user applications that one can architect on top of a Fedora repository to perform digital asset management. Pretty cool and pretty handy. Especially for someone that has no interest in architecting a repository from scratch.
And for a little context re: open source… the idea is that a community of like-minded individuals that care about a particular thing, will band together to develop a massively cool software product that meets a defined need, is supported and extended by the community, and is offered for free for someone to inspect, modify and/or enhance the source code.
I italicized ‘free’ to emphasize that while the software itself is free, and while the source code is available for download and modification it does take a certain suite of skills to architect a Hydra/Fedora Repository. It’s not currently an out-of-the-box solutions, but is moving in that direction with Hydra-in-a-Box. But I digress…
So. Why might someone be interested in joining an open-source community such as these? Well, for many reasons, some of which might ring true for you:
Resources are thin. Talented developers are hard to find and harder to recruit. Working with an open source community means that 1) you have the source code to get started, 2) you have a community of people that are available (and generally enthusiastic) about being a resource, and 3) working collaboratively makes everything better. No one wants to go it alone.
Governance. If one gets truly involved at the community level there are often opportunities for contributing thoughts and opinion that can help to shape and guide the software product. That’s super important when you want to get invested in a project and ensure that it fully meets you need. Going it alone is never a good option, and the whole idea of open-source is that it’s participatory, collaborative, and engaged.
Give back. Perhaps you have a great idea. A fantastic use case. Perhaps one that could benefit a whole lot of other people and/or institutions. Well then share the love by participating in open-source. Instead of developing a behemoth locally that is not maintainable, contribute ideas or features or a new product back to the community. It benefits others, and it benefits you, by investing the community in the effort of folding features and enhancements back into the core.
Hydra Connect was a fantastic opportunity to mingle with like-minded professionals doing very similar work, and all really enthusiastic to share their efforts. They want you to get excited about their work. To see how they are participating in the community. How they are using this variety of open-source software solutions in new and innovative ways.
It’s easy to get bogged down at a local level with the micro details, and to lose the big picture. It was refreshing to step out of the office and get back into the frame of mind that recognizes and empowers the notion that there is a lot of power in participating in healthy communities of practice. There is also a lot of economy in it.
The team came back to Durham full of great ideas and a lot of enthusiasm. It has fueled a lot of fantastic discussion about the future of our repository software eco-system and how that complements our desire to focus on integration, community developed goodness, and sustainable practices for software development.
More to come as we turn that thought process into practice!
As the Digital Collections Program Manager at DUL, I spend most of my time managing projects. I really enjoy the work, and I’m always looking for ways to improve my methods, skills and overall approach. For this reason, I was excited to join forces with a few colleagues to think about how we could help graduate students develop and sharpen their project management skills. We have been meeting since last Spring and our accomplishments include defining key skills, reaching out to grad school departments about available resources and needs, assembling a list of project management readings and resources that we think are relevant in the academic context (still a work in progress: http://bit.ly/DHProjMgmt), and we are in the process of planning a workshop. But my most favorite project has been making project management themed zines.
Yes, you read that correctly: project management zines. You can print them on letter sized paper, and they are very easy to assemble (check out the a demo our friends in Rubenstein put together). But before you download, read on to learn more about the process behind the time management related zine.
Gathering Zine Content
Early on in our work the group decided to focus on 5 key aspects of project management: time management, communicating with others, logging research activities, goal setting, and document or research management. After talking with faculty we decided to focus on time management and document/research management.
I’ve been working with a colleague on time management tips for grad students, so we spent a lot of time combing Lifehacker and GradHacker and found some really good ideas and great resources! Based on our findings, we decided to break the concept of time management down further into smaller areas: planning, prioritizing and monotasking. From there, we made zines (monotasking coming soon)! We are also working on a libguide and some kind of learning module for a workshop.
Here are a few of my favorite new ideas from our time management research:
Monotasking: sometimes focussing on one task for an extended period of time sounds impossible, but my colleague found some really practical approaches for doing one thing at a time, such as the “Pomodoro technique” (http://pomodorotechnique.com/)
Park your work when multitasking: the idea is that before you move from task a to task b, spend a moment noting where you are leaving off on task a, and what you plan to do next when you come back to it.
Prioritization grids: if you don’t know where to begin with the long list of tasks in front of you (something grand students can surely related to), try plotting them on a priority matrix. The most popular grid for this kind of work that I found is the Eisenhower grid, which has you rank tasks by urgency and importance (https://www.mindtools.com/pages/article/newHTE_91.htm). Then you accomplish your tasks by grid quadrant in a defined order (starting with tasks that are both important and urgent). Although I haven’t tried this, I feel like you use other variables depending on your context, perhaps impact and effort. I have an example grid on my zine so you can try this method out yourself!
Use small amounts of time effectively: this is really a mind shift more than a tool or tip, and relates to the Pomodoro technique. Essentially the idea is to stop thinking that you cannot get anything done in those random 15-30 minute windows of downtime we all have between meetings, classes or other engagements. I often feel defeated by 20 minutes of availability and 4 hours of work to do. So I tried really jumping into those small time blocks, and it has been great. Instead of waiting for a longer time slot to work on a “big” task, I’m getting better at carving away at my projects over time. I’ve found that I can really get more done than I thought in 20 minutes. It has been a game changer for me!
Designing the Zines
I was inspired to make zines by my colleague in Rubenstein, who created a researcher how-to zine. The 1-page layout makes the idea of designing a zine much less intimidating. Everyone in the ad-hoc project management group adopted the template and we designed our zines in a variety of design tools: google draw, powerpoint or illustrator. We still have a few more to finish, but you can see our work so far online: http://tinyurl.com/pmzines
Each zine prints out to an 8.5 x 11 piece of paper and can easily be cut and folded into its zine form following an easy gif demo.
Introduction to Project Management (you can use this one as a coloring book too!)
Monotasking for Productive Work Blocks
Planning and Prioritizing
Project Manage your Writing
A few weeks ago I attended my second HopScotch Design Fest in downtown Raleigh. Overall the conference was superb – almost every session I attended was interesting, inspiring, and valuable. Compared to last year, the format this time around was centered around themed groups of speakers and shorter presentations followed by a panel discussion. I was especially impressed with two of these sessions.
Design for Storytelling
Daniel Horovitz talked about how he’d reached a point in his career where he was tired of doing design work with computers. He decided to challenge himself and create at least one new piece of art every day using analog techniques (collage, drawing, etc). He began sharing his work online which lead to increased exposure and a desire from clients to create new projects in the new style he’d developed, instead of the computer-based design work that he’d spent most of his career working on. Continued exploration and growth in his new techniques lead to working on bigger and bigger projects around the world. His talent and body of work are truly impressive and it’s inspiring to hear that creative ruts can sometime lead to reinvention (and success!).
Ekene Eijeoma began his talk by inviting us to turn to the person next to us and say three things: I see you, I value you, and I acknowledge you. This fleetingly simple interaction was actually quite powerful – it was a really interesting experience. He went on to demonstrate how empathy has driven his work. I was particularly impressed with his interactive installation Wage Islands. It visualizes which parts of New York City are really affordable for the people who live there and allows users to see how things change with increases and decreases to the minimum wage.
Michelle Higa Fox showed us many examples of the amazing work that her design studio has created. She started off talking about the idea of micro story telling and the challenges of reaching users on social media channels where focus is fleeting and pulled in many directions. Here are a couple of really clever examples:
Her studio also builds seriously impressive interactive installations. She showed us a very recent work that involved transparent LCD screens and dioramas housed behind the screens that were hidden and revealed based on the context, while motion graphic content could be overlaid in front. It was amazing. I couldn’t find any images online, but I did find this video of another really cool interactive wall:
One anecdote she shared, which I found particularly useful, is that it’s very important to account for short experiences when designing these kinds of interfaces, as you can’t expect your users to stick around as long as you’d like them to. I think that’s something we can take more into consideration as we build interfaces for the library.
Design for Hacking Yourself
Brooke Belk lead us through a short mindfulness exercise (which was very refreshing) and talked about how practicing meditating can really help creativity flow more easily throughout the day. Something I need to try more often! Alexa Clay talked about her concept of the misfit economy. I was amused by her stories of doing role-playing at tech conferences where she dresses as the Amish Futurist and asks deeply challenging questions about the role of technology in the modern world.
But I was mostly impressed with Lulu Miller’s talk. She formerly was a producer at Radiolab, my favorite show on NPR, and now has her own podcast called Invisibilia which is all to say that she knows how to tell a good story. She shared a poignant tale about the elusive nature of creative pursuits she called the house and the bicycle. The story intertwined her experience of pursuing a career in fiction writing while attending grad school in Portland and her neighbor’s struggle to stop building custom bicycles and finish building his house. Other themes included the paradox of intention, having faith in yourself and your work, throwing out the blueprint, and putting out what you have right now! All sage advice for creative types. It really was a lovely experience – I hope it gets published in some form soon.
Back in March I wrote a blog post about the Library exploring Multispectral Imaging (MSI) to see if it was feasible to bring this capability to the Library. It seems that all the stars have aligned, all the ducks have been put in order, the t’s crossed and the i’s dotted because over the past few days/weeks we have been receiving shipments of MSI equipment, scheduling the painting of walls and installation of tile floors and finalizing equipment installation and training dates (thanks Molly!). A lot of time and energy went into bringing MSI to the Library and I’m sure I speak for everyone involved along the way that WE ARE REALLY EXCITED!
I won’t get too technical but I feel like geeking out on this a little… like I said… I’m excited!
Lights, Cameras and Digital Backs: To maximize the usefulness of this equipment and the space it will consume we will capture both MSI and full color images with (mostly) the same equipment. MSI and full color capture require different light sources, digital backs and software. In order to capture full color images, we will be using the Atom Lighting and copy stand system and a Phase One IQ180 80MP digital back from Digital Transitions. To capture MSI we will be using narrowband multispectral EurekaLight panels with a Phase One IQ260 Achromatic, 60MP digital back. These two setups will use the same camera body, lens and copy stand. The hope is to set the equipment up in a way that we can “easily” switch between the two setups.
The computer that drives the system: Bill Christianson of R. B. Toth Associates has been working with Library IT to build a work station that will drive both the MSI and full color systems. We opted for a dual boot system because the Capture One software that drives the Phase One digital back for capturing full-color images has been more stable in a Mac environment and MSI capture requires software that only runs on a Windows system. Complicated, but I’m sure they will work out all the technical details.
The Equipment (Geek out):
Phase One IQ260 Achromatic, 60MP Digital Back
Phase One IQ180, 80MP Digital Back
Phase One iXR Camera Body
Phase One 120mm LS Lens
DT Atom Digitization Bench -Motorized Column (received)
DT Photon LED 20″ Light Banks (received)
Narrowband multispectral EurekaLight panels
Fluorescence filters and control
Workstation (in progress)
Blackout curtains and track (received)
The space: We are moving our current Phase One system and the MSI system into the same room. While full-color capture is pretty straightforward in terms of environment (overhead lights off, continuous light source for exposing material, neutral wall color and no windows), the MSI environment requires total darkness during capture. In order to have both systems in the same room we will be using blackout curtains between the two systems so the MSI system will be able to capture in total darkness and the full-color system will be able to use a continuous light source. While the blackout curtains are a significant upgrade, the overall space needs some minor remodeling. We will be upgrading to full spectrum overhead lighting, gray walls and a tile floor to match the existing lab environment.
As shown above… we have begun to receive MSI equipment, installation and training dates have been finalized, the work station is being built and configured as I write this and the room that will house both Phase One systems has been cleared out and is ready for a makeover… It is actually happening!
What a team effort!
I look forward to future blog posts about the discoveries we will make using our new MSI system!
We’re excited to have released nine digitized collections online this week in the Duke Digital Repository (see the list below ). Some are brand new, and the others have been migrated from older platforms. This brings our tally up to 27 digitized collections in the DDR, and 11,705 items. That’s still just a few drops in what’ll eventually be a triumphantly sloshing bucket, but the development and outreach we completed for this batch is noteworthy. It changes the game for our ability to put digital materials online faster going forward.
Let’s have a look at the new features, and review briefly how and why we ended up here.
Collection Portals: No Developers Needed
Before this week, each digital collection in the DDR required a developer to create some configuration files in order to get a nice-looking, made-to-order portal to the collection. These configs set featured items and their layout, a collection thumbnail, custom rules for metadata fields and facets, blog feeds, and more.
It’s helpful to have this kind of flexibility. It can enhance the usability of collections that have distinctive characteristics and unique needs. It gives us a way to show off photos and other digitized images that’d otherwise look underwhelming. But on the other hand, it takes time and coordination that isn’t always warranted for a collection.
We now have an optimized default portal display for any digital collection we add, so we don’t need custom configuration files for everything. A collection portal is not as fancy unconfigured, but it’s similar and the essential pieces are present. The upshot is: the digital collections team can now take more items through the full workflow quickly–from start to finish–putting collections online without us developers getting in the way.
To better accommodate our manuscript collections, we added more distinction in the interface between different kinds of image items. A digitized archival folder of loose manuscript material now includes some visual cues to reinforce that it’s a folder and not, e.g., a bound album, a single photograph, or a two-page letter.
We completed a fair amount of folder-level digitization in recent years, especially between 2011-2014 as part of a collaborative TRLN Large-Scale Digitization IMLS grant project. That initiative allowed us to experiment with shifting gears to get more digitized content online efficiently. We succeeded in that goal, however, those objects unfortunately never became accessible or discoverable outside of their lengthy, text-heavy archival collection guides (finding aids). They also lacked useful features such as zooming, downloading, linking, and syndication to other sites like DPLA. They were digital collections, but you couldn’t find or view them when searching and browsing digital collections.
Many of this week’s newly launched collections are composed of these digitized folders that were previously siloed off in finding aids. Now they’re finally fully integrated for preservation, discovery, and access alongside our other digital collections in the DDR. They remain viewable from within the finding aids and we link between the interfaces to provide proper context.
Keyboard Nav & Rotation
Two things are bound to increase when digitizing manuscripts en masse at the folder level: 1) the number of images present in any given “item” (folder); 2) the chance that something of interest within those pages ends up oriented sideways or upside-down. We’ve improved the UI a bit for these cases by adding full keyboard navigation and rotation options.
Duke Libraries’ digitization objectives are ambitious. Especially given both the quality and quantity of distinctive, world-class collections in the David M. Rubenstein Library, there’s a constant push to: 1) Go Faster, 2) Do More, 3) Integrate Everything, and 4) Make Everything Good. These needs are often impossibly paradoxical. But we won’t stop trying our best. Our team’s accomplishments this week feel like a positive step in the right direction.
My work involves a lot of problem-solving and problem solving often requires learning new skills. It’s one of the things I like most about my job. Over the past year, I’ve spent most of my time helping Duke’s Rubenstein Library implement ArchivesSpace, an open source web application for managing information about archival collections.
As an archivist and metadata librarian by training (translation: not a programmer), I’ve been working mostly on data mapping and migration tasks, but part of my deep-dive into ArchivesSpace has been learning about the ArchivesSpace API, or, really, learning about APIs in general–how they work, and how to take advantage of them. In particular, I’ve been trying to find ways we can use the ArchivesSpace API to work smarter and not harder as the saying goes.
Why use the ArchivesSpace API?
Quite simply, the ArchivesSpace API lets you do things you can’t do in the staff interface of the application, especially batch operations.
So what is the ArchivesSpace API? In very simple terms, it is a way to interact with the ArchivesSpace backend without using the application interface. To learn more, you should check out this excellent post from the University of Michigan’s Bentley Historical Library: The ArchivesSpace API.
Working with the ArchivesSpace API: Other stuff you might need to know
As with any new technology, it’s hard to learn about APIs in isolation. Figuring out how to work with the ArchivesSpace API has introduced me to a suite of other technologies–the Python programming language, data structure standards like JSON, something called cURL, and even GitHub. These are all technologies I’ve wanted to learn at some point in time, but I’ve always found it difficult to block out time to explore them without having a concrete problem to solve.
Fortunately (I guess?), ArchivesSpace gave me some concrete problems–lots of them. These problems usually surface when a colleague asks me to perform some kind of batch operation in ArchivesSpace (e.g. export a batch of EAD, update a bunch of URLs, or add a note to a batch of records).
Below are examples of some of the requests I’ve received and some links to scripts and other tools (on Github) that I developed for solving these problems using the ArchivesSpace API.
ArchivesSpace API examples:
“Can you re-publish these 12 finding aids again because I fixed some typos?”
I get this request all the time. To publish finding aids at Duke, we export EAD from ArchivesSpace and post it to a webserver where various stylesheets and scripts help render the XML in our public finding aid interface. Exporting EAD from the ArchivesSpace staff interface is fairly labor intensive. It involves logging into the application, finding the collection record (resource record in ASpace-speak) you want to export, opening the record, making sure the resource record and all of its components are marked “published,” clicking the export button, and then specifying the export options, filename, and file path where you want to save the XML.
In addition to this long list of steps, the ArchivesSpace EAD export service is really slow, with large finding aids often taking 5-10 minutes to export completely. If you need to post several EADs at once, this entire process could take hours–exporting the record, waiting for the export to finish, and then following the steps again. A few weeks after we went into production with ArchivesSpace I found that I was spending WAY TOO MUCH TIME exporting and re-exporting EAD from ArchivesSpace. There had to be a better way…
asEADpublish_and_export_eadid_input.py – A Python script that batch exports EAD from the ArchivesSpace API based on EADID input. Run from the command line, the script prompts for a list of EADID values separated with commas and checks to see if a resource record’s finding aid status is set to ‘published’. If so, it exports the EAD to a specified location using the EADID as the filename. If it’s not set to ‘published,’ the script updates the finding aid status to ‘published’ and then publishes the resource record and all its components. Then, it exports the modified EAD. See comments in the script for more details.
Below is a screenshot of the script in action. It even prints out some useful information to the terminal (filename | collection number | ASpace record URI | last person to modify | last modified date | export confirmation)
“Can you update the URLs for all the digital objects in this collection?”
We’re migrating most of our digitized content to the new Duke Digital Repository (DDR) and in the process our digital objects are getting new (and hopefully more persistent) URIs. To avoid broken links in our finding aids to digital objects stored in the DDR, we need to update several thousand digital object URLs in ArchivesSpace that point to old locations. Changing the URLs one at a time in the ArchivesSpace staff interface would take, you guessed it, WAY TOO MUCH TIME. While there are probably other ways to change the URLs in batch (SQL updates?), I decided the safest way was to, of course, use the ArchivesSpace API.
asUpdateDAOs.py – A Python script that will batch update Digital Object identifiers and file version URIs in ArchivesSpace based on an input CSV file that contains refIDs for the the linked Archival Object records. The input is a five column CSV file (without column headers) that includes: [old file version use statement], [old file version URI], [new file version URI], [ASpace ref_id], [ark identifier in DDR (e.g. ark:/87924/r34j0b091)].
[WARNING: The script above only works for ArchivesSpace version 1.5.0 and later because it uses the new “find_by_id” endpoint. The script is also highly customized for our environment, but could easily be modified to make other batch changes to digital object records based on CSV input. I’d recommend testing this in a development environment before using in production].
“Can you add a note to these 300 records?”
We often need to add a note or some other bit of metadata to a set of resource records or component records in ArchivesSpace. As you’ve probably learned, making these kinds of batch updates isn’t really possible through the ArchivesSpace staff interface, but you can do it using the ArchivesSpace API!
duke_archival_object_metadata_adder.py – A Python script that reads a CSV input file and batch adds ‘repository processing notes’ to archival object records in ArchivesSpace. The input is a simple two-column CSV file (without column headers) where the first column contains the archival object’s ref_ID and the second column contains the text of the note you want to add. You could easily modify this script to batch add metadata to other fields.
[WARNING: Script only works in ArchivesSpace version 1.5.0 and higher].
The ArchivesSpace API is a really powerful tool for getting stuff done in ArchivesSpace. Having an open API is one of the real benefits of an open-source tool like ArchivesSpace. The API enables the community of ArchivesSpace users to develop their own solutions to local problems without having to rely on a central developer or development team.
There is already a healthy ecosystem of ArchivesSpace users who have shared their API tips and tricks with the community. I’d like to thank all of them for sharing their expertise, and more importantly, their example scripts and documentation.
Here are more resources for exploring the ArchivesSpace API: