Let’s take a brief tour to surface some these changes!
Policy and Procedure Development
The recently formed Digital Preservation Advisory Group has been working on policy and procedure to bring DDR into compliance with the ISO 16363 Audit and Certification of Trustworthy Digital Repositories Minimum Criteria. We’ve been working on diverse policy areas like defining how embargoes may be set; how often fixity must be checked and reported to stakeholders; in what situations may content be removed and who must be involved in that decision; and what conditions necessitate a ‘tombstone’ to explain the removal of an object. Some of these policies are internal and some have already been made publicly available. For example, see our Deaccession Policy and our Preservation Policy. We’ve made great progress due to the fantastic example set by our friends at Purdue University Research Repository and others.
Durham, North Carolina, is a lovely city– close to mountains, the beach, and full of fantastic restaurants! Sometimes, though, your digital assets just need to get away from it all. Digital preservation demands some geographic diversity. No repository wants all of its data to be subject to a hurricane, of course! That’s why we’ve partnered with DuraCloud, a preservation-focused cloud provider, to store copies of our digital assets in geographically diverse locations. Our data now enjoys homes at Duke, at DuraCloud, and in Amazon Glacier!
To bring transparency to the process of remotely replicating our assets and validating the local and remote assets, we’ve recently implemented a process that externalizes these tasks from Fedora and delivers scheduled reports to stakeholders enumerating and detailing the health of their assets.
Research and Development
The DDR has grown tremendously in the last year and with it has grown the need to standardize and scale to demand. Writing Python to arrange files to conform to our Standard Ingest Format was a perfectly reasonable solution in early 2016. Likewise, programmatic reformatting of endangered file formats wasn’t feasible with the resources available at the time. We also did need to worry about traffic scaling back then. Times have changed!
DDR staff are exploring tools to allow non-developers to easily ingest large amounts of material, methods to identify and migrate files to better supported formats, and are planning for more sustainable and durable architecture like increased inter-application messaging to allow us to externalize processes that have been handled within the repository to external servers.
In past posts, I’ve paid homage to the audio ancestors with riffs on such endangered–some might say extinct–formats as DAT and Minidisc. This week we turn our attention to the smallest (and perhaps the cutest) tape format of them all: the Microcassette.
Introduced by the Olympus Corporation in 1969, the Microcassette used the same width tape (3.81 mm) as the more common Philips Compact Cassette but housed it in a much smaller and less robust plastic shell. The Microcassette also spooled from right to left (opposite from the compact cassette) as well as using slower recording speeds of 2.4 and 1.2 cm/s. The speed adjustment, allowing for longer uninterrupted recording times, could be toggled on the recorder itself. For instance, the original MC60 Microcassette allowed for 30 minutes of recorded content per “side” at standard speed and 60 minutes per side at low speed.
The microcassette was mostly used for recording voice–e.g. lectures, interviews, and memos. The thin tape (prone to stretching) and slow recording speeds made for a low-fidelity result that was perfectly adequate for the aforementioned applications, but not up to the task of capturing the wide dynamic and frequency range of music. As a result, the microcassette was the go-to format for cheap, portable, hand-held recording in the days before the smartphone and digital recording. It was standard to see a cluster of these around the lectern in a college classroom as late as the mid-1990s. Many of the recorders featured voice-activated recording (to prevent capturing “dead air”) and continuously variable playback speed to make transcription easier.
The tiny tapes were also commonly used in telephone answering machines and dictation machines.
As you may have guessed, the rise of digital recording, handheld devices, and cheap data storage quickly relegated the microcassette to a museum piece by the early 21st century. While the compact cassette has enjoyed a resurgence as a hip medium for underground music, the poor audio quality and durability of the microcassette have largely doomed it to oblivion except among the most willful obscurantists. Still, many Rubenstein Library collections contain these little guys as carriers of valuable primary source material. That means we’re holding onto our Microcassette player for the long haul in all of its atavistic glory.
“There is nothing wrong with your television set. Do not attempt to adjust the picture. We are controlling transmission. We will control the horizontal. We will control the vertical. We repeat: there is nothing wrong with your television set.”
That was part of the cold open of one of the best science fiction shows of the 1960’s, “The Outer Limits.” The implication being that by controlling everything you see and hear in the next hour, the show’s producers were about to blow your mind and take you to the outer limits of human thought and fantasy, which the show often did.
In regards to controlling the horizontal and the vertical, one of the more mysterious parts of my job is dealing with aspect ratios when it comes to digitizing videotape. The aspect ratio of any shape is the proportion of it’s dimensions. For example, the aspect ratio of a square is always 1 : 1 (width : height). That means, in any square, the width is always equal to the height, regardless of whether a square is 1-inch wide or 10-feet wide. Traditionally, television sets displayed images in a 4 : 3 ratio. So, if you owned a 20” CRT (cathode ray tube) TV back in the olden days, like say 1980, the broadcast image on the screen was 16” wide by 12” high. So, the height was 3/4 the size of the width, or 4 : 3. The 20” dimension was determined by measuring the rectangle diagonally, and was mainly used to categorize and advertise the TV.
Almost all standard-definition analog videotapes, like U-matic, Beta and VHS, have a 4 : 3 aspect ratio. But when digitizing the content, things get more complicated. Analog video monitors display pixels that are tall and thin in shape. The height of these pixels is greater than their width, whereas modern computer displays use pixels that are square in shape. On an analog video monitor, NTSC video displays at roughly 720 (tall and skinny) pixels per horizontal line, and there are 486 visible horizontal lines. If you do the math on that, 720 x 486 is not 4 : 3. But because the analog pixels display tall and thin, you need more of them aligned vertically to fill up a 4 : 3 video monitor frame.
When Duke Libraries digitizes analog video, we create a master file that is 720 x 486 pixels, so that if someone from the broadcast television world later wants to use the file, it will be native to that traditional standard-definition broadcast specification. However, in order to display the digitized video on Duke’s website, we make a new file, called a derivative, with the dimensions changed to 640 x 480 pixels, because it will ultimately be viewed on computer monitors, laptops and smart phones, which use square pixels. Because the pixels are square, 640 x 480 is mathematically a 4 : 3 aspect ratio, and the video will display properly. The derivative video file is also compressed, so that it will stream smoothly regardless of internet bandwidth limits.
“We now return control of your television set to you. Until next week at the same time, when the control voice will take you to – The Outer Limits.”
A little more than a year ago, I wrote about the proposed update to the 508 accessibility standards. And about three weeks ago, the US Access Board published the final rule that contains updates to the 508 accessibility requirements for Information and Communication Technology (ICT). The rules had not previously been updated since 2001 and as such had greatly lagged behind modern web conventions.
It’s important to note that the 508 guidelines are intended to serve as a vehicle for guiding procurement, while at the same time applying to content created by a given group/agency. As such, the language isn’t always straightforward.
As I outlined in my previous post, a major purpose of the new rule is to move away from regulating types of devices and instead focus on functionality:
… one of the primary purposes of the final rule is to replace the current product-based approach with requirements based on functionality, and, thereby, ensure that accessibility for people with disabilities keeps pace with advances in ICT.
To that effect, one of the biggest change over the old standard is the adoption of WCAG 2.0 as the compliance level. The fundamental premise of WCAG compliance is that content is ‘perceivable, operable, and understandable’ — bottom line is that as developers, we should strive to make sure all of our content is usable for everyone across all devices. The adoption of WCAG allows the board to offload responsibility of making incremental changes as technology advances (so we don’t have to wait another 15 years for updates) and also aligns our standards in the United States with those used around the world.
Harmonization with international standards and guidelines creates a larger marketplace for accessibility solutions, thereby attracting more offerings and increasing the likelihood of commercial availability of accessible ICT options.
Another change has to do with making a wider variety of electronic content accessible, including internal documents. It will be interesting to see to what degree this part of the rule is followed by non-federal agencies.
The Revised 508 Standards specify that all types of public-facing content, as well as nine categories of non-public-facing content that communicate agency official business, have to be accessible, with “content” encompassing all forms of electronic information and data. The existing standards require Federal agencies to make electronic information and data accessible, but do not delineate clearly the scope of covered information and data. As a result, document accessibility has been inconsistent across Federal agencies. By focusing on public-facing content and certain types of agency official communications that are not public facing, the revised requirements bring needed clarity to the scope of electronic content covered by the 508 Standards and, thereby, help Federal agencies make electronic content accessible more consistently.
The new rules do not go into effect until January 2018. There’s also a ‘safe harbor’ clause that protects content that was created before this enforcement date, assuming it was in compliance with the old rules. However, if you update that content after January, you’ll need to make sure it complies with the new final rule.
Existing ICT, including content, that meets the original 508 Standards does not have to be upgraded to meet the refreshed standards unless it is altered. This “safe harbor” clause (E202.2) applies to any component or portion of ICT that complies with the existing 508 Standards and is not altered. Any component or portion of existing, compliant ICT that is altered after the compliance date (January 18, 2018) must conform to the updated 508 Standards.
So long story short, a year from now you should make sure all the content you’re creating meets the new compliance level.
If you’ve visited the Duke University Libraries website in the past month, you may have noticed that it looks a bit more polished than it used to. Over the course of the fall 2016 semester, my talented colleague Michael Daul and I co-led a project to develop and implement a new theme for the site. We flipped the switch to launch the theme on January 6, 2017, the week before spring classes began. In this post, I’ll share some background on the project and its process, and highlight some noteworthy features of the new theme we put in place.
We kicked off the project in Aug 2016 using the title “Website Refresh” (hat-tip to our friends at NC State Libraries for coining that term). The best way to frame it was not as a “redesign,” but more like a 50,000-mile maintenance tuneup for the site. We had four main goals:
Extend the Life of our current site (in Drupal 7) without a major redesign or redevelopment effort
Refresh the Look of the site to be modern but not drastically different
Better Code by streamlining HTML markup & CSS style code for easier management & flexibility
Enhance Accessibility via improved compliance with WCAG accessibility guidelines
Our site is fairly large and complex (1,200+ pages, for starters). So to keep the scope lean, we included no changes in content, information architecture, or platform (i.e., stayed on Drupal 7). We also worked with a lean stakeholder team to make decisions related to aesthetics.
Extending the Life of the Site
Our old website theme was aging; the project leading to its development began five years ago in Sep 2012, was announced in Jan 2013, and then eventually launched about three years ago in Jan 2014. Five years–and even three–is a long time in web years. Sites accumulate a lot of code cruft over time, the tools for managing and writing code become deprecated quickly. We wanted to invest a little time now to replace some pieces of the site’s front-end architecture with newer and better replacements, in order to buy us more time before we’d have to do an expensive full-scale overhaul from the ground up.
Refreshing the Look
Our 2014 site derived a lot its aesthetic from the main Duke.edu website at the time. Duke’s site has changed significantly since then, and meanwhile, web design trends have changed dramatically: flat design is in, skeuomorphism out. Google Web Fonts are in, Times, Arial, Verdana and company are out. Even a three year old site on the web can look quite dated.
Beyond evolving aesthetics, the various behind-the-scenes web frameworks and code workflows are in constant, rapid flux; it can really keep a developer’s head on a swivel. Better code means easier maintenance, and to that end our code got a lot better after implementing these solutions:
Bootstrap Upgrade. For our site’s HTML/CSS/JS framework, we moved from Bootstrap version 2 (2.3.1) to version 3 (3.3.7). This took weeks of work: it meant thousands of pages of markup revisions, only some of which could be done with a global Search & Replace.
Sass for CSS. We trashed all of our old theme’s CSS files and started over using Sass, a far more efficient way to express and maintain style rules than vanilla CSS.
Gulp for Automation. Our new theme uses Gulp to automate code tasks like processing Sass into CSS, auto-prefixing style declarations to work on older browsers, and crunching 30+ css files down into one.
Font Awesome. We ditched most of our older image-based icons in favor of Font Awesome ones, which are far easier to reference and style, and faster to load.
Radix. This was an incredibly useful base theme for Drupal that encapsulates/integrates Sass, Gulp, Bootstrap, and FontAwesome. It also helped us get a Bootswatch starter theme in the mix to minimize the local styling we had to do on top of Bootstrap.
We named our new theme Dulcet and put it up on GitHub.
Some of the code and typography revisions we’ve made in the “refresh” improve our site’s compliance with WCAG2.0 accessibility guidelines. We’re actively working on further assessment and development in this area. Our new theme is better suited to integrate with existing tools, e.g., to automatically add ARIA attributes to interactive page elements.
Feedback or Questions?
We would love to hear from you if you have any feedback on our new site, if you spot any oddities, or if you’re considering doing a similar project and have any questions. We encourage you to explore the site, and hope you find it a refreshing experience.
I am sure you have all been following the Library’s exploration into Multispectral Imaging (MSI) here on Bitstreams, Preservation Underground and the News & Observer. Previous posts have detailed our collaboration with R.B. Toth Associates and the Duke Eye Center, the basic process and equipment, and the wide range of departments that could benefit from MSI. In early December of last year (that sounds like it was so long ago!), we finished readying the room for MSI capture, installed the equipment, and went to MSI boot camp.
Well, boot camp came to us. Meghan Wilson, an independent contractor who has worked with R.B. Toth Associates for many years, started our training with an overview of the equipment and the basic science behind it. She covered the different lighting schemes and when they should be used. She explained MSI applications for identifying resins, adhesives and pigments and how to use UV lighting and filters to expose obscured text. We quickly went from talking to doing. As with any training session worth its salt, things went awry right off the bat (not Meghan’s fault). We had powered up the equipment but the camera would not communicate with the software and the lights would not fire when the shutter was triggered. This was actually a good experience because we had to troubleshoot on the spot and figure out what was going on together as a team. It turns out that there are six different pieces of equipment that have to be powered-up in a specific sequence in order for the system to communicate properly (tee up Apollo 13 soundtrack). Once we got the system up and running we took turns driving the software and hardware to capture a number of items that we had pre-selected. This is an involved process that produces a bunch of files that eventually produce an image stack that can be manipulated using specialized software. When it’s all said and done, files have been converted, cleaned, flattened, manipulated and variations produced that are somewhere in the neighborhood of 300 files. Whoa!
This is not your parents’ point and shoot—not the room, the lights, the curtains, the hardware, the software, the pricetag, none of it. But it is different in another more important way too. This process is team-driven and interdisciplinary. Our R&D working group is diverse and includes representatives from the following library departments.
The Digital Production Center (DPC) has expertise in high-end, full spectrum imaging for cultural heritage institutions along with a deep knowledge of the camera and lighting systems involved in MSI, file storage, naming and management of large sets of files with complex relationships.
The Rubenstein Library’s Collection Development brings a deep understanding of the collections, provenance and history of materials, and valuable contacts with researchers near and far.
To get the most out of MSI we need all of those skills and perspectives. What MSI really offers is the ability to ask—and we hope answer—strings of good questions. Is there ink beneath that paste-down or paint? Is this a palimpsest? What text is obscured by that stain or fire-damage or water damage? Can we recover it without having to intervene physically? What does the ‘invisible’ text say and what if anything does this tell us about the object’s history? Is the reflectance signature of the ink compatible with the proposed date or provenance of the object? That’s just for starters. But you can see how even framing the right question requires a range of perspectives; we have to understand what kinds of properties MSI is likely to illuminate, what kinds of questions the material objects themselves suggest or demand, what the historical and scholarly stakes are, what the wider implications for our and others’ collections are, and how best to facilitate human interface with the data that we collect. No single person on the team commands all of this.
Working in any large group can be a challenge. But when it all comes together, it is worth it. Below is a page from Jantz 723, one processed as a black and white image and the other a Principal Component Analysis produced by the MSI capture and processed using ImageJ and a set of tools created by Bill Christens-Barry of R.B. Toth Associates with false color applied using Photoshop. Using MSI we were able to better reveal this watermark which had previously been obscured.
I think we feel like 16-year-old kids with newly minted drivers’ licenses who have never driven a car on the highway or out of town. A whole new world has just opened up to us, and we are really excited and a little apprehensive!
Practice, experiment, document, refine. Over the next 12 (16? 18) months we will work together to hone our collective skills, driving the system, deepening our understanding of the scholarly, conservation, and curatorial use-cases for the technology, optimizing workflow, documenting best practices, getting a firm grip on scale, pace, and cost of what we can do. The team will assemble monthly, practice what we have learned, and lean on each other’s expertise to develop a solid workflow that includes the right expertise at the right time. We will select a wide variety of materials so that we can develop a feel for how far we can push the system and what we can expect day to day. During all of this practice, workflows, guidelines, policies and expectations will come into sharper focus.
As you can tell from the above, we are going to learn a lot over the coming months. We plan to share what we learn via regular posts here and elsewhere. Although we are not prepared yet to offer MSI as a standard library service, we are interested to hear your suggestions for Duke Library collection items that may benefit from MSI imaging. We have a long queue of items that we would like to shoot, and are excited to add more research questions, use cases, and new opportunities to push our skills forward. To suggest materials, contact Molly Bragg, Digital Collections Program Manager (molly.bragg at Duke.edu), Joshua Sosin, Associate Professor in Classical Studies & History (jds15 at Duke.edu) or Curator of Collections (andrew.armacost at Duke.edu).
Noise is an inescapable part of our sonic environment. As I sit at my quiet library desk writing this, I can hear the undercurrent of the building’s pipes and HVAC systems, the click-clack of the Scribe overhead book scanner, footsteps from the floor above, doors opening and closing in the hallway, and the various rustlings of my own fidgeting. In our daily lives, our brains tune out much of this extraneous noise to help us focus on the task at hand and be alert to sounds conveying immediately useful information: a colleagues’s voice, a cell-phone buzz, a fire alarm.
When sound is recorded electronically, however, this tuned-out noise is often pushed to the foreground. This may be due to the recording conditions (e.g. a field recording done on budget equipment in someone’s home or outdoors) or inherent in the recording technology itself (electrical interference, mechanical surface noise). Noise is always present in the audio materials we digitize and archive, many of which are interviews, oral histories, and events recorded to cassette or open reel tape by amateurs in the field. Our first goal is to make the cleanest and most direct analog-to-digital transfer possible, and then save this as our archival master .wav file with no alterations. Once this is accomplished, we have some leeway to work with the digital audio and try to create a more easily listenable and intelligible access copy.
I recently started experimenting with Steinberg WaveLab software to clean up digitized recordings from the Larry Rubin Papers. This collection contains some amazing documentation of Rubin’s work as a civil rights organizer in the 1960s, but the ever-present hum & hiss often threaten to obscure the content. I worked with two plug-ins in WaveLab to try to mitigate the noise while leaving the bulk of the audio information intact.
Even if you don’t know it by name, anyone who has used electronic audio equipment has probably heard the dreaded 60 Cycle Hum. This is a fixed low-frequency tone that is related to our main electric power grid operating at 120 volts AC in the United States. Due to improper grounding and electromagnetic interference from nearby wires and appliances, this current can leak into our audio signals and appear as the ubiquitous 60 Hz hum (disclaimer–you may not be able to hear this as well on tiny laptop speakers or earbuds). Wavelab’s De-Buzzer plug-in allowed me to isolate this troublesome frequency and reduce its volume level drastically in relation to the interview material. Starting from a recommended preset, I adjusted the sensitivity of the noise reduction by ear to cut unwanted hum without introducing any obvious digital artifacts in the sound.
Similarly omnipresent in analog audio is High-Frequency Hiss. This wash of noise is native to any electrical system (see Noise Floor) and is especially problematic in tape-based media where the contact of the recording and playback heads against the tape introduces another level of “surface noise.” I used the De-Noiser plug-in to reduce hiss while being careful not to cut into the high-frequency content too much. Applying this effect too heavily could make the voices in the recording sound dull and muddy, which would be counterproductive to improving overall intelligibility.
Listen to the before & after audio snippets below. While the audio is still far from perfect due to the original recording conditions, conservative application of the noise reduction tools has significantly cleaned up the sound. It’s possible to cut the noise even further with more aggressive use of the effects, but I felt that would do more harm than good to the overall sound quality.
I was fairly pleased with these results and plan to keep working with these and other software tools in the future to create digital audio files that meet the needs of archivists and researchers. We can’t eliminate all of the noise from our media-saturated lives, but we can always keep striving to keep the signal-to-noise ratio at manageable and healthy levels.
A few weeks ago I attended my second HopScotch Design Fest in downtown Raleigh. Overall the conference was superb – almost every session I attended was interesting, inspiring, and valuable. Compared to last year, the format this time around was centered around themed groups of speakers and shorter presentations followed by a panel discussion. I was especially impressed with two of these sessions.
Design for Storytelling
Daniel Horovitz talked about how he’d reached a point in his career where he was tired of doing design work with computers. He decided to challenge himself and create at least one new piece of art every day using analog techniques (collage, drawing, etc). He began sharing his work online which lead to increased exposure and a desire from clients to create new projects in the new style he’d developed, instead of the computer-based design work that he’d spent most of his career working on. Continued exploration and growth in his new techniques lead to working on bigger and bigger projects around the world. His talent and body of work are truly impressive and it’s inspiring to hear that creative ruts can sometime lead to reinvention (and success!).
Ekene Eijeoma began his talk by inviting us to turn to the person next to us and say three things: I see you, I value you, and I acknowledge you. This fleetingly simple interaction was actually quite powerful – it was a really interesting experience. He went on to demonstrate how empathy has driven his work. I was particularly impressed with his interactive installation Wage Islands. It visualizes which parts of New York City are really affordable for the people who live there and allows users to see how things change with increases and decreases to the minimum wage.
Michelle Higa Fox showed us many examples of the amazing work that her design studio has created. She started off talking about the idea of micro story telling and the challenges of reaching users on social media channels where focus is fleeting and pulled in many directions. Here are a couple of really clever examples:
Her studio also builds seriously impressive interactive installations. She showed us a very recent work that involved transparent LCD screens and dioramas housed behind the screens that were hidden and revealed based on the context, while motion graphic content could be overlaid in front. It was amazing. I couldn’t find any images online, but I did find this video of another really cool interactive wall:
One anecdote she shared, which I found particularly useful, is that it’s very important to account for short experiences when designing these kinds of interfaces, as you can’t expect your users to stick around as long as you’d like them to. I think that’s something we can take more into consideration as we build interfaces for the library.
Design for Hacking Yourself
Brooke Belk lead us through a short mindfulness exercise (which was very refreshing) and talked about how practicing meditating can really help creativity flow more easily throughout the day. Something I need to try more often! Alexa Clay talked about her concept of the misfit economy. I was amused by her stories of doing role-playing at tech conferences where she dresses as the Amish Futurist and asks deeply challenging questions about the role of technology in the modern world.
But I was mostly impressed with Lulu Miller’s talk. She formerly was a producer at Radiolab, my favorite show on NPR, and now has her own podcast called Invisibilia which is all to say that she knows how to tell a good story. She shared a poignant tale about the elusive nature of creative pursuits she called the house and the bicycle. The story intertwined her experience of pursuing a career in fiction writing while attending grad school in Portland and her neighbor’s struggle to stop building custom bicycles and finish building his house. Other themes included the paradox of intention, having faith in yourself and your work, throwing out the blueprint, and putting out what you have right now! All sage advice for creative types. It really was a lovely experience – I hope it gets published in some form soon.
Back in March I wrote a blog post about the Library exploring Multispectral Imaging (MSI) to see if it was feasible to bring this capability to the Library. It seems that all the stars have aligned, all the ducks have been put in order, the t’s crossed and the i’s dotted because over the past few days/weeks we have been receiving shipments of MSI equipment, scheduling the painting of walls and installation of tile floors and finalizing equipment installation and training dates (thanks Molly!). A lot of time and energy went into bringing MSI to the Library and I’m sure I speak for everyone involved along the way that WE ARE REALLY EXCITED!
I won’t get too technical but I feel like geeking out on this a little… like I said… I’m excited!
Lights, Cameras and Digital Backs: To maximize the usefulness of this equipment and the space it will consume we will capture both MSI and full color images with (mostly) the same equipment. MSI and full color capture require different light sources, digital backs and software. In order to capture full color images, we will be using the Atom Lighting and copy stand system and a Phase One IQ180 80MP digital back from Digital Transitions. To capture MSI we will be using narrowband multispectral EurekaLight panels with a Phase One IQ260 Achromatic, 60MP digital back. These two setups will use the same camera body, lens and copy stand. The hope is to set the equipment up in a way that we can “easily” switch between the two setups.
The computer that drives the system: Bill Christianson of R. B. Toth Associates has been working with Library IT to build a work station that will drive both the MSI and full color systems. We opted for a dual boot system because the Capture One software that drives the Phase One digital back for capturing full-color images has been more stable in a Mac environment and MSI capture requires software that only runs on a Windows system. Complicated, but I’m sure they will work out all the technical details.
The Equipment (Geek out):
Phase One IQ260 Achromatic, 60MP Digital Back
Phase One IQ180, 80MP Digital Back
Phase One iXR Camera Body
Phase One 120mm LS Lens
DT Atom Digitization Bench -Motorized Column (received)
DT Photon LED 20″ Light Banks (received)
Narrowband multispectral EurekaLight panels
Fluorescence filters and control
Workstation (in progress)
Blackout curtains and track (received)
The space: We are moving our current Phase One system and the MSI system into the same room. While full-color capture is pretty straightforward in terms of environment (overhead lights off, continuous light source for exposing material, neutral wall color and no windows), the MSI environment requires total darkness during capture. In order to have both systems in the same room we will be using blackout curtains between the two systems so the MSI system will be able to capture in total darkness and the full-color system will be able to use a continuous light source. While the blackout curtains are a significant upgrade, the overall space needs some minor remodeling. We will be upgrading to full spectrum overhead lighting, gray walls and a tile floor to match the existing lab environment.
As shown above… we have begun to receive MSI equipment, installation and training dates have been finalized, the work station is being built and configured as I write this and the room that will house both Phase One systems has been cleared out and is ready for a makeover… It is actually happening!
What a team effort!
I look forward to future blog posts about the discoveries we will make using our new MSI system!
My work involves a lot of problem-solving and problem solving often requires learning new skills. It’s one of the things I like most about my job. Over the past year, I’ve spent most of my time helping Duke’s Rubenstein Library implement ArchivesSpace, an open source web application for managing information about archival collections.
As an archivist and metadata librarian by training (translation: not a programmer), I’ve been working mostly on data mapping and migration tasks, but part of my deep-dive into ArchivesSpace has been learning about the ArchivesSpace API, or, really, learning about APIs in general–how they work, and how to take advantage of them. In particular, I’ve been trying to find ways we can use the ArchivesSpace API to work smarter and not harder as the saying goes.
Why use the ArchivesSpace API?
Quite simply, the ArchivesSpace API lets you do things you can’t do in the staff interface of the application, especially batch operations.
So what is the ArchivesSpace API? In very simple terms, it is a way to interact with the ArchivesSpace backend without using the application interface. To learn more, you should check out this excellent post from the University of Michigan’s Bentley Historical Library: The ArchivesSpace API.
Working with the ArchivesSpace API: Other stuff you might need to know
As with any new technology, it’s hard to learn about APIs in isolation. Figuring out how to work with the ArchivesSpace API has introduced me to a suite of other technologies–the Python programming language, data structure standards like JSON, something called cURL, and even GitHub. These are all technologies I’ve wanted to learn at some point in time, but I’ve always found it difficult to block out time to explore them without having a concrete problem to solve.
Fortunately (I guess?), ArchivesSpace gave me some concrete problems–lots of them. These problems usually surface when a colleague asks me to perform some kind of batch operation in ArchivesSpace (e.g. export a batch of EAD, update a bunch of URLs, or add a note to a batch of records).
Below are examples of some of the requests I’ve received and some links to scripts and other tools (on Github) that I developed for solving these problems using the ArchivesSpace API.
ArchivesSpace API examples:
“Can you re-publish these 12 finding aids again because I fixed some typos?”
I get this request all the time. To publish finding aids at Duke, we export EAD from ArchivesSpace and post it to a webserver where various stylesheets and scripts help render the XML in our public finding aid interface. Exporting EAD from the ArchivesSpace staff interface is fairly labor intensive. It involves logging into the application, finding the collection record (resource record in ASpace-speak) you want to export, opening the record, making sure the resource record and all of its components are marked “published,” clicking the export button, and then specifying the export options, filename, and file path where you want to save the XML.
In addition to this long list of steps, the ArchivesSpace EAD export service is really slow, with large finding aids often taking 5-10 minutes to export completely. If you need to post several EADs at once, this entire process could take hours–exporting the record, waiting for the export to finish, and then following the steps again. A few weeks after we went into production with ArchivesSpace I found that I was spending WAY TOO MUCH TIME exporting and re-exporting EAD from ArchivesSpace. There had to be a better way…
asEADpublish_and_export_eadid_input.py – A Python script that batch exports EAD from the ArchivesSpace API based on EADID input. Run from the command line, the script prompts for a list of EADID values separated with commas and checks to see if a resource record’s finding aid status is set to ‘published’. If so, it exports the EAD to a specified location using the EADID as the filename. If it’s not set to ‘published,’ the script updates the finding aid status to ‘published’ and then publishes the resource record and all its components. Then, it exports the modified EAD. See comments in the script for more details.
Below is a screenshot of the script in action. It even prints out some useful information to the terminal (filename | collection number | ASpace record URI | last person to modify | last modified date | export confirmation)
“Can you update the URLs for all the digital objects in this collection?”
We’re migrating most of our digitized content to the new Duke Digital Repository (DDR) and in the process our digital objects are getting new (and hopefully more persistent) URIs. To avoid broken links in our finding aids to digital objects stored in the DDR, we need to update several thousand digital object URLs in ArchivesSpace that point to old locations. Changing the URLs one at a time in the ArchivesSpace staff interface would take, you guessed it, WAY TOO MUCH TIME. While there are probably other ways to change the URLs in batch (SQL updates?), I decided the safest way was to, of course, use the ArchivesSpace API.
asUpdateDAOs.py – A Python script that will batch update Digital Object identifiers and file version URIs in ArchivesSpace based on an input CSV file that contains refIDs for the the linked Archival Object records. The input is a five column CSV file (without column headers) that includes: [old file version use statement], [old file version URI], [new file version URI], [ASpace ref_id], [ark identifier in DDR (e.g. ark:/87924/r34j0b091)].
[WARNING: The script above only works for ArchivesSpace version 1.5.0 and later because it uses the new “find_by_id” endpoint. The script is also highly customized for our environment, but could easily be modified to make other batch changes to digital object records based on CSV input. I’d recommend testing this in a development environment before using in production].
“Can you add a note to these 300 records?”
We often need to add a note or some other bit of metadata to a set of resource records or component records in ArchivesSpace. As you’ve probably learned, making these kinds of batch updates isn’t really possible through the ArchivesSpace staff interface, but you can do it using the ArchivesSpace API!
duke_archival_object_metadata_adder.py – A Python script that reads a CSV input file and batch adds ‘repository processing notes’ to archival object records in ArchivesSpace. The input is a simple two-column CSV file (without column headers) where the first column contains the archival object’s ref_ID and the second column contains the text of the note you want to add. You could easily modify this script to batch add metadata to other fields.
[WARNING: Script only works in ArchivesSpace version 1.5.0 and higher].
The ArchivesSpace API is a really powerful tool for getting stuff done in ArchivesSpace. Having an open API is one of the real benefits of an open-source tool like ArchivesSpace. The API enables the community of ArchivesSpace users to develop their own solutions to local problems without having to rely on a central developer or development team.
There is already a healthy ecosystem of ArchivesSpace users who have shared their API tips and tricks with the community. I’d like to thank all of them for sharing their expertise, and more importantly, their example scripts and documentation.
Here are more resources for exploring the ArchivesSpace API: