Around this time last year, I wrote about our ambitious plans to implement ArcLight software for archival discovery and access at Duke in 2020. While this year has certainly laid waste to so many good intentions, our team persisted through the cacophony undeterred, and—I’m proud to report—still hit our mark of going live on July 1, 2020 after a six-month work cycle. The site is available at https://archives.lib.duke.edu/.
Now that we have been live for awhile, I thought it’d be worthwhile to summarize what we accomplished, and reflect a bit on how it’s going.
Working Among Peers
I had the pleasure of presenting about ArcLight at the Oct 2020 Blacklight Summit alongside Julie Hardesty (Indiana University) and Trey Pendragon (Princeton University). The three of us shared our experiences implementing ArcLight at our institutions. Though we have the same core ArcLight software underpinning our apps, we have each taken different strategies to build on top of it. Nevertheless, we’re all emerging with solutions that look polished and fill in various gaps to meet our unique local needs. It’s exciting to see how well the software holds up in different contexts, and to be able to glean inspiration from our peers’ platforms.
A lot of content in this post will reiterate what I shared in the presentation. A recording of that presentation will be available soon, and when it is, I’ll post it here.
Back in April, I discussed at length our custom-built features and interface revisions that we had completed by the halfway point for the project. So, now let’s look closer at everything else we added in the second half (and in the post-launch period).
Browsing Collection Contents
This is one of the hardest things to get right in a finding aids UI, so our solution has evolved through many iterations. We created a context sidebar with lightly-animated loading indicators matching the number of items currently loading. The nav sticks with you as you scroll down the page and the Request button stays visible. We also decided to present a list of direct child components in the main page body for any parent component.
At the collection level, we wanted to ensure that users didn’t miss any restrictions info, so we presented a taste of it at the top-right of the page that jumps you to the full description when clicking “More.”
We changed how access and use restriction indexing so components can inherit their restrictions from any ancestor component. Then we made bright yellow banners and icons in the UI to signify that a component has restrictions.
Hierarchical Record Group Browse
Using the excellent Blacklight Hierarchy plugin, we developed a way to browse University Archives collections by an existing hierarchical Record Group classification system. We encoded the group numbers, titles, nesting, and description in a YAML config file so they’re easy to change as they evolve.
Digital Repository & Bento Search Integration
ArcLight exists among a wide constellation of other applications supporting and promoting discovery in the library, so integrating with these other pieces was an important part of our implementation. In April, I showed the interaction between ArcLight and our Requests app, as well as rendering digital object viewers/players inline via the Duke Digital Repository (DDR).
Two other locations external to our application now use ArcLight’s APIs to retrieve archival information. The first is the Duke Digital Repository (DDR). When viewing a digital collection or digital object that has a physical counterpart in the archives, we pull archival information for the item into the DDR interface from ArcLight’s JSON API.
The other is our “Bento” search application powering the default All search available from the library website. Now when your query finds matches in ArcLight, you’ll see component-level results under a Collection Guides bento box. Components are contextualized with a linked breadcrumb trail.
Bookmarks Export CSV
COVID-19 brought about many changes to how Duke Libraries retrieve materials for faculty and student research. You may have heard Duke’s Library Takeout song (819K YouTube views & counting!), and if you have, you probably can’t ever un-hear it.
But with archival materials, we’re talking about items that could never be taken out of the building. Materials may only be accessed in a controlled environment in the Rubenstein Reading Room, which remains highly restricted. With so much Duke instruction moving online during COVID, we urgently needed to come up with a better workflow to field an explosion of requests for digitizing archival materials for use in remote instruction.
ArcLight’s Bookmarks feature (which comes via Blacklight) proved to be highly valuable here. We extended the feature to add a CSV export. The CSV is constructed in a way that makes it function as a digitization work order that our Digital Collections & Curation Services staff use to shepherd a request through digitization, metadata creation, and repository ingest. Over 26,000 images have now been digitized for patron instruction requests using this new workflow.
Here’s a list of several other custom features we completed after the April midway point.
- Relevancy optimization
- WCAG2.0 AA accessibility
- ARKs & permalinks
- Advanced search modal
- Catalog record links
- Dynamic sitemaps (via gem)
- Creative Commons / RightsStatements.org integration
- Twitter card metadata
- Open Graph metadata
- Google Analytics event tracking with Anonymize IP
- Debug mode for relevancy tuning
Bringing ArcLight online required some major rearchitecting of our pipeline to preview and publish archival data. Our archivists have been using ArchivesSpace for several years to manage the source data, and exporting EAD2002 XML files when ready to be read by the public UI. Those parts remain the same for now, however, everything else is new and improved.
Our new process involves two GitLab repositories: one for the EAD data, and another for the ArcLight-based application. The data repo uses GitLab Webhooks to send
POST requests to the app to queue up reindexing jobs automatically whenever the data changes. We have a test/preview branch for the data that updates our dev and test servers for the application, so archivists can easily see what any revised or new finding aids will look like before they go live in production.
We use GitLab CI/CD to easily and automatically deploy changes to the application code to the various servers. Each code change gets systematically checked for passing unit and feature tests, security, and code style before being integrated. We also aim to add automated accessibility testing to our pipeline within the next couple months.
A lot of data gets crunched while indexing EAD documents through Traject into Solr. Our app uses Resque-based background job processing to handle the transactions. With about 4,000 finding aids, this creates around 900,000 Solr documents; the index is currently a little over 1GB. Changes to data get reindexed and reflected in the UI near-instantaneously. If we ever need to reindex every finding aid, it takes only about one hour to complete.
What We Have Learned
We have been live for just over four months, and we’re really ecstatic with how everything is going.
In September 2020, our Assessment & User Experience staff conducted ten usability tests using our ArcLight UI, with five experienced archival researchers and five novice users. Kudos to Joyce Chapman, Candice Wang, and Anh Nguyen for their excellent work. Their report is available here. The tests were conducted remotely over Zoom due to COVID restrictions. This was our first foray into remote usability testing.
Novice and advanced participants alike navigated the site fairly easily and understood the contextual elements in the UI. We’re quite pleased with how well our custom features performed (especially the context sidebar, contents lists, redesigned breadcrumb trail). The Advanced Search modal got more use than we had anticipated, but it was effective. We were also somewhat surprised to find that users were not confused by the
All Collections vs.
This Collection search scope selector when searching the site.
“The interface design does a pretty good job of funneling me to what I need to see… Most of the things I was looking for were in the first place or two I’d suspect they’d be.” — Representative quote from a test participant
A few improvements were recommended as a result of the testing:
- make container information clearer, especially within the requesting workflow
- improve visibility of the online access facet
- make the Show More links in the sidebar context nav clearer
- better delineate between collections and series in the breadcrumb
- replace jargon with clearer labels, especially “Indexed Terms“
We recently implemented changes to address 2, 3, and 5. We’re still considering options for 1 and 4. Usability testing has been invaluable part of our development process. It’s a joy (and often a humbling experience!) to see your design work put through the paces with actual users in a usability test. It always helps us understand what we’re doing so we can make things better.
We want to learn more about how often different parts of the UI are used, so we implemented Google Analytics event tracking to anonymously log interactions. We use the Anonymize IP feature to help protect patron privacy.
Some observations so far:
- The context nav sidebar is by far the most interacted-with part of the UI.
- Browsing the Contents section of a component page (list of direct child components) is the second-most frequent interaction.
- Subject, Collection, & Names are the most-used facets, in that order. That does not correlate with the order they appear in the sidebar.
- Links presented in the Online Access banners were clicked 5x more often than the limiter in the Online Access facet (which matches what we found in usability testing)
- Basic keyword searches happen 32x more frequently than advanced searches
Search Engine Optimization (SEO)
We want to be sure that when people search Google for terms that appear in our finding aids, they discover our resources. So when several Blacklight community members combined forces to create a Blacklight Dynamic Sitemaps gem this past year, it caught our eye. We found it super easy to set up, and it got the vast majority of our collection records Google-indexed within a month or so. We are interested in exploring ways to get it to include the component records in the sitemap as well.
Launching ArcLight: Retrospective
We’re pretty proud of how this all turned out. We have accomplished a lot in a relatively short amount of time. And the core software will only improve as the community grows.
At Duke, we already use Blacklight to power a bunch of different discovery applications in our portfolio. And given that the responsibility of supporting ArcLight falls to the same staff who support all of those other apps, it has been unquestionably beneficial for us to be able to work with familiar tooling.
We did encounter a few hurdles along the way, mostly because the software is so new and not yet widely adopted. There are still some rough edges that need to be smoothed out in the core software. Documentation is pretty sparse. We found indexing errors and had to adjust some rules. Relevancy ranking needed a lot of work. Not all of the EAD elements and attributes are accounted for; some things aren’t indexed or displayed in an optimal way.
Still, the pros outweigh the cons by far. With ArcLight, you get an extensible Blacklight-based core, only catered specifically to archival data. All the things Blacklight shines at (facets, keyword highlighting, autosuggest, bookmarks, APIs, etc.) are right at your fingertips. We have had a very good experience finding and using Blacklight plugins to add desired features.
Finally, while the ArcLight community is currently small, the larger Blacklight community is not. There is so much amazing work happening out in the Blacklight community–so much positive energy! You can bet it will eventually pay dividends toward making ArcLight an even better solution for archival discovery down the road.
Many thanks go out to our Duke staff members who contributed to getting this project completed successfully. Especially:
- Product Owner: Noah Huffman
- Developers/DevOps: Sean Aery, David Chandek-Stark, Michael Daul, Cory Lown (scrum master)
- Project Sponsors: Will Sexton & Meghan Lyon
- Redesign Team: Noah Huffman (chair), Joyce Chapman, Maggie Dickson, Val Gillispie, Brooke Guthrie, Tracy Jackson, Meghan Lyon, Sara Seten Berghausen
And thank you as well to the Stanford University Libraries staff for spearheading the ArcLight project.