Featured image – Wayback Machine capture of the Tripod2 beta site in February, 2011.
We all design and create platforms that work beautifully for us, that fill us with pride as they expand and grow to meet our programmatic needs, and all the while the world changes around us, the programs scale beyond what we envisioned, and what was once perfectly adaptable becomes unsustainable, appearing to us all of the sudden as a voracious, angry beast, threatening to consume us, or else a rickety contraption, teetering on the verge of a disastrous collapse. I mean, everyone has that experience, right?
In March of 2011, a small team consisting primarily of me and fellow developer Sean Aery rolled out a new, homegrown platform, Tripod2. It became the primary point of access for Duke Digital Collections, the Rubenstein Library’s collection guides, and a handful of metadata-only datasets describing special collections materials. Within a few years, we had already begun talking about migrating all the Tripod2 stuff to new platforms. Yet nearly a decade after its rollout, we still have important content that depends on that platform for access.
Nevertheless, we have made significant progress. Sunsetting Tripod2 became a priority for one of the teams in our Digital Preservation and Publishing Program last year, and we vowed to follow through by the end of 2020. We may not make that target, but we do have firm plans for the remaining work. The migration of digital collections to the Duke Digital Repository has been steady, and nears its completion. This past summer, we rolled out a new platform for the Rubenstein collection guides, based on the ArcLight framework. And now have a plan to handle the remaining instances of metadata-only databases, a plan that itself relies on the new collection guides platform.
We built Tripod2 on the triptych of Python/Django, Solr, and a document base of METS files. There were many areas of functionality that we never completely developed, but it gave us a range of capability that was crucial in our thinking about digital collections a decade ago – the ability to customize, to support new content types, and to highlight what made each digital collection unique. In fact, the earliest public statement that I can find revealing the existence of Tripod2 is Sean’s blog post, “An increasingly diverse range of formats,” from almost exactly ten years ago. As Sean wrote then, “dealing with format complexity is one of our biggest challenges.”
As the years went by, a number of factors made it difficult to keep Tripod2 current with the scope of our programs and the changing of web technology. The single most prevalent factor was the expanding scope of the Duke Digital Collections program, which began to take on more high-volume digitization efforts. We started adding all of our new digital collections to the Duke Digital Repository (DDR) starting in 2015, and the effort to migrate from Tripod2 to the repository picked up soon thereafter. That work was subject to all sorts of comical and embarrassing misestimations by myself on the pages of this very blog over the years, but thanks to the excellent work by Digital Collections and Curation Services, we are down to the final stages.
Collection and item counters from the Duke Digital Repository’s homepage for Duke Digital Collections, taken from the Internet Archive’s Wayback Machine, approximately a year apart in 2018, 2019, and 2020. The volume of digital collections has roughly doubled in that time, due to both the addition of new collections, and the migration of collections from Tripod2.
Moving digital collections to the DDR went hand-in-hand with far less customization, and far less developer intervention to publish a new collection. Where we used to have developers dedicated to individual platforms, we now work together more broadly as a team, and promote redundancy in our development and support models as much as we can. In both our digital collections program and our approach to software development, we are more efficient and more process-driven.
Given my record of predictions about our work on this blog, I don’t want to be too forward in announcing this transition. We all know that 2020 doesn’t suffer fools gladly, or maybe it suffers some of them but not others, and maybe I shouldn’t talk about 2020 just like this, right out in the open, where 2020 can hear me. So I’ll just leave it here – in recent times, we have done a lot of work toward saying goodbye to Tripod2. Perhaps soon we shall.
Some recent community contributions to the gem as well as some extra time as we transition from one work cycle to another provided an opportunity for maintenance and refinement of EDTF-Humanize. The primary improvement is better support for languages other than English via Ruby I18n locale configuration files and a language specific module override pattern. Support for French is now included and support for other languages may be added following the same approach as French.
The primary means of adding additional languages to EDTF-Humanize is to add a translation file to config/locals/. This is the translation file included to support French:
fr:
date:
day_names: [Dimanche, Lundi, Mardi, Mercredi, Jeudi, Vendredi, Samedi]
abbr_day_names: [Dim, Lun, Mar, Mer, Jeu, Ven, Sam]
# Don't forget the nil at the beginning; there's no such thing as a 0th month
month_names: [~, Janvier, Février, Mars, Avril, Mai, Juin, Juillet, Août, Septembre, Octobre, Novembre, Decembre]
abbr_month_names: [~, Jan, Fev, Mar, Avr, Mai, Jun, Jul, Aou, Sep, Oct, Nov, Dec]
seasons:
spring: "printemps"
summer: "été"
autumn: "automne"
winter: "hiver"
edtf:
terms:
approximate_date_prefix_day: ""
approximate_date_prefix_month: ""
approximate_date_prefix_year: ""
approximate_date_suffix_day: " environ"
approximate_date_suffix_month: " environ"
approximate_date_suffix_year: " environ"
decade_prefix: "Les années "
decade_suffix: ""
century_suffix: ""
interval_prefix_day: "Du "
interval_prefix_month: "De "
interval_prefix_year: "De "
interval_connector_approximate: " à "
interval_connector_open: " à "
interval_connector_day: " au "
interval_connector_month: " à "
interval_connector_year: " à "
interval_unspecified_suffix: "s"
open_start_interval_with_day: "Jusqu'au %{date}"
open_start_interval_with_month: "Jusqu'en %{date}"
open_start_interval_with_year: "Jusqu'en %{date}"
open_end_interval_with_day: "Depuis le %{date}"
open_end_interval_with_month: "Depuis %{date}"
open_end_interval_with_year: "Depuis %{date}"
set_dates_connector_exclusive: ", "
set_dates_connector_inclusive: ", "
set_earlier_prefix_exclusive: 'Le ou avant '
set_earlier_prefix_inclusive: 'Le et avant '
set_last_date_connector_exclusive: " ou "
set_last_date_connector_inclusive: " et "
set_later_prefix_exclusive: 'Le ou après '
set_later_prefix_inclusive: 'Le et après '
set_two_dates_connector_exclusive: " ou "
set_two_dates_connector_inclusive: " et "
uncertain_date_suffix: "?"
unknown: 'Inconnue'
unspecified_digit_substitute: "x"
formats:
day_precision_strftime_format: "%-d %B %Y"
month_precision_strftime_format: "%B %Y"
year_precision_strftime_format: "%Y"
In addition to the translation file, the methods used to construct the human readable string for each EDTF date object type may be completely overridden for a language if needed. For instance, when the date object is an instance of EDTF::Century the French language uses a different method from the default to construct the humanized form. This override is accomplished by adding a language module for the French language that includes the Default module and also includes a Century module that overrides the default behavior. The override is here (minus the internals of the humanizer method) as an example:
# lib/edtf/humanize/language/french.rb
module Edtf
module Humanize
module Language
module French
include Default
module Century
extend self
def humanizer(date)
# Special French handling for EDTF::Century
end
end
end
end
end
end
EDTF-Humanize version 2.0.0 is available on rubygems.org and on GitHub. Documentation is available on GitHub. Pull requests are welcome; I’m especially interested in contributions to add support for languages in addition to English and French.
Last week the Duke University Libraries (DUL) development team released a new version of the Duke Digital Repository (DDR), which is the preservation and access platform for digitized, born digital, and purchased library collections. DDR is developed and maintained by DUL staff and it is built using Samvera, Valkyrie and Blacklight components (read all about our migration to Valkyrie which concluded in early 2020).
Look at that beautiful technical metadata!
The primary goal of our new repository features are to provide better support for and access to born digital records. The planning for this work began more than 2 years ago, when the Rubenstein Libraries’ Digital Records Archivist joined the Digital Collections Implementation Team (DCIT) to help us envision how DDR and our workflows could better support born digital collections. Conversations on this topic began between the Rubenstein Library and Digital Strategies and Technology well before that.
Back in 2018, DCIT developed a list of user stories to address born digital records as well as some other longstanding needs. At the time we evaluated each need based on its difficult and impact and then developed a list of high, medium and low priority features. Fast forward to late 2019, and we designated 3 folks from DCIT to act as product owners during development. Those folks are our Metadata Architect (Maggie Dickson), Digital Records Archivist ([Matthew] farrell), and me (Head of Digital Collections and Curation Services). Development work began in earnest in Jan/February and now after many meetings, user story refinements, more meetings, and actual development work here we are!
Notable new features include:
Metadata only view of objects: restrict the object but allow the public to search and discover its metadata
Expose technical metadata for components in the public interface
Better access to full text search in CONTENTdm from DDR
As you can see above we were able to fit in a few non-born digital records related features. This is because one of our big priorities is finishing the migration from our legacy Tripod 2 platform to DDR in 2020. One of the impediments to doing so (in addition migrating the actual content) is that Tripod 2 connects with our CONTENTdm instance, which is where we provide access to digitized primary sources that require full text search (newspapers and publications primarily). The new DDR features therefor include enhanced links to our collections in CONTENTdm.
We hope these new features provide a better experience for our users as well as a safe and happy home for our born digital records!
Search full text link on a collection landing page.Example of the search within an item interface
A technology allowing most of us to keep working effectively during the COVID-19 pandemic is called “videotelephony,” which is real-time, simultaneous audio-visual communication between two or more users. Right now, millions of workers and families are using Zoom, FaceTime, WhatsApp, WebEx, Skype and other software to see and hear each other live, using the built-in microphones and video cameras on our computers, tablets and mobile phones.
We take this capability for granted now, but it’s actually been over a century in the making. Generations of trial and error, billions in spent capital, technical brick walls and failed business models have paved the way to this morning’s Zoom meeting with your work team. You might want to change out of your pajamas, by the way.
AT&T’s Picturephone (Model 1) was introduced at the 1964 World’s Fair.
Alexander Graham Bell famously patented the telephone in 1876. Shortly after, the concept of not only hearing the person you are talking to, but also seeing them simultaneously, stirred the imagination of inventors, writers and artists. It seemed like a reasonably-attainable next step. Early terms for a hypothetical device that could accomplish this included the “Telephonoscope” and the “Telectroscope”
Mr. Bell himself conceived of a device called an “electrical radiophone,” and predicted “the day would come when the man at the telephone would be able to see the distant person to whom he was speaking.” But that day would not come until long after Bell’s death in 1922.
The problem was, the transmission of moving images was a lot more complicated than transmitting audio. Motion picture film, also introduced in the late 1800s, was brought to life by chemicals reacting to silver-halide crystals in a darkroom, but unlike the telephone, electricity played no part in film’s construction or dissemination.
The telephone converted sound waves to electrical signals, as did radio station towers. Neither could transmit without electricity. And a telephone is “full-duplex,” meaning the data is transmitted in both directions, simultaneously, on a single carrier. The next challenge was to somehow electrify moving images, make them full-duplex, and accommodate their exponentially larger bandwidth.
The Picturephone (Model 2). Only a few hundred were sold in the 1970s.
It wasn’t until the late 1930s that cathode-ray-tube television sets were introduced to the world, and the concept of analog video began to gain traction. Unlike motion picture film, video is an electronic medium. Now that moving images were utilizing electricity, they could be transmitted to others, using antennas.
After World War II ended, and Americans had more spending money, black & white television sets became popular household items in the 1950s. But unlike the telephone, communication was still one way. It wasn’t full-duplex. You could see “The Honeymooners,” but they couldn’t see you, and it wasn’t live. Live television broadcasts were rare, and still in the experimental phase.
In 1964, AT&T’s Bell Labs (originally founded by Alexander Graham Bell), introduced the “Picturephone” at the New York World’s Fair and at Disneyland, demonstrating a video call between the two locales. Later, AT&T introduced public videophone booths in New York City, Chicago and Washington, DC. If you were in the New York videophone booth, you could see and hear someone in the Chicago videophone booth, in real time, and it was two-way communication.
The problem was, it was outrageously expensive. A three-minute call cost $225 in today’s money. The technology was finally here, but who could afford it? AT&T poured billions into this concept for years, manufacturing “PicturePhones” and “VideoPhones” for home and office, all the way through 1995, but they were always hampered by the limitations of low-bandwidth telephone lines and very high prices, making them not worth it for the consumer, and never widely adopted.
AT&T’s VideoPhone 2500, released in 1992, priced at $1599.99.
It wasn’t until broadband internet, and high-compression video codecs became widespread in the new millennium, that videotelephony finally became practical, affordable and thus marketable. In recent years, electronics manufacturers began to include video cameras and microphones as a standard feature in CPUs, tablets and mobile phones, making external webcams obsolete. Services like Skype, FaceTime and WebEx were introduced, and later WhatsApp, Zoom and numerous others.
Now it’s simple, and basically free, to have a high-quality, full-color video chat with your friend, partner or co-worker, and a company like Zoom has a net worth of 40 billion. It’s amazing that it took more than 100 years since the invention of the telephone to get here. And just in time for a global pandemic requiring strict physical distancing. Don’t forget to update your clever background image!
On January 20, 2020, we kicked off our first development sprint for implementing ArcLight at Duke as our new finding aids / collection guides platform. We thought our project charter was solid: thorough, well-vetted, with a reasonable set of goals. In the plan was a roadmap identifying a July 1, 2020 launch date and a list of nineteen high-level requirements. There was nary a hint of an impending global pandemic that could upend absolutely everything.
The work wasn’t supposed to look like this, carried out by zooming virtually into each other’s living rooms every day. Code sessions and meetings now require navigating around child supervision shifts and schooling-from-home responsibilities. Our new young office-mates occasionally dance into view or within earshot during our calls. Still, we acknowledge and are grateful for the privilege afforded by this profession to continue to do our work remotely from safe distance.
So, a major shoutout is due to my colleagues in the trenches of this work overcoming the new unforeseen constraints around it, especially Noah Huffman, David Chandek-Stark, and Michael Daul. Our progress to date has only been possible through resilience, collaboration, and willingness to keep pushing ahead together.
Three months after we started the project, we remain on track for a summer 2020 launch.
As a reminder, we began with the core open-source ArcLight platform (demo available) and have been building extensions and modifications in our local application in order to accommodate Duke needs and preferences. With the caveat that there’ll be more changes coming over the next couple months before launch, I want to provide a summary of what we have been able to accomplish so far and some issues we have encountered along the way. Duke staff may access our demo app (IP-restricted) for an up-to-date look at our work in progress.
Homepage
Homepage design for Duke’s ArcLight finding aids site.
Duke Branding. Aimed to make an inviting front door to the finding aids consistent with other modern Duke interfaces, similar to–yet distinguished enough from–other resources like the catalog, digital collections, or Rubenstein Library website.
Featured Items. Built a configurable set of featured items from the collections (with captions), to be displayed randomly (actual selections still in progress).
Dynamic Content. Provided a live count of collections; we might add more indicators for types/counts of materials represented.
Layout
A collection homepage with a sidebar for context navigation.
Sidebar. Replaced the single-column tabbed layout with a sidebar + main content area.
Persistent Collection Info. Made collection & component views more consistent; kept collection links (Summary, Background, etc.) visible/available from component pages.
Width. Widened the largest breakpoint. We wanted to make full use of the screen real estate, especially to make room for potentially lengthy sidebar text.
Navigation
Component pages contextualized through a sidebar navigator and breadcrumb above the main title.
Hierarchical Navigation. Restyled & moved the hierarchical tree navigation into the sidebar. This worked well functionally in ArcLight core, but we felt it would be more effective as a navigational aid when presented beside rather than below the content.
Tooltips & Popovers. Provided some additional context on mouseovers for some navigational elements.
Mouseover context in navigation.
List Child Components. Added a direct-child list in the main content for any series or other component. This makes for a clear navigable table of what’s in the current series / folder / etc. Paginating it helps with performance in cases where we might have 1,000+ sibling components to load.
Breadcrumb Refactor. Emphasized the collection title. Kept some indentation, but aimed for page alignment/legibility plus a balance of emphasis between current component title and collection title.
Breadcrumb trail to show the current component’s nesting.
Search Results
Search results grouped by collection, with keyword highlighting.
“Group by Collection” as the default. Our stakeholders were confused by atomized components as search results outside of the context of their collections, so we tried to emphasize that context in the default search.
Revised search result display. Added keyword highlighting within result titles in Grouped or All view. Made Grouped results display checkboxes for bookmarking & digitized content indicators.
Advanced Search. Kept the global search box simple but added a modal Advanced search option that adds fielded search and some additional filters.
Digital Objects Integration
Digital objects from the Duke Digital Repository are presented inline in the finding aid component page.
DAO Roles. Indexed the @role attribute for <dao> elements; we used that to call templates for different kinds of digital content
Embedded Object Viewers. Used the Duke Digital Repository’s embed feature, which renders <iframe>s for images and AV.
Indexing
Whitespace compression. Added a step to the pipeline to remove extra whitespace before indexing. This seems to have slightly accelerated our time-to-index rather than slow it down.
More text, fewer strings. We encountered cases where note-like fields indexed as strings by ArcLight core (e.g., <scopecontent>) needed to be converted to text because we had more than 32,766 bytes of data (limit for strings) to put in them. In those cases, finding aids were failing to index.
Underscores. For the IDs that end up in a URL for a component, we added an underscore between the finding aid slug and the component ID. We felt these URLs would look cleaner and be better for SEO (our slugs often contain names).
Dates. Changed the date normalization rules (some dates were being omitted from indexing/display)
Bibliographic ID. We succeeded in indexing our bibliographic IDs from our EADs to power a collection-level Request button that leads a user to our homegrown requests system.
Formatting
EAD -> HTML. We extended the EAD-to-HTML transformation rules for formatted elements to cover more cases (e.g., links like <extptr> & <extref> or other elements like <archref> & <indexentry>)
Additional formatting and link render rules applied.
Formatting in Titles. We preserved bold or italic formatting in component titles.
ArcLight Core Contributions
We have been able to contribute some of our code back to the ArcLight core project to help out other adopters.
Setting the Stage
The behind-the-scenes foundational work deserves mention here — it represents some of the most complex and challenging aspects of the project. It makes the application development driving the changes I’ve shared above possible.
Gathered a diverse set of 40 representative sample EADs for testing
Dockerized our Duke ArcLight app to simplify developer environment setup
Provisioned a development/demo server for sharing progress with stakeholders
Automated continuous integration and deployment to servers using GitLabCI
Performed targeted data cleanup
Successfully got all 4,000 of our finding aids indexed in Solr on our demo server
Our team has accomplished a lot in three months, in large part due to the solid foundation the ArcLight core software provides. We’re benefiting from some amazing work done by many, many developers who have contributed their expertise and their code to the Blacklight and ArcLight codebases over the years. It has been a real pleasure to be able to build upon an open source engine– a notable contrast to our previous practice of developing everything in-house for finding aids discovery and access.
Still, much remains to be addressed before we can launch this summer.
The Road Ahead
Here’s a list of big things we still plan to tackle by July (other minor revisions/bugfixes will continue as well)…
ASpace -> ArcLight. We need a smoother publication pipeline to regularly get data from ArchivesSpace indexed into ArcLight.
Access & Use Statements. We need to revise the existing inheritance rules and make sure these statements are presented clearly. It’s especially important when materials are indeed restricted.
Relevance Ranking. We know we need to improve the ranking algorithm to ensure the most relevant results for a query appear first.
Analytics. We’ll set up some anonymized tracking to help monitor usage patterns and guide future design decisions.
Sitemap/SEO. It remains important that Google and other crawlers index the finding aids so they are discoverable via the open web.
Accessibility Testing / Optimization. We aim to comply with WCAG2.0 AA guidelines.
Single-Page View. Many of our stakeholders are accustomed to a single-page view of finding aids. There’s no such functionality baked into ArcLight, as its component-by-component views prioritize performance. We might end up providing a downloadable PDF document to meet this need.
More Data Cleanup. ArcLight’s feature set (especially around search/browse) reveals more places where we have suboptimal or inconsistent data lurking in our EADs.
More Community Contributions. We plan to submit more of our enhancements and bugfixes for consideration to be merged into the core ArcLight software.
If you’re a member of the Duke community, we encourage you toexplore our demo and provide feedback. To our fellow future ArcLight adopters, we would love to hear how your implementations or plans are shaping up, and identify any ways we might work together toward common goals.
Duke University is an early adopter for FOLIO, an open source library services platform that will give us tools to better support the information needs of our students, faculty, and staff. A core team in Library Systems and Integration Support began forming in January 2019 to help Duke move to FOLIO. I joined that team in January 2019 and began work as an IT Business Analyst.
In preparation for going-live with FOLIO, we formally kicked off our local implementation effort in January 2020. More than 40 local subject experts have joined small group teams to work on different parts of the FOLIO project. These experts are invaluable to Library IT staff: they know how the library’s work is done, which features need to be prioritized over others, and are committed to figuring out how to transition their work into the FOLIO environment.
If you’re reading this in April 2020 and thinking “wasn’t January ten years ago?” you’re not alone. Because the FOLIO Project is international, with partners all over the world, many of us are used to working via remote tools like Slack, Microsoft Teams, and Zoom. But that is a far cry from doing ALL of our work that way, while also taking care of our families and ourselves. It’s a huge credit to all library staff that while the University was swiftly pivoting to remote work, we were able to keep our implementation work going.
One of the first big, messy areas that we knew we needed to work on was using locations.
Locations are essential to how patrons know where an item is at the Duke Libraries. When you look up a book in our catalog and the system tells you Where to Find It, it’s using location information from our systems. Library staff also use locations to understand how often items are borrowed, decide when to move items to our off-campus storage, and decide when we to buy new items to keep our collections up to date.
A group of FOLIO team members came together from different working areas, including public services, cataloging, acquisitions, digital resources and assessment. I convened those discussions as a lead for our Configurations team. Over the course of late February and March 2020, we met three times as a group using Zoom and delved deep into learning about locations in our current system and how they will work in FOLIO. Staff members shared their knowledge with each other about their functional areas, allowing us to identify potential gaps in FOLIO functionality, as well as things we could improve now, without waiting for FOLIO to deploy.
This team identified two potential paths forward – one that was straightforward, and one that was more creative and would adapt the FOLIO four-level locations in a new way. In our final meeting – where we had hoped to decide between the two options, our subject experts grappled with the challenges, risks and rewards of the two choices and were able to recommend a path forward together. Ultimately, the team agreed that the creative option was the best choice, but both options would work – and that guidance helped us decide how to make a first pass on configuring locations and move the project forward.
The most important part of these meetings was valuing the expertise of our library staff and working to support them as they decided what would work the best for the library’s needs. I am deeply appreciative of the staff who committed the time to these discussions while also figuring out how to move their regular jobs to remote work. Our FOLIO implementation is all the better because of their collaborative spirit.
It’s been just over a year since we launched our new catalog in January of 2019. Since then we’ve made improvements to features, performance, and reliability, have developed a long term governance and development strategy, and have plans for future features and enhancements.
During the Spring 2019 semester we experienced a number of outages of the Solr index that powers the new catalog. It proved to be both frustrating and difficult to track down the root cause of the outages. We took a number of measures to reduce the risk of bot traffic slowing down or crashing the index. A few of these measures include limiting facet paging to 50 pages and results paging to 250 pages, as well as setting limits on OpenSearch queries. We also added service monitoring so we are automatically alerted when things go awry and automatic restarts under some known bad system conditions. We also identified that a bug in the version of Solr we were running was vulnerable to causing crashes for queries with particular characteristics. We have since applied a patch to Solr to address this bug. Happily, the index has not crashed since we implemented these protective measures and bug fixes.
Over the past year we’ve made a number of other improvements to the catalog including:
Caching of the home page and advanced search page have reduced page load times by 75%.
Subject searches are now more precise and do not include stemmed terms.
CDs and DVDs can be searched by accession number.
When digitized copies of Duke material are available at the Internet Archive, links to the digital copy are automatically generated.
Records can be saved to a bookmarks list and shared with others via a stable URL.
Eligible records now have a “Request digitization” link.
Many other small improvements and bug fixes.
We sometimes get requests for features that the catalog already supports:
Anchor links for quick access to record metadata, and a jump to top link.
While development has slowed, the core TRLN team meets monthly to discuss and prioritize new features and fixes, and dedicates time each month to maintenance and new development. We have a number of small improvements and bug fixes in the works. One new feature we’re working on is adding a citation generator that will provide copyable citations in multiple formats (APA, MLA, Chicago, Harvard, and Turabian) for records with OCLC numbers.
We welcome, read, and respond to all feedback and feature requests that come to us through the catalog’s feedback form. Let us know what you think.
After nearly a year of work, the libraries recently launched an updated version of the software stack that powers parts the Duke Digital Repository. This work primarily centered around migrating the underlying software in our Samvera implementation — which we use to power the DDR — from ActiveFedora to Valkyrie. Moving to Valkyrie gives us the benefits of improved stability along with the flexibility to use different storage solutions, which in turn provides us with options and some degree of future-proofing. Considerable effort was also spent on updating the public and administrative interfaces to use more recent versions of blacklight and supporting software.
Administrative interface for the DDR
We also used this opportunity to revise the repository landing page at repository.duke.edu and I was involved in building a new version of the home page. Our main goals were to make use of a header implementation that mirrored our design work in other recent library projects and that integrated our ‘unified’ navigation, while also maintaining the functionality required by the Samvera software.
DDR home page before the redesign
We also spent a lot of time thinking about how best to illustrate the components of the Duke Digital Repository while trying to keep the content simple and streamlined. In the end we went with a design that emphasizes the two branches of the repository; Library Collections and Duke Scholarship. Each branch in turn links to two destinations — Digitized Collections / Acquired Materials and the Research Data Repository / DukeSpace. The overall design is more compact than before and hopefully an improvement aesthetically as well.
Redesigned DDR home page
We also incorporated a feedback form that is persistent across the interface so that users can more readily report any difficulties they encounter while using the platform. And finally, we updated the content in the footer to help direct users to the content they are more than likely looking for.
Future plans include incorporating our header and footer content more consistently across the repository platforms along with bringing a more unified look and feel to interface components.
Check out the new design and let us know what you think!
Header Image: Collection of extinct and extant turtle skull microCT scans in MorphoSource: bit.ly/3DFossilTurtles
MorphoSource (www.morphosource.org) is a publicly accessible repository for 3D research data, especially data that represents biological specimens. Developers in Evolutionary Anthropology and the Library’s Software Services department have been working to rebuild the application, improving upon the current site’s technology and features. An important part of this rebuild is implementing a more robust data model that will let our users efficiently discover, curate, disseminate, and preserve their data.
A typical deposit in MorphoSource is a file or files that represent a scan of all or part of an organism – such as a bone, tooth, or entire animal. The files may be a mesh or series of images produced through a CT scan. In order to collect all the information necessary to understand the files, the specimen that the files represent, and the processes that created the data, the improved site will guide the researcher in providing additional context for their deposit at the same time that they upload their files. The following describes what kind of metadata the depositor can expect to provide as part of the submission process.
The first step is to determine whether the researcher’s current deposit is derived in some way from data that is already in MorphoSource, or if the depositor would like to also submit those files and metadata. For example, they may be depositing a mesh file that was created from original photographs that are already available through the site. By including links to the raw data in the repository, users can reprocess the files if needed, or run different processes in the future.
MorphoSource collects metadata to provide context for 3D data in the repository
Next, the researcher is asked to identify or describe the biological specimen that was imaged to create their data, either by entering the information themselves or importing it from another site like iDigBio. Metadata entered at this stage includes the information about the institution that owns the specimen, a taxonomy for the specimen, and additional identifying information such as the institution’s collection or catalog number. When the depositor fills in these fields, other users will be able to search for and compare data sets for the same specimen or species.
Moving on from the description of the organism, the depositor then provides information about the device that was used to image the specimen, either by selecting a device that is already in the repository’s database, or by creating a new record, including the manufacturer, model, and modality (MRI, photography, laser scan, etc.) of the device.
Once they have described the specimen and device used for imaging, the depositor then enters metadata about the imaging event itself, such as the technician who did the imaging, the date, and the software used.
With the imaging of the specimen described, the depositor then enters data about any processing that was done to create the files being deposited, including who was responsible, what software was used, and what the process was – for example, creating a mesh or point cloud from photographs. This metadata is important in case there is a need to reprocess the data in the future.
Finally, the researcher completes their deposit by uploading the files themselves. While some technical metadata is extracted automatically, MorphoSource will rely on data depositors to provide other information that is helpful for display, such as the orientation of the scan, or to identify the files, like an external id number. This technical metadata is important for long term preservation of the data sets.
Screen capture of example media page in MorphoSource
While the submission process asks the researcher to enter quite a bit of metadata, when users view the data on MorphoSource they have an understanding of what the data represents, how it was created, and how it relates to other data in the repository. It becomes easy to discover other media files representing the same specimen, or the same species, or to explore other items from the institution or researcher’s collections.
Archival collection guides—also known as finding aids—are a critical part of the researcher experience when finding and accessing materials from the David M. Rubenstein Rare Book & Manuscript Library and the Duke University Archives. At present, we have guides for nearly 4,000 collections with upwards of one million components that have some level of description. Our collection guides site is visited by researchers about 400 times per day.
An example collection guide.
In 2020, we’ll be making significant changes to our systems supporting archival discovery and access. The main impetus for this shift is that our current platform has grown outdated and is no longer sustainable going forward. We intend to replace our platform with ArcLight, open source software backed by a community of peer institutions.
Finding Aids at Duke: Innovations Past
At Duke, we’re no strangers to pushing the boundaries of archival discovery through advances in technology. Way back in the mid 1990s, Duke was among pioneers rendering SGML-encoded finding aids into HTML. For most of the 90s and aughts we used a commercial platform, but we decided to develop our own homegrown finding aids front-end in 2007 (using the Apache Cocoon framework). We then replaced it in 2012 with another in-house platform built on the Django web framework.
Since going home-grown in 2007, we have been able to find some key opportunities to innovate within our platforms. Here are a few examples:
2013. Added inline digital object viewing, as part of a consortium-wide collaborative foray into large-scale manuscript digitization.
2014. Crosswalked metadata using Schema.org markup and used different Google APIs to power rich snippet search results (since deprecated).
Example archival component with inline embedded digital object and sticky navigation.
So, Why Migrate Now?
Our current platform was pretty good for its time, but a lot has changed in eight years. The way we build web applications today is much different than it used to be. And beyond desiring a modern toolset, there are major concerns going forward around size, search/indexing, and support.
Size
We have some enormous finding aids. And we have added more big ones over the years. This causes problems of scale, particularly with an interface like ours that renders each collection as a single web page with all of the text of its contents written in the markup. One of our finding aids contains over 21,000 components; all told it is 9MB of raw EAD transformed into 15MB of HTML.
A large finding aid — 15MB of HTML in a single page.
No amount of caching or server wizardry can change the fact that this is simply too much data to be delivered and rendered in a single webpage, especially for researchers in lower-bandwidth conditions. We need a solution that divides the data for any given finding aid into smaller payloads.
Search
Google Custom Search does a pretty nice job of relevance ranking and highlighting where in a finding aid a term matches (after all, that’s Google’s bread-and-butter). However, when used to power search in an application like this, it has some serious limitations. It only returns a maximum of one hundred results per query. Google doesn’t index 100% of the text, especially for our larger finding aids. And some finding aids are just mysteriously omitted despite our best efforts optimizing our markup for SEO and providing a sitemap.
Search Results powered by Google Custom Search
We need search functionality where we have complete control of what gets indexed, when, and how. And we need assurance that the entirety of the materials described will be discoverable.
Support
This is a familiar story. Homegrown applications used for several years by organizations with a small number of developers and a large number of projects to support become difficult to sustain over time. We have only one developer remaining who can fix our finding aids platform when it breaks, or prevent it from breaking when the systems around it change. Many of the software components powering the system are at or nearing end-of-life and they can’t be easily upgraded.
Where to Go From Here?
It has been clear for awhile that we would soon need a new platform for finding aids, but not as clear what platform we should pursue. We had been eyeing the progress of two promising open source community-built solutions emerging from our peer institutions: the ArchivesSpace Public UI (PUI), and ArcLight.
Over 2018-19, my colleague Noah Huffman and I co-led a project to install pilot instances of the ASpace PUI and ArcLight, index all of our finding aids in them, and then evaluate the platforms for their suitability to meet Duke’s needs going forward. The project involved gathering feedback from Duke archivists, curators, research services staff, and our digital collections implementation team. We looked at six criteria: 1) features; 2) ease of migration/customization; 3) integration with other systems; 4) data cleanup considerations; 5) impact on existing workflows; 6) sustainability/maintenance.
Comparison of the ASpace PUI and ArcLight, out-of-the-box UI.
There’s a lot to like about both the ASpace PUI and ArcLight. Feature-wise, they’re fairly comparable. Both are backed by a community of talented, respected peers, and either would be a suitable foundation for a usable, accessible interface to archives. In the end, we recommended that Duke pursue ArcLight, in large part due to its similarity to so much of the other software in our IT portfolio.
Duke is certainly not alone in our desire to replace an outdated, unsustainable homegrown finding aids platform, and intention to use ArcLight as a replacement.
This fall, with tremendous leadership from Stanford University Libraries, five universities collaborated on developing the ArcLight software further to address shared needs. Over a nine week work cycle from August to October, we had the good fortune of working alongside Stanford, Princeton, Michigan, and Indiana. The team addressed needs on several fronts, especially: usability, accessibility, indexing, context/navigation, and integrations.
Duke joined Stanford, Princeton, Michigan, and Indiana for Arclight Community Work Cycle II in fall 2019.
Three Duke staff members participated: I was a member of the Development Team, Noah Huffman a member of the Product Owners Team, and Will Sexton on the Steering Group.
The work cycle is complete and you can try out the current state of the core ArcLight demo application. It includes several finding aids from each of the participating partner institutions. Here are just a few highlights that have us excited about bringing ArcLight to Duke:
Search results can be grouped by collection. Faceted navigation helps pinpoint items of interest from deep within a finding aid.
Components are individually discoverable and have their own pages. Integrations with online content viewers and request systems such as Aeon are possible.
Here’s a final demo video (37 min) that nicely summarizes the work completed in the fall 2019 work cycle.
Lighting the Way
With some serious momentum from the fall ArcLight work cycle and plans taking shape to implement the software in 2020, the Duke Libraries intend to participate in the Stanford-led, IMLS grant-funded Lighting the Way project, a platform-agnostic National Forum on Archival Discovery and Delivery. Per the project website:
Lighting the Way is a year-long project led by Stanford University Libraries running from September 2019-August 2020 focused on convening a series of meetings focused on improving discovery and delivery for archives and special collections.
Coming in 2020: ArcLight Implementation at Duke
There’ll be much more share about this in the new year, but we are gearing up now for a 2020 ArcLight launch at Duke. As good as the platform is now out-of-the-box, we’ll have to do additional development to address some local needs, including:
Duke branding
An efficient preview/publication workflow
Digital object viewing / repository integration
Sitemap generation
Some data cleanup
Building these local customizations will be time well-spent. We’ll also look for more opportunities to collaborate with peers and contribute code back to the community. The future looks bright for Duke with ArcLight lighting the way.
Notes from the Duke University Libraries Digital Projects Team