One of the pleasures of working in an academic library is the opportunity it presents for engagement with communities in our field of work. One such community that Duke University Libraries has been a member for some time now is Samvera, which is an open-source community for software development that supports digital repositories. I, along with my colleagues Jim Coble, Moira Downey, and Ayse Durmaz, recently attended the Samvera Connect conference in Salt Lake City, and this post is a report on our experience there.
It was my first time attending Samvera Connect, and so it was a chance for me to put faces with names that I had come to know from discussions on Slack and elsewhere. Moira and I participated in a panel with some of our colleagues from the University of Michigan and Indiana University, and it was great to have the opportunity to meet them in person and talk about our work on digital repositories. We spoke on the theme of using the Hyrax platform for research data; you can see our slides here. Moira and I also had a poster on the same theme.
I attended the meetup of the Samvera Interest Group for Advising the Hyrax Roadmap, or SIGAHR, as it is known. There was some introspection in the group about the suitability of the acronym, though it produced no resolution one way or another. Much of the conversation in that meeting focused on support and developer resources for the Hyrax platform. It’s one of the central questions for an open source community like Samvera, and one we’re giving some consideration at Duke after returning from the meeting.
Otherwise, there were several interesting presentations that I attended and would highlight. First, the team from the WGBH Media Library did a presentation titled “Building on Hyrax and Avalon for the American Archive of Public Broadcasting” that I enjoyed a lot. That team has great energy and has developed some interesting solutions for a complex and compelling project.
I also learned much at the workshop titled “Managing Samvera-based Projects & Services,” which was conducted by Hannah Frost, Nabeela Jaffer, and Steve Van Tuyl. Thinking in terms of an extended community requires a different mindset from they way we work locally and on our campuses.
Finally, one of the most interesting presentations came from Hannah Frost and Christina Harlow from Stanford Libraries, outlining the new architecture they have developed for the next iteration of the Stanford Digital Library. It was titled “Making TACOs for Hydras,” and the slides are not available, but much of what they covered is included in the github documentation here.
I’ll conclude there, and share the following sections were authored by two of my colleagues at Duke.
Valkyrie and Hyrax (contributed by Jim Coble)
A focus of attention at this year’s Samvera Connect was Valkyrie, a project which enables the use of multiple backends for storing files and metadata in Samvera applications. Historically, Hydra/Samvera applications have had only one option for file and metadata storage; namely, a Fedora repository. Recent versions of Fedora have experienced performance problems in certain circumstances, leading the community to look for different options for storing files and metadata where performance is a key requirement. Valkyrie allows a project to pick and choose among multiple backends depending on the needs of the project. Projects can still use a Fedora repository for storage if that is desired but also have the option of using a Postgres database or Solr for metadata storage and/or a disk filesystem for file storage. Other metadata and file storage adapters are under development to provide Valkyrie with even more options.
Discussions at the conference favored moving forward to convert Hyrax (a key Samvera project) to use Valkyrie and we’ll likely see work happening on that soon. Our Research Data Repository is based on Hyrax, so the eventual Valkyrization of Hyrax would provide us with additional storage options for the files and metadata in that repository (which currently uses Fedora 4). Valkyrie may also be a component in a future migration of the legacy Duke Digital Repository, enabling us to move it off the no-longer-supported Fedora 3 version.
Discoverability of Research Data (contributed by Moira Downey)
In addition to the back-end infrastructure, another growing area of interest around our Hyrax-based Research Data Repository has been increased visibility and discoverability of the content that we publish and preserve through our software applications. New services like Google’s Dataset Search are making it easier for scholars and researchers to find the data they need to support their scholarly endeavors. As institutions responsible for the publication of these data, we want to ensure that the scholarship our repositories are hosting is indexed by these services, heightening its visibility, and hopefully, its usability. Over a lunchtime breakout session, the Repository Management Interest Group compiled a list of services similar to Google Dataset Search in nature (Google Scholar, Unpaywall.org, Crossref, Datacite, and SHARE, among others) that we intend to investigate further, with a particular eye toward how our existing repositories are integrated with these services and where we might improve. The group also intends to consider what local practices we might implement to optimize the discoverability of our content, and what changes to the code base should be advocate for in order to connect our content to the web at large.