All posts by Will Sexton

A Statement of Commitment

The featured image is from a mockup of a new repositories home page that we’re working on in the Libraries, planned for rollout in January of 2020.

Working at the Libraries, it can be dizzying to think about all of our commitments.

There’s what we owe our patrons, a body of so many distinct and overlapping communities, all seeking to learn and discover, that we could split the library along an infinite number of lines to meet them where they work and think.

There’s what we owe the future, in our efforts to preserve and share the artifacts of knowledge that we acquire on the market, that scholars create on our own campus, or that seem to form from history and find us somehow.

There’s what we owe the field, and the network of peer libraries that serve their own communities, each of them linked in a web of scholarship with our own. Within our professional network, we seek to support and complement one another, to compete sometimes in ways that move our field forward, and to share what we learn from our experiences.

The needs of information technology underlie nearly all of these activities, and to meet those needs, we have an IT staff that’s modest in size, but prodigious in its skill and its dedication to the mission of the Libraries. Within that group, the responsibility for creating new software, and maintaining what we have, falls to a small team of developers and devops engineers. We depend on them to enhance and support a wide range of platforms, including our web services, our discovery platforms, and our digital repositories.

This fall, we did some reflection on how we want to approach support for our repository platforms. The result of that reflection was a Statement of Commitment to Repositories Support and Development, a document of roughly a page that expresses what we consider to be our values in this area, and the context of priorities in which we do that work.

The committee that created the statement was our Digital Preservation and Publishing Program, or DP3 as call it in house. We summarized our values as “openness, community and peer engagement, and independence from vended platforms,” which have “guided us to build our repositories on open source software platforms.” We place that work within the context of very large, looming priorities like our transition to FOLIO as our Library Services Platform, and the project to renovate Lilly Library. There are others, not mentioned in the statement, that fill the pages of this blog.

The statement is explicit that we will not seek to find alternative platforms for our repository services in the next several years, and in particular while the FOLIO transition is underway. This decision is informed by our recognition that migration of content and services across platforms is complex and expensive. It’s also a recognition that we have invested a lot into these existing platforms, and we want to carve out as much space as we can for our talented staff to focus on maintaining and improving them, rather than locking ourselves into all-consuming cycles of content migration.

From a practical perspective, and speaking as the manager who oversees software development in the Libraries, I see this statement as part of an overall strategy to bring focus to our work. It’s a small but important symbolic measure that recognizes the drag that we create for our software team when give in to our urge to prioritize everything. 

The phrase “context switching” is one that we have borrowed from the parlance of operating systems to describe the effects on a developer of working on multiple projects at once. There are real costs to moving between development environments, code bases, and architectures on the same day, in the same week, during the same sprint, or within even an extended work cycle. We also call this problem “multi-tasking,” and the penalty it imposes of performance is well documented

Even more than performance, I think of it as a quality of life concern. People are generally happier and more invested when they’re able to do quality work. As a manager, I can work with scheduling and planning to try to mitigate those effects of multitasking on our team. But the responsibility really lies with the organization. We have our commitments, and they are vast in size and scope. We owe it to ourselves to do some introspection now and again, and ask what we can realistically do with what we have, or more accurately, who we are.

Managing impermanence – migration of the Libraries’ digital exhibits

Post contributed by Claire Cahoon, student in the master’s program at the School of Information and Library Science, UNC-Chapel Hill.

This summer I worked as a field experience student in the Software Services department migrating digital exhibits into Omeka 2, Duke’s most current platform. The ultimate goal was to start and document the process of moving exhibits from legacy platforms into Omeka 2.

The reasoning behind the project became clear as we started creating an index of all of the digital exhibits on display in the exhibits website. Out of 97 total exhibits, there were varying degrees of functionality, from the most recent and up-to-date exhibits, to sites with broken links and pages where only text would display, leaving out crucial images. Centralizing these into a single platform should make it easier to create, support, and maintain all of these exhibits.

Screenshot of the sidebar of an exhibit, showing the link to the previous version of the exhibit in the Internet Archive
Screenshot of the sidebar of an exhibit, showing the link to the previous version of the exhibit in the Internet Archive

I found exhibits in Omeka 1, Cascade, Scriptorium, JAlbum, and even found a few mystery platforms that we never identified. Since it was the largest, we decided to work on the Omeka 1 group over the summer, and this week I finished migrating all 34 exhibits – that means that after a few adjustments to make the new exhibits available, Omeka 1 can be shut off!

We worked with Meg Brown, Exhibits Coordinator for the Libraries, and the exhibits department to figure out how each exhibit needed to be represented. Since we were managing expectations from lots of different stakeholders, we landed on the idea to include a link to the archived version of each exhibit in the WayBack machine, in case the look and feel of the new exhibits is limiting for anyone used to Omeka 1.

Working with the internet archive links and sorting through broken pieces of these exhibits really put into perspective how impermanent the internet is, even for seemingly static information. Without much maintenance, these exhibits lost some of the core content when video links changed, references were lost, and even the most well-written custom code stopped working. I hope that my work this summer will help keep these exhibit materials in working order while also eliminating the need to continue supporting for Omeka 1.

While migrating, I came across a few favorite exhibits and items that combined interesting content and some updated features in Omeka 2:

Cover of “Anxious homes: cursory-cleaning for the imminent arrival of visitors or how to give the impression of a clean house in under 20 minutes” by Jackie Batey.
Cover of “Anxious homes: cursory-cleaning for the imminent arrival of visitors or how to give the impression of a clean house in under 20 minutes” by Jackie Batey. Available in the Rubenstein Library: N7433.4.B38 A59 2006

Book + Art: Artists’ books from the Sallie Bingham Center for Women’s History and Culture (and the old version of Book + Art)

John Hope Franklin: Imprint of an American Scholar (and the old version of the John Hope Franklin exhibit)

Cheap Thrills: The Highs and Lows of Paris’s Cabaret Culture (and the old version of Cheap Thrills)

Medicology, or, Home encyclopedia of health: a complete family guide... Vol. I, by Joseph Gibbons Richardson (1904).
Medicology, or, Home encyclopedia of health: a complete family guide… Vol. I, by Joseph Gibbons Richardson (1904). Available in the Rubenstein Library: RC81 .R52 1904

Animated Anatomies: The Human Body in Anatomical Texts from the 16th to 21st Centuries (and the old version of Animated Anatomies)

Omeka still has some quirks to work out, and the accessibility of the pages and the metadata display are still in the works. However, migrating these exhibits into Omeka 2 will make them much easier to support and change for improvements. Thanks to the team that worked with me and taught me so much this summer: Will Sexton, Michael Daul, and Meg Brown!

News Feeds, Microfilm, and the Stories We Tell Ourselves

A little over a week ago, I watched the searing and provocative TED talk by British journalist Carole Cadwalladr, “Facebook’s role in Brexit – and the threat to democracy.” It got me thinking about a few library things, which I thought might make for an interesting blog post. Then thinking about these library things took me down a series of rabbit holes, interconnecting and nuanced and compelling enough to chew up the entirety of the time I’d set aside for my turn in the Bitstreams blog rotation. There is no breezy, concise blog post that could pull them all together so I’m just going to do with it what I can with two of the maybe four or five rabbit holes that I fell into.

Cadwalladr took the stage at a TED conference sponsored by Facebook and Google, and spoke about her investigations into the role of Facebook and Cambridge Analytica in the Brexit vote in 2016. Addressing the big tech leaders present – the “Gods of Silicon Valley: Mark Zuckerberg, Sheryl Sandberg, Larry Page, Sergey Brin and Jack Dorsey” – she levelled a devastating j’accuse – “[W]hat the Brexit vote demonstrates is that liberal democracy is broken. And you broke it. This is not democracy — spreading lies in darkness, paid for with illegal cash, from God knows where. It’s subversion, and you are accessories to it.”

It was a courageous act, and Cadwalladr deserves celebration and recognition for it, even if the place it leaves us is a bleak one. As she would admit later, she felt massive pressure as she spoke. I had a number of reactions to her talk, but there was a line in particular got me thinking about library things. It occurred when she explained to that audience that “this entire referendum took place in darkness, because it took place on Facebook…, because only you see your news feed, and then it vanishes, so it’s impossible to research anything.” It provoked me to think about how we use “news feeds” – in the form of newspapers themselves – in the study of history, and the role that libraries play in preserving them.

Continue reading News Feeds, Microfilm, and the Stories We Tell Ourselves

Community and Collaboration at Samvera Connect 2018

One of the pleasures of working in an academic library is the opportunity it presents for engagement with communities in our field of work. One such community that Duke University Libraries has been a member for some time now is Samvera, which is an open-source community for software development that supports digital repositories. I, along with my colleagues Jim Coble, Moira Downey, and Ayse Durmaz, recently attended the Samvera Connect conference in Salt Lake City, and this post is a report on our experience there.

It was my first time attending Samvera Connect, and so it was a chance for me to put faces with names that I had come to know from discussions on Slack and elsewhere. Moira and I participated in a panel with some of our colleagues from the University of Michigan and Indiana University, and it was great to have the opportunity to meet them in person and talk about our work on digital repositories. We spoke on the theme of using the Hyrax platform for research data; you can see our slides here. Moira and I also had a poster on the same theme.

I attended the meetup of the Samvera Interest Group for Advising the Hyrax Roadmap, or SIGAHR, as it is known. There was some introspection in the group about the suitability of the acronym, though it produced no resolution one way or another. Much of the conversation in that meeting focused on support and developer resources for the Hyrax platform. It’s one of the central questions for an open source community like Samvera, and one we’re giving some consideration at Duke after returning from the meeting.

Otherwise, there were several interesting presentations that I attended and would highlight. First, the team from the WGBH Media Library did a presentation titled “Building on Hyrax and Avalon for the American Archive of Public Broadcasting” that I enjoyed a lot. That team has great energy and has developed some interesting solutions for a complex and compelling project.

I also learned much at the workshop titled “Managing Samvera-based Projects & Services,” which was conducted by Hannah Frost, Nabeela Jaffer, and Steve Van Tuyl. Thinking in terms of an extended community requires a different mindset from they way we work locally and on our campuses.

Finally, one of the most interesting presentations came from Hannah Frost and Christina Harlow from Stanford Libraries, outlining the new architecture they have developed for the next iteration of the Stanford Digital Library. It was titled “Making TACOs for Hydras,” and the slides are not available, but much of what they covered is included in the github documentation here.

I’ll conclude there, and share the following sections were authored by two of my colleagues at Duke.

Valkyrie and Hyrax (contributed by Jim Coble)

A focus of attention at this year’s Samvera Connect was Valkyrie, a project which enables the use of multiple backends for storing files and metadata in Samvera applications.  Historically, Hydra/Samvera applications have had only one option for file and metadata storage; namely, a Fedora repository. Recent versions of Fedora have experienced performance problems in certain circumstances, leading the community to look for different options for storing files and metadata where performance is a key requirement.  Valkyrie allows a project to pick and choose among multiple backends depending on the needs of the project. Projects can still use a Fedora repository for storage if that is desired but also have the option of using a Postgres database or Solr for metadata storage and/or a disk filesystem for file storage. Other metadata and file storage adapters are under development to provide Valkyrie with even more options.

Discussions at the conference favored moving forward to convert Hyrax (a key Samvera project) to use Valkyrie and we’ll likely see work happening on that soon.  Our Research Data Repository is based on Hyrax, so the eventual Valkyrization of Hyrax would provide us with additional storage options for the files and metadata in that repository (which currently uses Fedora 4).  Valkyrie may also be a component in a future migration of the legacy Duke Digital Repository, enabling us to move it off the no-longer-supported Fedora 3 version.

Discoverability of Research Data (contributed by Moira Downey)

In addition to the back-end infrastructure, another growing area of interest around our Hyrax-based Research Data Repository has been increased visibility and discoverability of the content that we publish and preserve through our software applications. New services like Google’s Dataset Search are making it easier for scholars and researchers to find the data they need to support their scholarly endeavors. As institutions responsible for the publication of these data, we want to ensure that the scholarship our repositories are hosting is indexed by these services, heightening its visibility, and hopefully, its usability. Over a lunchtime breakout session, the Repository Management Interest Group compiled a list of services similar to Google Dataset Search in nature (Google Scholar, Unpaywall.org, Crossref, Datacite, and SHARE, among others) that we intend to investigate further, with a particular eye toward how our existing repositories are integrated with these services and where we might improve. The group also intends to consider what local practices we might implement to optimize the discoverability of our content, and what changes to the code base should be advocate for in order to connect our content to the web at large.

 

DDR-RD: Previewing DUL’s new platform for research data

While we sometimes talk about “the repository” as if it were a monolith at Duke University Libraries, we have in fact developed and maintained two core platforms that function as repository applications. I’ll describe them briefly, then preview a third that is in development, as well as the rationale behind expanding in this way.

Continue reading DDR-RD: Previewing DUL’s new platform for research data

Living Our Best DSpace Lives

Last week, an indefatigable team at Duke University Libraries released an upgraded version of the DukeSpace platform, completing  the first phase of the critical project that I wrote about in this space in January.  One member of the team remarked that we now surely have “one of the best DSpaces in the world,” and I dare anyone to prove otherwise.

DukeSpace serves as the Libraries’ open-access institutional repository, which makes it a key aspect of our mission to “partner in research,” as outlined in our strategic plan.  As I wrote in January, the version of the DSpace platform that underlies the service had been stuck at 1.7, which was released during 2010 – the year the iPad came out, and Lady Gaga wore a meat dress. We upgraded to version 6.2, though the differences between the two versions are so great that it would be more accurate to call the project a migration.

That migration turned out to be one of the more complex technology projects we’ve undertaken over the years. The main complicating factor was the integration with Symplectic Elements, the Research Information Management System (RIMS) that powers the Scholars at Duke site. As far as we know, we are the first institution to integrate Elements with DSpace 6.2. It was a beast to do, and we are happy to share our knowledge gained if it will help any of our peers out there trying to do the same thing.

Meanwhile, feel free to click on over to and enjoy one of the best DSpaces in the world. And congratulations to one of the mightiest teams assembled since Spain won the World Cup!

Upgrading DukeSpace

The year 2006 was charged with epoch-defining events: Zidane head-butted Materazzi, the astronomers downgraded Pluto, Google bought Youtube, and Duke University Libraries rolled out DukeSpace (PDF). Built on the DSpace platform, DukeSpace has served as our institutional repository for almost a dozen years now, providing access for electronic theses and dissertations and Duke faculty publications.

While the landscape of open access has changed much over the intervening period, we can’t really say the same about the underlying platform of DukeSpace.

At Duke, faculty approved an open access policy in March of 2010; it was a few weeks previous that DSpace 1.6 was released. By the end of the year it had moved ahead a dot release to 1.7.  Along the way, we did some customization to integrate with Symplectic Elements – the Research Information Management System (RIMS) that powers the Scholars@Duke site. That work essentially locked us into that version of DSpace, which remains in operation despite its final release in July 2013, and having reached its end of life four years ago.

Animated GIF of Zinedine Zidane head-butting an opponent in the final game of the 2006 FIFA World Cup.
If only I had the skills to photoshop DSpace 6.2 in for Zidane, and 1.7 for Materazzi. GIF from Something Awful.

Beginning last November, we committed to a full upgrade of the DukeSpace platform to the current version (6.2 as of this writing). We had considered alternatives, including replacing the platform with Hyrax, but concluded that that approach would be too complex.

So we are currently coordinating work across a technology team and the Libraries’ open access group. Some of the concerns that we have encountered include:

  • Integrating with updated versions of Symplectic Elements. That same integration that locked us into a version years ago lies at the center of this upgrade. We have basically been handling this process as a separate thread of the larger project. It will be critical for us to maintain the currency of this dependency with subsequent upgrades to both products.
  • Rethinking metadata architecture. The conceptual basis of the institutional repository is greatly informed by the definition and use of metadata. Our Metadata Architect, Maggie Dickson, mentioned this area in her “Metadata Year-in-Review” post back in December. She highlighted the need to make “real headway tackling the problem of identity management – leveraging unique identifiers for people (ORCIDs, for example), rather than relying on name strings, which is inherently error prone.” Many other questions have arisen this area, requiring extensive and ongoing discussion and coordination between the tech team and the stakeholders.
  • Migration of legacy stats data. How do we migrate usage stats between two versions of a platform so remote from each other in time? It has taken some trial-and-error to solve this one.
  • Replicating or enhancing existing workflows. Again, when two versions of a system are so different that an upgrade seems more like a platform migration, and our infrastructure and staffing have changed over the years, how do we reproduce existing workflows without disrupting them? What opportunities can we take to improve on them without destabilizing the project? Aside from the integration with Elements, we also have the important workflow related to the ingest of electronic theses & dissertations, which employs both self-deposit and file transfer from ProQuest. Re-envisioning and re-implementing workflows such as these takes careful analysis and planning.

While we have run into a few complicating issues during the process so far, we feel confident that we remain on track to roll out the upgraded version during the first quarter of 2018. Pluto remains a dwarf planet, Zidane manages Real Madrid (for now),  and to Mark Cuban’s apparent distress, Google still owns Youtube. Soon our own story from 2006 should reach a kind of resolution.

Photograph of the surface of Pluto, taken by the New Horizons spacecraft.
“Pluto’s Majestic Mountains, Frozen Plains and Foggy Hazes” – Image from NASA. Credits: NASA/JHUAPL/SwRI.

 

Squirlicorn, spirit guide of the digital repository: Four things you should know

One thing I’ve learned on my life’s journey is the importance of knowing your spirit guide.

That’s why, by far the most important point that I made in a talk at the TRLN Annual Meeting in July is that the spirit guide of the digital repository movement is the squirlicorn.

Continue reading Squirlicorn, spirit guide of the digital repository: Four things you should know

Rethinking Repositories at CNI Spring ’17

One of the main areas of emphasis for the CNI Spring 2017 meeting was “new strategies and approaches for institutional repositories (IR).” A few of us at UNC and Duke decided to plug into the zeitgeist by proposing a panel to reflect on some of the ways that we have been rethinking – or even just thinking about – our repositories.

Continue reading Rethinking Repositories at CNI Spring ’17

A New Home Page for the Duke Digital Repository

Today is an eventful day for the Duke Digital Repository (DDR). Later today, I and several of my colleagues will present on the DDR at Day 1 of the Duke Research Computing Symposium. We’ll be introducing new staff who’ll focus on managing, curating, and preserving research data, as well as the role that the DDR will play as both a service and a platform. This event serves as a soft launch of our plans – which I wrote about last September – to support the work of researchers at Duke.

Out-of-the-box DDR home page of the past

At the same time, the DDR gets a new look, at least on its home page. For years, we’ve used a rather drab and uninformative page that was essentially the out-of-the-box rendering by Blacklight, our discovery and access layer in the repository stack. Last fall, our DDR Program Committee took up the task of revamping that page to reflect how we conceptualize the repository and its major program areas.

New DDR home page with aerial hero image and three program areas.

The page design will evolve with the DDR itself, but it went live earlier today. More information about the DDR initiative and our plans will follow in the coming months.