All posts by Will Sexton

Behind the Scenes, MorphoSource, Projects, Technology

The Shortest Year

October 15, 2021 Will Sexton

Featured image – screenshot from the Sunset Tripod2 project charter.

Realizing that my most recent post here went up more than a year ago, I pause to reflect. What even happened over these last twelve months? Pandemic and vaccine, election and insurrection, mandates and mayhem – outside of our work bubble, October 2020 to October 2021 has been a churn of unprecedented and often dark happenings. Bitstreams, however, broadcasts from inside the bubble, where we have modeled cooperation and productivity, met many milestones, and kept our collective cool, despite working nearly 100% remotely as a team, with our stakeholders, and across organizational lines.

Last October, I wrote about Sunsetting Tripod2, a homegrown platform for our digital collections and archival finding aids that was also the final service we had running on a physical server. “Firm plans,” I said we had for the work that remained. Still, in looking toward that setting sun, I worried about “all sorts of comical and embarrassing misestimations by myself on the pages of this very blog over the years.” I was optimistic, but cautiously so, that we would banish the ghosts of Django-based systems past.

Reader, I have returned to Bitstreams to tell you that we did it. Sometime in Q1 of 2021, we said so long, farewell, adieu to Tripod2. It was a good feeling, like when you get your laundry folded, or your teeth cleaned, only better.

However, we did more in the past year than just power down exhausted old servers. What follows are a few highlights from the work of the Digital Strategies and Technology division of Duke University Libraries’ software developers, and our collaborators (whom we cannot thank or praise enough) over the past twelve months.

In November, Digital Projects Developer Sean Aery posted on Implementing ArcLight: A Reflection. The work of replacing and improving upon our implementation for the Rubenstein Library’s collection guides was one of the main components that allowed us to turn off Tripod2. We actually completed it in July of 2020, but that team earned its Q4 victory laps, including Sean’s post and a session at Blacklight Summit a few days after my own post last October.

As the new year began, the MorphoSource team rolled out version 2.0 of that platform. MorphoSource Repository Developer Jocelyn Triplett shared a A Preview of MorphoSource 2 Beta in these pages on January 20. The launch took place on February 1.

After more than two years of work, we are happy to announce MorphoSource 2.0 is available! Includes color mesh and CT scan web previews, expanded metadata for non-CT modalities, and a greatly improved UI. Let us know what you think! https://t.co/D0zx9PDDE3

— MorphoSource (@MorphoSource) February 1, 2021

One project we had underway as I was writing last October was the integration of Globus, a transfer service for large datasets, into the Duke Research Data Repository. We completed that work in Q1 of 2021, prompting our colleague, Senior Research Data Management Consultant Sophia Lafferty-Hess, to post Share More Data in the Duke Research Data Repository! in a neighboring location that shares our charming cul-de-sac of library blogs.

The seventeen months since the murder of George Floyd have seen major changes in how we think and talk about race in the Libraries. We committed ourselves to the DUL Racial Justice Roadmap, a pathway for recognizing and attacking the pervasive influence of white supremacy in our society, in higher education, at Duke, in the field of librarianship, in our library, in the field of information technology, and in our own IT practices. During this time, members of our division have also participated broadly in DiversifyIT, a campus-wide group of IT professionals who seek to foster a culture of inclusion “by providing professional development, networking, and outreach opportunities.”

Digital Projects Developer Michael Daul shared his own point of view with great thoughtfulness in his April post, What does it mean to be an actively antiracist developer? He touched on representation in the IT industry, acknowledging bias, being aware of one’s own patterns of communication, and bringing these ideas to the systems we build and maintain.

One of the ideas that Michael identified for software development is web accessibility; as he wrote, we can “promote the benefits of building accessible interfaces that follow the practices of universal design.” We put that idea into action a few months later, as Sean described in precise technical terms in his July post, Automated Accessibility Testing and Continuous Integration. Currently that process applies to the ArcLight platform, but when we have a chance, we’ll see if we can expand it to other services.

The question of when we’ll have that chance is a big one, as it hinges on the undertaking that now dominates our attention. Over the past year we have ramped up on the migration of our website from Drupal 7 to Drupal 9, to head off the end-of-life for 7. This project has transformed into the raging beast that our colleagues at NC State Libraries warned us it would at the Code4Lib Southeast in May of 2019.

Screenshot of NC State Libraries presentation on Drupal migration — They warned us – Screenshot from “Drupal 7 to Drupal 8: Our Journey,” by Erik Olson and Meredith Wynn of NC State Libraries’ User Experience Department, presented at Code4Lib Southeast in May of 2019.

We are on a path to complete the Drupal migration in March 2022 – we have “firm plans,” you could say – and I’m certain that its various aspects will come to feature in Bitstreams in due time. For now I will mention that it spawned two sub-projects that have challenged our team over the past six months or so, both of which involve refactoring functionality previously implemented as Drupal modules into standalone Rails applications:

Quicksearch, aka unified search, aka “Bento search” – see Michael’s Bento is Coming! from 2014 – is now a standalone app; it also uses the open-source tool Apache Nutch, rather than Google CSE.
The staff directory app that went live in 2019, which Michael wrote about in Building a new Staff Directory, also no longer runs as a Drupal module.

Each of these implementations was necessary to prepare the way for a massive migration of theme and content that will take place over the coming months.

Screenshot of a Jira issue related to the Decouple Staff Directory project.

When it’s done, maybe we’ll have a chance to catch our breath. Who can really say? I could not have guessed a year ago where we’d be now, and anyway, the period of the last twelve months gets my nod as the shortest year ever. Assuming we’re here, whatever “here” means in the age of remote/hybrid/flexible work arrangements, then I expect we’ll be burning down backlogs, refactoring this or that, deploying some service, and making firm plans for something grand.

Duke Digital Repository, Technology

Sunsetting Tripod2

October 2, 2020 Will Sexton

Featured image – Wayback Machine capture of the Tripod2 beta site in February, 2011.

We all design and create platforms that work beautifully for us, that fill us with pride as they expand and grow to meet our programmatic needs, and all the while the world changes around us, the programs scale beyond what we envisioned, and what was once perfectly adaptable becomes unsustainable, appearing to us all of the sudden as a voracious, angry beast, threatening to consume us, or else a rickety contraption, teetering on the verge of a disastrous collapse. I mean, everyone has that experience, right?

In March of 2011, a small team consisting primarily of me and fellow developer Sean Aery rolled out a new, homegrown platform, Tripod2. It became the primary point of access for Duke Digital Collections, the Rubenstein Library’s collection guides, and a handful of metadata-only datasets describing special collections materials. Within a few years, we had already begun talking about migrating all the Tripod2 stuff to new platforms. Yet nearly a decade after its rollout, we still have important content that depends on that platform for access.

Nevertheless, we have made significant progress. Sunsetting Tripod2 became a priority for one of the teams in our Digital Preservation and Publishing Program last year, and we vowed to follow through by the end of 2020. We may not make that target, but we do have firm plans for the remaining work. The migration of digital collections to the Duke Digital Repository has been steady, and nears its completion. This past summer, we rolled out a new platform for the Rubenstein collection guides, based on the ArcLight framework. And now have a plan to handle the remaining instances of metadata-only databases, a plan that itself relies on the new collection guides platform.

We built Tripod2 on the triptych of Python/Django, Solr, and a document base of METS files. There were many areas of functionality that we never completely developed, but it gave us a range of capability that was crucial in our thinking about digital collections a decade ago – the ability to customize, to support new content types, and to highlight what made each digital collection unique. In fact, the earliest public statement that I can find revealing the existence of Tripod2 is Sean’s blog post, “An increasingly diverse range of formats,” from almost exactly ten years ago. As Sean wrote then, “dealing with format complexity is one of our biggest challenges.”

As the years went by, a number of factors made it difficult to keep Tripod2 current with the scope of our programs and the changing of web technology. The single most prevalent factor was the expanding scope of the Duke Digital Collections program, which began to take on more high-volume digitization efforts. We started adding all of our new digital collections to the Duke Digital Repository (DDR) starting in 2015, and the effort to migrate from Tripod2 to the repository picked up soon thereafter. That work was subject to all sorts of comical and embarrassing misestimations by myself on the pages of this very blog over the years, but thanks to the excellent work by Digital Collections and Curation Services, we are down to the final stages.

Collection and item counters from the Duke Digital Repository's homepage for Duke Digital Collections, showing the volume of digital collections roughly doubling between 2018 and 2020. — Collection and item counters from the Duke Digital Repository’s homepage for Duke Digital Collections, taken from the Internet Archive’s Wayback Machine, approximately a year apart in 2018, 2019, and 2020. The volume of digital collections has roughly doubled in that time, due to both the addition of new collections, and the migration of collections from Tripod2.

Moving digital collections to the DDR went hand-in-hand with far less customization, and far less developer intervention to publish a new collection. Where we used to have developers dedicated to individual platforms, we now work together more broadly as a team, and promote redundancy in our development and support models as much as we can. In both our digital collections program and our approach to software development, we are more efficient and more process-driven.

Given my record of predictions about our work on this blog, I don’t want to be too forward in announcing this transition. We all know that 2020 doesn’t suffer fools gladly, or maybe it suffers some of them but not others, and maybe I shouldn’t talk about 2020 just like this, right out in the open, where 2020 can hear me. So I’ll just leave it here – in recent times, we have done a lot of work toward saying goodbye to Tripod2. Perhaps soon we shall.

Uncategorized

DUCC, TUCC, and the origins of digital computing in North Carolina

April 3, 2020 Will Sexton

The feature image is”Triangle University Computation Center IBM System/370 Hardware Configuration,” from Network Management Survey, published in 1974.

The Cut Study and DUCC

The Fall semester of 1958 saw deep concern among the Duke student body with a pressing issue – cutting class. The Undergraduate Faculty Council Committee had taken up a study of class attendance, and planned to issue recommendations for policies on “absence limitations.” Its chair was John Jay Gergen, who had been on the faculty at Duke more than 20 years at that point, serving most of them as head of the Mathematics Department. In September, the Chronicle urged him to “make a sincere effort to show the students the seriousness of the situation and to explain their findings.” They warned him not to “announce suddenly a new policy to the students,” which would be a form of “[t]actless communication” that might “breed discontent among the students.” By all indications, Gergen ignored them.

While the “Cut Study” may have seemed enormously consequential to the students at the time, Gergen was leading a different effort that would have far more lasting impact at Duke. By at least one account, he was a large and imposing man, which could also describe his influence on campus. He channeled some of that influence into his work as the senior faculty member and administrator who oversaw the effort to bring digital computing to the university. While Gergen acted mainly in an administrative role, it was a protege of his who authored the grants that brought in the funding, and did the legwork on setting up an operational computing center.

Continue reading DUCC, TUCC, and the origins of digital computing in North Carolina →

Behind the Scenes, Duke Digital Repository

A Statement of Commitment

November 11, 2019 Will Sexton 2 Comments

The featured image is from a mockup of a new repositories home page that we’re working on in the Libraries, planned for rollout in January of 2020.

Working at the Libraries, it can be dizzying to think about all of our commitments.

There’s what we owe our patrons, a body of so many distinct and overlapping communities, all seeking to learn and discover, that we could split the library along an infinite number of lines to meet them where they work and think.

There’s what we owe the future, in our efforts to preserve and share the artifacts of knowledge that we acquire on the market, that scholars create on our own campus, or that seem to form from history and find us somehow.

There’s what we owe the field, and the network of peer libraries that serve their own communities, each of them linked in a web of scholarship with our own. Within our professional network, we seek to support and complement one another, to compete sometimes in ways that move our field forward, and to share what we learn from our experiences.

The needs of information technology underlie nearly all of these activities, and to meet those needs, we have an IT staff that’s modest in size, but prodigious in its skill and its dedication to the mission of the Libraries. Within that group, the responsibility for creating new software, and maintaining what we have, falls to a small team of developers and devops engineers. We depend on them to enhance and support a wide range of platforms, including our web services, our discovery platforms, and our digital repositories.

This fall, we did some reflection on how we want to approach support for our repository platforms. The result of that reflection was a Statement of Commitment to Repositories Support and Development, a document of roughly a page that expresses what we consider to be our values in this area, and the context of priorities in which we do that work.

The committee that created the statement was our Digital Preservation and Publishing Program, or DP3 as call it in house. We summarized our values as “openness, community and peer engagement, and independence from vended platforms,” which have “guided us to build our repositories on open source software platforms.” We place that work within the context of very large, looming priorities like our transition to FOLIO as our Library Services Platform, and the project to renovate Lilly Library. There are others, not mentioned in the statement, that fill the pages of this blog.

The statement is explicit that we will not seek to find alternative platforms for our repository services in the next several years, and in particular while the FOLIO transition is underway. This decision is informed by our recognition that migration of content and services across platforms is complex and expensive. It’s also a recognition that we have invested a lot into these existing platforms, and we want to carve out as much space as we can for our talented staff to focus on maintaining and improving them, rather than locking ourselves into all-consuming cycles of content migration.

From a practical perspective, and speaking as the manager who oversees software development in the Libraries, I see this statement as part of an overall strategy to bring focus to our work. It’s a small but important symbolic measure that recognizes the drag that we create for our software team when give in to our urge to prioritize everything.

The phrase “context switching” is one that we have borrowed from the parlance of operating systems to describe the effects on a developer of working on multiple projects at once. There are real costs to moving between development environments, code bases, and architectures on the same day, in the same week, during the same sprint, or within even an extended work cycle. We also call this problem “multi-tasking,” and the penalty it imposes of performance is well documented.

Even more than performance, I think of it as a quality of life concern. People are generally happier and more invested when they’re able to do quality work. As a manager, I can work with scheduling and planning to try to mitigate those effects of multitasking on our team. But the responsibility really lies with the organization. We have our commitments, and they are vast in size and scope. We owe it to ourselves to do some introspection now and again, and ask what we can realistically do with what we have, or more accurately, who we are.

Digital Exhibits, Projects

Managing impermanence – migration of the Libraries’ digital exhibits

August 2, 2019 Will Sexton

Post contributed by Claire Cahoon, student in the master’s program at the School of Information and Library Science, UNC-Chapel Hill.

This summer I worked as a field experience student in the Software Services department migrating digital exhibits into Omeka 2, Duke’s most current platform. The ultimate goal was to start and document the process of moving exhibits from legacy platforms into Omeka 2.

The reasoning behind the project became clear as we started creating an index of all of the digital exhibits on display in the exhibits website. Out of 97 total exhibits, there were varying degrees of functionality, from the most recent and up-to-date exhibits, to sites with broken links and pages where only text would display, leaving out crucial images. Centralizing these into a single platform should make it easier to create, support, and maintain all of these exhibits.

Screenshot of the sidebar of an exhibit, showing the link to the previous version of the exhibit in the Internet Archive

I found exhibits in Omeka 1, Cascade, Scriptorium, JAlbum, and even found a few mystery platforms that we never identified. Since it was the largest, we decided to work on the Omeka 1 group over the summer, and this week I finished migrating all 34 exhibits – that means that after a few adjustments to make the new exhibits available, Omeka 1 can be shut off!

We worked with Meg Brown, Exhibits Coordinator for the Libraries, and the exhibits department to figure out how each exhibit needed to be represented. Since we were managing expectations from lots of different stakeholders, we landed on the idea to include a link to the archived version of each exhibit in the WayBack machine, in case the look and feel of the new exhibits is limiting for anyone used to Omeka 1.

Working with the internet archive links and sorting through broken pieces of these exhibits really put into perspective how impermanent the internet is, even for seemingly static information. Without much maintenance, these exhibits lost some of the core content when video links changed, references were lost, and even the most well-written custom code stopped working. I hope that my work this summer will help keep these exhibit materials in working order while also eliminating the need to continue supporting for Omeka 1.

While migrating, I came across a few favorite exhibits and items that combined interesting content and some updated features in Omeka 2:

Cover of “Anxious homes: cursory-cleaning for the imminent arrival of visitors or how to give the impression of a clean house in under 20 minutes” by Jackie Batey. Available in the Rubenstein Library: N7433.4.B38 A59 2006

Book + Art: Artists’ books from the Sallie Bingham Center for Women’s History and Culture (and the old version of Book + Art)

“Party Dress” by Catherine Michaelis
“Anxious homes: cursory-cleaning for the imminent arrival of visitors or how to give the impression of a clean house in under 20 minutes” by Jackie Batey

John Hope Franklin: Imprint of an American Scholar (and the old version of the John Hope Franklin exhibit)

John Hope’s Franklin’s letters and memos related to his activism, including advising the defense for Brown v. Board of Education.

Cheap Thrills: The Highs and Lows of Paris’s Cabaret Culture (and the old version of Cheap Thrills)

The cabaret pieces to listen to, adapted by student composers for the Duke New Music Ensemble

Medicology, or, Home encyclopedia of health: a complete family guide... Vol. I, by Joseph Gibbons Richardson (1904). — Medicology, or, Home encyclopedia of health: a complete family guide… Vol. I, by Joseph Gibbons Richardson (1904). Available in the Rubenstein Library: RC81 .R52 1904

Animated Anatomies: The Human Body in Anatomical Texts from the 16th to 21st Centuries (and the old version of Animated Anatomies)

Videos of these anatomy flap-books

Omeka still has some quirks to work out, and the accessibility of the pages and the metadata display are still in the works. However, migrating these exhibits into Omeka 2 will make them much easier to support and change for improvements. Thanks to the team that worked with me and taught me so much this summer: Will Sexton, Michael Daul, and Meg Brown!

Uncategorized

News Feeds, Microfilm, and the Stories We Tell Ourselves

April 29, 2019 Will Sexton

A little over a week ago, I watched the searing and provocative TED talk by British journalist Carole Cadwalladr, “Facebook’s role in Brexit – and the threat to democracy.” It got me thinking about a few library things, which I thought might make for an interesting blog post. Then thinking about these library things took me down a series of rabbit holes, interconnecting and nuanced and compelling enough to chew up the entirety of the time I’d set aside for my turn in the Bitstreams blog rotation. There is no breezy, concise blog post that could pull them all together so I’m just going to do with it what I can with two of the maybe four or five rabbit holes that I fell into.

Cadwalladr took the stage at a TED conference sponsored by Facebook and Google, and spoke about her investigations into the role of Facebook and Cambridge Analytica in the Brexit vote in 2016. Addressing the big tech leaders present – the “Gods of Silicon Valley: Mark Zuckerberg, Sheryl Sandberg, Larry Page, Sergey Brin and Jack Dorsey” – she levelled a devastating j’accuse – “[W]hat the Brexit vote demonstrates is that liberal democracy is broken. And you broke it. This is not democracy — spreading lies in darkness, paid for with illegal cash, from God knows where. It’s subversion, and you are accessories to it.”

It was a courageous act, and Cadwalladr deserves celebration and recognition for it, even if the place it leaves us is a bleak one. As she would admit later, she felt massive pressure as she spoke. I had a number of reactions to her talk, but there was a line in particular got me thinking about library things. It occurred when she explained to that audience that “this entire referendum took place in darkness, because it took place on Facebook…, because only you see your news feed, and then it vanishes, so it’s impossible to research anything.” It provoked me to think about how we use “news feeds” – in the form of newspapers themselves – in the study of history, and the role that libraries play in preserving them.

Continue reading News Feeds, Microfilm, and the Stories We Tell Ourselves →

Uncategorized

Community and Collaboration at Samvera Connect 2018

October 26, 2018 Will Sexton

One of the pleasures of working in an academic library is the opportunity it presents for engagement with communities in our field of work. One such community that Duke University Libraries has been a member for some time now is Samvera, which is an open-source community for software development that supports digital repositories. I, along with my colleagues Jim Coble, Moira Downey, and Ayse Durmaz, recently attended the Samvera Connect conference in Salt Lake City, and this post is a report on our experience there.

It was my first time attending Samvera Connect, and so it was a chance for me to put faces with names that I had come to know from discussions on Slack and elsewhere. Moira and I participated in a panel with some of our colleagues from the University of Michigan and Indiana University, and it was great to have the opportunity to meet them in person and talk about our work on digital repositories. We spoke on the theme of using the Hyrax platform for research data; you can see our slides here. Moira and I also had a poster on the same theme.

I attended the meetup of the Samvera Interest Group for Advising the Hyrax Roadmap, or SIGAHR, as it is known. There was some introspection in the group about the suitability of the acronym, though it produced no resolution one way or another. Much of the conversation in that meeting focused on support and developer resources for the Hyrax platform. It’s one of the central questions for an open source community like Samvera, and one we’re giving some consideration at Duke after returning from the meeting.

Otherwise, there were several interesting presentations that I attended and would highlight. First, the team from the WGBH Media Library did a presentation titled “Building on Hyrax and Avalon for the American Archive of Public Broadcasting” that I enjoyed a lot. That team has great energy and has developed some interesting solutions for a complex and compelling project.

I also learned much at the workshop titled “Managing Samvera-based Projects & Services,” which was conducted by Hannah Frost, Nabeela Jaffer, and Steve Van Tuyl. Thinking in terms of an extended community requires a different mindset from they way we work locally and on our campuses.

Finally, one of the most interesting presentations came from Hannah Frost and Christina Harlow from Stanford Libraries, outlining the new architecture they have developed for the next iteration of the Stanford Digital Library. It was titled “Making TACOs for Hydras,” and the slides are not available, but much of what they covered is included in the github documentation here.

I’ll conclude there, and share the following sections were authored by two of my colleagues at Duke.

Valkyrie and Hyrax (contributed by Jim Coble)

A focus of attention at this year’s Samvera Connect was Valkyrie, a project which enables the use of multiple backends for storing files and metadata in Samvera applications. Historically, Hydra/Samvera applications have had only one option for file and metadata storage; namely, a Fedora repository. Recent versions of Fedora have experienced performance problems in certain circumstances, leading the community to look for different options for storing files and metadata where performance is a key requirement. Valkyrie allows a project to pick and choose among multiple backends depending on the needs of the project. Projects can still use a Fedora repository for storage if that is desired but also have the option of using a Postgres database or Solr for metadata storage and/or a disk filesystem for file storage. Other metadata and file storage adapters are under development to provide Valkyrie with even more options.

Discussions at the conference favored moving forward to convert Hyrax (a key Samvera project) to use Valkyrie and we’ll likely see work happening on that soon. Our Research Data Repository is based on Hyrax, so the eventual Valkyrization of Hyrax would provide us with additional storage options for the files and metadata in that repository (which currently uses Fedora 4). Valkyrie may also be a component in a future migration of the legacy Duke Digital Repository, enabling us to move it off the no-longer-supported Fedora 3 version.

Discoverability of Research Data (contributed by Moira Downey)

In addition to the back-end infrastructure, another growing area of interest around our Hyrax-based Research Data Repository has been increased visibility and discoverability of the content that we publish and preserve through our software applications. New services like Google’s Dataset Search are making it easier for scholars and researchers to find the data they need to support their scholarly endeavors. As institutions responsible for the publication of these data, we want to ensure that the scholarship our repositories are hosting is indexed by these services, heightening its visibility, and hopefully, its usability. Over a lunchtime breakout session, the Repository Management Interest Group compiled a list of services similar to Google Dataset Search in nature (Google Scholar, Unpaywall.org, Crossref, Datacite, and SHARE, among others) that we intend to investigate further, with a particular eye toward how our existing repositories are integrated with these services and where we might improve. The group also intends to consider what local practices we might implement to optimize the discoverability of our content, and what changes to the code base should be advocate for in order to connect our content to the web at large.

Duke Digital Repository

DDR-RD: Previewing DUL’s new platform for research data

July 20, 2018 Will Sexton

While we sometimes talk about “the repository” as if it were a monolith at Duke University Libraries, we have in fact developed and maintained two core platforms that function as repository applications. I’ll describe them briefly, then preview a third that is in development, as well as the rationale behind expanding in this way.

Continue reading DDR-RD: Previewing DUL’s new platform for research data →

Duke Digital Repository, Projects, Technology

Living Our Best DSpace Lives

March 30, 2018 Will Sexton

Last week, an indefatigable team at Duke University Libraries released an upgraded version of the DukeSpace platform, completing the first phase of the critical project that I wrote about in this space in January. One member of the team remarked that we now surely have “one of the best DSpaces in the world,” and I dare anyone to prove otherwise.

DukeSpace serves as the Libraries’ open-access institutional repository, which makes it a key aspect of our mission to “partner in research,” as outlined in our strategic plan. As I wrote in January, the version of the DSpace platform that underlies the service had been stuck at 1.7, which was released during 2010 – the year the iPad came out, and Lady Gaga wore a meat dress. We upgraded to version 6.2, though the differences between the two versions are so great that it would be more accurate to call the project a migration.

That migration turned out to be one of the more complex technology projects we’ve undertaken over the years. The main complicating factor was the integration with Symplectic Elements, the Research Information Management System (RIMS) that powers the Scholars at Duke site. As far as we know, we are the first institution to integrate Elements with DSpace 6.2. It was a beast to do, and we are happy to share our knowledge gained if it will help any of our peers out there trying to do the same thing.

Meanwhile, feel free to click on over to and enjoy one of the best DSpaces in the world. And congratulations to one of the mightiest teams assembled since Spain won the World Cup!

Duke Digital Repository

Upgrading DukeSpace

January 19, 2018 Will Sexton

The year 2006 was charged with epoch-defining events: Zidane head-butted Materazzi, the astronomers downgraded Pluto, Google bought Youtube, and Duke University Libraries rolled out DukeSpace (PDF). Built on the DSpace platform, DukeSpace has served as our institutional repository for almost a dozen years now, providing access for electronic theses and dissertations and Duke faculty publications.

While the landscape of open access has changed much over the intervening period, we can’t really say the same about the underlying platform of DukeSpace.

At Duke, faculty approved an open access policy in March of 2010; it was a few weeks previous that DSpace 1.6 was released. By the end of the year it had moved ahead a dot release to 1.7. Along the way, we did some customization to integrate with Symplectic Elements – the Research Information Management System (RIMS) that powers the Scholars@Duke site. That work essentially locked us into that version of DSpace, which remains in operation despite its final release in July 2013, and having reached its end of life four years ago.

Animated GIF of Zinedine Zidane head-butting an opponent in the final game of the 2006 FIFA World Cup. — If only I had the skills to photoshop DSpace 6.2 in for Zidane, and 1.7 for Materazzi. GIF from Something Awful.

Beginning last November, we committed to a full upgrade of the DukeSpace platform to the current version (6.2 as of this writing). We had considered alternatives, including replacing the platform with Hyrax, but concluded that that approach would be too complex.

So we are currently coordinating work across a technology team and the Libraries’ open access group. Some of the concerns that we have encountered include:

Integrating with updated versions of Symplectic Elements. That same integration that locked us into a version years ago lies at the center of this upgrade. We have basically been handling this process as a separate thread of the larger project. It will be critical for us to maintain the currency of this dependency with subsequent upgrades to both products.
Rethinking metadata architecture. The conceptual basis of the institutional repository is greatly informed by the definition and use of metadata. Our Metadata Architect, Maggie Dickson, mentioned this area in her “Metadata Year-in-Review” post back in December. She highlighted the need to make “real headway tackling the problem of identity management – leveraging unique identifiers for people (ORCIDs, for example), rather than relying on name strings, which is inherently error prone.” Many other questions have arisen this area, requiring extensive and ongoing discussion and coordination between the tech team and the stakeholders.
Migration of legacy stats data. How do we migrate usage stats between two versions of a platform so remote from each other in time? It has taken some trial-and-error to solve this one.
Replicating or enhancing existing workflows. Again, when two versions of a system are so different that an upgrade seems more like a platform migration, and our infrastructure and staffing have changed over the years, how do we reproduce existing workflows without disrupting them? What opportunities can we take to improve on them without destabilizing the project? Aside from the integration with Elements, we also have the important workflow related to the ingest of electronic theses & dissertations, which employs both self-deposit and file transfer from ProQuest. Re-envisioning and re-implementing workflows such as these takes careful analysis and planning.

While we have run into a few complicating issues during the process so far, we feel confident that we remain on track to roll out the upgraded version during the first quarter of 2018. Pluto remains a dwarf planet, Zidane manages Real Madrid (for now), and to Mark Cuban’s apparent distress, Google still owns Youtube. Soon our own story from 2006 should reach a kind of resolution.

The Cut Study and DUCC

Valkyrie and Hyrax (contributed by Jim Coble)

Discoverability of Research Data (contributed by Moira Downey)

Notes from the Duke University Libraries Digital Projects Team