The year 2006 – was charged with epoch-defining events: Zidane head-butted Materazzi, the astronomers downgraded Pluto, Google bought Youtube, and Duke University Libraries rolled out DukeSpace (PDF). Built on the DSpace platform, DukeSpace has served as our institutional repository for almost a dozen years now, providing access for electronic theses and dissertations and Duke faculty publications.
While the landscape of open access has changed much over the intervening period, we can’t really say the same about the underlying platform of DukeSpace.
At Duke, faculty approved an open access policy in March of 2010; it was a few weeks previous that DSpace 1.6 was released. By the end of the year it had moved ahead a dot release to 1.7. Along the way, we did some customization to integrate with Symplectic Elements – the Research Information Management System (RIMS) that powers the Scholars@Duke site. That work essentially locked us into that version of DSpace, which remains in operation despite its final release in July 2013, and having reached its end of life four years ago.
Beginning last November, we committed to a full upgrade of the DukeSpace platform to the current version (6.2 as of this writing). We had considered alternatives, including replacing the platform with Hyrax, but concluded that that approach would be too complex.
So we are currently coordinating work across a technology team and the Libraries’ open access group. Some of the concerns that we have encountered include:
- Integrating with updated versions of Symplectic Elements. That same integration that locked us into a version years ago lies at the center of this upgrade. We have basically been handling this process as a separate thread of the larger project. It will be critical for us to maintain the currency of this dependency with subsequent upgrades to both products.
- Rethinking metadata architecture. The conceptual basis of the institutional repository is greatly informed by the definition and use of metadata. Our Metadata Architect, Maggie Dickson, mentioned this area in her “Metadata Year-in-Review” post back in December. She highlighted the need to make “real headway tackling the problem of identity management – leveraging unique identifiers for people (ORCIDs, for example), rather than relying on name strings, which is inherently error prone.” Many other questions have arisen this area, requiring extensive and ongoing discussion and coordination between the tech team and the stakeholders.
- Migration of legacy stats data. How do we migrate usage stats between two versions of a platform so remote from each other in time? It has taken some trial-and-error to solve this one.
- Replicating or enhancing existing workflows. Again, when two versions of a system are so different that an upgrade seems more like a platform migration, and our infrastructure and staffing have changed over the years, how do we reproduce existing workflows without disrupting them? What opportunities can we take to improve on them without destabilizing the project? Aside from the integration with Elements, we also have the important workflow related to the ingest of electronic theses & dissertations, which employs both self-deposit and file transfer from ProQuest. Re-envisioning and re-implementing workflows such as these takes careful analysis and planning.
While we have run into a few complicating issues during the process so far, we feel confident that we remain on track to roll out the upgraded version during the first quarter of 2018. Pluto remains a dwarf planet, Zidane manages Real Madrid (for now), and to Mark Cuban’s apparent distress, Google still owns Youtube. Soon our own story from 2006 should reach a kind of resolution.