While we sometimes talk about “the repository” as if it were a monolith at Duke University Libraries, we have in fact developed and maintained two core platforms that function as repository applications. I’ll describe them briefly, then preview a third that is in development, as well as the rationale behind expanding in this way.
Last week, an indefatigable team at Duke University Libraries released an upgraded version of the DukeSpace platform, completing the first phase of the critical project that I wrote about in this space in January. One member of the team remarked that we now surely have “one of the best DSpaces in the world,” and I dare anyone to prove otherwise.
DukeSpace serves as the Libraries’ open-access institutional repository, which makes it a key aspect of our mission to “partner in research,” as outlined in our strategic plan. As I wrote in January, the version of the DSpace platform that underlies the service had been stuck at 1.7, which was released during 2010 – the year the iPad came out, and Lady Gaga wore a meat dress. We upgraded to version 6.2, though the differences between the two versions are so great that it would be more accurate to call the project a migration.
That migration turned out to be one of the more complex technology projects we’ve undertaken over the years. The main complicating factor was the integration with Symplectic Elements, the Research Information Management System (RIMS) that powers the Scholars at Duke site. As far as we know, we are the first institution to integrate Elements with DSpace 6.2. It was a beast to do, and we are happy to share our knowledge gained if it will help any of our peers out there trying to do the same thing.
Meanwhile, feel free to click on over to and enjoy one of the best DSpaces in the world. And congratulations to one of the mightiest teams assembled since Spain won the World Cup!
The year 2006 was charged with epoch-defining events: Zidane head-butted Materazzi, the astronomers downgraded Pluto, Google bought Youtube, and Duke University Libraries rolled out DukeSpace (PDF). Built on the DSpace platform, DukeSpace has served as our institutional repository for almost a dozen years now, providing access for electronic theses and dissertations and Duke faculty publications.
While the landscape of open access has changed much over the intervening period, we can’t really say the same about the underlying platform of DukeSpace.
At Duke, faculty approved an open access policy in March of 2010; it was a few weeks previous that DSpace 1.6 was released. By the end of the year it had moved ahead a dot release to 1.7. Along the way, we did some customization to integrate with Symplectic Elements – the Research Information Management System (RIMS) that powers the Scholars@Duke site. That work essentially locked us into that version of DSpace, which remains in operation despite its final release in July 2013, and having reached its end of life four years ago.
Beginning last November, we committed to a full upgrade of the DukeSpace platform to the current version (6.2 as of this writing). We had considered alternatives, including replacing the platform with Hyrax, but concluded that that approach would be too complex.
So we are currently coordinating work across a technology team and the Libraries’ open access group. Some of the concerns that we have encountered include:
- Integrating with updated versions of Symplectic Elements. That same integration that locked us into a version years ago lies at the center of this upgrade. We have basically been handling this process as a separate thread of the larger project. It will be critical for us to maintain the currency of this dependency with subsequent upgrades to both products.
- Rethinking metadata architecture. The conceptual basis of the institutional repository is greatly informed by the definition and use of metadata. Our Metadata Architect, Maggie Dickson, mentioned this area in her “Metadata Year-in-Review” post back in December. She highlighted the need to make “real headway tackling the problem of identity management – leveraging unique identifiers for people (ORCIDs, for example), rather than relying on name strings, which is inherently error prone.” Many other questions have arisen this area, requiring extensive and ongoing discussion and coordination between the tech team and the stakeholders.
- Migration of legacy stats data. How do we migrate usage stats between two versions of a platform so remote from each other in time? It has taken some trial-and-error to solve this one.
- Replicating or enhancing existing workflows. Again, when two versions of a system are so different that an upgrade seems more like a platform migration, and our infrastructure and staffing have changed over the years, how do we reproduce existing workflows without disrupting them? What opportunities can we take to improve on them without destabilizing the project? Aside from the integration with Elements, we also have the important workflow related to the ingest of electronic theses & dissertations, which employs both self-deposit and file transfer from ProQuest. Re-envisioning and re-implementing workflows such as these takes careful analysis and planning.
While we have run into a few complicating issues during the process so far, we feel confident that we remain on track to roll out the upgraded version during the first quarter of 2018. Pluto remains a dwarf planet, Zidane manages Real Madrid (for now), and to Mark Cuban’s apparent distress, Google still owns Youtube. Soon our own story from 2006 should reach a kind of resolution.
One thing I’ve learned on my life’s journey is the importance of knowing your spirit guide.
That’s why, by far the most important point that I made in a talk at the TRLN Annual Meeting in July is that the spirit guide of the digital repository movement is the squirlicorn.
One of the main areas of emphasis for the CNI Spring 2017 meeting was “new strategies and approaches for institutional repositories (IR).” A few of us at UNC and Duke decided to plug into the zeitgeist by proposing a panel to reflect on some of the ways that we have been rethinking – or even just thinking about – our repositories.
Today is an eventful day for the Duke Digital Repository (DDR). Later today, I and several of my colleagues will present on the DDR at Day 1 of the Duke Research Computing Symposium. We’ll be introducing new staff who’ll focus on managing, curating, and preserving research data, as well as the role that the DDR will play as both a service and a platform. This event serves as a soft launch of our plans – which I wrote about last September – to support the work of researchers at Duke.
At the same time, the DDR gets a new look, at least on its home page. For years, we’ve used a rather drab and uninformative page that was essentially the out-of-the-box rendering by Blacklight, our discovery and access layer in the repository stack. Last fall, our DDR Program Committee took up the task of revamping that page to reflect how we conceptualize the repository and its major program areas.
The page design will evolve with the DDR itself, but it went live earlier today. More information about the DDR initiative and our plans will follow in the coming months.
The Duke Digital Repository (DDR) is a growing service, and the Libraries are growing to support it. As I post this entry, our jobs page shows three new positions comprising five separate openings that will support the DDR. One is a DevOps position which we have re-envisioned from a salary line that opened with a staff member’s departure. The other four consist of two new positions, with two openings for each, created to meet specific, emerging needs for supporting research data at Duke.
Last fall at Duke, the Vice Provosts for Research and the Vice President for Information Technology convened a Digital Research Faculty Working Group. It included a number of faculty members from around campus, as well as several IT administrators, the latter of whom served in an ex-officio capacity. The Libraries were represented by our Associate University Librarian for Information Technology, Tim McGeary (who happens to be my supervisor).
While I would really prefer to cat-blog my merry way into the holiday weekend, I feel duty-bound to follow up on my previous posts about the digital collections migration project that has dominated our 2016.
Since I last wrote, we have launched two more new collections in the Fedora/Hydra platform that comprises the Duke Digital Repository. The larger of the two, and a major accomplishment for our digital collections program, was the Duke Chapel Recordings. We also completed the Alex Harris Photographs.
Meanwhile, we are working closely with our colleagues in Digital Repository Services to facilitate a whole other migration, from Fedora 3 to 4, and onto a new storage platform. It’s the great wheel in which our own wheel is only the wheel inside the wheel. Like the wheel in the sky, it keeps on turning. We don’t know where we’ll be tomorrow, though we expect the platform migration to be completed inside of a month.
Last time, I wrote hopefully of the needle moving on the migration of digital collections into the new platform, and while behind the scenes the needle is spasming toward the FULL side of the gauge, for the public it still looks stuck just a hair above EMPTY. We have two batches of ten previously published collections ready to re-launch when we roll over to Fedora 4, which we hope will be in June – one is a group of photography collections, and the other a group of manuscripts-based collections.
In the meantime, the work on migrating the digital collections and building a new UI for discovery and access absorbs our team. Much of what we’ve learned and accomplished during this project has related to the migration, and quite a bit has appeared in this blog.
Our Metadata Architect, Maggie Dickson, has undertaken wholesale remediation of twenty years’ worth of digital collections metadata. Dealing with date representation alone has been a critical effort, as evidenced by the series of posts by her and developer Cory Lown on their work with EDTF.
Sean Aery has posted about his work as a developer, including the integration of the OpenSeadragon image viewer into our UI. He also wrote about “View Item in Context,” four words in a hyperlink that represent many hours of analysis, collaboration, and experimentation within our team.
I expect, by the time the wheel has completed another rotation, and it’s my turn again to write for the blog, there will be more to report. Batches will have been launched, features deployed, and metadata remediated. Even more cat pictures will have been posted to the Internet. It’s all one big cycle and the migration is part of it.
Last time I wrote for Bitstreams, I said “Today is the New Future.” It was a day of optimism, as we published for the first time in our next-generation platform for digital collections. The debut of the W. Duke, Sons & Co. Advertising Materials, 1880-1910 was the first visible success of a major effort to migrate our digital collections into the Duke Digital Repository. “Our current plan,” I propounded, “Is to have nearly all of the content of Duke Digital Collections available in the new platform by the end of March, 2016.”
Since then we’ve published a second collection – the Benjamin and Julia Stockton Rush Papers – in the new platform, but we’ve also done more extensive planning for the migration. We’ll divide the work into six-week phases or “supersprints” that overlay the shorter sprints of our software development cycle. The work will take longer than I suggested in October – we now project the bulk of it to be completed by the end of the fourth six-week phase, or toward the end of June of this year, with some continuing until deeper in the calendar year.
As it happens, today represents the rollover from Phase 1 to Phase 2 of our plan. Phase 1 was relatively light in its payload. During the next phase – concluding in six weeks on March 28 – we plan to add 24 of the collections currently published in our older platform, as well as two new collections.
As team leader, I take upon myself the hugely important task of assigning mottos to each phase of the project. The motto for Phase 1 was “Plant the seeds in the bottle.” It derives from the story of David Latimer’s bottle garden, which he planted in 1960 and has not watered since Duke Law alum Richard Nixon was president.
This image from from the Friedrich Carl Peetz Photographs, along with many other items from our photography and manuscript collections, will be among those re-published in the Duke Digital Repository during Phase 2 of our migration process.
Imagine, I said to the group, we are creating self-sustaining environments for our collections, that we can stash under the staircase next to the wine rack. Maybe we tend to them once or twice, but they thrive without our constant curation and intervention. Everyone sort of looked at me as if I had suggested using a guillotine as a bagel slicer for a staff breakfast event. But they’re all good sports. We hunkered down, and expect to publish one new collection, and re-publish two of the older collections, in the new platform this week.
The motto for Phase 2 is “Move the needle.” The object here is to lean on our work in Phase 1 to complete a much larger batch of materials. We’ll extend our work on photography collections in Phase 1 to include many of the existing photography collections. We’ll also re-publish many of the “manuscript collections,” which is our way of referring to the dozen or so collections that we previously published by embedding content in collection guides.
If we are successful in this approach, by the end of Phase 2, we’ll have completed a significant portion of the digital collections migrated to the Duke Digital Repository. Each collection, presumably, will flourish, sealed in a fertile, self-regulating environment, like bottle gardens, or wine.
As we’ve written previously, we’re in the process of re-digitizing the William Gedney Photographs, so they will not be migrated to the Duke Digital Repository in Phase 2, but will wait until we’ve completed that project.
Yesterday was Back to the Future day, and the Internet had a lot of fun with it. I guess now it falls to each and every one of us, to determine whether or not today begins a new future. It’s certainly true for Duke Digital Collections.
Today we roll out – softly – the first release of Tripod3, the next-generation platform for digital collections. For now, the current version supports a single, new collection, the W. Duke, Sons & Co. Advertising Materials, 1880-1910. We’re excited about both the collection – which Noah Huffman previewed in this blog almost exactly a year ago – and the platform, which represents a major milestone in a project that began nearly a year ago.
The next few months will see a great deal more work on the project. We have new collections scheduled for December and the first quarter of 2016, we’ll gradually migrate the collections from our existing site, and we’ll be developing the features and the look of the new site in an iterative process of feedback, analysis, and implementation. Our current plan is to have nearly all of the content of Duke Digital Collections available in the new platform by the end of March, 2016.
The completion of the Tripod3 project will mean the end of life for the current-generation platform, which we call, to no one’s surprise, Tripod2. However, we have not set an exact timeline for sunsetting Tripod2. During the transitional phase, we will do everything we can to make the architecture of Duke Digital Collections transparent, and our plans clear.
After the jump, I’ll spend the rest of this post going into a little more depth about the project, but want to express my pride and gratitude to an excellent team – you know who you are – who helped us achieve this milestone.