Tag Archives: Duke Digital Repository

Preservation Architecture: Phase 2 – Moving Forward with Duke Digital Repository

 

DukeSpace circa 2013
DukeSpace circa 2013

 

In 2013, the average price for a gallon of gas was $3.80, President Obama was inaugurated for a second term, and Duke University Libraries offered DukeSpace as an institutional repository.  Some things haven’t changed much, but the preservation architecture protecting the digital materials curated by the Libraries has changed a lot!

We still provide DukeSpace, but are laying the foundation to migrate collections and processes to the Duke Digital Repository (DDR).  The DDR was conceived of and developed as a digital preservation repository, an environment intended to preserve and sustain the rich digital collections; university scholarship and research data; purchased collections, and history of Duke far into the future.  Only through the grace of our partnership with Digital Projects and Production Services has the DDR recently also become a site that no longer hurts the eyes of our visitors.

The Duke Digital Repository endeavors to protect our assets from a large and diverse threat model. There are threats that are not addressed in the systems model presented here, such as those identified in the SPOT Model for Risk Assessment, of course. We formally consider these baseline threats to include:

  • Natural disasters including accidents at our local nuclear power station, fire, and hurricanes
  • Data degradation also known as bit rot or bit decay
  • External actors or threats posed by people external to the DDR team including those who manage our infrastructure
  • Internal actors including intentional or unintentional security risks and exploits by privileged staff in the libraries and supporting IT organizations

Phase 1 of our ingress into digital preservation established that DSpace, the software powering DukeSpace, was not sufficient for our needs, which led to an environmental scan and pilot project with Fedora and then Fedora and Hydra. This provided us with some of the infrastructure to mitigate the threats we had identified, but not all.  In Phase 1 we were to perform some important preservation tasks including:

  • Prove authenticity by offering checksum fixity validation on ingest and periodically
  • Identify and report on data degradation
  • Capture context in the form of descriptive, administrative, and technical metadata
  • Identify files in need of remediation using file characterization tools

Phase 2 allows us to address a greater range of threats and therefore offer a higher level of security to our collections.  In Phase 2 we’re doing several concurrent migrations including migrating our archival storage to infrastructure that will allow for dynamic resizing, de-duplication, and block-level integrity checking; moving to a horizontally scaled server architecture to allow the repository to grow to meet increasing demands of size (individual file size and size of collection) and traffic; and adopting a cloud replication disaster recovery process using DuraCloud to replace our local-only disk/tape infrastructure.  These changes provide significant protection against our baseline threat model by providing geographic diversity to our replicas, allowing us to constantly monitor the health of our 3 cloud replicas, and providing administrative diversity to the management of our replicas ensuring no single threat may corrupt all 4 copies of our data.

More detail about the repository architecture to come.

 

Looking to the Future of the Duke Digital Repository: Defining a Program for Digital Preservation, Management & Access

Our modern day lives and professional endeavors are teeming with digital output.  We participate in the digital ecosystem every day, contributing our activities, our scholarship, and our work in new and evolving ways.  Some of that contribution gets lost in the Internet ether, and some gets saved, or preserved, in specific, often localized ways that are neither sustainable nor preservable for the long haul.  We here at the Duke University Libraries, want to be able to look to the future with confidence, knowing that we have a game plan for capturing and preserving digital objects that are necessary and vital to the university community.  Queue the new Duke Digital Repository.  

DDR

The Duke Digital Repository is a software development initiative undertaken by the Digital Repository Services department in the Duke University Libraries.  It is a preservation repository architected using the Fedora Open Source software project, which is intended to replace the current manifestation of our institutional repository, Duke Space.  It is a superior product that is provisioned specifically for the preservation, storage, and access of digital objects.  The Duke Digital Repository is fully operational; we are now in the process of refining user interfaces, ingesting new and varied collections, and assessing descriptive metadata needs for ingested collections.  

fed

So what’s next?  Well we’ve got the Duke Digital Repository as a platform, now we need the Duke Digital Repository as a program.  We need to clarify the services and support that we offer to the university community, we need to fully define its stakeholders, and we need to implement an organizational structure to support a robust service.  

Here are just a few things that we’re engaged in that are seeking to define our user groups and assess their needs in a preservation platform and digital support service.  Defining these expectations will allow us to take the next step in crafting a sustainable and relevant program to support the digital scholarship of the university.

  • ITHAKA Faculty Survey: In the Fall semester of 2015, the Libraries deployed the ITHAKA S+R Faculty Survey.  Faculty are considered a primary stakeholder of the repository, as it is well provisioned to meet their data management needs.  260 faculty members responded to the survey, sharing their thoughts on a variety of topics including scholarly communications services, research practices, data preservation and management needs, and much more.  There was a lot of valuable, actionable data contributed, which pertains directly to the repository as a preservation tool, and a service for data support.  The digital repository team is working through this data to identify and target needs and desires in a repository program.
  • Graduate & Undergraduate Advisory Boards:  The Digital Repository staff are also working with the Assessment & User Experience team within the library to reach out to graduate and undergraduate student constituents to capture their voice.  We have collectively identified a list of questions and prompts that will engage them in a discussion about their needs pertaining to the repository as a tool and a service.  From this discussion we are also gauging their understanding of ‘a repository’ and hoping to glean some information that will help us to understand how we might brand and market the repository more effectively.  
  • Fedora Community: Fedora is an open source software product developed and stewarded by the DuraSpace community.  The Duke University Libraries are active participants in the community which is essentially a consortium of academic institutions that are working toward a common goal of preserving intellectual, cultural, and scientific heritage.  We are reaching out to our community constituents to ask how other institutions similar to ours are supporting their repository  programs.  We’re assessing  various models of support and generating a discussion around repository support as a resourced program, rather than a simple software solution.  We are also working with Assessment & User Experience to conduct an environmental scan and literature review to gain greater insight and understanding of best practice.

special

In short, we want to make the repository special, and relevant to its users.  We want to feel confident that it provides a service that is valuable and necessary for our university community.  We invite your feedback as we embark on this effort.  For further information or to give us your feedback, please contact us.

Moving the Needle: Bring on Phase 2 of the Tripod3/Digital Collections Migration

Last time I wrote for Bitstreams, I said “Today is the New Future.” It was a day of optimism, as we published for the first time in our next-generation platform for digital collections. The debut of the W. Duke, Sons & Co. Advertising Materials, 1880-1910 was the first visible success of a major effort to migrate our digital collections into the Duke Digital Repository. “Our current plan,” I propounded, “Is to have nearly all of the content of Duke Digital Collections available in the new platform by the end of March, 2016.”

Since then we’ve published a second collection – the Benjamin and Julia Stockton Rush Papers – in the new platform, but we’ve also done more extensive planning for the migration. We’ll divide the work into six-week phases or “supersprints” that overlay the shorter sprints of our software development cycle. The work will take longer than I suggested in October – we now project the bulk of it to be completed by the end of the fourth six-week phase, or toward the end of June of this year, with some continuing until deeper in the calendar year.

As it happens, today represents the rollover from Phase 1 to Phase 2 of our plan.  Phase 1 was relatively light in its payload. During the next phase – concluding in six weeks on March 28 – we plan to add 24 of the collections currently published in our older platform, as well as two new collections.

As team leader, I take upon myself the hugely important task of assigning mottos to each phase of the project. The motto for Phase 1 was “Plant the seeds in the bottle.” It derives from the story of David Latimer’s bottle garden, which he planted in 1960 and has not watered since Duke Law alum Richard Nixon was president.

This image from from the Friedrich Carl Peetz Photographs, along with many other items from our photography and manuscript collections, will be among those re-published in the Duke Digital Repository during Phase 2 of our migration process.

Imagine, I said to the group, we are creating self-sustaining environments for our collections, that we can stash under the staircase next to the wine rack. Maybe we tend to them once or twice, but they thrive without our constant curation and intervention. Everyone sort of looked at me as if I had suggested using a guillotine as a bagel slicer for a staff breakfast event. But they’re all good sports. We hunkered down, and expect to publish one new collection, and re-publish two of the older collections, in the new platform this week.

The motto for Phase 2 is “Move the needle.” The object here is to lean on our work in Phase 1 to complete a much larger batch of materials. We’ll extend our work on photography collections in Phase 1 to include many of the existing photography collections. We’ll also re-publish many of the “manuscript collections,” which is our way of referring to the dozen or so collections that we previously published by embedding content in collection guides.

If we are successful in this approach, by the end of Phase 2, we’ll have completed a significant portion of the digital collections migrated to the Duke Digital Repository. Each collection, presumably, will flourish, sealed in a fertile, self-regulating environment, like bottle gardens, or wine.

Here’s a page where we’ll track progress.

As we’ve written previously, we’re in the process of re-digitizing the William Gedney Photographs, so they will not be migrated to the Duke Digital Repository in Phase 2, but will wait until we’ve completed that project.