All posts by Ginny Boyer

A Tough Nut to Crack: Developing Digital Repositories

May 1, 2017 Ginny Boyer 2 Comments

Folks, developing digital repositories is hard. There are so many different layers of complexity built into the stack, compounded by the unique variety of end-users, or stakeholders, that we serve.

Consider the breadth of this work:

Starting at the bottom of the stack, you have our Preservation layer. This is where we capture your bits, and ensure the long-term preservation of your digital assets. But it goes well beyond just logging a single record in a database. It involves capturing the data stream, writing that file and all associated files (metadata) to storage, replicating the data to various geographically dispersed servers, validating the ingest, logging the validation, ensuring successful recovery of replicated assets, and more.

All of that comes post-ingest. I’ll not even belabor the complexities of data modeling and ingest here, but you get the idea… it’s hairy stuff. Receiving and massaging a highly diverse body of data into a package appropriate for homogeneous ingest is a monumental effort in normalization.

Move up the stack into our Curation layer. Currently we have a single administrative application that facilitates management and curatorial activities of our digital objects following ingest. Roles or access controls can be managed here, in addition to various types of metadata (description about the item), etc. There are a variety of other applications that are managed at this layer, which interact with, and store, various values that fuel display and functionality within the user interface. This layer is quickly evolving in a way that necessitates diversification. We have found that a single monolithic application is not a one-size-fits-all solution for our stakeholders who are in the business of data production/curation; it is at this layer where we are getting increasingly more pressure to integrate and inter-operate with a myriad of other tools and platforms for resource/data management. This is tricky business as each of these tools handle data in different ways.

Finally, we have the Discovery layer. The user interface. This is what the public sees and consumes. It’s where access to ingested materials occurs. It is itself an application requiring significant custom development to meet the needs of various programs and collections of materials. It is tightly coupled with the Curation layer, and therefore highly complex and customized to meet the needs of different focal areas. Search functionality is yet another piece of complexity that requires maintenance and customization of a central index. Nothing is OOTB (out of the box). Everything requires configuration and customization.

And ALL of this- all of it- is inter-related. Highly coupled and complex. Few things reap easy wins, and often our work challenges foundational assumptions that have come well before. It’s an exercise in balancing technical debt and moving forward without re-inventing the wheel every six months.

What I have presented here is a simplistic view of our software eco-system. It’s just a snapshot of the various puzzle pieces that support the operation of a production repository. In general, digital repositories are still fairly new on the scene. No one has them figured out entirely and everyone does them a little bit differently. There’s a strength to that which manifests in diverse platforms and a breadth of development possibilities. There’s a weakness to it because there is no cookie-cutter approach that defines an easy path forward.

So it’s an exercise in evolution. In iteration. In patience. In requirements definition. We’re not going to always get it right, and our efforts will largely take a bit of time and experimentation, but we’re constantly working to improve, to enhance, and to mature our repository platform to meet the growing and evolving needs of our University.

So, here’s to many years of hard work ahead! And many successful collaborations with our Duke community to realize our repository’s future. We’re ready if you are!

Uncategorized

Revisiting: What is the Repository?

January 27, 2017 Ginny Boyer

Here at the Duke University Libraries we recently hosted a series of workshops that were part of a larger Research Symposium on campus. It was an opportunity for various campus agencies to talk about all of the evolving and innovative ways that they are planning for and accommodating research data. A few of my colleagues and I were asked to present on the new Research Data program that we’re rolling out in collaboration with the Duke Digital Repository, and we were happy to oblige!

I was asked to speak directly about the various software development initiatives that we have underway with the Duke Digital Repository. Since we’re in the midst of rolling out a brand new program area, we’ve got a lot of things cooking!

When I started planning for the conversation I initially thought I would talk a lot about our Fedora/Hydra stack, and the various inter-related systems that we’re planning to integrate into our repository eco-system. But what resulted from that was a lot of technical terms, and open-source software project names that didn’t mean a whole lot to anyone; especially those not embedded in the work. As a result, I took a step back and decided to focus at a higher level. I wanted to present to our faculty that we were implementing a series of software solutions that would meet their needs for accommodation of their data. This had me revisiting the age-old question: What is our Repository? And for the purposes of this conversation, it boiled down to this:

And this:

It is a highly complex, often mind-boggling set of software components, that are wrangled and tamed by a highly talented team with a diversity of skills and experience, all for the purposes of supporting Preservation, Curation, and Access of digital materials.

Those are our tenets or objectives. They are the principles that guide out work. Let’s dig in a bit on each.

Our first objection is Preservation. We want our researchers to feel 100% confident that when they give us their data, that we are preserving the integrity, longevity, and persistence of their data.

Our second objective is to support Curation. We aim to do that by providing software solutions that facilitate management and description of file sets, and logical arrangement of complex data sets. This piece is critically important because the data cannot be optimized without solid description and modeling that informs on its purpose, intended use, and to facilitate discovery of the materials for use.

Finally our work, our software, aims to facilitate discovery & access. We do this by architecture thoughtful solutions that optimize metadata and modeling, we build out features that enhance the consumption and usability of different format types, we tweak, refine and optimize our code to enhance performance and user experience.

The repository is a complex beast. It’s a software stack, and an eco-system of components. It’s Fedora. It’s Hydra. It’s a whole lot of other project names that are equally attractive and mystifying. At it’s core though, it’s a software initiative- one that seeks to serve up an eco-system of components with optimal functionality that meet the needs and desires of our programmatic stakeholders- our University.

Preservation, Curation, & Access are the heart of it.

Uncategorized

Good Stuff on the Horizon: a Duke Digital Repository Teaser…

December 4, 2016 Ginny Boyer

Folks,

We have been hard at work architecting a robust Repository program for our Duke University community. And while doing this, we’re in the midst of shoring things up architecturally on the back end. You may be asking yourself: Why all the fuss? What’s the big deal?

Well, part of the fuss is that it’s high time to move beyond the idea that our repository is a platform. We’d much prefer that our repository be know as a program. A suite of valuable services that serve the needs of our campus community. The repository will always be a platform. In fact, it will be a rock-solid preservation platform- a space to park your valuable digital assets and feel 100% confident that the Libraries will steward those materials for the long haul. But the repository is much more than a platform; it’s a suite of service goodness that we hope to market and promote!

Secondly, it’s because we’ve got some new and exciting developments happening in Repository-land, specifically in the realm of data management. To start with, the Provost graciously appointed four new positions to serve the data needs of the University, and those new positions will sit in the Libraries. We have two Senior Research Specialists and two Content Analysts joining our ranks in early January. These positions will be solely dedicated to the refinement of data curation processes, liaising with faculty on data management best practice, assisting researchers with the curation and deposit of research data, and acquiring persistent access to said data. Pretty cool stuff!

So in preparation for this, we’ve had a few things cooking. To begin with, we are re-designing our Duke Digital Repository homepage. We will highlight three service areas:

Duke Scholarship: This area will feature the research, scholarship and activities of Duke faculty members and academic staff. It will also highlight services in support of open access, copyright support, digital publishing, and more.
Research Data: This area will be dedicated to the fruits of Duke Scholarship, and will be an area that features research data and data sets. It will highlight services in support of data curation, data management, data deposit, data citation, and more.
Library Collections: This area will focus on digital collections that are owned or stewarded specifically by the Duke University Libraries. This includes digitized special collections, University Archives material, born digital materials, and more.

For each of these areas we’ve focused on defining a base collections policy for each, and are in the process of refining our service models, and shoring up policy that will drive preservation and digital asset management of these materials.

So now that I’ve got you all worked up about these new developments, you may be asking, ‘When can I know more?!’ You can expect to see and hear more about these developments (and our newly redesigned website) just after the New Year. In fact, you can likely expect another Bitstreams Repository post around that time with more updates on our progress, a preview of our site, and perhaps a profile or two of the new staff joining our efforts!

Until then, stay tuned, press ‘Save’, and call us if you’re looking for a better, more persistent, more authoritative approach to saving the fruits of your digital labor! (Or contact us)

Uncategorized

Open Source Software and Repository land

October 30, 2016 Ginny Boyer 1 Comment

The Duke University Libraries software development team just recently returned from a week in Boston, MA at a conference called Hydra Connect. We ate good seafood, admired beautiful cobblestones, strolled along the Charles River, and learned a ton about what’s going on in the Hydra-sphere.

At this point you may be scratching your head, exclaiming- huh?! Hydra? Hydrasphere? Have no fear, I shall explain!

Our repository, the Duke Digital Repository, is a Hydra/Fedora Repository. Hydra and Fedora are names for two prominent open-source communities in repository land. Fedora concerns itself with architecting the back-end of a repository- the storage layer. Hydra, on the other hand, refers to a multitude of end-user applications that one can architect on top of a Fedora repository to perform digital asset management. Pretty cool and pretty handy. Especially for someone that has no interest in architecting a repository from scratch.

And for a little context re: open source… the idea is that a community of like-minded individuals that care about a particular thing, will band together to develop a massively cool software product that meets a defined need, is supported and extended by the community, and is offered for free for someone to inspect, modify and/or enhance the source code.

I italicized ‘free’ to emphasize that while the software itself is free, and while the source code is available for download and modification it does take a certain suite of skills to architect a Hydra/Fedora Repository. It’s not currently an out-of-the-box solutions, but is moving in that direction with Hydra-in-a-Box. But I digress…

So. Why might someone be interested in joining an open-source community such as these? Well, for many reasons, some of which might ring true for you:

Resources are thin. Talented developers are hard to find and harder to recruit. Working with an open source community means that 1) you have the source code to get started, 2) you have a community of people that are available (and generally enthusiastic) about being a resource, and 3) working collaboratively makes everything better. No one wants to go it alone.
Governance. If one gets truly involved at the community level there are often opportunities for contributing thoughts and opinion that can help to shape and guide the software product. That’s super important when you want to get invested in a project and ensure that it fully meets you need. Going it alone is never a good option, and the whole idea of open-source is that it’s participatory, collaborative, and engaged.
Give back. Perhaps you have a great idea. A fantastic use case. Perhaps one that could benefit a whole lot of other people and/or institutions. Well then share the love by participating in open-source. Instead of developing a behemoth locally that is not maintainable, contribute ideas or features or a new product back to the community. It benefits others, and it benefits you, by investing the community in the effort of folding features and enhancements back into the core.

Hydra Connect was a fantastic opportunity to mingle with like-minded professionals doing very similar work, and all really enthusiastic to share their efforts. They want you to get excited about their work. To see how they are participating in the community. How they are using this variety of open-source software solutions in new and innovative ways.

It’s easy to get bogged down at a local level with the micro details, and to lose the big picture. It was refreshing to step out of the office and get back into the frame of mind that recognizes and empowers the notion that there is a lot of power in participating in healthy communities of practice. There is also a lot of economy in it.

The team came back to Durham full of great ideas and a lot of enthusiasm. It has fueled a lot of fantastic discussion about the future of our repository software eco-system and how that complements our desire to focus on integration, community developed goodness, and sustainable practices for software development.

More to come as we turn that thought process into practice!

Project Hydra

Hydra Connect 2016

Projects, Technology

Developing the Duke Digital Repository is Messy Business

August 28, 2016 Ginny Boyer

Let me tell you something people: Coordinating development of the Duke Digital Repository (DDR) is a crazy logistical affair that involves much ado about… well, everything!

My last post, What is a Repository?, discussed at a high level, what exactly a digital repository is intended to be and the purpose it plays in the Libraries’ digital ecosystem. If we take a step down from that, we can categorize the DDR as two distinct efforts, 1) a massive software development project and 2) a complex service suite. Both require significant project management and leadership, and necessitate tools to help in coordinating the effort.

There are many, many details that require documenting and tracking through the life cycle of a software development project. Initially we start with requirements- meaning what the tools need to do to meet the end-users needs. Requirements must be properly documented and must essentially detail a project management plan that can result in a successful product (the software) and the project (the process, and everything that supports success of the product itself). From this we manage a ‘backlog’ of requirements, and pull from the backlog to structure our work. Requirements evolve into tasks that are handed off to developers. Tasks themselves become conversations as the development team determines the best possible approach to getting the work done. In addition to this, there are bugs to track, changes to document, and new requirements evolving all of the time… you can imagine that managing all of this in a simple ‘To Do’ list could get a bit unwieldy.

We realized that our ability to keep all of these many plates spinning necessitated a really solid project management tool. So we embarked on a mission to find just the right one! I’ll share our approach here, in case you and your team have a similar need and could benefit from our experiences.

STEP 1: Establish your business case: Finding the right tool will take effort, and getting buy-in from your team and organization will take even more! Get started early with justifying to your team and your org why a PM tool is necessary to support the work.

STEP 2: Perform a needs assessment: You and your team should get around a table and brainstorm. Ask yourselves what you need this tool to do, what features are critical, what your budget is, etc. Create a matrix where you fully define all of these characteristics to drive your investigation.

STEP 3: Do an environmental scan: What is out there on the market? Do your research and whittle down a list of tools that have potential. Also build on the skills of your team- if you have existing competencies in a given tool, then fully flesh out its features to see if it fits the bill.

STEP 4: Put them through the paces: Choose a select list of tools and see how they match up to you needs assessment. Task a group of people to test-drive the tools, and report out on the experience.

STEP 5: Share your findings: Discuss the findings with your team. Capture the highs and the lows and present the material in a digestible fashion. If it’s possible to get consensus, make a recommendation.

STEP 6: Get buy-in: This is the MOST critical part! Get buy-in from your team to implement the tool. A PM tool can only benefit the team if it is used thoroughly, consistently, and in a team fashion. You don’t want to deal with adverse reactions to the tool after the fact…

No matter what tool you choose, you’ll need to follow some simple guidelines to ensure successful adoption:

Once again… Get TEAM buy-in!
Define ownership, or an Admin, of the tool (ideally the Project Manager)
Define basic parameters for use and team expectations
PROVIDE TRAINING
Consider your ecosystem of tools and simplify where appropriate
The more robust the tool, the more support and structure will be required

Trust me when I say that this exercise will not let you down, and will likely yield a wealth of information about the tools that you use, the projects that you manage, your team’s preferences for coordinating the work, and much more!

Uncategorized

What is a Repository?

July 29, 2016 Ginny Boyer

We’ve been talking a lot about the Repository of late, so I thought it might be time to come full circle and make sure we’re all on the same page here…. What exactly is a Repository?

A Repository is essentially a digital shelf. A really, really smart shelf!

It’s the place to safely and securely store digital assets of a wide variety of types for preservation, discovery, and use, though not all materials in the repository may be discoverable or accessible by everyone. So, it’s like a shelf. Except that this shelf is designed to help us preserve these materials and try to ensure they’ll be usable for decades.

This shelf tells us if the materials on it have changed in any way. They tell us when the materials don’t conform to the format specification that describes exactly how a file format is to be represented. These shelves have very specific permissions, a well thought out backup procedure to several corners of the country, a built-in versioning system to allow us to migrate endangered or extinct formats to new, shiny formats, and a bunch of other neat stuff.

The repository is the manifestation of a conviction about the importance of an enduring scholarly record and open and free access to Duke scholarship. It is where we do our best to carve our knowledge in stone for future generations.

Why? is perhaps the most important question of all. There are several approaches to Why? National funding agencies (NIH, NSF, NEH, etc) recognize that science is precariously balanced on shoddy data management practices and increasingly require researchers to deposit their data with a reputable repository. Scholars would like to preserve their work, make it accessible to everyone (not just those who can afford outrageously priced journal subscriptions), and want to increase the reach and impact of their work by providing stable and citable DOIs.

Students want to be able to cite their own thesis, dissertations, and capstone papers and to have others discover and cite them. The Library wants to safeguard its investment in digitization of Special Collections. Archives needs a place to securely store university records.

A Repository, specifically our Duke Digital Repository, is the place to preserve our valuable scholarly output for many years to come. It ensures disaster recovery, facilitates access to knowledge, and connects you with an ecosystem of knowledge.

Pretty cool, huh?!

Uncategorized

Looking to the Future of the Duke Digital Repository: Defining a Program for Digital Preservation, Management & Access

March 30, 2016 Ginny Boyer

Our modern day lives and professional endeavors are teeming with digital output. We participate in the digital ecosystem every day, contributing our activities, our scholarship, and our work in new and evolving ways. Some of that contribution gets lost in the Internet ether, and some gets saved, or preserved, in specific, often localized ways that are neither sustainable nor preservable for the long haul. We here at the Duke University Libraries, want to be able to look to the future with confidence, knowing that we have a game plan for capturing and preserving digital objects that are necessary and vital to the university community. Queue the new Duke Digital Repository.

The Duke Digital Repository is a software development initiative undertaken by the Digital Repository Services department in the Duke University Libraries. It is a preservation repository architected using the Fedora Open Source software project, which is intended to replace the current manifestation of our institutional repository, Duke Space. It is a superior product that is provisioned specifically for the preservation, storage, and access of digital objects. The Duke Digital Repository is fully operational; we are now in the process of refining user interfaces, ingesting new and varied collections, and assessing descriptive metadata needs for ingested collections.

So what’s next? Well we’ve got the Duke Digital Repository as a platform, now we need the Duke Digital Repository as a program. We need to clarify the services and support that we offer to the university community, we need to fully define its stakeholders, and we need to implement an organizational structure to support a robust service.

Here are just a few things that we’re engaged in that are seeking to define our user groups and assess their needs in a preservation platform and digital support service. Defining these expectations will allow us to take the next step in crafting a sustainable and relevant program to support the digital scholarship of the university.

ITHAKA Faculty Survey: In the Fall semester of 2015, the Libraries deployed the ITHAKA S+R Faculty Survey. Faculty are considered a primary stakeholder of the repository, as it is well provisioned to meet their data management needs. 260 faculty members responded to the survey, sharing their thoughts on a variety of topics including scholarly communications services, research practices, data preservation and management needs, and much more. There was a lot of valuable, actionable data contributed, which pertains directly to the repository as a preservation tool, and a service for data support. The digital repository team is working through this data to identify and target needs and desires in a repository program.

Graduate & Undergraduate Advisory Boards: The Digital Repository staff are also working with the Assessment & User Experience team within the library to reach out to graduate and undergraduate student constituents to capture their voice. We have collectively identified a list of questions and prompts that will engage them in a discussion about their needs pertaining to the repository as a tool and a service. From this discussion we are also gauging their understanding of ‘a repository’ and hoping to glean some information that will help us to understand how we might brand and market the repository more effectively.

Fedora Community: Fedora is an open source software product developed and stewarded by the DuraSpace community. The Duke University Libraries are active participants in the community which is essentially a consortium of academic institutions that are working toward a common goal of preserving intellectual, cultural, and scientific heritage. We are reaching out to our community constituents to ask how other institutions similar to ours are supporting their repository programs. We’re assessing various models of support and generating a discussion around repository support as a resourced program, rather than a simple software solution. We are also working with Assessment & User Experience to conduct an environmental scan and literature review to gain greater insight and understanding of best practice.

In short, we want to make the repository special, and relevant to its users. We want to feel confident that it provides a service that is valuable and necessary for our university community. We invite your feedback as we embark on this effort. For further information or to give us your feedback, please contact us.

Notes from the Duke University Libraries Digital Projects Team