Category Archives: Duke Digital Repository

Going with the Flow: building a research data curation workflow

Why research data? Data generated by scholars in the course of investigation are increasingly being recognized as outputs nearly equal in importance to the scholarly publications they support. Among other benefits, the open sharing of research data reinforces unfettered intellectual inquiry, fosters reproducibility and broader analysis, and permits the creation of new data sets when data from multiple sources are combined. Data sharing, though, starts with data curation.

In January of this year, Duke University Libraries brought on four new staff members–two Research Data Management Consultants and two Digital Content Analysts–to engage in this curatorial effort, and we have spent the last few months mapping out and refining a research data curation workflow to ensure best practices are applied to managing data before, during, and after ingest into the Duke Digital Repository.

What does this workflow entail? A high level overview of the process looks something like the following:

After collecting their data, the researcher will take what steps they are able to prepare it for deposit. This generally means tasks like cleaning and de-identifying the data, arranging files in a structure expected by the system, and compiling documentation to ensure that the data is comprehensible to future researchers. The Research Data Management Consultants will be on hand to help guide these efforts and provide researchers with feedback about data management best practices as they prepare their materials.

Our form for metadata capture

Depositors will then be asked to complete a metadata form and electronically sign a deposit agreement defining the terms of deposit. After we receive this information, someone from our team will invite the depositor to transfer their files to us, usually through Box.

Consultant tasks

As this stage, the Research Data Management Consultants will begin a preliminary review of the researcher’s data by performing a cursory examination for personally identifying or protected health information, inspecting the researcher’s documentation for comprehension and completeness, analyzing the submitted metadata for compliance with the research data application profile, and evaluating file formats for preservation suitability. If they have any concerns, they will contact the researcher to make some suggestions about ways to better align the deposit with best practices.

Analyst tasks

When the deposit is in good shape, the Research Data Management Consultants will notify the Digital Content Analysts, who will finalize the file arrangement and migrate some file formats, generate and normalize any necessary or missing metadata, ingest the files into the repository, and assign the deposit a DOI. After the ingest is complete, the Digital Content Analysts will carry out some quality assurance on the data to verify that the deposit was appropriately and coherently structured and that metadata has been correctly assigned. When this is confirmed, they will publish the data in the repository and notify the depositor.

Of course, this workflow isn’t a finished piece–we hope to continue to clarify and optimize the process as we develop relationships with researchers at Duke and receive more data. The Research Data Management Consultants in particular are enthusiastic about the opportunity to engage with scholars earlier in the research life cycle in order to help them better incorporate data curation standards in the beginning phases of their projects. All of us are looking forward to growing into our new roles, while helping to preserve Duke’s research output for some time to come.

Rethinking Repositories at CNI Spring ’17

One of the main areas of emphasis for the CNI Spring 2017 meeting was “new strategies and approaches for institutional repositories (IR).” A few of us at UNC and Duke decided to plug into the zeitgeist by proposing a panel to reflect on some of the ways that we have been rethinking – or even just thinking about – our repositories.

Continue reading Rethinking Repositories at CNI Spring ’17

Nuts, Bolts, and Bits: Further Down the Preservation Path

It’s been awhile since we last wrote about the preservation architecture underlying the repository in Preservation Architecture: Phase 2 – Moving Forward with Duke Digital Repository.   Iceberg.  Fickr user: pere.We’ve made some terrific progress in the interim, but most of that is invisible to our users not unlike our chilly friends, icebergs.

Let’s take a brief tour to surface some these changes!

 

Policy and Procedure Development

The recently formed Digital Preservation Advisory Group has been working on policy and procedure to bring DDR into compliance with the ISO 16363 Audit and Certification of Trustworthy Digital Repositories Minimum Criteria. We’ve been working on diverse policy areas like defining how embargoes may be set; how often fixity must be checked and reported to stakeholders; in what situations may content be removed and who must be involved in that decision; and what conditions necessitate a ‘tombstone’ to explain the removal of an object.   Some of these policies are internal and some have already been made publicly available.  For example, see our Deaccession Policy and our Preservation Policy.   We’ve made great progress due to the fantastic example set by our friends at Purdue University Research Repository and others.

Preservation Infrastructure

Duke, DuraCloud, and GlacierDurham, North Carolina, is a lovely city– close to mountains, the beach, and full of fantastic restaurants!  Sometimes, though, your digital assets just need to get away from it all.  Digital preservation demands some geographic diversity.  No repository wants all of its data to be subject to a hurricane, of course!  That’s why we’ve partnered with DuraCloud, a preservation-focused cloud provider, to store copies of our digital assets in geographically diverse locations.  Our data now enjoys homes at Duke, at DuraCloud, and in Amazon Glacier!

To bring transparency to the process of remotely replicating our assets and validating the local and remote assets, we’ve recently implemented a process that externalizes these tasks from Fedora and delivers scheduled reports to stakeholders enumerating and detailing the health of their assets.

 

Research and Development

The DDR has grown tremendously in the last year and with it has grown the need to standardize and scale to demand.  Writing Python to arrange files to conform to our Standard Ingest Format was a perfectly reasonable solution in early 2016.  Likewise, programmatic reformatting of endangered file formats wasn’t feasible with the resources available at the time.  We also did need to worry about traffic scaling back then.  Times have changed!

DDR staff are exploring tools to allow non-developers to easily ingest large amounts of material, methods to identify and migrate files to better supported formats, and are planning for more sustainable and durable architecture like increased inter-application messaging to allow us to externalize processes that have been handled within the repository to external servers.

A New Home Page for the Duke Digital Repository

Today is an eventful day for the Duke Digital Repository (DDR). Later today, I and several of my colleagues will present on the DDR at Day 1 of the Duke Research Computing Symposium. We’ll be introducing new staff who’ll focus on managing, curating, and preserving research data, as well as the role that the DDR will play as both a service and a platform. This event serves as a soft launch of our plans – which I wrote about last September – to support the work of researchers at Duke.

Out-of-the-box DDR home page of the past

At the same time, the DDR gets a new look, at least on its home page. For years, we’ve used a rather drab and uninformative page that was essentially the out-of-the-box rendering by Blacklight, our discovery and access layer in the repository stack. Last fall, our DDR Program Committee took up the task of revamping that page to reflect how we conceptualize the repository and its major program areas.

New DDR home page with aerial hero image and three program areas.

The page design will evolve with the DDR itself, but it went live earlier today. More information about the DDR initiative and our plans will follow in the coming months.