Infrastructure and Multispectral Imaging in the Library

As we continue to work on our “standard” full color digitization projects such as Section A and the William Gedney Photograph Collection, both of which are multiyear projects, we are still hard at work with a variety of things related to Multispectral Imaging (MSI).  We have been writing documentation and posting it to our Knowledgebase, building tools to track MSI requests and establishing a dedicated storage space for MSI image stacks.  Below are some high-level details about these things and the kinks we are ironing out of the MSI process.  As with any new venture, it can be messy in the beginning and tedious to put all the details in order but in the end it’s worth it.

MSI Knowledge Base

We established a knowledge base for documents related to MSI that cover a wide variety of subjects:  How-To articles, to do lists, templates, notes taken during imaging sessions, technical support issues and more.  These documents will help us develop sound guidelines and workflows which in turn will make our work in this area more consistent, efficient and productive.

Dedicated storage space

Working with other IT staff, a new server space has been established specifically for MSI.  This is such a relief because, as we began testing the system in the early days, we didn’t have a dedicated space for storing the MSI image stacks and most of our established spaces were permissions restricted, preventing our large MSI group from using it.  On top of this we didn’t have any file management strategies in place for MSI.  This made for some messy file management. From our first demo, initial testing and eventual purchase of the system, we used a variety of storage spaces and a number of folder structures as we learned the system.  We used our shared Library server, the Digital Production Center’s production server, Box and Google Drive.  Files were all over the place!  What a mess!  In our new dedicated space, we have established standard folder structures and file management strategies and store all of our MSI image stacks in one place now.  Whew!

The Request Queue

In the beginning, once the MSI system was up and running, our group had a brainstorming session to identify a variety of material that we could use to test with and hone our skills in using the new system.  Initially this queue was a bulleted list in Basecamp identifying an item.  As we worked through the list it would sometimes be confusing as to what had already been done and what item was next.  This process became more cumbersome because multiple people were working through the list at the same time, both on capture and processing, with no specific reporting mechanism to track who was doing what.  We have recently built an MSI Request Queue that tracks items to be captured in a more straightforward, clear manner.  We have included title, barcode and item information along with the research question to be answered, it priority level, due date, requester information and internal contact information.  The MSI group will use this queue for a few weeks then tweak it as necessary.  No more confusion.

The Processing Queue

As described in a previous post, capturing with MSI produces lots of image stacks that contain lots of files.  On average, capturing one page can produce 6 image stacks totaling 364 images.  There are 6 different stages of conversion/processing that the image stack goes through before it might be considered “done”, and the fact that everyone on the MSI team has other job responsibilities makes it difficult to carve out a large enough block of time to convert and process the image stacks through all of the stages.  This made it difficult to know what items had been completely processed or not.  We have recently built an MSI Processing Queue that tracks what stage of processing each item is in.  We have included root file names, flat field information, PPI and a column for each phase of processing to indicate whether or not an image stack has passed through a phase.  As with the Request Queue, the MSI group will use this queue for a few weeks then tweak it as necessary.  No more confusion.

Duke University East Campus Progress Picture #27

As with most blog posts, the progress described above has been boiled down and simplified as to not bore you to death, but this is a fair amount of work nonetheless.  Having dedicated storage and a standardized folder structure simplifies the management of lots of files and puts them in a predictable structure.  Streamlining the Request Queue establishes a clear path of work and provides enough information about the request in order to move forward with a clear goal in mind.  Developing a Processing Queue that provides a snapshot of the state of processing across multiple requests and provides enough information so that any staff member familiar with our MSI process can complete a request.  Establishing a knowledge base to document our workflows and guidelines ties everything together in an organized and searchable manner making it easier to find information about established procedures and troubleshoot technical problems.

It is important to put this infrastructure in place and build a strong foundation for Multispectral Imaging at the Library so it will scale in the future.  This is only the beginning!

_______________________

Want to learn even more about MSI at DUL?

 

 

A Summer Day in the Life of Digital Collections

A recent tweet from my colleague in the Rubenstein Library (pictured above) pretty much sums up the last few weeks at work.  Although I rarely work directly with students and classes, I am still impacted by the hustle and bustle in the library when classes are in session.  Throughout the busy Spring I found myself saying, oh I’ll have time to work on that over the Summer.  Now Summer is here, so it is time to make some progress on those delayed projects while keeping others moving forward.  With that in mind here is your late Spring and early Summer round-up of Digital Collections news and updates.

Radio Haiti

A preview of the soon to be live Radio Haiti Archive digital collection.

The long anticipated launch of the Radio Haiti Archives is upon us.  After many meetings to review the metadata profile, discuss modeling relationships between recordings, and find a pragmatic approach to representing metadata in 3 languages all in the Duke Digital Repository public interface, we are now in preview mode, and it is thrilling.  Behind the scenes, Radio Haiti represents a huge step forward in the Duke Digital Repository’s ability to store and play back audio and video files.

You can already listen to many recordings via the Radio Haiti collection guide, and we will share the digital collection with the world in late June or early July.  In the meantime, check out this teaser image of the homepage.

 

Section A

My colleague Meghan recently wrote about our ambitions Section A digitization project, which will result in creating finding aids for and digitizing 3000+ small manuscript collections from the Rubenstein library.  This past week the 12 people involved in the project met to review our workflow.  Although we are trying to take a mass digitization and streamlined approach to this project, there are still a lot of people and steps.  For example, we spent about 20-30 minutes of our 90 minute meeting reviewing the various status codes we use on our giant Google spreadsheet and when to update them. I’ve also created a 6 page project plan that encompasses both a high and medium level view of the project. In addition to that document, each part of the process (appraisal, cataloging review, digitization, etc.) also has their own more detailed documentation.  This project is going to last at least a few years, so taking the time to document every step is essential, as is agreeing on status codes and how to use them.  It is a big process, but with every box the project gets a little easier.

Status codes for tracking our evaluation, remediation, and digitization workflow.
Section A Project Plan Summary

 

 

 

 

 

 

 

Diversity and Inclusion Digitization Initiative Proposals and Easy Projects

As Bitstreams readers and DUL colleagues know, this year we instituted 2 new processes for proposing digitization projects.  Our second digitization initiative deadline has just passed (it was June 15) and I will be working with the review committee to review new proposals as well as reevaluate 2 proposals from the first round in June and early July.  I’m excited to say that we have already approved one project outright (Emma Goldman papers), and plan to announce more approved projects later this Summer. 

We also codified “easy project” guidelines and have received several easy project proposals.  It is still too soon to really assess this process, but so far the process is going well.

Transcription and Closed Captioning

Speaking of A/V developments, another large project planned for this Summer is to begin codifying our captioning and transcription practices.  Duke Libraries has had a mandate to create transcriptions and closed captions for newly digitized A/V for over a year. In that time we have been working with vendors on selected projects.  Our next steps will serve two fronts; on the programmatic side we need  review the time and expense captioning efforts have incurred so far and see how we can scale our efforts to our backlog of publicly accessible A/V.  On the technology side I’ve partnered with one of our amazing developers to sketch out a multi-phase plan for storing and providing access to captions and time-coded transcriptions accessible and searchable in our user interface.  The first phase goes into development this Summer.  All of these efforts will no doubt be the subject of a future blog post.  

Testing VTT captions of Duke Chapel Recordings in JWPlayer

Summer of Documentation

My aspirational Summer project this year is to update digital collections project tracking documentation, review/consolidate/replace/trash existing digital collections documentation and work with the Digital Production Center to create a DPC manual.  Admittedly writing and reviewing documentation is not the most exciting Summer plan,  but with so many projects and collaborators in the air, this documentation is essential to our productivity, communication practices, and my personal sanity.   

Late Spring Collection launches and Migrations

Over the past few months we launched several new digital collections as well as completed the migration of a number of collections from our old platform into the Duke Digital Repository.  

New Collections:

Migrated Collections:

…And so Much More!

In addition to the projects above, we continue to make slow and steady progress on our MSI system, are exploring using the FFv1 format for preserving selected moving image collections, planning the next phase of the Digital Collections migration into the Duke Digital Repository, thinking deeply about collection level metadata and structured metadata, planning to launch newly digitized Gedney images, integrating digital objects in finding aids and more.  No doubt some of these efforts will appear in subsequent Bitstreams posts.  In the meantime, let’s all try not to let this Summer fly by too quickly!

Enjoy Summer while you can!

On TRAC: Assessment Tools and Trustworthiness

Duke Digital Repository is, among other things, a digital preservation platform and the locus of much of our work in that area.  As such, we often ponder the big questions:

  1. What is the repository?
  2. What is digital preservation?
  3. How are we doing?

 

 

 

What is the repository?

Fortunately, Ginny gave us a good start on defining the repository in Revisiting: What is the Repository?  It’s software, hardware, and  collaboration.  It’s processes, policies, attention, and intention.  While digital preservation is one of the focuses of the repository, digital preservation extends beyond the repository and should far outlive the repository.

What is digital preservation?

There are scores of definitions, but this Medium Definition from ALCTS is representative:

Digital preservation combines policies, strategies and actions to ensure access to reformatted and born digital content regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time.

This is the short answer to the question: Accurate rendering of authenticated digital content over time.  This is the motivation behind the work described in Preservation Architecture: Phase 2 – Moving Forward with Duke Digital Repository.

How are we doing?

There are 2 basic methodologies for assessing this work- reactive and proactive.  A reactive approach to digital preservation might be characterized by “Hey!  We haven’t lost anything yet!”, which is why we like the proactive approach.

Digital preservation can be be a pretty deep rabbit hole and it can be an expensive proposition to attempt to mitigate the long tail of risk.  Fortunately, the community of practice has developed tools to assist in the planning and execution of trustworthy repositories.  At Duke, we’ve got several years experience working in the framework of the Center for Research Libraries’ Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC) as the primary assessment tool by which we measure our efforts.  Much of the work to document our preservation environment and the supporting institutional commitment was focused on our DSpace repository, DukeSpace.  A great deal has changed in the recent 3 years including significant growth in our team and scope.  So, once again we’re working to measure ourselves against the standards of our profession and to use that process to inform our work.

There are 3 areas of focus in TRAC: Organizational Infrastructure, Digital Object Management, and Technologies, Technical Infrastructure, & Security.  These cover a very wide and deep field and include things like:

  • Securing Service Level of Agreements for all service providers
  • Documenting the organizational commitments of both Duke University and Duke University Libraries and sustainability plans relating to the repository
  • Creating and implementing routine testing of backup, remote replication, and restoration of data and relevant infrastructure
  • Creating and approving documentation on a wide variety of subjects for internal and external audiences

 

Back to the question: How are we doing?

Well, we’re making progress!  Naturally we’re starting with ensuring the basic needs are met first- successfully preserving the bits, maximizing transparency and external validation that we’re not losing the bits, and working on a sustainable, scalable architecture.  We have a lot of work ahead of us, of course.  The boxes in the illustration are all the same size, but the work they represent is not.  For example, the Disaster Recovery Plan at Hathi Trust is 61 pages of highly detailed thoughtfulness.  However, these works build on each other so we’re confident that the work we’re doing on the supporting bodies of policy, procedure, and documentation will make ease the work to a complete Disaster Recovery Plan.

The New and Improved SNCC Digital Gateway

It may only be 6 months old, but as of May 31, the SNCC Digital Gateway is sporting a new look. Since going live in December 2016, we’ve been doing assessment, talking to contemporary activists and movement veterans and conducting user testing and student surveys. The feedback’s been overwhelmingly positive, but a few suggestions kept coming up. Give people a better sense of who SNCC was right from the homepage, and make it more active. Connect SNCC’s history to organizing today. As one of the young organizers put it, “What is it about SNCC’s legacy now that matters for people?” So we took those suggestions to heart and are proud to present a reworked, redesigned SNCC Digital Gateway. Keep reading for a breakdown of what’s new and why.

Today Section

The new Today section highlights important strategies and lessons from SNCC’s work and explores their usefulness to today’s struggles. Through short, engaging videos, contemporary activists talk about how SNCC’s work continues to be relevant to their organizing today. The nine framing questions and answers of today’s organizers speak to enduring themes at the heart of SNCC’s work: uniting with local people to build a grassroots movement for change that empowered Black communities and transformed the nation. Check out this example:

More Expansive Homepage

The new homepage is longer and gives visitors to the site more context and direction. It includes descriptions of who SNCC was and links users to The Story of SNCC, which tells an expansive but concise history of SNCC’s work. It features videos from the new Today section, and gives users a way to explore the site through themes like voting rights, the organizing tradition, and Black Power.

Themes


Want to know more about voting rights? Black Power? Or are you not as familiar with SNCC’s history and need an entry point? The theme buttons on the homepage give users a window into SNCC’s history through particular aspects of the organization’s work. Theme pages feature select profiles and events focused on a central component of SNCC’s organizing. From there, click through the documents or follow the links to dig deeper into the story.

Navigation Updates

To improve navigation for the site, we’ve changed the name of the History section to Timeline and the former Perspectives to Our Voices. We’ve also moved the About section to the footer to make space for the new Today section.

Have suggestions? Comments? We’re always interested in what you’re thinking. Add a comment or send us an e-mail to snccdigital@gmail.org.

The ABCs of Digitizing Section A

I’m not sure anyone who currently works in the library has any idea when the phrase “Section A” was first coined as a call number for small manuscript collections. Before the library’s renovation, before we barcoded all our books and boxes — back when the Rubenstein was still RBMSCL, and our reading room carpet was a very bright blue — there was a range of boxes holding single-folder manuscript collections, arranged alphabetically by collection creator. And this range was called Section A.

Box 175 of Section A
Box 175 of Section A

Presumably there used to be a Section B, Section C, and so on — and it could be that the old shelf ranges were tracked this way, I’m not sure — but the only one that has persisted through all our subsequent stacks moves and barcoding projects has been Section A. Today there are about 3900 small collections held in 175 boxes that make up the Section A call number. We continue to add new single-folder collections to this call number, although thanks to the miracle of barcodes in the catalog, we no longer have to shift files to keep things in perfect alphabetical order. The collections themselves have no relationship to one another except that they are all small. Each collection has a distinct provenance, and the range of topics and time periods is enormous — we have everything from the 17th to the 21st century filed in Section A boxes. Small manuscript collections can also contain a variety of formats: correspondence, writings, receipts, diaries or other volumes, accounts, some photographs, drawings, printed ephemera, and so on. The bang-for-your-buck ratio is pretty high in Section A: though small, the collections tend to be well-described, meaning that there are regular reproduction and reference requests. Section A is used so often that in 2016, Rubenstein Research Services staff approached Digital Collections to propose a mass digitization project, re-purposing the existing catalog description into digital collections within our repository. This will allow remote researchers to browse all the collections easily, and also reduce repetitive reproduction requests.

This project has been met with enthusiasm and trepidation from staff since last summer, when we began to develop a cross-departmental plan to appraise, enhance description, and digitize the 3900 small manuscript collections that are housed in Section A. It took us a bit of time, partially due to the migration and other pressing IT priorities, but this month we are celebrating a major milestone: we have finally launched our first 2 Section A collections, meant to serve as a proof of concept, as well as a chance for us to firmly define the project’s goals and scope. Check them out: Abolitionist Speech, approximately 1850, and the A. Brouseau and Co. Records, 1864-1866. (Appropriately, we started by digitizing the collections that began with the letter A.)

A. Brouseau & Co. Records carpet receipts, 1865

Why has it been so complicated? First, the sheer number of collections is daunting; while there are plenty of digital collections with huge item counts already in the repository, they tend to come from a single or a few archival collections. Each newly-digitized Section A collection will be a new collection in the repository, which has significant workflow repercussions for the Digital Collections team. There is no unifying thread for Section A collections, so we are not able to apply metadata in batch like we would normally do for outdoor advertising or women’s diaries. Rubenstein Research Services and Library Conservation Department staff have been going box by box through the collections (there are about 25 collections per box) to identify out-of-scope collections (typically reference material, not primary sources), preservation concerns, and copyright concerns. These are excluded from the digitization process. Technical Services staff are also reviewing and editing the Section A collections’ description. This project has led to our enhancing some of our oldest catalog records — updating titles, adding subject or name access, and upgrading the records to RDA, a relatively new standard. Using scripts and batch processes (details on GitHub), the refreshed MARC records are converted to EAD files for each collection, and the digitized folder is linked through ArchivesSpace, our collection management system. We crosswalk the catalog’s name and subject access data to both the finding aid and the repository’s metadata fields, allowing the collection to be discoverable through the Rubenstein finding aid portal, the Duke Libraries catalog, and the Duke Digital Repository.

It has been really exciting to see the first two collections go live, and there are many more already digitized and just waiting in the wings for us to automate some of our linking and publishing processes. Another future development that we expect will speed up the project is a batch ingest feature for collections entering the repository. With over 3000 collections to ingest, we are eager to streamline our processes and make things as efficient as possible. Stay tuned here for more updates on the Section A project, and keep an eye on Digital Collections if you’d like to explore some of these newly-digitized collections.

The Research Data Team: Hitting the Ground Running

There has been a lot of blogging over the last year about the Duke Digital Repository’s development and implementation, about its growth as a platform and a program, and about the creation of new positions to support research data management and curation. My fellow digital content analyst also recently posted about how we four new hires have been creating and refining our research data curation workflow since beginning our positions at Duke this past January. It’s obviously been (and continues to be) a very busy time here for the repository team at Duke Libraries, including both seasoned and new staff alike.

Besides the research data workflows between our two departments, what other things have the data management consultants and the digital content analysts been doing? In short, we’ve been busy!

 

In addition to envisioning stakeholder needs (which is an exercise we continuously do), we’ve received and ingested several data collections this year, which has given us an opportunity to also learn from experience. We have been tracking and documenting the types of data we’re receiving, the various needs that these types of data and depositors have, how we approach these needs (including investigating and implementing any additional tools that may help us better address these), how our repository displays the data and associated metadata, and the time spent on our management and curation tasks. Some of these are in the form of spreadsheets, others as draft policies that will first be reviewed by the library’s research data working group and then by a program committee, and others simply as brain dumps for things that require a further, more structured investigation by developers, the metadata architect, subject librarians, and other stakeholders. These documents live in either our shared online folder or our shared Box account, and, if a wider Duke library and/or public audience are required, are moved to our departments’ content collaboration software platforms (currently Confluence/Jira and Basecamp). The collaborative environments of these platforms support the dynamic nature of our work, particularly as our program takes form.

We also value the importance of face-to-face discussions, so we hold weekly meetings to talk through all of this work (we prefer outside when the weather is nice, and because squirrels are awesome).

One of the most exciting, and at times challenging, aspects of where we are is that we are essentially starting from the ground up and therefore able to develop procedures and features (and re-develop, and on and on again) until we find fits that best accommodate our users and their data. We rely heavily on each other’s knowledge about the research data field, and we also engage in periodic environmental scans of other institutions that offer data management and curation services.

When we began in January, we all considered the first 6-9 months as a “pilot phase”, though this description may not be accurate. In the minds of the data management consultants and the digital content analysts, we’re here and ready. Will we run into situations that require an adjustment to our procedures? Absolutely. It’s the nature of our work. Do we want feedback from the Duke community about how our services are (or are not) meeting their needs? Without a doubt. And will the DDR team continue to identify and implement features to better meet end-user needs? Certainly. We fully expect to adjust and readjust our tools and services, with the overall goal of fulfilling future needs before they’re even evident to our users. So, as always, keep watching to see how we grow!

A Tough Nut to Crack: Developing Digital Repositories

Folks, developing digital repositories is hard.  There are so many different layers of complexity built into the stack, compounded by the unique variety of end-users, or stakeholders, that we serve.

Consider the breadth of this work:

Starting at the bottom of the stack, you have our Preservation layer.  This is where we capture your bits, and ensure the long-term preservation of your digital assets.  But it goes well beyond just logging a single record in a database.  It involves capturing the data stream, writing that file and all associated files (metadata) to storage, replicating the data to various geographically dispersed servers, validating the ingest, logging the validation, ensuring successful recovery of replicated assets, and more.

All of that comes post-ingest.  I’ll not even belabor the complexities of data modeling and ingest here, but you get the idea… it’s hairy stuff.  Receiving and massaging a highly diverse body of data into a package appropriate for homogeneous ingest is a monumental effort in normalization.

Move up the stack into our Curation layer.  Currently we have a single administrative application that facilitates management and curatorial activities of our digital objects following ingest.  Roles or access controls can be managed here, in addition to various types of metadata (description about the item), etc.  There are a variety of other applications that are managed at this layer, which interact with, and store, various values that fuel display and functionality within the user interface.  This layer is quickly evolving in a way that necessitates diversification.  We have found that a single monolithic application is not a one-size-fits-all solution for our stakeholders who are in the business of data production/curation; it is at this layer where we are getting increasingly more pressure to integrate and inter-operate with a myriad of other tools and platforms for resource/data management.  This is tricky business as each of these tools handle data in different ways.

Finally, we have the Discovery layer.  The user interface.  This is what the public sees and consumes.  It’s where access to ingested materials occurs.  It is itself an application requiring significant custom development to meet the needs of various programs and collections of materials.  It is tightly coupled with the Curation layer, and therefore highly complex and customized to meet the needs of different focal areas.  Search functionality is yet another piece of complexity that requires maintenance and customization of a central index.  Nothing is OOTB (out of the box).  Everything requires configuration and customization.

And ALL of this- all of it- is inter-related.  Highly coupled and complex.  Few things reap easy wins, and often our work challenges foundational assumptions that have come well before.  It’s an exercise in balancing technical debt and moving forward without re-inventing the wheel every six months.

What I have presented here is a simplistic view of our software eco-system.  It’s just a snapshot of the various puzzle pieces that support the operation of a production repository.  In general, digital repositories are still fairly new on the scene.  No one has them figured out entirely and everyone does them a little bit differently.  There’s a strength to that which manifests in diverse platforms and a breadth of development possibilities.  There’s a weakness to it because there is no cookie-cutter approach that defines an easy path forward.

So it’s an exercise in evolution.  In iteration.  In patience.  In requirements definition.  We’re not going to always get it right, and our efforts will largely take a bit of time and experimentation, but we’re constantly working to improve, to enhance, and to mature our repository platform to meet the growing and evolving needs of our University.

So, here’s to many years of hard work ahead!  And many successful collaborations with our Duke community to realize our repository’s future.  We’re ready if you are!

Multispectral Imaging: What’s it good for?

At the beginning of March, the multispectral imaging working group presented details about the imaging system and the group’s progress so far to other library staff at a First Wednesday event. Representatives from Conservation Services, Data and Visualization Services, the Digital Production Center, the Duke Collaboratory for Classics Computing (DC3) and the Rubenstein Library each shared their involvement and interest in the imaging technology. Our presentation attempted to answer some basic questions about how the equipment works and how it can be used to benefit the scholarly community. You can view a video of that presentation here

Some of the images we have already shared illustrate a basic benefit or goal of spectral imaging for books and manuscripts: making obscured text visible. But what else can this technology tell us about the objects in our collections? As a library conservator, I am very interested in the ways that this technology can provide more information about the composition and condition of objects, as well as inform conservation treatment decisions and document their efficacy.

Conservators and conservation scientists have been using spectral imaging to help distinguish between and to characterize materials for some time. For example, pigments, adhesives, or coatings may appear very differently under ultraviolet or infrared radiation. Many labs have the equipment to image under a few wavelengths of light, but our new imaging system allows us to capture at a much broader range of wavelengths and compare them in an image stack.

Adhesive samples under visible and UV light.
(Photo credit Art Conservation Department, SUNY Buffalo State)

Spectral imaging  can help to identify the materials used to make or repair an object by the way they react under different light sources. Correct identification of components is important in making the best conservation treatment decisions and might also be used to establish the relative age of a particular object or to verify its authenticity.  While spectral imaging offers the promise of providing a non-destructive tool for identification, it does have limitations and other analytical techniques may be required.

Pigment and dye-based inks under visible and infrared light.
(Photo credit Image Permanence Institute)

Multispectral imaging offers new opportunities to evaluate and document the condition of objects within our collections. Previous repairs may be so well executed or intentionally obscured that the location or extent of the repair is not obvious under visible light. Areas of paper or parchment that have been replaced or have added reinforcements (such as linings) may appear different from the original when viewed under UV radiation. Spectral imaging can provide better visual documentation of the degradation of inks ( see image below) or damage from mold or water that is not apparent under visible light.

Iron gall ink degredation. Jantz MS#124, Rubenstein Library
(Jantz MS#124, Rubenstein Library)

This imaging equipment provides opportunities for better measuring the effectiveness of the treatments that conservators perform in-house. For example, a common treatment that we perform in our lab is the removal of pressure sensitive tape repairs from paper documents using solvents. Spectral imaging before, during, and after treatment could document the effectiveness of the solvents or other employed techniques in removing the tape carrier and adhesive from the paper.

Tape removal before and during treatment under visible and UV light.
(Photo credit Art Conservation Department, SUNY Buffalo State)

Staff from the Conservation Services department have a long history of participating in the library’s digitization program in order to ensure the safety of fragile collection materials. Our department will continue to serve in this capacity for special collections objects undergoing multispectral imaging to answer specific research questions; however, we are also excited to employ this same technology to better care for the cultural heritage within our collections.

______

Want to learn even more about MSI at DUL?

 

Going with the Flow: building a research data curation workflow

Why research data? Data generated by scholars in the course of investigation are increasingly being recognized as outputs nearly equal in importance to the scholarly publications they support. Among other benefits, the open sharing of research data reinforces unfettered intellectual inquiry, fosters reproducibility and broader analysis, and permits the creation of new data sets when data from multiple sources are combined. Data sharing, though, starts with data curation.

In January of this year, Duke University Libraries brought on four new staff members–two Research Data Management Consultants and two Digital Content Analysts–to engage in this curatorial effort, and we have spent the last few months mapping out and refining a research data curation workflow to ensure best practices are applied to managing data before, during, and after ingest into the Duke Digital Repository.

What does this workflow entail? A high level overview of the process looks something like the following:

After collecting their data, the researcher will take what steps they are able to prepare it for deposit. This generally means tasks like cleaning and de-identifying the data, arranging files in a structure expected by the system, and compiling documentation to ensure that the data is comprehensible to future researchers. The Research Data Management Consultants will be on hand to help guide these efforts and provide researchers with feedback about data management best practices as they prepare their materials.

Our form for metadata capture

Depositors will then be asked to complete a metadata form and electronically sign a deposit agreement defining the terms of deposit. After we receive this information, someone from our team will invite the depositor to transfer their files to us, usually through Box.

Consultant tasks

As this stage, the Research Data Management Consultants will begin a preliminary review of the researcher’s data by performing a cursory examination for personally identifying or protected health information, inspecting the researcher’s documentation for comprehension and completeness, analyzing the submitted metadata for compliance with the research data application profile, and evaluating file formats for preservation suitability. If they have any concerns, they will contact the researcher to make some suggestions about ways to better align the deposit with best practices.

Analyst tasks

When the deposit is in good shape, the Research Data Management Consultants will notify the Digital Content Analysts, who will finalize the file arrangement and migrate some file formats, generate and normalize any necessary or missing metadata, ingest the files into the repository, and assign the deposit a DOI. After the ingest is complete, the Digital Content Analysts will carry out some quality assurance on the data to verify that the deposit was appropriately and coherently structured and that metadata has been correctly assigned. When this is confirmed, they will publish the data in the repository and notify the depositor.

Of course, this workflow isn’t a finished piece–we hope to continue to clarify and optimize the process as we develop relationships with researchers at Duke and receive more data. The Research Data Management Consultants in particular are enthusiastic about the opportunity to engage with scholars earlier in the research life cycle in order to help them better incorporate data curation standards in the beginning phases of their projects. All of us are looking forward to growing into our new roles, while helping to preserve Duke’s research output for some time to come.

Rethinking Repositories at CNI Spring ’17

One of the main areas of emphasis for the CNI Spring 2017 meeting was “new strategies and approaches for institutional repositories (IR).” A few of us at UNC and Duke decided to plug into the zeitgeist by proposing a panel to reflect on some of the ways that we have been rethinking – or even just thinking about – our repositories.

Continue reading Rethinking Repositories at CNI Spring ’17

Notes from the Duke University Libraries Digital Projects Team