Tag Archives: Repository

A Preview of MorphoSource 2 Beta

January 20, 2021 Jocelyn Triplett

It’s an exciting time for the MorphoSource team, as we work to launch the MorphoSource 2 Beta application next Wednesday!

The new application improves and expands upon the original MorphoSource, a repository for 3D research data, and is being built using Hyrax, an open-source digital repository application widely implemented by libraries to manage digital repositories and collections. The team has been working on the site for the last two and a half years, and is looking forward to our efforts being made available to the MorphoSource community. At launch, users will be able to access records for over 140,000 media files, contributed by 1,500 researchers from all over the world.

While the current site is still available for browsing at www.morphosource.org, we are migrating the repository data over to the new site in preparation for the launch, and have paused the ingest of new data sets. When the migration is complete, users will be able to access the new application at the current url. Users with an account on the old site will be able to log in to the new site using their MorphoSource 1 credentials.

In my last post in June, I described some of the features that were in development at that time. In this post, I’ll highlight a few recent additions with screenshots from the beta site: Browse, Search, and User Dashboards.

Browse

Browse pages have been added as a quick entry point for users to discover data in several different ways. Users can use these pages to immediately access media, biological specimens, cultural heritage objects, organizations, teams, or projects.

Media Types and Modalities: Users can view all media records of a specific file type, such as image, CT image series, or mesh or point cloud. There are also links to records created by different methods, such as X-Ray, Magnetic Resource Imaging, or Photogrammetry.

Physical Object Types: Links to view either all the Biological Specimens or Cultural Heritage Objects in MorphoSource

Biological Taxonomy: Users can find specimen records through the taxonomy browse by drilling down through the taxonomic ranks. The MorphoSource taxonomy records have been imported from the GBIF Backbone Taxonomy or have been created by MorphoSource users.

Taxonomy browse page — Taxonomy Browse Page

Projects: Projects are user-created groupings of media and specimens. From the browse page, projects can be searched by title and sorted by title, description, team, creator, or number of associated media or objects.

MorphoSource 2 Project Browse — Project Browse Page

Teams: Teams are groups of MorphoSource users that share management of media and team projects. A Team may be associated with an organization. The Team browse page lets users search and sort teams in a similar way to the Projects browse page.

Organizations: Lastly, users can view all of the organizations that have biological specimens or cultural heritage objects in MorphoSource. An organization may be an institution, department, collection, facility, lab, or other group. From the Organizations browse page, users can search by name and sort by parent institution name or institution code.

Faceted Searching

In addition to the browse pages, records for Media, Biological Specimens, Cultural Heritage Objects, Organizations, Teams, and Projects can also be found through the MorphoSource search interface. Searching has been customized for the different record types to include relevant facets. The different search categories can be chosen from the dropdown next to the search box ‘Go’ button.

MorphoSource 2 Beta Media Search Results — Media Search Results

Search results for media records can be faceted by file type, modality, object type (biological specimen or cultural heritage object), organization, tag, or membership in a team or project, while search results for objects can be limited by object type, creator, organization, taxonomy, associated media types, associated media tags, and membership of associated media in a team or project. Organization and Team/Project searches similarly have their own sets of facets.

MorphoSource 2 Object Search — Biological Specimen and Cultural Heritage Object Search Results

User Dashboards

Users who register an account on the site will have access to a dashboard that enables them to manage their data downloads. The dashboard is accessed by clicking on the profile icon at the top right of the site, and will open to the user’s media cart. The media cart contains two sections – the top holds all media items that the user currently has permission to download, while the bottom has media items with a restricted status where download has not been requested or approved:

MorphoSource 2 Beta Media Cart — Default User Dashboard

Users who have been granted contributor access to the site will have a dashboard that opens to the media and objects that they have contributed:

MorphoSource 2 Beta Contributor Dashboard — Contributor Dashboard

From the menu at the left, all users can access their previous downloads, or projects, teams, or other repository content to which they have been granted access, and manage their user profile. In addition, contributors can also create and manage projects and teams.

We hope that the browse, search, and dashboard enhancements, along with the other features we have been working on over the last couple of years, will enable users to easily discover and manage data sets in MorphoSource. And although we are looking forward to the launch, we are also excited to continue working on the site, and will be adding even more features in the near future.

MorphoSource

MorphoSource: Features in Development

June 8, 2020 Jocelyn Triplett

For the last two years, developers in Software Services and Duke’s department of Evolutionary Anthropology have been working to rebuild MorphoSource, a repository for 3D research data representing physical objects, primarily biological specimens. MorphoSource 2.0 is being built using Hyrax, an open-source digital repository application widely implemented by libraries to manage digital repositories and collections. While Hyrax already provided much of the core functionality needed for management, access, and preservation of our data, the MorphoSource team has been customizing the application and adding additional features to tailor it to the needs of our users.

As a preview, here are some of the features we’re developing for the MorphoSource community:

Guided Submission Process

MorphoSource is open to subject experts, collection curators, and the public to submit their data and make it accessible to others. The MorphoSource submission process will guide users in entering metadata that provides additional description and context for the files being uploaded. Users will be able to save information about the specimen that was scanned, the equipment that was used to capture the 3d data, the data capture process, and related media in MorphoSource. The form is multi-step and nested, with different fields available or data pre-filled in depending on what the user has already entered in earlier sections of the form. The user also has opportunities to search for related organizations, devices, specimens, and media to link to their submission. Once the submission process is complete, depositors are able to return to their data page to edit metadata and redefine relationships to other MorphoSource records.

Morphosource data submission screenshots — A user proceeds through the MorphoSource media submission process.

In the gif above, a MorphoSource user proceeds through the submission process. If the media being uploaded (such as an image stack or 3d mesh) is a derivative of another object that is already in MorphoSource, the user can search for that media and link it to their submission, whereupon related metadata will be auto-filled for them. If it is a totally new submission, the user will proceed from filling in information about the object that was scanned (ownership, taxonomy, and descriptive details) to devices and processing steps used before attaching their files. This ability to nest, search and associate metadata from the organization level all the way down to an individual object component in a single interface is unique to MorphoSource and is a substantial addition to Hyrax’s code.

Displaying and Updating Media Records

Screenshot of a Media Record in MorphoSource — A media record in MorphoSource.

Following submission, the depositor can view their completed media record, as shown above. Data owners (and other users who have been granted edit permissions by the data owner) have the ability to freely edit their submissions at any time. The edit view (below) allows the user to move between tabs to update metadata or change links to other records in MorphoSource. These tabs offer a fast summary of the types of associated data that are typically collected for complex objects and digitization strategies. This system is also designed to encourage best practices in project and object documentation by reminding the user of critical metadata categories.

3D Viewer

Through a collaboration with Mnemoscene, MorphoSource has created Aleph, a web viewer for 3D models and volumes that can be used by itself or as an extension to the popular library document viewer Universal Viewer. A live example of a specimen in the viewer is below. You can rotate the object by clicking and dragging inside the frame, or switch between slices and volume view by changing the mode under tools.

The Media Cart and Restricted Downloads

A media depositor can choose from several different publication statuses for their data, allowing the data owner to retain different levels of control of both their data sets and the metadata record describing their deposit. While some may choose to publish both their metadata and data sets with an open status, allowing other users to freely view and download them, it is also possible for depositors to restrict either their records or data to other users or organizations, or require that a user is granted permission before they are permitted to download data.

When data owners choose to publish using restricted download, other users are able to view metadata and preview the 3D data in the Aleph viewer, but can’t download the data set until they are granted permission by the data owner, and are required to fill out a request including the manner they intend to use the data. Data owners can easily review and manage these download requests from their user dashboard.

Above is a view of a user’s media cart, where they collect media they intend to download. The top section has media items the user is free to download, either because the media was published with an open publication status or because the user was approved to download by the media owner. Users can download any or all of these items at one time. The bottom section of the page contains items with a restricted download publication status, and allows the user to request these items individually or as a group. Users can also track the status of their request from this page.

In the next image, a data owner views requests to download their media. Requests are grouped by requesting user and then the intended use of the data. Data owners can approve, deny, or clear any or all of the requests. When approving a request, the data owner specifies the amount of time that the media will be available for the requester to download. This person can also clear a request if they want more information from the requester before they approve the download. Requests that have already been decided are available for review in the Previous Requests tab.

Organizational Teams

Organizational teams are the final substantial addition to the new MorphoSource. Any organization, such as a university, museum, or department, can create an organizational team that stores metadata about the team and assigns roles to individual members. Members of the organizational team can also curate team projects, grant other users access to the team’s media, edit the organization’s metadata, and view any media created from objects in their collections. Below is the public view of one of our first sample organizational teams, the Nasher Museum.

Screenshot of an Organizational Team in MorphoSource — An organization page in MorphoSource. Image via Boyer, Silverton, and Winchester, 2020. MorphoSource: Creating a 3D web repository capable of archiving complex workflows and providing novel viewing experiences

The MorphoSource team is looking forward to unveiling the beta version of MorphoSource 2.0 later this year. In the meantime, please visit the current repository at www.morphosource.org. For further reading, check out the recent EuropeanaTech article MorphoSource: Creating a 3D web repository capable of archiving complex workflows and providing novel viewing experiences.

Behind the Scenes, MorphoSource, Technology

Describing 3D Data in MorphoSource 2.0

January 17, 2020 Jocelyn Triplett

Header Image: Collection of extinct and extant turtle skull microCT scans in MorphoSource: bit.ly/3DFossilTurtles

MorphoSource (www.morphosource.org) is a publicly accessible repository for 3D research data, especially data that represents biological specimens. Developers in Evolutionary Anthropology and the Library’s Software Services department have been working to rebuild the application, improving upon the current site’s technology and features. An important part of this rebuild is implementing a more robust data model that will let our users efficiently discover, curate, disseminate, and preserve their data.

A typical deposit in MorphoSource is a file or files that represent a scan of all or part of an organism – such as a bone, tooth, or entire animal. The files may be a mesh or series of images produced through a CT scan. In order to collect all the information necessary to understand the files, the specimen that the files represent, and the processes that created the data, the improved site will guide the researcher in providing additional context for their deposit at the same time that they upload their files. The following describes what kind of metadata the depositor can expect to provide as part of the submission process.

The first step is to determine whether the researcher’s current deposit is derived in some way from data that is already in MorphoSource, or if the depositor would like to also submit those files and metadata. For example, they may be depositing a mesh file that was created from original photographs that are already available through the site. By including links to the raw data in the repository, users can reprocess the files if needed, or run different processes in the future.

MorphoSource collects metadata to provide context for 3D data in the repository

Next, the researcher is asked to identify or describe the biological specimen that was imaged to create their data, either by entering the information themselves or importing it from another site like iDigBio. Metadata entered at this stage includes the information about the institution that owns the specimen, a taxonomy for the specimen, and additional identifying information such as the institution’s collection or catalog number. When the depositor fills in these fields, other users will be able to search for and compare data sets for the same specimen or species.

Moving on from the description of the organism, the depositor then provides information about the device that was used to image the specimen, either by selecting a device that is already in the repository’s database, or by creating a new record, including the manufacturer, model, and modality (MRI, photography, laser scan, etc.) of the device.

Once they have described the specimen and device used for imaging, the depositor then enters metadata about the imaging event itself, such as the technician who did the imaging, the date, and the software used.

With the imaging of the specimen described, the depositor then enters data about any processing that was done to create the files being deposited, including who was responsible, what software was used, and what the process was – for example, creating a mesh or point cloud from photographs. This metadata is important in case there is a need to reprocess the data in the future.

Finally, the researcher completes their deposit by uploading the files themselves. While some technical metadata is extracted automatically, MorphoSource will rely on data depositors to provide other information that is helpful for display, such as the orientation of the scan, or to identify the files, like an external id number. This technical metadata is important for long term preservation of the data sets.

morphosource media page — Screen capture of example media page in MorphoSource

While the submission process asks the researcher to enter quite a bit of metadata, when users view the data on MorphoSource they have an understanding of what the data represents, how it was created, and how it relates to other data in the repository. It becomes easy to discover other media files representing the same specimen, or the same species, or to explore other items from the institution or researcher’s collections.

Conferences, Duke Digital Repository

Rethinking Repositories at CNI Spring ’17

April 7, 2017 Will Sexton 1 Comment

One of the main areas of emphasis for the CNI Spring 2017 meeting was “new strategies and approaches for institutional repositories (IR).” A few of us at UNC and Duke decided to plug into the zeitgeist by proposing a panel to reflect on some of the ways that we have been rethinking – or even just thinking about – our repositories.

Continue reading Rethinking Repositories at CNI Spring ’17 →

Uncategorized

Revisiting: What is the Repository?

January 27, 2017 Ginny Boyer

Here at the Duke University Libraries we recently hosted a series of workshops that were part of a larger Research Symposium on campus. It was an opportunity for various campus agencies to talk about all of the evolving and innovative ways that they are planning for and accommodating research data. A few of my colleagues and I were asked to present on the new Research Data program that we’re rolling out in collaboration with the Duke Digital Repository, and we were happy to oblige!

I was asked to speak directly about the various software development initiatives that we have underway with the Duke Digital Repository. Since we’re in the midst of rolling out a brand new program area, we’ve got a lot of things cooking!

When I started planning for the conversation I initially thought I would talk a lot about our Fedora/Hydra stack, and the various inter-related systems that we’re planning to integrate into our repository eco-system. But what resulted from that was a lot of technical terms, and open-source software project names that didn’t mean a whole lot to anyone; especially those not embedded in the work. As a result, I took a step back and decided to focus at a higher level. I wanted to present to our faculty that we were implementing a series of software solutions that would meet their needs for accommodation of their data. This had me revisiting the age-old question: What is our Repository? And for the purposes of this conversation, it boiled down to this:

And this:

It is a highly complex, often mind-boggling set of software components, that are wrangled and tamed by a highly talented team with a diversity of skills and experience, all for the purposes of supporting Preservation, Curation, and Access of digital materials.

Those are our tenets or objectives. They are the principles that guide out work. Let’s dig in a bit on each.

Our first objection is Preservation. We want our researchers to feel 100% confident that when they give us their data, that we are preserving the integrity, longevity, and persistence of their data.

Our second objective is to support Curation. We aim to do that by providing software solutions that facilitate management and description of file sets, and logical arrangement of complex data sets. This piece is critically important because the data cannot be optimized without solid description and modeling that informs on its purpose, intended use, and to facilitate discovery of the materials for use.

Finally our work, our software, aims to facilitate discovery & access. We do this by architecture thoughtful solutions that optimize metadata and modeling, we build out features that enhance the consumption and usability of different format types, we tweak, refine and optimize our code to enhance performance and user experience.

The repository is a complex beast. It’s a software stack, and an eco-system of components. It’s Fedora. It’s Hydra. It’s a whole lot of other project names that are equally attractive and mystifying. At it’s core though, it’s a software initiative- one that seeks to serve up an eco-system of components with optimal functionality that meet the needs and desires of our programmatic stakeholders- our University.

Preservation, Curation, & Access are the heart of it.

Uncategorized

Good Stuff on the Horizon: a Duke Digital Repository Teaser…

December 4, 2016 Ginny Boyer

Folks,

We have been hard at work architecting a robust Repository program for our Duke University community. And while doing this, we’re in the midst of shoring things up architecturally on the back end. You may be asking yourself: Why all the fuss? What’s the big deal?

Well, part of the fuss is that it’s high time to move beyond the idea that our repository is a platform. We’d much prefer that our repository be know as a program. A suite of valuable services that serve the needs of our campus community. The repository will always be a platform. In fact, it will be a rock-solid preservation platform- a space to park your valuable digital assets and feel 100% confident that the Libraries will steward those materials for the long haul. But the repository is much more than a platform; it’s a suite of service goodness that we hope to market and promote!

Secondly, it’s because we’ve got some new and exciting developments happening in Repository-land, specifically in the realm of data management. To start with, the Provost graciously appointed four new positions to serve the data needs of the University, and those new positions will sit in the Libraries. We have two Senior Research Specialists and two Content Analysts joining our ranks in early January. These positions will be solely dedicated to the refinement of data curation processes, liaising with faculty on data management best practice, assisting researchers with the curation and deposit of research data, and acquiring persistent access to said data. Pretty cool stuff!

So in preparation for this, we’ve had a few things cooking. To begin with, we are re-designing our Duke Digital Repository homepage. We will highlight three service areas:

Duke Scholarship: This area will feature the research, scholarship and activities of Duke faculty members and academic staff. It will also highlight services in support of open access, copyright support, digital publishing, and more.
Research Data: This area will be dedicated to the fruits of Duke Scholarship, and will be an area that features research data and data sets. It will highlight services in support of data curation, data management, data deposit, data citation, and more.
Library Collections: This area will focus on digital collections that are owned or stewarded specifically by the Duke University Libraries. This includes digitized special collections, University Archives material, born digital materials, and more.

For each of these areas we’ve focused on defining a base collections policy for each, and are in the process of refining our service models, and shoring up policy that will drive preservation and digital asset management of these materials.

So now that I’ve got you all worked up about these new developments, you may be asking, ‘When can I know more?!’ You can expect to see and hear more about these developments (and our newly redesigned website) just after the New Year. In fact, you can likely expect another Bitstreams Repository post around that time with more updates on our progress, a preview of our site, and perhaps a profile or two of the new staff joining our efforts!

Until then, stay tuned, press ‘Save’, and call us if you’re looking for a better, more persistent, more authoritative approach to saving the fruits of your digital labor! (Or contact us)

Uncategorized

Open Source Software and Repository land

October 30, 2016 Ginny Boyer 1 Comment

The Duke University Libraries software development team just recently returned from a week in Boston, MA at a conference called Hydra Connect. We ate good seafood, admired beautiful cobblestones, strolled along the Charles River, and learned a ton about what’s going on in the Hydra-sphere.

At this point you may be scratching your head, exclaiming- huh?! Hydra? Hydrasphere? Have no fear, I shall explain!

Our repository, the Duke Digital Repository, is a Hydra/Fedora Repository. Hydra and Fedora are names for two prominent open-source communities in repository land. Fedora concerns itself with architecting the back-end of a repository- the storage layer. Hydra, on the other hand, refers to a multitude of end-user applications that one can architect on top of a Fedora repository to perform digital asset management. Pretty cool and pretty handy. Especially for someone that has no interest in architecting a repository from scratch.

And for a little context re: open source… the idea is that a community of like-minded individuals that care about a particular thing, will band together to develop a massively cool software product that meets a defined need, is supported and extended by the community, and is offered for free for someone to inspect, modify and/or enhance the source code.

I italicized ‘free’ to emphasize that while the software itself is free, and while the source code is available for download and modification it does take a certain suite of skills to architect a Hydra/Fedora Repository. It’s not currently an out-of-the-box solutions, but is moving in that direction with Hydra-in-a-Box. But I digress…

So. Why might someone be interested in joining an open-source community such as these? Well, for many reasons, some of which might ring true for you:

Resources are thin. Talented developers are hard to find and harder to recruit. Working with an open source community means that 1) you have the source code to get started, 2) you have a community of people that are available (and generally enthusiastic) about being a resource, and 3) working collaboratively makes everything better. No one wants to go it alone.
Governance. If one gets truly involved at the community level there are often opportunities for contributing thoughts and opinion that can help to shape and guide the software product. That’s super important when you want to get invested in a project and ensure that it fully meets you need. Going it alone is never a good option, and the whole idea of open-source is that it’s participatory, collaborative, and engaged.
Give back. Perhaps you have a great idea. A fantastic use case. Perhaps one that could benefit a whole lot of other people and/or institutions. Well then share the love by participating in open-source. Instead of developing a behemoth locally that is not maintainable, contribute ideas or features or a new product back to the community. It benefits others, and it benefits you, by investing the community in the effort of folding features and enhancements back into the core.

Hydra Connect was a fantastic opportunity to mingle with like-minded professionals doing very similar work, and all really enthusiastic to share their efforts. They want you to get excited about their work. To see how they are participating in the community. How they are using this variety of open-source software solutions in new and innovative ways.

It’s easy to get bogged down at a local level with the micro details, and to lose the big picture. It was refreshing to step out of the office and get back into the frame of mind that recognizes and empowers the notion that there is a lot of power in participating in healthy communities of practice. There is also a lot of economy in it.

The team came back to Durham full of great ideas and a lot of enthusiasm. It has fueled a lot of fantastic discussion about the future of our repository software eco-system and how that complements our desire to focus on integration, community developed goodness, and sustainable practices for software development.

More to come as we turn that thought process into practice!

Project Hydra

Hydra Connect 2016

Projects, Technology

Developing the Duke Digital Repository is Messy Business

August 28, 2016 Ginny Boyer

Let me tell you something people: Coordinating development of the Duke Digital Repository (DDR) is a crazy logistical affair that involves much ado about… well, everything!

My last post, What is a Repository?, discussed at a high level, what exactly a digital repository is intended to be and the purpose it plays in the Libraries’ digital ecosystem. If we take a step down from that, we can categorize the DDR as two distinct efforts, 1) a massive software development project and 2) a complex service suite. Both require significant project management and leadership, and necessitate tools to help in coordinating the effort.

There are many, many details that require documenting and tracking through the life cycle of a software development project. Initially we start with requirements- meaning what the tools need to do to meet the end-users needs. Requirements must be properly documented and must essentially detail a project management plan that can result in a successful product (the software) and the project (the process, and everything that supports success of the product itself). From this we manage a ‘backlog’ of requirements, and pull from the backlog to structure our work. Requirements evolve into tasks that are handed off to developers. Tasks themselves become conversations as the development team determines the best possible approach to getting the work done. In addition to this, there are bugs to track, changes to document, and new requirements evolving all of the time… you can imagine that managing all of this in a simple ‘To Do’ list could get a bit unwieldy.

We realized that our ability to keep all of these many plates spinning necessitated a really solid project management tool. So we embarked on a mission to find just the right one! I’ll share our approach here, in case you and your team have a similar need and could benefit from our experiences.

STEP 1: Establish your business case: Finding the right tool will take effort, and getting buy-in from your team and organization will take even more! Get started early with justifying to your team and your org why a PM tool is necessary to support the work.

STEP 2: Perform a needs assessment: You and your team should get around a table and brainstorm. Ask yourselves what you need this tool to do, what features are critical, what your budget is, etc. Create a matrix where you fully define all of these characteristics to drive your investigation.

STEP 3: Do an environmental scan: What is out there on the market? Do your research and whittle down a list of tools that have potential. Also build on the skills of your team- if you have existing competencies in a given tool, then fully flesh out its features to see if it fits the bill.

STEP 4: Put them through the paces: Choose a select list of tools and see how they match up to you needs assessment. Task a group of people to test-drive the tools, and report out on the experience.

STEP 5: Share your findings: Discuss the findings with your team. Capture the highs and the lows and present the material in a digestible fashion. If it’s possible to get consensus, make a recommendation.

STEP 6: Get buy-in: This is the MOST critical part! Get buy-in from your team to implement the tool. A PM tool can only benefit the team if it is used thoroughly, consistently, and in a team fashion. You don’t want to deal with adverse reactions to the tool after the fact…

No matter what tool you choose, you’ll need to follow some simple guidelines to ensure successful adoption:

Once again… Get TEAM buy-in!
Define ownership, or an Admin, of the tool (ideally the Project Manager)
Define basic parameters for use and team expectations
PROVIDE TRAINING
Consider your ecosystem of tools and simplify where appropriate
The more robust the tool, the more support and structure will be required

Trust me when I say that this exercise will not let you down, and will likely yield a wealth of information about the tools that you use, the projects that you manage, your team’s preferences for coordinating the work, and much more!

Uncategorized

What is a Repository?

July 29, 2016 Ginny Boyer

We’ve been talking a lot about the Repository of late, so I thought it might be time to come full circle and make sure we’re all on the same page here…. What exactly is a Repository?

A Repository is essentially a digital shelf. A really, really smart shelf!

It’s the place to safely and securely store digital assets of a wide variety of types for preservation, discovery, and use, though not all materials in the repository may be discoverable or accessible by everyone. So, it’s like a shelf. Except that this shelf is designed to help us preserve these materials and try to ensure they’ll be usable for decades.

This shelf tells us if the materials on it have changed in any way. They tell us when the materials don’t conform to the format specification that describes exactly how a file format is to be represented. These shelves have very specific permissions, a well thought out backup procedure to several corners of the country, a built-in versioning system to allow us to migrate endangered or extinct formats to new, shiny formats, and a bunch of other neat stuff.

The repository is the manifestation of a conviction about the importance of an enduring scholarly record and open and free access to Duke scholarship. It is where we do our best to carve our knowledge in stone for future generations.

Why? is perhaps the most important question of all. There are several approaches to Why? National funding agencies (NIH, NSF, NEH, etc) recognize that science is precariously balanced on shoddy data management practices and increasingly require researchers to deposit their data with a reputable repository. Scholars would like to preserve their work, make it accessible to everyone (not just those who can afford outrageously priced journal subscriptions), and want to increase the reach and impact of their work by providing stable and citable DOIs.

Students want to be able to cite their own thesis, dissertations, and capstone papers and to have others discover and cite them. The Library wants to safeguard its investment in digitization of Special Collections. Archives needs a place to securely store university records.

A Repository, specifically our Duke Digital Repository, is the place to preserve our valuable scholarly output for many years to come. It ensures disaster recovery, facilitates access to knowledge, and connects you with an ecosystem of knowledge.

Pretty cool, huh?!

Behind the Scenes, Technology, Uncategorized

Repository Mega-Migration Update

July 1, 2016 Jim Tuttle

We are shouting it from the roof tops: The migration from Fedora 3 to Fedora 4 is complete! And Digital Repository Services are not the only ones relieved. We appreciate the understanding that our colleagues and users have shown as they’ve been inconvenienced while we’ve built a more resilient, more durable, more sustainable preservation platform in which to store and share our digital assets.

We began the migration of data from Fedora 3 on Monday, May 23rd. In this time we’ve migrated roughly 337,000 objects in the Duke Digital Repository. The data migration was split into several phases. In case you’re interested, here are the details:

Collections were identified for migration beginning with unpublished collections, which comprise about 70% of the materials in the repository
Collections to be migrated were locked for editing in the Fedora 3 repository to prevent changes that inadvertently won’t be migrated to the new repository
Collections to be migrated were passed to 10 migration processors for actual ingest into Fedora 4
- Objects were migrated first. This includes the collection object, content objects, item objects, color targets for digital imaging, and attachments (objects related to, but not part of, a collection like deposit agreements
- Then relationships between objects were migrated
- Last, metadata was migrated
Collections were then validated in Fedora 4
When validation is complete, collections will be unlocked for editing in Fedora 4

Presto! Voila! That’s it!

While our customized version of the Fedora migrate gem does some validation of migrated content, we’ve elected to build an independent process to provide validation. Some of the validation is straightforward such as comparing checksums of Fedora 3 files against those in Fedora 4. In other cases, being confident that we’ve migrated everything accurately can be much more difficult. In Fedora 3, we can compare checksums of metadata files while in Fedora 4 object metadata is stored opaquely in a database without checksums that can be compared. The short of it is that we’re working hard to prove successful migration of all of our content and it’s harder than it looks. It’s kind of like insurance- protecting us from the risk of lost or improperly migrated data.

We’re in the final phases of spiffing up the Fedora 4 Digital Repository user interface, which is scheduled to be deployed the week of July 11th. That release will not include any significant design changes, but is simply compatible with the new Fedora 4 code base. We are planning to release enhancements to our Data & Visualizations collection, and are prioritizing work on the homepage of the Duke Digital Repository… you will likely see an update on that coming up in a subsequent blog post!

Notes from the Duke University Libraries Digital Projects Team