Agile 101

Here in the DUL Information Technology Services organization, we continue to embrace Agile concepts, applied to many different types of projects, including the Integrated Library System (ILS), the development of specialized repositories, and even the exhibits hosted in the Libraries. Check out the amazing new Senses of Venice exhibit that opened last week.

I like to think of Agile as a mindset rather than a specific tool set or framework (like scrum).  The four values envisioned in the 2001 Agile Manifesto were devised in deliberate contrast to the rigor and slowness of erstwhile software development practices, and these concepts are still quite relevant today:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

Sometimes, when things develop as a backlash, the pendulum can swing too far the other way and we throw out some of the tried and true good bits.  On the other hand, we can slip back, as described in Steve Bank’s HBR piece, “When Waterfall Principles Sneak Back into Agile Workflows”.

Pendulums swing, but basically, when you face uncertainty, try something you think might work, get feedback, and adjust accordingly.

 

 

 

 

What happens when you click “Search?”

How many times each day to you type something into a search box on the web and click “Search?” Have you ever wondered what happens behind the scenes to make this possible? In this post I’ll show how search works on the Duke University Libraries Catalog. I’ll trace the journey of how search works from metadata in a MARC record (where our bibliographic data is stored), to transforming that data into something we can index for searching, to how the words you type into the search box are transformed, and then finally how the indexed records and your search interact to produce a relevance ranked list of search results. Let’s get into the weeds!

A MARC record stores bibliographic data that we purchase from vendors or are created by metadata specialists who work at Duke Libraries. These records look something like this:

In an attempt to keep this simple, let’s just focus on the main title of the record. This is information recorded in the MARC record’s 245 field in subfields a, b, f, g, h, k, n, p, and s. I’m not going to explain what each of the subfields is for but the Library of Congress maintains extensive documentation about MARC field specifications (see 245 – Title Statement (NR)). Here is an example of a MARC 245 field with a linked 880 field that contains the equivalent title in an alternate script (just to keep things interesting).

=245 10$6880-02$aUrbilder ;$bBlossoming ; Kalligraphie ; O Mensch, bewein' dein' Sünde gross (Arrangement) : for string quartet /$cToshio Hosokawa.
=880 10$6245-02/{dollar}1$a原像 ;$b開花 ; 書 (カリグラフィー) ほか : 弦楽四重奏のための /$c細川俊夫.

The first thing that has to happen is we need to get the data out of the MARC record into a more computer friendly data format — an array of hashes, which is just a fancy way of saying a list of key value pairs. The software reads the metadata from the MARC 245 field, joins all the subfields together, and cleans up some punctuation. The software also checks to see if the title field contains Arabic, Chinese, Japanese, Korean, or Cyrillic characters, which have to be handled separately from Roman character languages. From the MARC 245 field and its linked 880 field we end up with the following data structure.

"title_main": [
{
"value": "Urbilder ; Blossoming ; Kalligraphie ; O Mensch, bewein' dein' Sünde gross (Arrangement) : for string quartet"
},
{
"value": "原像 ; 開花 ; 書 (カリグラフィー) ほか : 弦楽四重奏のための",
"lang": "cjk"
}
]

We send this data off to an ingest service that prepares the metadata for indexing.

The data is first expanded to multiple fields.

{"title_main_indexed": "Urbilder ; Blossoming ; Kalligraphie ; O Mensch, bewein' dein' Sünde gross (Arrangement) : for string quartet",

"title_main_vernacular_value": "原像 ; 開花 ; 書 (カリグラフィー) ほか : 弦楽四重奏のための",

"title_main_vernacular_lang": "cjk",

"title_main_value": "原像 ; 開花 ; 書 (カリグラフィー) ほか : 弦楽四重奏のための / Urbilder ; Blossoming ; Kalligraphie ; O Mensch, bewein' dein' Sünde gross (Arrangement) : for string quartet"}

title_main_indexed will be indexed for searching.
title_main_vernacular_value holds the non Roman version of the title to be indexed for searching.
title_main_vernacular_lang holds information about the character set stored in title_main_vernacular_value.
title_main_value holds the data that will be stored for display purposes in the catalog user interface.

We take this flattened, expanded set of fields and apply a set of rules to prepare the data for the indexer (Solr). These rules append suffixes to each field and combine the two vernacular fields to produce the following field value pairs. The suffixes provide instructions to the indexer about what should be done with each field.

{"title_main_indexed_tsearchtp": "Urbilder ; Blossoming ; Kalligraphie ; O Mensch, bewein' dein' Sünde gross (Arrangement) : for string quartet",

"title_main_cjk_v": "原像 ; 開花 ; 書 (カリグラフィー) ほか : 弦楽四重奏のための",

"title_main_t_stored_single": "原像 ; 開花 ; 書 (カリグラフィー) ほか : 弦楽四重奏のための / Urbilder ; Blossoming ; Kalligraphie ; O Mensch, bewein' dein' Sünde gross (Arrangement) : for string quartet" }

When sent to the indexer the fields are further transformed.

Suffixed Source Field Solr Field Solr Field Type Solr Stored/Indexed Values
title_main_indexed_tsearchtp title_main_indexed_t text stemmed urbild blossom kalligraphi o mensch bewein dein sund gross arrang for string quartet
title_main_indexed_tsearchtp title_main_indexed_tp text unstemmed urbilder blossoming kalligraphie o mensch bewein dein sunde gross arrangement for string quartet
title_main_cjk_v title_main_cjk_v chinese, japanese, korean text 原 像 开花 书 か り く ら ふ ぃ い ほか 弦乐 亖 重奏 の ため の
title_main_t_stored_single title_main stored string 原像 ; 開花 ; 書 (カリグラフィー) ほか : 弦楽四重奏のための / Urbilder ; Blossoming ; Kalligraphie ; O Mensch, bewein’ dein’ Sünde gross (Arrangement) : for string quartet

These are all index time transformations. They occur when we send records into the index.

The query you enter into the search box also gets transformed in different ways and then compared to the indexed fields above. These are query time transformations. As an example, if I search for the terms “Urbilder Blossom Kalligraphie,” the following transformations and comparisons take place:

The values stored in the records for title_main_indexed_t are evaluated against my search string transformed to urbild blossom kalligraphi.

The values stored in the records for title_main_indexed_tp are evaluated against my search string transformed to urbilder blossom kalligraphie.

The values stored in the records for title_main_cjk_v are evaluated against my search string transformed to urbilder blossom kalligraphie.

Then Solr does some calculations based on relevance rules we configure to determine which documents are matches and how closely they match (signified by the relevance score calculated by Solr). The field value comparisons end up looking like this under the hood in Solr:

+(DisjunctionMaxQuery((
(title_main_cjk_v:urbilder)^50.0 |
(title_main_indexed_tp:urbilder)^500.0 |
(title_main_indexed_t:urbild)^100.0)~1.0)
DisjunctionMaxQuery((
(title_main_cjk_v:blossom)^50.0 |
(title_main_indexed_tp:blossom)^500.0 |
(title_main_indexed_t:blossom)^100.0)~1.0)
DisjunctionMaxQuery((
(title_main_cjk_v:kalligraphie)^50.0 |
(title_main_indexed_tp:kalligraphie)^500.0 |
(title_main_indexed_t:kalligraphi)^100.0)~1.0))~3
DisjunctionMaxQuery((
(title_main_cjk_v:"urbilder blossom kalligraphie")^150.0 |
(title_main_indexed_t:"urbild blossom kalligraphi")^600.0 |
(title_main_indexed_tp:"urbilder blossom kalligraphie")^5000.0)~1.0)
(DisjunctionMaxQuery((
(title_main_cjk_v:"urbilder blossom")^75.0 |
(title_main_indexed_t:"urbild blossom")^200.0 |
(title_main_indexed_tp:"urbilder blossom")^1000.0)~1.0)
DisjunctionMaxQuery((
(title_main_cjk_v:"blossom kalligraphie")^75.0 |
(title_main_indexed_t:"blossom kalligraphi")^200.0 |
(title_main_indexed_tp:"blossom kalligraphie")^1000.0)~1.0))
DisjunctionMaxQuery((
(title_main_cjk_v:"urbilder blossom kalligraphie")^100.0 |
(title_main_indexed_t:"urbild blossom kalligraphi")^350.0 |
(title_main_indexed_tp:"urbilder blossom kalligraphie")^3000.0)~1.0)

The ^nnnn indicates the relevance weight given to any matches it finds, while the ~n.n indicates the number of matches that are required from each clause to consider the document a match. Matches in fields with higher boosts count more than fields with lower boosts. You might notice another thing, that full phrase matches are boosted the most, two consecutive term matches are boosted slightly less, and then individual term matches are given the least boost. Furthermore unstemmed field matches (those that have been modified the least by the indexer, such as in the field title_main_indexed_tp) get more boost than stemmed field matches. This provides the best of both worlds — you still get a match if you search for “blossom” instead of “blossoming,” but if you had searched for “blossoming” the exact term match would boost the score of the document in results. Solr also considers how common the term is among all documents in the index so that very common words like “the” don’t boost the relevance score as much as less common words like “kalligraphie.”

I hope this provides some insight into what happens when you clicks search. Happy searching.

Building a new Staff Directory

The staff directory on the Library’s website was last overhauled in late 2014, which is to say that it has gotten a bit long in the tooth! For the past few months I’ve been working along with my colleagues Sean Aery, Tom Crichlow, and Derrek Croney on revamping the staff application to make it more functional, easier to use, and more visually compelling.

staff directory interface
View of the legacy staff directory interface

 

Our work was to be centered around three major components — an admin interface for HR staff, an edit form for staff members, and the public display for browsing people and departments. We spent a considerable amount of time discussing the best ways to approach the infrastructure for the project. In the end we settled on a hybrid approach in which the HR tool would be built as a Ruby-on-Rails application, and we would update our existing custom Drupal module for staff editing and public UI display.

We created a seed file for our Rails app based on the legacy data from the old application and then got to work building the HR interface. We decided to rely on the Rails Admin gem as it met most of our use cases and had worked well on some other internal projects. As we continued to add features, our database models became more and more complex, but working in Rails makes these kind of changes very straightforward. We ended up with two main tables (People and Departments) and four auxiliary tables to store extra attributes (External Contacts, Languages, Subject Areas, and Trainings).

rails admin
View of the Rails Admin dashboard

 

We also made use of the Ancestry gem and the Nestable gem to allow HR staff to visually sort department hierarchy. This makes it very easy to move departments around quickly and using a visual approach, so the next time we have a large department reorganization it will be very easy to represent the changes using this tool.

department sorting
Nestable gem allows for easy sorting of departments

 

After the HR interface was working well, we concentrated our efforts on the staff edit form in Drupal. We’d previously augmented the default Drupal profile editor with our extra data fields, but wanted to create a new form to make things cleaner and easier for staff to use. We created a new ‘Staff Profile’ tab and also included a link on the old ‘Edit’ tab that points to the new form. We’re enabling staff to include their subject areas, preferred personal pronouns, language expertise, and to tie into external services like ORCID and Libguides.

drupal edit form
Edit form for Staff Profile

 

The public UI in Drupal is where most of our work has gone. We’ve created four approaches to browsing; Departments, A–Z, Subject Specialists, and Executive Group. There is also a name search that incorporates typeahead for helping users find staff more efficiently.

The Department view displays a nested view of our complicated organizational structure which helps users to understand how a given department relates to another one. You can also drill down through departments when you’ve landed on a department page.

departments
View of departments

 

Department pages display all staff members therein and positions managers at the top of the display. We also display the contact information for the department and link to the department website if it exists.

department example
Example of a department page

 

The Staff A–Z list allows users to browse through an alphabetized list of all staff in the library. One challenge we’re still working through is staff photos. We are lacking photos for many of our staff, and many of the photos we do have are out of date and inconsistently formatted. We’ve included a default avatar for staff without photos to help with consistency, but they also serve the purpose of highlighting the number of staff without a photo. Stay tuned for improvements on this front!

a-to-z list
A-to-Z browse

 

The Subject Specialists view helps in finding specific subject librarians. We include links to relevant research guides and appointment scheduling. We also have a text filter at the top of the display that can help quickly narrow the results to whatever area you are looking for.

subject specialists
Subject Specialists view

 

The Executive Group display is a quick way to view the leadership of the library.

executive group
Executive Group display

 

One last thing to highlight is the staff display view. We spent considerable effort refining this, and I think our work has really paid off. The display is clean and modern and a great improvement from what we had before.

old profile
View of staff profile in legacy application
updated profile
View of the same profile in the new application

 

In addition to standard information like name, title, contact info, and department, we’re displaying:

  • a large photo of the staff person
  • personal pronouns
  • specialized trainings (like Duke’s P.R.I.D.E. program)
  • links our to ORCID, Libguides, and Libcal scheduling
  • customizable bio (with expandable text display)
  • language expertise
  • subject areas

Our plan is to roll out the new system at the end of the month, so you can look forward to a greatly improved staff directory experience soon!

Managing impermanence – migration of the Libraries’ digital exhibits

Post contributed by Claire Cahoon, student in the master’s program at the School of Information and Library Science, UNC-Chapel Hill.

This summer I worked as a field experience student in the Software Services department migrating digital exhibits into Omeka 2, Duke’s most current platform. The ultimate goal was to start and document the process of moving exhibits from legacy platforms into Omeka 2.

The reasoning behind the project became clear as we started creating an index of all of the digital exhibits on display in the exhibits website. Out of 97 total exhibits, there were varying degrees of functionality, from the most recent and up-to-date exhibits, to sites with broken links and pages where only text would display, leaving out crucial images. Centralizing these into a single platform should make it easier to create, support, and maintain all of these exhibits.

Screenshot of the sidebar of an exhibit, showing the link to the previous version of the exhibit in the Internet Archive
Screenshot of the sidebar of an exhibit, showing the link to the previous version of the exhibit in the Internet Archive

I found exhibits in Omeka 1, Cascade, Scriptorium, JAlbum, and even found a few mystery platforms that we never identified. Since it was the largest, we decided to work on the Omeka 1 group over the summer, and this week I finished migrating all 34 exhibits – that means that after a few adjustments to make the new exhibits available, Omeka 1 can be shut off!

We worked with Meg Brown, Exhibits Coordinator for the Libraries, and the exhibits department to figure out how each exhibit needed to be represented. Since we were managing expectations from lots of different stakeholders, we landed on the idea to include a link to the archived version of each exhibit in the WayBack machine, in case the look and feel of the new exhibits is limiting for anyone used to Omeka 1.

Working with the internet archive links and sorting through broken pieces of these exhibits really put into perspective how impermanent the internet is, even for seemingly static information. Without much maintenance, these exhibits lost some of the core content when video links changed, references were lost, and even the most well-written custom code stopped working. I hope that my work this summer will help keep these exhibit materials in working order while also eliminating the need to continue supporting for Omeka 1.

While migrating, I came across a few favorite exhibits and items that combined interesting content and some updated features in Omeka 2:

Cover of “Anxious homes: cursory-cleaning for the imminent arrival of visitors or how to give the impression of a clean house in under 20 minutes” by Jackie Batey.
Cover of “Anxious homes: cursory-cleaning for the imminent arrival of visitors or how to give the impression of a clean house in under 20 minutes” by Jackie Batey. Available in the Rubenstein Library: N7433.4.B38 A59 2006

Book + Art: Artists’ books from the Sallie Bingham Center for Women’s History and Culture (and the old version of Book + Art)

John Hope Franklin: Imprint of an American Scholar (and the old version of the John Hope Franklin exhibit)

Cheap Thrills: The Highs and Lows of Paris’s Cabaret Culture (and the old version of Cheap Thrills)

Medicology, or, Home encyclopedia of health: a complete family guide... Vol. I, by Joseph Gibbons Richardson (1904).
Medicology, or, Home encyclopedia of health: a complete family guide… Vol. I, by Joseph Gibbons Richardson (1904). Available in the Rubenstein Library: RC81 .R52 1904

Animated Anatomies: The Human Body in Anatomical Texts from the 16th to 21st Centuries (and the old version of Animated Anatomies)

Omeka still has some quirks to work out, and the accessibility of the pages and the metadata display are still in the works. However, migrating these exhibits into Omeka 2 will make them much easier to support and change for improvements. Thanks to the team that worked with me and taught me so much this summer: Will Sexton, Michael Daul, and Meg Brown!

Join our Team!

Do you have photography skills? Do you want to work with cultural heritage materials? Do you seek a highly collaborative work environment dedicated to preserving and making rare materials digitally available? If so, consider applying to be the next Digitization Specialist at Duke!

The Digitization Specialist produces digital surrogates of rare materials that include books, manuscripts, audio, and moving image collections. The ideal candidate should be detail-oriented, possess excellent organizational, project management skills and an ability to work independently and effectively in a team environment. The successful candidate will join the Digital Collections and Curation Services department and work under the direct supervision of the Digital Production Services manager.

The Digital Production Center (DPC) is a specialized unit dedicated to creating digital surrogates of primary resource materials from Duke University Libraries. Learn more about the DPC on our webpage, or through our department blog, Bitstreams. To get a sense of the variety of interesting and important collections we’ve digitized, immerse yourself in the Duke Digital Collections. We currently have over 640 digital collections comprising of 103,247 items – and we’re looking to do even more with your skills!

Duke is a diverse community committed to the principles of excellence, fairness, and respect for all people. As part of this commitment, we actively value diversity in our workplace and learning environments as we seek to take advantage of the rich backgrounds and abilities of everyone. We believe that when we understand, celebrate, and tap into our uniqueness to creatively solve problems and address shared goals, our possibilities are limitless. Duke University Libraries value diversity of thought, perspective, experience, and background and are actively committed to a culture of inclusion and respect.

Duke’s hometown is Durham, North Carolina, a city with vibrant research, medical and arts communities, and numerous shops, restaurants and theaters. Durham is located in the Research Triangle, a growing metropolitan area of more than one million people that provides a wide range of cultural, recreational and educational opportunities. The Triangle is conveniently located just a few hours from the mountains and the coast, offers a moderate climate, and has been ranked among the best places to live and to do business.

Duke offers a comprehensive benefit package, which includes traditional benefits such as health insurance, dental, leave time and retirement, as well as wide range of work/life and cultural benefits. More information can be found at: https://hr.duke.edu/benefits. For more information and to apply, please submit an electronic resume, cover letter, and a list of 3 references to https://library.duke.edu/about/jobs/digitizationspecialist. Search for Requisition ID #4778. Review of applications will begin immediately and will continue until the position is filled.

Two Office 365 Tips to Aid Productivity

Tip One: Access Office 365 at Home for Free

Did you know you can install Microsoft Office at home for free?  As a Duke permanent employee, you have access to a limited number of downloads of the Office package at no charge. The license works for both PC and Mac.  You may also use any browser to access the download.  Start by navigating to https://outlook.office.com/mail/inbox to access your webmail.  Log in with your NetID and password and you should see your inbox. (Side note: you can also go through Duke OIT at https://oit.duke.edu/what-we-do/applications/office-365 and click on the Access Office 365 Email).

Once inside look for the Duke logo and a series of squares (some technicians like to call it the waffle).  Click the waffle and then click the “Office 365″ link in the top right.

This should navigate you to a webpage that has an “Install Office” link.

The link will then give you the option to download the Office 365 package.  Click Office 365 apps” and the .exe (.pkg file for Mac) will download.  Click on the file for installation once it completes.

You will then see a setup wizard which will then install the Word, Excel, PowerPoint, Publisher, Access, Outlook, and OneNote.  The first app opened will require a one-time activation which will again require your NetID and password.  (Side note: Once an app is activated, all apps are activated.  No need to do this for all of the apps). If everything is done correctly, you should now be able to use Microsoft Office on your home machine.  The licenses also apply to mobile devices (phones and tablets).

 

Tip Two: Create PDF files directly from Office Apps

Many people use Adobe Acrobat to convert documents to the .pdf format. The useful format can also be used directly from Office Apps.  The pictures used will involve the use of Microsoft Word but the procedure works with all Office Apps.

When your final document is ready for conversion, find the File menu.  Click File and Save As.

Choose the “Save As” location but make sure to remember where you put it!  (I’ll use the Desktop folder for this demonstration.  Make sure to click the “Save as Type” box and then click PDF and then click Save.

No need to do anything else!  The document is now a .pdf file.  The .pdf cannot be directly edited in Microsoft Word.  You must use the Word document to make further edits and convert once you have made those edits.

Celebrating a New Duke Digital Collections Milestone with Section A

Duke Digital Collections recently passed 100,000 items!

 

Last week, it was brought to our attention that Duke Digital Collections recently passed 100,000 individual items found in the Duke Digital Repository! To celebrate, I want to highlight some of the most recent materials digitized and uploaded from our Section A project. In the past, Bitstreams has blogged about what Section A is and what it means, but it’s been a couple of years since that post, and a little refresher couldn’t hurt.

What is Section A?

In 2016, the staff of Rubenstein Research Services proposed a mass digitization project of Section A. This is the umbrella term for 175 boxes of different historic materials that users often request – manuscripts, correspondence, receipts, diaries, drawings, and more. These boxes contain around 3,900 small collections that all had their own workflows. Every box needs consultations from Rubenstein Research Services, review by Library Conservation Department staff, review by Technical Services, metadata updates, and more, all to make sure that the collections could be launched and hosted within the Duke Digital Repository. 

In the 2 years since that blog post, so much has happened! The first 2 Section A collections had gone live as a sort of proof-of-concept, and as a way to define what the digitization project would be and what it would look like. We’ve added over 500 more collections from Section A since then. This somehow barely even scratches the surface of the entire project! We’re digitizing the collections in alphabetical order, and even after all the collections that have gone online, we are currently still only on the letter “C”! 

Nonetheless, there is already plenty of materials to check out and enjoy. I was a student of history in college, so in this blog post, I want to particularly highlight some of the historic materials from the latter half of the 19th century.

Showing off some of Section A

Clara Barton’s description of the Grand Hotel de la Paix in Lyon, France.

In 1869, after her work as a nurse in the Civil War, Clara Barton traveled around Europe to Geneva, Switzerland and Corsica, France. Included in the Duke Digital Collections is her diary and calling cards from her time there. These pages detail where she visited and stayed throughout the year. She also wrote about her views on the different European countries, how Americans and Europeans compare, and more. Despite her storied career and her many travels that year, Miss Barton felt that “I have accomplished very little in a year”, and hoped that in 1870, she “may be accounted worthy once more to take my place among the workers of the world, either in my own country or in some other”.

Back in America, around 1900, the Rev. John Malachi Bowden began dictating and documenting his experiences as a Confederate soldier during the Civil War, one of many that a nurse like Miss Barton may have treated. Although Bowden says he was not necessarily a secessionist at the beginning of the Civil War, he joined the 2nd Georgia Regiment in August 1861 after Georgia had seceded. During his time in the regiment, he fought in the Battles of Fredericksburg, Gettysburg, Spotsylvania Court House, and more. In 1864, Union forced captured and held Bowden as a prisoner at Maryland’s Point Lookout Prison, where he describes in great detail what life was like as a POW before his eventual release. He writes that he was “so indignant at being in a Federal prison” that he refused to cut his hair. His hair eventually grew to be shoulder-length, “somewhat like Buffalo Bill’s.”

Speaking of whom, Duke Digital Collections also has some material from Buffalo Bill (William Frederick Cody), courtesy of the Section A initiative. A showman and entertainer who performed in cowboy shows throughout the latter half of the 19th century, Buffalo Bill was enormously popular wherever he went. In this collection, he writes to a Brother Miner about how he invited seventy-five of his “old Brothers” from Bedford, VA to visit him in Roanoke. There is also a brief itinerary of future shows throughout North Carolina and South Carolina. This includes a stop here in Durham, NC a few weeks after Bill wrote this letter.

Buffalo Bill’s letter to his “Brother Miner”, dated October 17, 1916.

Around this time, Walter Clark, associate justice of the North Carolina Supreme Court, began writing his own histories of North Carolina throughout the 18th and 19th centuries. Three of Clark’s articles prepared for the University Magazine of the University of North Carolina have been digitized as part of Section A. This includes an article entitled “North Carolina in War”, where he made note of the Generals from North Carolina engaged in every war up to that point. It’s possible that John Malachi Bowden was once on the battlefield alongside some of these generals mentioned in Clark’s writings. This type of synergy in our collection is what makes Section A so exciting to dive into.

As the new Still Image Digitization Specialist at the Duke Digital Production Center, seeing projects like this take off in such a spectacular way is near and dear to my heart. Even just the four collections I’ve highlighted here have been so informative. We still have so many more Section A boxes to digitize and host online. It’s so exciting to think of what we might find and what we’ll digitize for all the world to see. Our work never stops, so remember to stay updated on Duke Digital Collections to see some of these newly digitized collections as they become available. 

Looking Ahead to MorphoSource 2.0

MorphoSource logo For the past year, developers in the Library’s Software Services department have been working to rebuild Duke’s MorphoSource repository for 3D research data. The current repository, available at www.morphosource.org, provides a place for researchers and curators to make scans of biological specimens available to other researchers and to the general public.

MorphoSource, first launched in 2013, has become the most popular website for virtual fossils in the world.  The site currently contains sixty thousand data sets representing twenty thousand specimens from seven thousand different species. In 2017, led by Doug Boyer in Duke Evolutionary Anthropology, the project received a National Science Foundation grant. Under this grant, the technical infrastructure for the repository will be moved to the Library’s management, and the user interface is being rebuilt using Hyrax, an open-source digital repository application widely implemented by libraries that manage research data.  The scope of the repository is being expanded to include data for cultural heritage objects, such as museum artifacts, architecture, and archaeological sites. Most importantly, MorphoSource is being improved with better performance, a more intuitive user experience, and expanded functionality for users to view and interact with the data within the site.

Viewing and manipulating CT scans and the derived 3D model of a platypus in the MorphoSource viewer

Management of 3D data is in itself complicated.  It becomes even more so when striving towards long-term preservation of the digital representation of a unique biological specimen. In many cases, these specimens no longer exist, and the 3D data becomes the only record of their particular morphology.  It’s necessary to collect not only the actual digital files, but extensive metadata describing both the data’s creation and the specimen that was scanned to create the data. This can make the process of contributing data daunting for researchers. To improve the user experience and assist users with entering metadata about their files, MorphoSource 2.0 will guide them through the process. Users will be asked questions about their data, what it represents, when and how it was created, and if it is a derivative of data already in MorphoSource. As they progress through making their deposit, the answers they provide will direct them through linking their deposit to records already in the repository, or help them with entering new metadata about the specimen that was scanned, the facility and equipment used to scan the specimen, and any automated processes that were run to create the files.

MorphoSource page showing an alligator skull
Screenshot of a MorphoSource media page showing an alligator skull.

The new repository will also improve the experience for users exploring metadata about contributed resources and viewing the accompanying 3D files. All of the data describing technical information, acquisition and processing information, ownership and permissions, and related files will be gathered in one page, and give users the option to expand or collapse different metadata sections as their interests dictate. A file viewer will also be embedded in the page, which also allows for full-screen viewing and provides several new tools for users analyzing the media. Besides being able to move and spin the model within the viewer, users can also adjust lighting and other factors to focus on different areas of the model, and take custom measurements of different points on the specimen. Most exciting, for CT image series, users can scroll through the images along three axes, or convert the images to a 3D model. For some data, users will also be able to share models by embedding the file viewer in a webpage.

The MorphoSource team is very excited about our planned improvements, and plans to launch MorphoSource 2.0 in 2020. Stay tuned for the launch date, and in the meantime please visit the current site: www.morphosource.org.

U-matic for the People

Duke Libraries has a large collection of analog videotapes, in several different formats. One of the most common in our archives is 3/4″ videotape, also called “U-matic” (shown above). Invented by Sony in 1969, U-matic was the first videotape to be housed inside a plastic cassette for portability. Before U-matic, videotape was recorded on very large reels in the 2″ format known as Quadruplex which required heavy recording and playback machines the size of household refrigerators. U-matic got its name from the shape of the tape path as it wraps around the video head drum, which looks like the letter U.

The VO-3800 enabled TV news crews to record directly to U-matic videotape at breaking news events.

The format was officially released in 1971, and soon became popular with television stations, when the portable Sony VO-3800 video deck was released in 1974. The VO-3800 enabled TV crews to record directly to U-matic videotape at breaking news events, which previously had to be shot with 16mm film. The news content was now immediately available for broadcast, as opposed to film, which had to wait for processing in a darkroom. And the compact videocassettes could easily and quickly be transported to the TV station.

In the 1970’s, movie studios also used U-matic tapes to easily transport filmed scenes or “dailies,” such as the first rough cut of “Apocalypse Now.” In 1976, the high-band BVU (Broadcast Video U-matic) version of 3/4″ videotape, with better color reproduction and lower noise levels, replaced the previous “lo-band” version.

The Digital Production Center’s Sony VO-9800P for PAL videotapes (top), and a Sony BVU-950 for NTSC tapes (bottom).

The U-matic format remained popular at TV stations throughout the 1980’s, but was soon replaced by Sony’s 1/2″ Betacam SP format. The BVU-900 series was the last U-matic product line made by Sony, and Duke Libraries’ Digital Production Center uses two BVU-950s for NTSC tapes, as well as a VO-9800P for tapes in PAL format. A U-matic videotape player in good working order is now an obsolete collector’s item, so they can be hard to find, and expensive to purchase.

Unfortunately, most U-matic tapes have not aged well. After decades in storage, many of the videotapes in our collection now have sticky-shed syndrome, a condition in which the oxide that holds the visual content is literally flaking off the polyester tape base, and is moist and gummy in texture. When a videotape has sticky-shed, not only will it not play correctly, the residue can also clog up the tape heads in the U-matic playback deck, then transfer the contaminant to other tapes played afterwards in the same deck.

The DPC’s RTI VT3100 U-matic tape cleaner.

To combat this, we always bake (dehumidify) our U-matic videotapes in a scientific oven at 52 celsius (125 fahrenheit) for at least 10 hours. Then we run each tape through a specialized tape-cleaning machine, which fast-forwards and rewinds each tape, while using a burnishing blade to wipe off any built-up residue. We also clean the video heads inside our U-matic decks before each playback, using denatured alcohol.

Most of the time, these procedures make the U-matic tape playable, and we are able to digitize them, which rescues the content from the videotapes, before the magnetic tape ages and degrades any further. While the U-matic tapes are nearing the end of their life-span, the digital surrogates will potentially last for centuries to come, and will be accessible online through our Duke Digital Repository, from anywhere in the world.

Notes from the Duke University Libraries Digital Projects Team