Data Sharing and Equity: Sabrina McCutchan, Data Architect

This post is part of the Research Data Curation Team’s ‘Researcher Highlight’ series.

Equity in Collaboration

The landscape of research and data is enterprising, expansive and diverse. This dynamic is notably visible in the work done at Duke Global Health Institute (DGHI). Collaboration with international partners inherently comes with many challenges. In a conversation with the Duke Research Data Curation team, Sabrina McCutchan of the Research Design and Analysis Core (RDAC) at DGHI shares her thoughts on why data sharing and access is critical to global health research.

Questions of equity must be addressed when discussing research data and scholarship on a global scale. For the DGHI data equity is a priority. International research partners deserve equal access to primary data to better understand what’s happening in their communities, contribute to policy initiatives that support their populations, and support their own professional advancement by publishing in research and medical journals.

 “We work with so many different countries, people groups, and populations around the world that often themselves don’t have access to the same infrastructure, technologies or training in data. It can be challenging to collect quality primary data on their own, but  becomes a little easier in partnership with a big research institution like Duke.”

Collaborations like the Adolescent Mental Health in Africa Network Initiative (AMANI) demonstrate the significance of data sharing. AMANI is led by Dr. Dorothy Dow of DGHI, Dr. Lukoye Atwoli of Moi University School of Medicine, and Dr. Sylvia Kaaya of Muhimbili University of Health and Allied Sciences (MUHAS) and involves participating researchers from academic and medical institutions in South Africa, Kenya, and Tanzania.

Why Share Data?

As a Data Architect, Sabrina is available to support DGHI in achieving their data sharing goals. She takes a holistic approach to identifying areas where the team needs data support. Considering at each stage of the project lifecycle how system design and data architecture will influence how data can be shared. This may entail drafting informed consent documents, developing strategies for de-identification, curating and managing data, or discovering solutions for data storage and publishing. For instance, in collaboration with CDVS Research Data Management Consultants, Sabrina has helped AMANI create a Dataverse to enable sharing restricted access health data for international junior researchers. Data from one of DGHI’s studies are also available in the Duke Research Data Repository.

“All of these components are interconnected to each other. You really need to think about what are going to be the impacts of a decision made early in the process of gathering data for this study further downstream when we’re analyzing that data and publishing findings from it.”

Reproducibility is another reason that sharing and publishing data is important to Sabrina. DGHI wants to increase data availability in accordance with FAIR principles so other researchers can independently verify, reproduce, and iterate on their work. This supports peers and contributes to the advancement of the field. Publishing data in an open repository can also increase their reach and impact. DGHI is also currently examining how to incorporate the CARE principles and other frameworks for ethical data sharing within their international collaborations.

Global collaborations in research are vital in these times. Sabrina advises that it’s important for researchers, especially Principal Investigators, to think holistically about research projects. For example, thinking about data sharing at the very beginning of the project and writing consent forms that support what they hope to do with the data. Equitable practices paired with data sharing create opportunities for greater discovery and progress in research.

 

What does it mean to be an actively antiracist developer?

The library has been committed to Diversity, Equity, and Inclusion for the past year extended, specifically through the work of DivE-In and the Anti-Racist Roadmap. And to that end, the Digital Strategies and Technology department, where I work, has also been focusing on these issues. So lately I’ve been thinking a lot about how, as a web developer, I can be actively antiracist in my work.

First, some context. As a cis-gendered white male who is gainfully employed and resides in one of the best places to live in the country, I am soaking in privilege. So take everything I have to say with that large grain of salt. My first job out of college was working at a tech startup that was founded and run by a black person. To my memory, the overall makeup of the staff was something like 40–50% BIPOC, so my introduction to the professional IT world was that it was normal to see people who were different than me. However, in subsequent jobs my coworker pool has been much less diverse and more representative of the industry in general, which is to say very white and very male, which I think is a problem. So how can an industry that lacks diversity actively work on promoting the importance of diversity? How can we push back against systematic racism and oppression when we benefit from those very systems? I don’t think there are any easy answers.

Antiracist Baby Cover
Antiracist Baby by Ibram X. Kendi

I think it’s important to recognize that for organizations driven by top-down decision making, sweeping change needs to come from above. To quote one of my favorite bedtime stories, “Point at policies as the problem, not people. There’s nothing wrong with the people!” But that doesn’t excuse ‘the people’ from doing the hard work that can lead to profound change. I believe an important first step is to acknowledge your own implicit bias (if you are able, attend Duke IT’s Implicit Bias in the Workplace Training). Confronting these issues is an uncomfortable process, but I think ultimately that’s a good thing. And at least for me, I think doing this work is an ongoing process. I don’t think my implicit biases will ever truly go away, so it’s up to me to constantly be on the lookout for them and to broaden my horizons and experiences.

So in addition to working on our internalized biases, I think we can also work on how we communicate with each other as coworkers. In a recent DST-wide meeting concerning racial equity at DUL, the group I was in talked a lot about interpersonal communication. We should recognize that we all have blind spots and patterns that we slip into, like being overly jargony, being terse and/or confrontational, and so on. We have the power to change these patterns. I think we also need to be thoughtful of the language we use and the words that we speak. We need to appreciate diversity of backgrounds and be mindful of the mental taxation of code switching. We can try to help each other feel more comfortable in own skin and feel safe expressing our thoughts and ideas. I think it’s profoundly important to meet people from a place of empathy and mutual respect. And we should not pass up the opportunities to have difficult conversations with each other. If I say something loaded with a microaggression and make a colleague feel uncomfortable or slighted, I want to be called out. I want to learn from my mistakes, and I would think that’s true for all of my coworkers.

aze-con
Axe-con is an open and inclusive digital accessibility conference

We can also incorporate anti-racist practices into the things we create. Throughout my career, I’ve tried to always promote the benefits of building accessible interfaces that follow the practices of universal design. Building things with accessibility in mind is good for everyone, not just those who make use of assistive technologies. And as an aside, axe-con 2021 was packed full of great presentations, and recording are available for free. We can take small steps like removing problematic language from our workflows (“master” branches are now “main”). But I think and hope we can do more. Some areas where I think we have an opportunity to be more proactive would be doing an assessment of our projects and tools to see to what degree (if at all) we seek out feedback and input from BIPOC staff and patrons. How can we make sure their voices are represented in what we create?

I don’t have many good answers, but I will keep listening, and learning, and growing.

An Intern’s Investigation on Decolonizing Archival Descriptions and Legacy Metadata

This post was written by Laurier Cress. Laurier Cress is a graduate student at the University of Denver studying Library Science with an emphasis on digital collections, rare books and manuscripts, and social justice in librarianship and archives. In addition to LIS topics, she is also interested in Medieval and Early Modern European History. Laurier worked as a practicum intern with the Digital Collections and Curation Services Department this winter to investigate auditing practices for decolonizing archival descriptions and metadata. Laurier will complete her masters degree in the Fall of 2021. In her spare time, she also runs a YouTube channel called, Old Dirty History, where she discusses historic events, people, and places throughout history.

Now that diversity, equity, and inclusion (DEI) are popular concerns for libraries throughout the United States, discussions on DEI are inescapable. These three words have become reoccurring buzzwords dropped in meetings, classroom lectures, class syllabi, presentations, and workshops across the LIS landscape. While in some contexts, topics in DEI are thrown around with no sincere intent or value behind them, some institutions are taking steps to give meaning to DEI in librarianship. As an African American MLIS student at the University of Denver, I can say I have listened to one too many superficial talks on why DEI is important in our field. These conversations customarily exclude any examples on what DEI work actually looks like. When Duke Libraries advertised a practicum opportunity devoted to hands on experience exploring auditing practices for legacy metadata and harmful archival descriptions, I was immediately sold. I saw this experience as an opportunity to learn what scholars in our field are actually doing to make libraries a more equitable and diverse place.

As a practicum intern in Duke Libraries’ Digital Collections and Curation Services (DCCS) department, I spent three months exploring frameworks for auditing legacy metadata against DEI values and investigating harmful language statements for the department. Part of this work also included applying what I learned to Duke’s collections. Duke’s digital collections boasts 131,169 items and 997 collections, across 1,000 years of history from all over the world. Many of the collections represent a diverse array of communities that contribute to the preservation of a variety of cultural identities. It is the responsibility of institutions with cultural heritage holdings to present, catalog, and preserve their collections in a manner that accurately and respectively portrays the communities depicted within them. However, many institutions housing cultural heritage collections use antiquated archival descriptions and legacy metadata that should be revisited to better reflect 21st century language and ideologies. It is my hope that this brief overview on decolonizing archival collections not only aids Duke, but other institutions as well.

Harmful Language Statement Investigation

During the first phase of my investigation, I conducted an analysis on harmful language statements across several educational institutions throughout the United States. This analysis served as a launchpad for investigating how Duke can improve upon their inclusive description statement for their digital collections. During my investigation, I created a list that comprises of 41 harmful language statements. Some of these institutions include:

  • The Walters Museum of Art
  • Princeton University
  • University of Denver
  • Stanford University
  • Yale University

After gathering a list of institutions with harmful language statements, the next phase of my investigation was to conduct a comparative analysis to uncover what they had in common and how they differed. For this analysis, 12 harmful language statements were selected at random from the total list. From this investigation, I created the Harmful Statement Research Log to record my findings. The research log comprises of two tabs. The first tab includes a list of harmful statements from 12 institutions, with supplemental comments and information about each statement. The second tab provides a list of 15 observations deduced from cross examining the 12 harmful language statements. Some observations made include placement, length, historical context, and Library of Congress Subject Heading (LCSH) disclaimers. It is important for me to note, while some of the information provided within the research log is based on pure observation, much of the report also includes conclusions based on personal opinions born from my own perspective as a user.

Decolonizing Archival Descriptions & Legacy Metadata

The next phase in my research was to investigate frameworks and current sentiments on decolonizing archival description and legacy metadata for Duke’s digital collections. Due to the limited amount of research on this subject, most of the information I came across was related to decolonizing collections describing Indigenous peoples in Canada and African American communities. I found that the influence of late 19th and early 20th centuries library classification systems can still be found within archival descriptions and metadata in contemporary library collections. The use of dated language within library and archival collections encourages the inequality of underrepresented groups through the promotion of discriminatory infrastructures established by these earlier classification systems. In many cases, offensive archival descriptions are sourced from donors and creators. While it is important for information institutions to preserve the historical context of records within their collections, descriptions written by creators should be contextualized to help users better understand the racial connotation surrounding the record. Issues regarding contextualizing racist ideologies from the past can be found throughout Duke’s digital collections.

During my investigation, I examined Duke’s MARC records from the collection level to locate examples of harmful language used within their descriptions. The first harmful archival description I encountered was from the Alfred Boyd Papers. The archival description describes a girl referenced within the papers as “a free mulatto girl”.  This is an example of when archival description should not shy away from the realities of racist language used during the period the collection was created in; however, context should be applied. “Mulatto” was an offensive term used during the era of slavery in the United States to refer to people of African and White European ancestry. It originates from the Spanish word “mulato”, and its literal meaning is “young mule”. While this word is used to describe the girl within the papers, it should not be used to describe the person within the archival description without historical context.Screenshot of metadata from the Alfred Boyd papers

When describing materials concerning marginalized peoples, it is important to preserve creator-sourced descriptions, while also contextualizing them. To accomplish this, there should be a defined distinction between descriptions from the creator and the institution’s archivists. Some institutions, like The Morgan Library and Museum, use quotation marks as part of their in-house archival description procedure to differentiate between language originating from collectors or dealers versus their archivists. It is important to preserve contextual information, when racism is at the core of the material being described, in order for users to better understand the collection’s historic significance. While this type of language can bring about feelings of discomfort, it is also important to not allow your desire for comfort to take precedence over conveying histories of oppression and power dynamics. Placing context over personal comfort also takes the form of describing relationships of power and acts of violence just as they are. Acts of racism, colonization, and white supremacy should be labeled as such. For example, Duke’s Stephen Duvall Doar Correspondence collection describes the act of “hiring” enslaved people during the Civil War. Slavery does not imply hired labor because hiring implies some form of compensation. Slavery can only equate to forced labor and should be described as such.

Several academic institutions have taken steps to decolonize their collections. At the beginning of my investigation, a mentor of mine referred me to the University of Alberta Library’s (UAL) Head of Metadata Strategies, Sharon Farnel. Farnel and her colleagues have done extensive work on decolonizing UAL’s holdings related to Indigenous communities. The university declared a call to action to protect the representation of Indigenous groups and to build relationships with other institutions and Indigenous communities. Although UAL’s call to action not only encompasses decolonizing their collections, for the sake of this article, I will solely focus on the framework they established to decolonize their archival descriptions.

Community Engagement is Not Optional

Farnel and her colleagues created a team called the Decolonizing Description Working Group (DDWG). Their purpose was to propose a plan of action on how descriptive metadata practices could more accurately and respectfully represent Indigenous peoples. The DDWG included a Metadata Coordinator, a Cataloguer, a Public Service Librarian, a Coordinator of Indigenous Initiatives, and a self-identified Indigenous MLIS Intern. Much of their work consisted of consulting with the community and collaborating with other institutions. When I reached out to Farnel, she was so kind and generous with sharing her experience as part of the DDWG. Farnel told me that the community engagement approach taken is dependent on the community. Marginalized peoples are not a monolith; therefore, there is no “one size fits all” solution. If you are going to consult community members, recognize the time and expertise the community provides. This relationship has to be mutually beneficial, with the community’s needs and requests at the forefront at all times.

For the DDWG, the best course of action was to start building a relationship with local Indigenous communities. Before engaging with the entire community, the team first engaged with community elders to learn how to proceed with consulting the community from a place of respect. Because the DDWG’s work took place prior to COVID-19, most meetings with the community took place in person. Farnel refers to these meetings as “knowledge gathering events”. Food and beverages were provided and a safe space for open conversation. A community elder would start the session to set the tone.

In addition to knowledge gathering events, Aboriginal and non-Aboriginal students and alumni were consulted through an informal short online survey. The survey was advertised through an informal social media posting. Once the participants confirmed the desire to partake in the survey, they received an email with a link to complete it. Participants were asked questions based on their feelings and reactions to potentially changing the Library of Congress Subject Headings (LCSH) that related to Aboriginal content.

Auditing Legacy Metadata and Archival Descriptions

There is more than one approach an institution can take to start auditing legacy metadata and descriptions. In a case study written by Dorothy Berry, who is currently the Digital Collections Program Manager at Harvard’s Houghton Library, she describes a digitization project that took place at the University of Minnesota Libraries. The purpose of the project was to not only digitize African American heritage materials within the university’s holdings, but to also explore ways mass digitization projects can help re-aggregate marginalized materials. This case study serves as an example of how collections can be audited for legacy metadata and archival descriptions during mass digitization projects. Granted, this specific project received funding to support such an undertaking and not all institutions have the amount of currency required to take on an initiative of this magnitude. However, this type of work can be done slowly over a longer period of time. Simply running a report to search for offensive terms such as “negro”, or in my case “mulatto”, is a good place to start. Be open to having discussions with staff to learn what offensive language they also have come across. Self-reflection and research are equally important. Princeton University Library’s inclusive description working group spent two years researching and gathering data on their collections before implementing any changes. Part of their auditing process also included using a XQuery script to locate harmful descriptions and recover histories that were marginalized due to lackluster description.

Creators Over Community = Problematic

While exploring Duke’s digital collections, one problem that stood out to me the most was the perpetual valorization of creators. This is often found in collections with creators who are white men. Adjectives like “renowned”, “genius’, “talented”, and “preeminent” are used to praise the creators and make the collection more about them instead of the community depicted within the collection. An example of this troublesome language can be found in Duke’s Sidney D. Gamble’s Photographs collection. This collection comprises of over 5,000 black and white photographs taken by Sidney D. Gamble during his four visits to China from 1908 to 1932. Content within the photographs encompass depictions of people, architecture, livestock, landscapes, and more. Very little emphasis is placed on the community represented within this collection. Little, if any, historical or cultural context is given to help educate users on the culture behind the collection. And the predominate language used here is English. However, there is a
full page of information on the life and exploits of Gamble.

Screenshot of a description of the Sidney Gamble digital collection.

Describing Communities

Harmful language used to describe individuals represented within digital collections can be found everywhere. This is not always intentional. Dorothy Berry’s presentation with the Sunshine State Digital Network on conscious editing serves as a great source of knowledge on problematic descriptions that can be easily overlooked. Some of Berry’s examples include:

  • Class: Examples include using descriptions such as “poor family” or “below the poverty line”.
  • Race & Ethnicity: Examples include using dehumanizing vocabulary to describe someone of a specific ethnicity or excluding describing someone of a specific race within an image.
  • Gender: Example includes referring to a woman using her husband’s full name (Mrs. John Doe) instead of her own.
  • Ability: Example includes using offensive language like “cripple” to describe disabled individuals.

This is only a handful of problematic description examples from Berry’s presentation. I highly recommend watching not only Berry’s presentation, but the entire Introduction to Conscious Editing Series.

Library of Congress Subject Headings (LCSH) Are Unavoidable

I could talk about LCSH in relation to decolonizing archival descriptions for days on end, but for the sake of wrapping up this post I won’t. In a perfect world we would stop using LCSH altogether. Unfortunately, this is impossible. Many institutions use custom made subject headings to promote their collections respectfully and appropriately. However, the problem with using custom made subject headings that are more culturally relevant and respectful is accessibility. If no one is using your custom-made subject headings when conducting a search, users and aggregators won’t find the information. This defeats the purpose of decolonizing archival collections, which is to make collections that represent marginalized communities more accessible.

What we can do is be as cognizant as possible of the LCSHs we are using and avoid harmful subject headings as much as possible. If you are uncertain if a LCSH is harmful, conduct research or consult with communities who desire to be part of your quest to remove harmful language from your collections. Let your users know why you are limited to subject headings that may be harmful and that you recognize the issue this presents to the communities you serve. Also consider collaborating with Cataloginglab.org to help design new LCSH proposals and to stay abreast on new LCSH that better reflect DEI values. There are also some alternative thesauri, like homosaurus.org and Xwi7xwa Subject Headings, that better describe underrepresented communities.

Resources

In support of Duke Libraries’ intent to decolonize their digital collections, I created a Google Drive folder that includes all the fantastic resources I included in my research on this subject. Some of these resources include metadata auditing practices from other institutions, recommendations on how to include communities in archival description, and frameworks for decolonizing their descriptions.

While this short overview provides a wealth of information gathered from many scholars, associations, and institutions who have worked hard to make libraries a better place for all people, I encourage anyone reading this to continue reading literature on this topic. This overview does not come close to covering half of what invested scholars and institutions have contributed to this work. I do hope it encourages librarians, catalogers, and metadata architects to take a closer look at their collections.

FFV1: The Gains of Lossless

One of the greatest challenges to digitizing analog moving-image sources such as videotape and film reels isn’t the actual digitization. It’s the enormous file sizes that result, and the high costs associated with storing and maintaining those files for long-term preservation. For many years, Duke Libraries has generated 10-bit uncompressed preservation master files when digitizing our vast inventory of analog videotapes.

Unfortunately, one hour of uncompressed video can produce a 100 gigabyte file. That’s at least 50 times larger than an audio preservation file of the same duration, and about 1000 times larger than most still image preservation files. That’s a lot of data, and as we digitize more and more moving-image material over time, the long-term storage costs for these files can grow exponentially.

To help offset this challenge, Duke Libraries has recently implemented the FFV1 video codec as its primary format for moving image preservation. FFV1 was first created as part of the open-source FFmpeg software project, and has been developed, updated and improved by various contributors in the Association of Moving Image Archivists (AMIA) community.

FFV1 enables lossless compression of moving-image content. Just like uncompressed video, FFV1 delivers the highest possible image resolution, color quality and sharpness, while avoiding the motion compensation and compression artifacts that can occur with “lossy” compression. Yet, FFV1 produces a file that is, on average, 1/3 the size of its uncompressed counterpart.

sleeping bag
FFV1 produces a file that is, on average, 1/3 the size of its uncompressed counterpart. Yet, the audio & video content is identical, thanks to lossless compression.

The algorithms used in lossless compression are complex, but if you’ve ever prepared for a fall backpacking trip, and tightly rolled your fluffy goose-down sleeping bag into one of those nifty little stuff-sacks, essentially squeezing all the air out of it, you just employed (a simplified version of) lossless compression. After you set up your tent, and unpack your sleeping bag, it decompresses, and the sleeping bag is now physically identical to the way it was before you packed.

Yet, during the trek to the campsite, it took up a lot less room in your backpack, just like FFV1 files take up a lot less room in our digital repository. Like that sleeping bag, FFV1 lossless compression ensures that the compressed video file is mathematically identical to it’s pre-compressed state. No data is “lost” or irreversibly altered in the process.

Duke Libraries’ Digital Production Center utilizes a pair of 6-foot-tall video racks, which house a current total of eight videotape decks, comprised of a variety of obsolete formats such as U-matic (NTSC), U-matic (PAL), Betacam, DigiBeta, VHS (NTSC) and VHS (PAL, Secam). Each deck is converted from analog to digital (SDI) using Blackmagic Design Mini Converters.

The SDI signals are sent to a Blackmagic Design Smart Videohub, which is the central routing center for the entire system. Audio mixers and video transcoders allow the Digitization Specialist to tweak the analog signals so the waveform, vectorscope and decibel levels meet broadcast standards and the digitized video is faithful to its analog source. The output is then routed to one of two Retina 5K iMacs via Blackmagic UltraStudio devices, which convert the SDI signal to Thunderbolt 3.

FFV1 video digitization in progress in the Digital Production Center.

Because no major company (Apple, Microsoft, Adobe, Blackmagic, etc.) has yet adopted the FFV1 codec, multiple foundational layers of mostly open-source systems software had to be installed, tested and tweaked on our iMacs to make FFV1 work: Apple’s Xcode, Homebrew, AMIA’s vrecord, FFmpeg, Hex Fiend, AMIA’s ffmprovisr, GitHub Desktop, MediaInfo, and QCTools.

FFV1 operates via terminal command line prompts, so some understanding of programming language is helpful to enter the correct prompts, and be able to decipher the terminal logs.

The FFV1 files are “wrapped” in the open source Matroska (.mkv) media container. Our FFV1 scripts employ several degrees of quality-control checks, input logs and checksums, which ensure file integrity. The files can then be viewed using VLC media player, for Mac and Windows. Finally, we make an H.264 (.mp4) access derivative from the FFV1 preservation master, which can be sent to patrons, or published via Duke’s Digital Collections Repository.

An added bonus is that, not only can Duke Libraries digitize analog videotapes and film reels in FFV1, we can also utilize the codec (via scripting) to target a large batch of uncompressed video files (that were digitized from analog sources years ago) and make much smaller FFV1 copies, that are mathematically lossless. The script runs checksums on both the original uncompressed video file, and its new FFV1 counterpart, and verifies the content inside each container is identical.

Now, a digital collection of uncompressed masters that took up 9 terabytes can be deleted, and the newly-generated batch of FFV1 files, which only takes up 3 terabytes, are the new preservation masters for that collection. But no data has been lost, and the content is identical. Just like that goose-down sleeping bag, this helps the Duke University budget managers sleep better at night.

We’re hiring!

The Digital Production Center (DPC) is looking to hire a Digitization Specialist to join our team! The DPC team is on the forefront of enabling students, teachers, and researchers to continue their research by digitizing materials from our library collections.  We get to work with a variety of unique and rare materials (in a multitude of formats), and we use professional equipment to get the work done. Imagine working on digitizing papyri and comic books – the spectrum is far and wide! Get a glimpse of the collections that have been digitized by DPC staff by checking out our Duke Digital Collections.

Also, the people are really nice (and right now, we’re working in a socially distanced manner)!

More information about the job description can be found here. The successful candidate should be detailed-oriented, possess excellent organizational, project management skills, have scanning experience, and be able to work independently and effectively in a team environment. This position is part of the Digital Collections and Curation Services department and will report to the Digital Production Services manager.

More information about Duke’s benefit package can be found at https://hr.duke.edu/benefits. For more information and to apply, please submit an electronic resume, cover letter, and a list of 3 references to https://library.duke.edu/about/jobs/digitizationspecialist. Review of applications will begin immediately and will continue until the position is filled.

Seats in the time of COVID: Improving new services with user feedback

In fall 2020, the Libraries quickly developed several new COVID-safe services as we reopened our facilities to students and faculty in the midst of the pandemic. Two such services were Library Takeout, which allows Duke affiliates to pick up reserved books with minimal contact, and an online reservation system for seats and equipment in library study spaces. 

Libraries staff spent significant time over the summer of 2020 developing these new services. Once they were put in operation in the fall of 2020, Assessment & User Experience staff knew we needed to gather feedback from users and analyze data to better understand how the services were working and what could be improved. We developed brief, anonymous feedback surveys to be sent during two-week periods to each person who reserved equipment or a study seat or made an appointment to pick up books. 

 

What did we learn?

The vast majority of the 111 patrons who responded to the Library Takeout survey were extremely satisfied with both wait time and safety precautions, as shown in the figure below.

Satisfaction levels by library location with Library Takeout service

Patrons were also asked what worked well about the process, what did not work well, and whether they had any additional comments or suggestions. There were 69 comments about things that worked well. The most prevalent themes in these compliments were clear instructions, very short wait times, friendly security and staff, access to parking, and adequate safety precautions.

The directions were clear, the parking pass for the Upper Allen lot made arriving on campus for pick up easy, the security staff were helpful and efficient, and the library staff was cheerful and helpful as I’ve come to expect.

Very rigorous about precautions. Keep it that way.

There were 34 comments about things that did not work well, many of which also make suggestions for improvements. For example: 

  • There was interest in the Libraries offering weekend hours for materials pick-up
  • Several students found the check-in requirements at the library entrance confusing 
  • There were complaints about having to make appointments at all to pick up materials 
  • Several students reported issues with their parking passes not opening the gates
  • Interest in having the confirmation email for a scheduled pick-up be sent earlier
  • Several felt that the security presence at the doors was uncomfortable

The survey for seat and equipment reservations received 114 responses in the two-week period in which this survey was distributed at the beginning of the fall semester. Users were asked how easy five activities were: using the online system to book, checking in, finding the seat/equipment, using it, and cleaning up/checking out. An overwhelming percent of users found it “extremely easy” to use their seat/equipment (89%). In general, close to two-thirds of users found each of the other activities “extremely easy.” When “Somewhat easy” and “extremely easy” responses are combined, between 85-97% of respondents found each activity easy. The activity with the lowest “easy” score (85%) was “cleaning up and checking out after your reservation.” 

Seat/equipment reservations: how easy or difficult were the following activities?

When asked what worked well about reserving and using a seat or equipment, many praised the booking website for its clarity, simplicity, and ease of use, and also praised the entire process. Students were happy to be assured of a seat when they came to the library, and many commented on how clean, quiet, and nicely socially distanced the library was. Compliments were offered for the signage as well as for the security staff’s assistance in finding seats.

It was easy from start to finish. The security guard at the front was very helpful in explaining how to find my seat.

Was happy to see cleaning supplies to wipe down the desk area. felt safe. good social distancing precautions!

When asked what did not work well about reserving and using a study seat or equipment, reported issues included the following: 

  • Some respondents hadn’t realized they were supposed to check out online or clean their seating area when they were finished. The Libraries should add visuals next to the seats instructing on these procedures. 
  • When reserving, patrons can’t tell which seats are close to electrical outlets or windows. They requested a floorplan, map, or photos of the spaces so they can see where the seats are in relation to other things. 
  • Multiple people asked for the ability to easily extend one’s study time in the same seat if no one had booked it after them by the time their session was up. 
  • For the website, several people complained about the inability to edit reservation times without canceling and rebooking the whole thing, and a few other clunky visual things about the tool used for reservations. 
  • Several people requested weekend hours for the service. 

 

Changes we were able to make based on feedback

By gathering student feedback when we first began offering these services, we were able to quickly make changes so that the services better met our users’ needs. Below is a list of some of the key changes we made in response to survey feedback. 

  1. Revised and expanded opening hours in both Lilly and Perkins & Bostock Libraries in response to student requests and an analysis of usage patterns based on reservation system data.
  2. Removed the “check in” requirement for study seat users early in the fall semester, once we realized this was posing problems
  3. Added floorplans, images, and descriptions to the study seat reservation system so that users can get more info as they book study seats (here’s an example)
  4. Added more physical signage in the buildings to help students find their seats
  5. Developed a guide to study seats, including pictures, descriptions, and amenities of seats in Lilly, Music, Perkins, Bostock, and Rubenstein 
  6. Added online information so that students can easily see which seats do not have access to electrical outlets when deciding which seat to reserve (see this example)
  7. Added an Interview Room for students to book for 90-minute periods. Students can use this space to participate in virtual interviews.
  8. Added information about parking, elevator access, and daily reservation limits to the Reserve a Seat webpage and Reservation system.
  9. Increased outreach and marketing about reservable Study Seats through email blasts, social media, and blog posts. Library Takeout got plenty of buzz through this catchy video that went viral this past fall (870,000 views and counting)!

A Preview of MorphoSource 2 Beta

It’s an exciting time for the MorphoSource team, as we work to launch the MorphoSource 2 Beta application next Wednesday!

The new application improves and expands upon the original MorphoSource, a repository for 3D research data, and is being built using Hyrax, an open-source digital repository application widely implemented by libraries to manage digital repositories and collections. The team has been working on the site for the last two and a half years, and is looking forward to our efforts being made available to the MorphoSource community. At launch, users will be able to access records for over 140,000 media files, contributed by 1,500 researchers from all over the world.

MorphoSource 2 Beta Homepage
MorphoSource 2 Beta Homepage

While the current site is still available for browsing at www.morphosource.org, we are migrating the repository data over to the new site in preparation for the launch, and have paused the ingest of new data sets. When the migration is complete, users will be able to access the new application at the current url. Users with an account on the old site will be able to log in to the new site using their MorphoSource 1 credentials.

In my last post in June, I described some of the features that were in development at that time. In this post, I’ll highlight a few recent additions with screenshots from the beta site: Browse, Search, and User Dashboards.

Browse

Browse pages have been added as a quick entry point for users to discover data in several different ways. Users can use these pages to immediately access media, biological specimens, cultural heritage objects, organizations, teams, or projects.

MorphoSource Browse Categories
MorphoSource 2 Beta Browse Categories

Media Types and Modalities: Users can view all media records of a specific file type, such as image, CT image series, or mesh or point cloud. There are also links to records created by different methods, such as X-Ray, Magnetic Resource Imaging, or Photogrammetry.

Physical Object Types: Links to view either all the Biological Specimens or Cultural Heritage Objects in MorphoSource

Biological Taxonomy:  Users can find specimen records through the taxonomy browse by drilling down through the taxonomic ranks. The MorphoSource taxonomy records have been imported from the GBIF Backbone Taxonomy or have been created by MorphoSource users.

Taxonomy browse page
Taxonomy Browse Page

Projects: Projects are user-created groupings of media and specimens. From the browse page, projects can be searched by title and sorted by title, description, team, creator, or number of associated media or objects.

MorphoSource 2 Project Browse
Project Browse Page

Teams: Teams are groups of MorphoSource users that share management of media and team projects. A Team may be associated with an organization. The Team browse page lets users search and sort teams in a similar way to the Projects browse page.

Organizations: Lastly, users can view all of the organizations that have biological specimens or cultural heritage objects in MorphoSource. An organization may be an institution, department, collection, facility, lab, or other group. From the Organizations browse page, users can search by name and sort by parent institution name or institution code.

Faceted Searching

In addition to the browse pages, records for Media, Biological Specimens, Cultural Heritage Objects, Organizations, Teams, and Projects can also be found through the MorphoSource search interface. Searching has been customized for the different record types to include relevant facets. The different search categories can be chosen from the dropdown next to the search box ‘Go’ button.

MorphoSource 2 Beta Media Search Results
Media Search Results

Search results for media records can be faceted by file type, modality, object type (biological specimen or cultural heritage object), organization, tag, or membership in a team or project, while search results for objects can be limited by object type, creator, organization, taxonomy, associated media types, associated media tags, and membership of associated media in a team or project. Organization and Team/Project searches similarly have their own sets of facets.

MorphoSource 2 Object Search
Biological Specimen and Cultural Heritage Object Search Results

User Dashboards

Users who register an account on the site will have access to a dashboard that enables them to manage their data downloads. The dashboard is accessed by clicking on the profile icon at the top right of the site, and will open to the user’s media cart. The media cart contains two sections – the top holds all media items that the user currently has permission to download, while the bottom has media items with a restricted status where download has not been requested or approved:

MorphoSource 2 Beta Media Cart
Default User Dashboard

Users who have been granted contributor access to the site will have a dashboard that opens to the media and objects that they have contributed:

MorphoSource 2 Beta Contributor Dashboard
Contributor Dashboard

From the menu at the left,  all users can access their previous downloads, or projects, teams, or other repository content to which they have been granted access, and manage their user profile.  In addition, contributors can also create and manage projects and teams.

We hope that the browse, search, and dashboard enhancements, along with the other features we have been working on over the last couple of years, will enable users to easily discover and manage data sets in MorphoSource. And although we are looking forward to the launch, we are also excited to continue working on the site, and will be adding even more features in the near future.

2020 Highlights from Digital Collections

Welcome to the 2020 digital collections round up!

In spite of the dumpster fire of 2020, Duke Digital Collections had a productive and action packed year (maybe too action packed at times). 

Per usual we launched new and added content to existing digital collections (full list below). We are also wrapping up our mega-migration from our old digital collections system (Tripod2) to the Duke Digital Repository! This migration has been in process for 5 years, yes 5 years. We plan to celebrate this exciting milestone more in January so stay tuned. 

A classroom and auditorium blueprint, digitized for a patron and launched this month.

The Digital Production Center, in collaboration with the Rubenstein Library, shifted to a new folder level workflow for patron and instruction requests. This workflow was introduced just in time for the pandemic and the resulting unprecedented number of digitization requests.  As a result of the demand for digital images, all project work has been put aside and the DPC is focusing on patron and instruction requests only. Since late June, the DPC has produced over 40,000 images!  

Another digital collections highlight from 2020 is the development of new features for our preservation and access interface, the Duke Digital Repository.  We have wasted no time using these new features especially “metadata only”  and the DDR to CONTENTdm connection

Looking ahead to 2021, our priorities will be the folder level digitization workflow for researcher and instruction requests. The DPC received 200+ requests since June, and we need to get all those digitized folders moved into the repository. We are also experimenting with preserving scans created outside of the DPC. For example Rubenstein Library staff created a huge number of access copies using reading room scanners, and we would like to make them available to others.  Lastly, we have a few bigger digital collections to ingest and launch as well. 

Thanks to everyone associated with Digital Collections for their incredible work this year!!  Whew, it has been…a year. 

One of our newest digital collections features postcards from Greece: Salonica / Selanik / Thessaloniki
One of the Radio Haiti photographs launched recently.

Laundry list of 2020 Digital Collections

New Collections

Digital Collections Additions

Migrated Collections

Implementing ArcLight: A Reflection

Around this time last year, I wrote about our ambitious plans to implement ArcLight software for archival discovery and access at Duke in 2020. While this year has certainly laid waste to so many good intentions, our team persisted through the cacophony undeterred, and—I’m proud to report—still hit our mark of going live on July 1, 2020 after a six-month work cycle. The site is available at https://archives.lib.duke.edu/.

Now that we have been live for awhile, I thought it’d be worthwhile to summarize what we accomplished, and reflect a bit on how it’s going.

Working Among Peers

I had the pleasure of presenting about ArcLight at the Oct 2020 Blacklight Summit alongside Julie Hardesty (Indiana University) and Trey Pendragon (Princeton University). The three of us shared our experiences implementing ArcLight at our institutions. Though we have the same core ArcLight software underpinning our apps, we have each taken different strategies to build on top of it. Nevertheless, we’re all emerging with solutions that look polished and fill in various gaps to meet our unique local needs. It’s exciting to see how well the software holds up in different contexts, and to be able to glean inspiration from our peers’ platforms.

Title slide to ArcLight@Duke presentation at Blacklight Summit
Slides from ArcLight@Duke presentation, 10/7/2020

A lot of content in this post will reiterate what I shared in the presentation.

Custom-Built Features

Back in April, I discussed at length our custom-built features and interface revisions that we had completed by the halfway point for the project. So, now let’s look closer at everything else we added in the second half (and in the post-launch period).

Browsing Collection Contents

This is one of the hardest things to get right in a finding aids UI, so our solution has evolved through many iterations. We created a context sidebar with lightly-animated loading indicators matching the number of items currently loading. The nav sticks with you as you scroll down the page and the Request button stays visible.  We also decided to present a list of direct child components in the main page body for any parent component.

Restrictions

At the collection level, we wanted to ensure that users didn’t miss any restrictions info, so we presented a taste of it at the top-right of the page that jumps you to the full description when clicking “More.”

Collection restrictions snippet

We changed how access and use restriction indexing so components can inherit their restrictions from any ancestor component. Then we made bright yellow banners and icons in the UI to signify that a component has restrictions.

Restrictions presented on a component

Hierarchical Record Group Browse

Using the excellent Blacklight Hierarchy plugin, we developed a way to browse University Archives collections by an existing hierarchical Record Group classification system. We encoded the group numbers, titles, nesting, and description in a YAML config file so they’re easy to change as they evolve.

Browse by Record Group

Digital Repository & Bento Search Integration

ArcLight exists among a wide constellation of other applications supporting and promoting discovery in the library, so integrating with these other pieces was an important part of our implementation. In April, I showed the interaction between ArcLight and our Requests app, as well as rendering digital object viewers/players inline via the Duke Digital Repository (DDR).

Inline digital object viewing via the DDR

Two other locations external to our application now use ArcLight’s APIs to retrieve archival information.  The first is the Duke Digital Repository (DDR). When viewing a digital collection or digital object that has a physical counterpart in the archives, we pull archival information for the item into the DDR interface from ArcLight’s JSON API.

Duke Digital Repository pulls archival info from ArcLight

The other is our “Bento” search application powering the default All search available from the library website. Now when your query finds matches in ArcLight, you’ll see component-level results under a Collection Guides bento box. Components are contextualized with a linked breadcrumb trail.

ArcLight search results presented in Bento search UI

 

Bookmarks Export CSV

COVID-19 brought about many changes to how staff at Duke Libraries retrieve materials for faculty and student research. You may have heard Duke’s Library Takeout song (819K YouTube views & counting!), and if you have, you probably can’t ever un-hear it.

But with archival materials, we’re talking about items that could never be taken out of the building. Materials may only be accessed in a controlled environment in the Rubenstein Reading Room, which remains highly restricted.  With so much Duke instruction moving online during COVID, we urgently needed to come up with a better workflow to field an explosion of requests for digitizing archival materials for use in remote instruction.

ArcLight’s Bookmarks feature (which comes via Blacklight) proved to be highly valuable here. We extended the feature to add a CSV export. The CSV is constructed in a way that makes it function as a digitization work order that our Digital Collections & Curation Services staff use to shepherd a request through digitization, metadata creation, and repository ingest. Over 26,000 images have now been digitized for patron instruction requests using this new workflow.

Bookmarks export to CSV

More Features

Here’s a list of several other custom features we completed after the April midway point.

  • Relevancy optimization
  • WCAG2.0 AA accessibility
  • ARKs & permalinks
  • Advanced search modal
  • Catalog record links
  • Dynamic sitemaps (via gem)
  • Creative Commons / RightsStatements.org integration
  • Twitter card metadata
  • Open Graph metadata
  • Google Analytics event tracking with Anonymize IP
  • Debug mode for relevancy tuning

Data Pipeline

Bringing ArcLight online required some major rearchitecting of our pipeline to preview and publish archival data. Our archivists have been using ArchivesSpace for several years to manage the source data, and exporting EAD2002 XML files when ready to be read by the public UI. Those parts remain the same for now, however, everything else is new and improved.
Data pipeline diagram for Duke finding aids

Our new process involves two GitLab repositories: one for the EAD data, and another for the ArcLight-based application. The data repo uses GitLab Webhooks to send POST requests to the app to queue up reindexing  jobs automatically whenever the data changes.  We have a test/preview branch for the data that updates our dev and test servers for the application, so archivists can easily see what any revised or new finding aids will look like before they go live in production.

We use GitLab CI/CD to easily and automatically deploy changes to the application code to the various servers. Each code change gets systematically checked for passing unit and feature tests, security, and code style before being integrated. We also aim to add automated accessibility testing to our pipeline within the next couple months.

A lot of data gets crunched while indexing EAD documents through Traject into Solr. Our app uses Resque-based background job processing to handle the transactions. With about 4,000 finding aids, this creates around 900,000 Solr documents; the index is currently a little over 1GB. Changes to data get reindexed and reflected in the UI near-instantaneously. If we ever need to reindex every finding aid, it takes only about one hour to complete.

What We Have Learned

We have been live for just over four months, and we’re really ecstatic with how everything is going.

Usability

In September 2020, our Assessment & User Experience staff conducted ten usability tests using our ArcLight UI, with five experienced archival researchers and five novice users. Kudos to Joyce Chapman, Candice Wang, and Anh Nguyen for their excellent work. Their report is available here. The tests were conducted remotely over Zoom due to COVID restrictions. This was our first foray into remote usability testing.

Remote usability testing screenshot

Novice and advanced participants alike navigated the site fairly easily and understood the contextual elements in the UI. We’re quite pleased with how well our custom features performed (especially the context sidebar, contents lists, and redesigned breadcrumb trail). The Advanced Search modal got more use than we had anticipated, and it too was effective. We were also somewhat surprised to find that users were not confused by the All Collections vs. This Collection search scope selector when searching the site.

“The interface design does a pretty good job of funneling me to what I need to see… Most of the things I was looking for were in the first place or two I’d suspect they’d be.” — Representative quote from a test participant

A few improvements were recommended as a result of the testing:

  1. make container information clearer, especially within the requesting workflow
  2. improve visibility of the online access facet
  3. make the Show More links in the sidebar context nav clearer
  4. better delineate between collections and series in the breadcrumb
  5. replace jargon with clearer labels, especially “Indexed Terms

We recently implemented changes to address 2, 3, and 5. We’re still considering options for 1 and 4.  Usability testing has been invaluable part of our development process. It’s a joy (and often a humbling experience!) to see your design work put through the paces with actual users in a usability test. It always helps us understand what we’re doing so we can make things better.

Usage

We want to learn more about how often different parts of the UI are used, so we implemented Google Analytics event tracking to anonymously log interactions. We use the Anonymize IP feature to help protect patron privacy.

Google Analytics Event Tracking
Top Google Analytics event categories & actions, Jul 1 – Nov 20, 2020.

Some observations so far:

  • The context nav sidebar is by far the most interacted-with part of the UI.
  • Browsing the Contents section of a component page (list of direct child components) is the second-most frequent interaction.
  • Subject, Collection, & Names are the most-used facets, in that order. That does not correlate with the order they appear in the sidebar.
  • Links presented in the Online Access banners were clicked 5x more often than the limiter in the Online Access facet (which matches what we found in usability testing)
  • Basic keyword searches happen 32x more frequently than advanced searches

Search Engine Optimization (SEO)

We want to be sure that when people search Google for terms that appear in our finding aids, they discover our resources. So when several Blacklight community members combined forces to create a Blacklight Dynamic Sitemaps gem this past year, it caught our eye. We found it super easy to set up, and it got the vast majority of our collection records Google-indexed within a month or so. We are interested in exploring ways to get it to include the component records in the sitemap as well.

Google Search Console showing index performance

 

Launching ArcLight: Retrospective

We’re pretty proud of how this all turned out. We have accomplished a lot in a relatively short amount of time. And the core software will only improve as the community grows.

At Duke, we already use Blacklight to power a bunch of different discovery applications in our portfolio. And given that the responsibility of supporting ArcLight falls to the same staff who support all of those other apps, it has been unquestionably beneficial for us to be able to work with familiar tooling.

We did encounter a few hurdles along the way, mostly because the software is so new and not yet widely adopted. There are still some rough edges that need to be smoothed out in the core software. Documentation is pretty sparse. We found indexing errors and had to adjust some rules. Relevancy ranking needed a lot of work. Not all of the EAD elements and attributes are accounted for; some things aren’t indexed or displayed in an optimal way.

Still, the pros outweigh the cons by far. With ArcLight, you get an extensible Blacklight-based core, only catered specifically to archival data. All the things Blacklight shines at (facets, keyword highlighting, autosuggest, bookmarks, APIs, etc.) are right at your fingertips. We have had a very good experience finding and using Blacklight plugins to add desired features.

Finally, while the ArcLight community is currently small, the larger Blacklight community is not. There is so much amazing work happening out in the Blacklight community–so much positive energy! You can bet it will eventually pay dividends toward making ArcLight an even better solution for archival discovery down the road.

Acknowledgments

Many thanks go out to our Duke staff members who contributed to getting this project completed successfully. Especially:

  • Product Owner: Noah Huffman
  • Developers/DevOps: Sean Aery, David Chandek-Stark, Michael Daul, Cory Lown (scrum master)
  • Project Sponsors: Will Sexton & Meghan Lyon
  • Redesign Team: Noah Huffman (chair), Joyce Chapman, Maggie Dickson, Val Gillispie, Brooke Guthrie, Tracy Jackson, Meghan Lyon, Sara Seten Berghausen

And thank you as well to the Stanford University Libraries staff for spearheading the ArcLight project.


This post was updated on 1/7/21, adding the embedded video recording of the Oct 2020 Blacklight Summit ArcLight presentation.

Access for One, Access for All: DPC’s Approach towards Folder Level Digitization

Earlier this year and prior to the pandemic, Digital Production Center (DPC) staff piloted an alternative approach to digitize patron requests with the Rubenstein Library’s Research Services (RLRS) team. The previous approach was focused on digitizing specific items that instruction librarians and patrons requested, and these items were delivered directly to that person. The alternative strategy, the Folder Level digitization approach, involves digitizing the contents of the entire folder that the item is contained in, ingesting these materials to the Duke Digital Repository (to enable Duke Library staff to retrieve these items), and when possible, publishing these materials so that they are available to anyone with internet access. This soft launch prepared us for what is now an all-hands-on-deck-but-in-a-socially-distant-manner digitization workflow.

Giao Luong Baker assessing folders in the DPC.

Since returning to campus for onsite digitization in late June, the DPC’s primary focus has been to perfect and ramp up this new workflow. It is important to note that the term “folder” in this case is more of a concept and that its contents and their conditions vary widely. Some folders may have 2 pages, other folders have over 300 pages. Some folders consists of pamphlets, notebooks, maps, papyri, and bound items. All this to say that a “folder” is a relatively loose term.

Like many initiatives at Duke Libraries, Folder Level Digitization is not just a DPC operation, it is a collaborative effort. This effort includes RLRS working with instructors and patrons to identify and retrieve the materials. RLRS also works with Rubenstein Library Technical Services (RLTS) to create starter digitization guides, which are the building blocks for our digitization guide. Lastly, RLRS vets the materials and determines their level of access. When necessary, Duke Library’s Conservation team steps in to prepare materials for digitization. After the materials are digitized, ingest and metadata work by the Digital Collections and Curation Services as well as the RLTS teams ensure that the materials are preserved and available in our systems.

Kristin Phelps captures a color target.

Doing this work in the midst of a pandemic requires that DPC work closely with the Rubenstein Library Access Services Reproduction Team (a section of RLRS) to track our workflow using a Google Doc. We track the point where the materials are identified by RLRS, through multiple quarantine periods, scanning, post processing, file delivery, to ingest. Also, DPC staff are digitizing in a manner that is consistent with COVID-19 guidelines. Materials are quarantined before and after they arrive at the DPC, machines and workspaces are cleaned before and after use, capture is done in separate rooms, and quality control is done off site with specialized calibrated monitors.

Since we started Folder Level digitization, the DPC has received close to 200 unique Instruction and Patron requests from RLRS. As of the publication of this post, 207 individual folders (an individual request may contain several folders) have been digitized. In total, we’ve scanned and quality controlled over 26,000 images since we returned to campus!

By digitizing entire folders, we hope this will allow for increased access to the materials without risking damage through their physical handling. So far we anticipate that 80 new digital collections will be ingested to the Duke Digital Repository. This number will only grow as we receive more requests. Folder Level Digitization is an exciting approach towards digital collection development, as it is directly responsive to instruction and researcher needs. With this approach, it is access for one, access for all!

Notes from the Duke University Libraries Digital Projects Team