Indexing variant names from the Library of Congress Name Authority File (LCNAF) in TRLN Discovery

You might or might not have noticed a TRLN Discovery feature announcement in the February TRLN News Roundup. It mentioned that we are now indexing variant names from the Library of Congress Name Authority File in TRLN Discovery. I thought in this post I would expand on what this change means for Duke’s Books & Media catalog, add some details about the technical implementation, and discuss some related features we might add in the future based on this work.

What is it?

First, the practical matter:  what does this feature mean for people who search the catalog? Our catalog records contain authoritative forms of creator names. This is the specific form of the person’s name chosen as the authoritative form by the Library of Congress. For example, the authoritative form of Emily Dickinson’s name is “Dickinson, Emily, 1830-1886.” If you search the Books & Media catalog using this form of the poet’s name you will find all records associated with her name (example search with the authoritative name). Previously, if you had searched the catalog and added the poet’s middle name, “Elizabeth,” it’s likely you would have missed many relevant results because “Elizabeth” is not included in the authoritative form of the name. It is, however, included in one of the variant names in the LC Name Authority File. The full list of variant names for Emily Dickinson is:

  • Dickinson, Emilia, 1830-1886
  • Dickinson, Emily Elizabeth, 1830-1886
  • Dickinson, Emily (Emily Elizabeth), 1830-1886
  • Dikinson, Ėmili, 1830-1886
  • D̲ikinson, Emily, 1830-1886
  • Ti-chin-sen, Ai-mi-li, 1830-1886
  • דיקינסון, אמילי, 1830־1886
  • דיקינסון, אמילי, 1886־1830
  • Dykinsan, Ėmili, 1830-1886

Emily Dickinson

Since we are now indexing these forms in TRLN Discovery you now get much better results if you happen to add Emily Dickinson’s middle name to your search (example search including a variant form of the name). Additionally, various romanizations and vernacular forms are indexed (example search for “דיקינסון, אמילי”).

If you clicked through to the example searches you may have noticed that the result counts and result order are slightly different when searching the authoritative form vs. the variant forms.

The variant forms are only indexed on records that include a URI that references the LC Name Authority File. If this URI reference is missing the variant names are not indexed for that record. Additionally, some records may not have been updated since we implemented this feature. In time all records that include URIs for names will have variant names indexed.

The difference in result order is due to how the variant names are indexed. For the authoritative form of the name we distinguish between creators, editors, contributors, etc. and give matches in these categories different boosts in the relevance ranking. At the moment, the variant names from the LCNAF file are indexed in a single field and so we lose the nuance needed for more granular relevance ranking. This is something that could be revised in the future if needed.

How does it work?

This feature relies on the fact that our MARC records include URI references to the LC Name Authority File. As an example, here’s a MARC XML 100 Main Entry-Personal Name field for Emily Dickinson with a URI reference to the authority file.

<datafield tag="100" ind1="1" ind2=" ">
<subfield code="a">Dickinson, Emily,</subfield>
<subfield code="d">1830-1886.</subfield>
<subfield code="0">http://id.loc.gov/authorities/names/n79054166</subfield>
</datafield>

We store this URI reference in the TRLN Discovery name field and then use this URI reference at ingest time to lookup and index the variant names from a local cache of the variant names. Here’s the stored name for Emily Dickinson in TRLN Discovery index.

names_a: ["{\"name\":\"Dickinson, Emily, 1830-1886\",\"rel\":\"author\",\"type\":\"creator\",\"id\":\"http://id.loc.gov/authorities/names/n79054166\"}"]

The TRLN Discovery ingest service keeps its own cache of the name identifiers and variant names for efficient lookup at ingest time. We use Redis, an open-source, in-memory (very fast) data store to make the variant names available when records are ingested. This local cache is built from the LC Name Authority File. Since the name authority file changes over time we will refresh our local cache of the data every 3 months to keep it up to date. We’ve written a script (Rails Rake task) that automates this update process.

What’s next?

The addition of stored name authority URIs in the TRLN Discovery index opens up opportunities to add more features in the future. I’m especially interested in displaying more contextual information about creators in our catalog. We could also expose “See also” references from the authority files to make it easier to find works by the same person published under different names (“Twain, Mark, 1835-1910” being a good example):

  • Clemens, Samuel Langhorne, 1835-1910
  • Conte, Louis de, 1835-1910
  • Snodgrass, Quintus Curtius, 1835-1910

Mark Twain

As always, we continue to add features and make incremental improvements to TRLN Discovery, and your feedback is critical. Please let us know how things are working for you using the feedback form available on every page of the Books & Media Catalog.

Library study space design: Intentional, inclusive, flexible

In the Assessment & User Experience department, one of our ongoing tasks is to gather and review patron feedback in order to identify problems and suggest improvements. While the libraries offer a wide variety of services to our patrons, one of the biggest and trickiest areas to get right is the design of our physical spaces. Typically inhabited by students, our library study spaces come in a variety of sizes and shapes and are distributed somewhat haphazardly throughout our buildings. How can we design our study spaces to meet the needs of our patrons? When we have study spaces with different features, how can we let our patrons know about them?

These questions and the need for a deeper assessment of library study space design inspired the formation of a small team – the Spaces With Intentional Furniture Team (or SWIFT). This team was charged with identifying best practices in study space furniture arrangement, as well as making recommendations on opportunities for improvements to existing spaces and outreach efforts. The team reviewed and summarized relevant literature on library study space design in report (public version now available). In this post, we will share a few of the most surprising and valuable suggestions from our literature review.

Increase privacy in large, open spaces

Some of the floors in our library buildings have large, open study spaces that can accommodate a large number of patrons. Because study space is limited, we are highly motivated to make the most of the space we have. The way a space is designed, however, influences how comfortable patrons feel spending a lot of time in the space.

With large open spaces, the topic of privacy came up across several different studies. In this context, privacy relates to both to visibility in a space and to the ability to make noise without being overheard. Even when policies allow for noise in a space, a lack of privacy can make students nervous to go ahead and be noisy. For spaces where silence is the norm, a lack of privacy can make patrons feel on display and especially nervous about any movements or sound they might make.

The literature suggests that there are ways to improve privacy in open spaces. For group spaces, placing dividers or partitions between group table arrangements may both offer privacy and provide useful amenities, like writeable surfaces. For quiet spaces, privacy can be improved by varying the type and height of furniture and by turning furniture in different directions so individuals are not facing each other. Seating density should also be restricted in quiet spaces.

Isolate noisy zones from quiet zones

Controlling noise is a common topic in the literature. Libraries are some of the only spaces on campus that offer a quiet study environment, but the need for quiet spaces needs to be balanced with the need to engage in the increasingly collaborative work required by modern classes. Libraries are often in central locations on campus and offer prime real estate for groups to meet in between or after classes. How to provide enough quiet space for people who need to work without distractions while still accommodating group work and socializing?

Once strategy is to make sure that people feel comfortable with making noise in spaces where it is encouraged. Libraries can position noisy spaces to take advantage of other sources of noise to provide some noise “cover” – for example,  a staff service desk, copy machines, elevators, and meeting rooms.  Quiet spaces should be isolated from these sources of noise, perhaps by placing them on separate floors. Stacks can also help separate spaces, as books provide some sound absorption, and the visual obstruction reduces visual distractions for students studying quietly.

Reservable private study rooms meet several needs

Sometimes, enforcing noise policies to keep spaces quiet only solves part of the problem. Quiet study spaces reduce distractions caused by noise, but students can be sensitive to other kinds of distractions – visual distractions, strong or chemical smells, etc. For students needing spaces completely free of distractions, libraries might consider creating reservable rooms available for individual study.

This kind of service is useful for more than low-distraction study needs. Making exceptions for pandemics, libraries often employ a first-come, first-served approach to seats in study spaces. Patrons with mobility issues or limited time to study would benefit greatly from being able to reserve a study space in advance. Identifying reservable study spaces for individuals, either within a larger study space or as part of a set of reservable private rooms, might meet a variety of currently unmet needs.

Physical spaces need web presences

As SWIFT begins to think about recommendations, we know we have to address our outreach around spaces. Patrons currently have few options for learning about our spaces. We have some signage in our buildings to identify different noise policies, and we have a few websites that give a basic overview of the spaces, but patrons are often reduced to simply performing exhaustive circuits around the buildings to discover all that we have available. More likely, students find a few of our spaces either by chance or by word of mouth, and if those spaces don’t meet their needs, they may not return.

One detailed review (Brunskill, 2020) offers very explicit guidance on the design of websites to support patrons with disabilities. As is commonly true, improvements that support one group of patrons often improve services for all patrons. Prominently sharing the following information about physical spaces will better support all patrons looking to find their space in the libraries:

  • details about navigating physical spaces (maps, floorplans, photos)
  • sensory information for spaces (noise, privacy, lighting, chemical sensitivity)
  • physical building accessibility
  • parking/transportation information
  • disability services contact (with name, contact form)
  • assistive technologies hardware and equipment
  • any accessibility problems with spaces

Next steps

Throughout our literature review, we saw the same advice over and over again: patrons need variety. There is no one-size-fits-all solution to patron needs. Luckily, at Duke we have several library buildings and many many study spaces. With some careful planning, we should be able to take an intentional approach to our space design in order to better accommodate the needs of our patrons. The libraries have new groups tasked with acting on these and related recommendations, and while it may take some time, our goal is to create a shared understanding of the best practices for library study space design.

Relevant Literature

Furniture Arrangement

Noise Isolation

Private Study Rooms

Websites about Spaces

Data Sharing and Equity: Sabrina McCutchan, Data Architect

This post is part of the Research Data Curation Team’s ‘Researcher Highlight’ series.

Equity in Collaboration

The landscape of research and data is enterprising, expansive and diverse. This dynamic is notably visible in the work done at Duke Global Health Institute (DGHI). Collaboration with international partners inherently comes with many challenges. In a conversation with the Duke Research Data Curation team, Sabrina McCutchan of the Research Design and Analysis Core (RDAC) at DGHI shares her thoughts on why data sharing and access is critical to global health research.

Questions of equity must be addressed when discussing research data and scholarship on a global scale. For the DGHI data equity is a priority. International research partners deserve equal access to primary data to better understand what’s happening in their communities, contribute to policy initiatives that support their populations, and support their own professional advancement by publishing in research and medical journals.

 “We work with so many different countries, people groups, and populations around the world that often themselves don’t have access to the same infrastructure, technologies or training in data. It can be challenging to collect quality primary data on their own, but  becomes a little easier in partnership with a big research institution like Duke.”

Collaborations like the Adolescent Mental Health in Africa Network Initiative (AMANI) demonstrate the significance of data sharing. AMANI is led by Dr. Dorothy Dow of DGHI, Dr. Lukoye Atwoli of Moi University School of Medicine, and Dr. Sylvia Kaaya of Muhimbili University of Health and Allied Sciences (MUHAS) and involves participating researchers from academic and medical institutions in South Africa, Kenya, and Tanzania.

Why Share Data?

As a Data Architect, Sabrina is available to support DGHI in achieving their data sharing goals. She takes a holistic approach to identifying areas where the team needs data support. Considering at each stage of the project lifecycle how system design and data architecture will influence how data can be shared. This may entail drafting informed consent documents, developing strategies for de-identification, curating and managing data, or discovering solutions for data storage and publishing. For instance, in collaboration with CDVS Research Data Management Consultants, Sabrina has helped AMANI create a Dataverse to enable sharing restricted access health data for international junior researchers. Data from one of DGHI’s studies are also available in the Duke Research Data Repository.

“All of these components are interconnected to each other. You really need to think about what are going to be the impacts of a decision made early in the process of gathering data for this study further downstream when we’re analyzing that data and publishing findings from it.”

Reproducibility is another reason that sharing and publishing data is important to Sabrina. DGHI wants to increase data availability in accordance with FAIR principles so other researchers can independently verify, reproduce, and iterate on their work. This supports peers and contributes to the advancement of the field. Publishing data in an open repository can also increase their reach and impact. DGHI is also currently examining how to incorporate the CARE principles and other frameworks for ethical data sharing within their international collaborations.

Global collaborations in research are vital in these times. Sabrina advises that it’s important for researchers, especially Principal Investigators, to think holistically about research projects. For example, thinking about data sharing at the very beginning of the project and writing consent forms that support what they hope to do with the data. Equitable practices paired with data sharing create opportunities for greater discovery and progress in research.

 

What does it mean to be an actively antiracist developer?

The library has been committed to Diversity, Equity, and Inclusion for the past year extended, specifically through the work of DivE-In and the Anti-Racist Roadmap. And to that end, the Digital Strategies and Technology department, where I work, has also been focusing on these issues. So lately I’ve been thinking a lot about how, as a web developer, I can be actively antiracist in my work.

First, some context. As a cis-gendered white male who is gainfully employed and resides in one of the best places to live in the country, I am soaking in privilege. So take everything I have to say with that large grain of salt. My first job out of college was working at a tech startup that was founded and run by a black person. To my memory, the overall makeup of the staff was something like 40–50% BIPOC, so my introduction to the professional IT world was that it was normal to see people who were different than me. However, in subsequent jobs my coworker pool has been much less diverse and more representative of the industry in general, which is to say very white and very male, which I think is a problem. So how can an industry that lacks diversity actively work on promoting the importance of diversity? How can we push back against systematic racism and oppression when we benefit from those very systems? I don’t think there are any easy answers.

Antiracist Baby Cover
Antiracist Baby by Ibram X. Kendi

I think it’s important to recognize that for organizations driven by top-down decision making, sweeping change needs to come from above. To quote one of my favorite bedtime stories, “Point at policies as the problem, not people. There’s nothing wrong with the people!” But that doesn’t excuse ‘the people’ from doing the hard work that can lead to profound change. I believe an important first step is to acknowledge your own implicit bias (if you are able, attend Duke IT’s Implicit Bias in the Workplace Training). Confronting these issues is an uncomfortable process, but I think ultimately that’s a good thing. And at least for me, I think doing this work is an ongoing process. I don’t think my implicit biases will ever truly go away, so it’s up to me to constantly be on the lookout for them and to broaden my horizons and experiences.

So in addition to working on our internalized biases, I think we can also work on how we communicate with each other as coworkers. In a recent DST-wide meeting concerning racial equity at DUL, the group I was in talked a lot about interpersonal communication. We should recognize that we all have blind spots and patterns that we slip into, like being overly jargony, being terse and/or confrontational, and so on. We have the power to change these patterns. I think we also need to be thoughtful of the language we use and the words that we speak. We need to appreciate diversity of backgrounds and be mindful of the mental taxation of code switching. We can try to help each other feel more comfortable in own skin and feel safe expressing our thoughts and ideas. I think it’s profoundly important to meet people from a place of empathy and mutual respect. And we should not pass up the opportunities to have difficult conversations with each other. If I say something loaded with a microaggression and make a colleague feel uncomfortable or slighted, I want to be called out. I want to learn from my mistakes, and I would think that’s true for all of my coworkers.

aze-con
Axe-con is an open and inclusive digital accessibility conference

We can also incorporate anti-racist practices into the things we create. Throughout my career, I’ve tried to always promote the benefits of building accessible interfaces that follow the practices of universal design. Building things with accessibility in mind is good for everyone, not just those who make use of assistive technologies. And as an aside, axe-con 2021 was packed full of great presentations, and recording are available for free. We can take small steps like removing problematic language from our workflows (“master” branches are now “main”). But I think and hope we can do more. Some areas where I think we have an opportunity to be more proactive would be doing an assessment of our projects and tools to see to what degree (if at all) we seek out feedback and input from BIPOC staff and patrons. How can we make sure their voices are represented in what we create?

I don’t have many good answers, but I will keep listening, and learning, and growing.

An Intern’s Investigation on Decolonizing Archival Descriptions and Legacy Metadata

This post was written by Laurier Cress. Laurier Cress is a graduate student at the University of Denver studying Library Science with an emphasis on digital collections, rare books and manuscripts, and social justice in librarianship and archives. In addition to LIS topics, she is also interested in Medieval and Early Modern European History. Laurier worked as a practicum intern with the Digital Collections and Curation Services Department this winter to investigate auditing practices for decolonizing archival descriptions and metadata. Laurier will complete her masters degree in the Fall of 2021. In her spare time, she also runs a YouTube channel called, Old Dirty History, where she discusses historic events, people, and places throughout history.

Now that diversity, equity, and inclusion (DEI) are popular concerns for libraries throughout the United States, discussions on DEI are inescapable. These three words have become reoccurring buzzwords dropped in meetings, classroom lectures, class syllabi, presentations, and workshops across the LIS landscape. While in some contexts, topics in DEI are thrown around with no sincere intent or value behind them, some institutions are taking steps to give meaning to DEI in librarianship. As an African American MLIS student at the University of Denver, I can say I have listened to one too many superficial talks on why DEI is important in our field. These conversations customarily exclude any examples on what DEI work actually looks like. When Duke Libraries advertised a practicum opportunity devoted to hands on experience exploring auditing practices for legacy metadata and harmful archival descriptions, I was immediately sold. I saw this experience as an opportunity to learn what scholars in our field are actually doing to make libraries a more equitable and diverse place.

As a practicum intern in Duke Libraries’ Digital Collections and Curation Services (DCCS) department, I spent three months exploring frameworks for auditing legacy metadata against DEI values and investigating harmful language statements for the department. Part of this work also included applying what I learned to Duke’s collections. Duke’s digital collections boasts 131,169 items and 997 collections, across 1,000 years of history from all over the world. Many of the collections represent a diverse array of communities that contribute to the preservation of a variety of cultural identities. It is the responsibility of institutions with cultural heritage holdings to present, catalog, and preserve their collections in a manner that accurately and respectively portrays the communities depicted within them. However, many institutions housing cultural heritage collections use antiquated archival descriptions and legacy metadata that should be revisited to better reflect 21st century language and ideologies. It is my hope that this brief overview on decolonizing archival collections not only aids Duke, but other institutions as well.

Harmful Language Statement Investigation

During the first phase of my investigation, I conducted an analysis on harmful language statements across several educational institutions throughout the United States. This analysis served as a launchpad for investigating how Duke can improve upon their inclusive description statement for their digital collections. During my investigation, I created a list that comprises of 41 harmful language statements. Some of these institutions include:

  • The Walters Museum of Art
  • Princeton University
  • University of Denver
  • Stanford University
  • Yale University

After gathering a list of institutions with harmful language statements, the next phase of my investigation was to conduct a comparative analysis to uncover what they had in common and how they differed. For this analysis, 12 harmful language statements were selected at random from the total list. From this investigation, I created the Harmful Statement Research Log to record my findings. The research log comprises of two tabs. The first tab includes a list of harmful statements from 12 institutions, with supplemental comments and information about each statement. The second tab provides a list of 15 observations deduced from cross examining the 12 harmful language statements. Some observations made include placement, length, historical context, and Library of Congress Subject Heading (LCSH) disclaimers. It is important for me to note, while some of the information provided within the research log is based on pure observation, much of the report also includes conclusions based on personal opinions born from my own perspective as a user.

Decolonizing Archival Descriptions & Legacy Metadata

The next phase in my research was to investigate frameworks and current sentiments on decolonizing archival description and legacy metadata for Duke’s digital collections. Due to the limited amount of research on this subject, most of the information I came across was related to decolonizing collections describing Indigenous peoples in Canada and African American communities. I found that the influence of late 19th and early 20th centuries library classification systems can still be found within archival descriptions and metadata in contemporary library collections. The use of dated language within library and archival collections encourages the inequality of underrepresented groups through the promotion of discriminatory infrastructures established by these earlier classification systems. In many cases, offensive archival descriptions are sourced from donors and creators. While it is important for information institutions to preserve the historical context of records within their collections, descriptions written by creators should be contextualized to help users better understand the racial connotation surrounding the record. Issues regarding contextualizing racist ideologies from the past can be found throughout Duke’s digital collections.

During my investigation, I examined Duke’s MARC records from the collection level to locate examples of harmful language used within their descriptions. The first harmful archival description I encountered was from the Alfred Boyd Papers. The archival description describes a girl referenced within the papers as “a free mulatto girl”.  This is an example of when archival description should not shy away from the realities of racist language used during the period the collection was created in; however, context should be applied. “Mulatto” was an offensive term used during the era of slavery in the United States to refer to people of African and White European ancestry. It originates from the Spanish word “mulato”, and its literal meaning is “young mule”. While this word is used to describe the girl within the papers, it should not be used to describe the person within the archival description without historical context.Screenshot of metadata from the Alfred Boyd papers

When describing materials concerning marginalized peoples, it is important to preserve creator-sourced descriptions, while also contextualizing them. To accomplish this, there should be a defined distinction between descriptions from the creator and the institution’s archivists. Some institutions, like The Morgan Library and Museum, use quotation marks as part of their in-house archival description procedure to differentiate between language originating from collectors or dealers versus their archivists. It is important to preserve contextual information, when racism is at the core of the material being described, in order for users to better understand the collection’s historic significance. While this type of language can bring about feelings of discomfort, it is also important to not allow your desire for comfort to take precedence over conveying histories of oppression and power dynamics. Placing context over personal comfort also takes the form of describing relationships of power and acts of violence just as they are. Acts of racism, colonization, and white supremacy should be labeled as such. For example, Duke’s Stephen Duvall Doar Correspondence collection describes the act of “hiring” enslaved people during the Civil War. Slavery does not imply hired labor because hiring implies some form of compensation. Slavery can only equate to forced labor and should be described as such.

Several academic institutions have taken steps to decolonize their collections. At the beginning of my investigation, a mentor of mine referred me to the University of Alberta Library’s (UAL) Head of Metadata Strategies, Sharon Farnel. Farnel and her colleagues have done extensive work on decolonizing UAL’s holdings related to Indigenous communities. The university declared a call to action to protect the representation of Indigenous groups and to build relationships with other institutions and Indigenous communities. Although UAL’s call to action not only encompasses decolonizing their collections, for the sake of this article, I will solely focus on the framework they established to decolonize their archival descriptions.

Community Engagement is Not Optional

Farnel and her colleagues created a team called the Decolonizing Description Working Group (DDWG). Their purpose was to propose a plan of action on how descriptive metadata practices could more accurately and respectfully represent Indigenous peoples. The DDWG included a Metadata Coordinator, a Cataloguer, a Public Service Librarian, a Coordinator of Indigenous Initiatives, and a self-identified Indigenous MLIS Intern. Much of their work consisted of consulting with the community and collaborating with other institutions. When I reached out to Farnel, she was so kind and generous with sharing her experience as part of the DDWG. Farnel told me that the community engagement approach taken is dependent on the community. Marginalized peoples are not a monolith; therefore, there is no “one size fits all” solution. If you are going to consult community members, recognize the time and expertise the community provides. This relationship has to be mutually beneficial, with the community’s needs and requests at the forefront at all times.

For the DDWG, the best course of action was to start building a relationship with local Indigenous communities. Before engaging with the entire community, the team first engaged with community elders to learn how to proceed with consulting the community from a place of respect. Because the DDWG’s work took place prior to COVID-19, most meetings with the community took place in person. Farnel refers to these meetings as “knowledge gathering events”. Food and beverages were provided and a safe space for open conversation. A community elder would start the session to set the tone.

In addition to knowledge gathering events, Aboriginal and non-Aboriginal students and alumni were consulted through an informal short online survey. The survey was advertised through an informal social media posting. Once the participants confirmed the desire to partake in the survey, they received an email with a link to complete it. Participants were asked questions based on their feelings and reactions to potentially changing the Library of Congress Subject Headings (LCSH) that related to Aboriginal content.

Auditing Legacy Metadata and Archival Descriptions

There is more than one approach an institution can take to start auditing legacy metadata and descriptions. In a case study written by Dorothy Berry, who is currently the Digital Collections Program Manager at Harvard’s Houghton Library, she describes a digitization project that took place at the University of Minnesota Libraries. The purpose of the project was to not only digitize African American heritage materials within the university’s holdings, but to also explore ways mass digitization projects can help re-aggregate marginalized materials. This case study serves as an example of how collections can be audited for legacy metadata and archival descriptions during mass digitization projects. Granted, this specific project received funding to support such an undertaking and not all institutions have the amount of currency required to take on an initiative of this magnitude. However, this type of work can be done slowly over a longer period of time. Simply running a report to search for offensive terms such as “negro”, or in my case “mulatto”, is a good place to start. Be open to having discussions with staff to learn what offensive language they also have come across. Self-reflection and research are equally important. Princeton University Library’s inclusive description working group spent two years researching and gathering data on their collections before implementing any changes. Part of their auditing process also included using a XQuery script to locate harmful descriptions and recover histories that were marginalized due to lackluster description.

Creators Over Community = Problematic

While exploring Duke’s digital collections, one problem that stood out to me the most was the perpetual valorization of creators. This is often found in collections with creators who are white men. Adjectives like “renowned”, “genius’, “talented”, and “preeminent” are used to praise the creators and make the collection more about them instead of the community depicted within the collection. An example of this troublesome language can be found in Duke’s Sidney D. Gamble’s Photographs collection. This collection comprises of over 5,000 black and white photographs taken by Sidney D. Gamble during his four visits to China from 1908 to 1932. Content within the photographs encompass depictions of people, architecture, livestock, landscapes, and more. Very little emphasis is placed on the community represented within this collection. Little, if any, historical or cultural context is given to help educate users on the culture behind the collection. And the predominate language used here is English. However, there is a
full page of information on the life and exploits of Gamble.

Screenshot of a description of the Sidney Gamble digital collection.

Describing Communities

Harmful language used to describe individuals represented within digital collections can be found everywhere. This is not always intentional. Dorothy Berry’s presentation with the Sunshine State Digital Network on conscious editing serves as a great source of knowledge on problematic descriptions that can be easily overlooked. Some of Berry’s examples include:

  • Class: Examples include using descriptions such as “poor family” or “below the poverty line”.
  • Race & Ethnicity: Examples include using dehumanizing vocabulary to describe someone of a specific ethnicity or excluding describing someone of a specific race within an image.
  • Gender: Example includes referring to a woman using her husband’s full name (Mrs. John Doe) instead of her own.
  • Ability: Example includes using offensive language like “cripple” to describe disabled individuals.

This is only a handful of problematic description examples from Berry’s presentation. I highly recommend watching not only Berry’s presentation, but the entire Introduction to Conscious Editing Series.

Library of Congress Subject Headings (LCSH) Are Unavoidable

I could talk about LCSH in relation to decolonizing archival descriptions for days on end, but for the sake of wrapping up this post I won’t. In a perfect world we would stop using LCSH altogether. Unfortunately, this is impossible. Many institutions use custom made subject headings to promote their collections respectfully and appropriately. However, the problem with using custom made subject headings that are more culturally relevant and respectful is accessibility. If no one is using your custom-made subject headings when conducting a search, users and aggregators won’t find the information. This defeats the purpose of decolonizing archival collections, which is to make collections that represent marginalized communities more accessible.

What we can do is be as cognizant as possible of the LCSHs we are using and avoid harmful subject headings as much as possible. If you are uncertain if a LCSH is harmful, conduct research or consult with communities who desire to be part of your quest to remove harmful language from your collections. Let your users know why you are limited to subject headings that may be harmful and that you recognize the issue this presents to the communities you serve. Also consider collaborating with Cataloginglab.org to help design new LCSH proposals and to stay abreast on new LCSH that better reflect DEI values. There are also some alternative thesauri, like homosaurus.org and Xwi7xwa Subject Headings, that better describe underrepresented communities.

Resources

In support of Duke Libraries’ intent to decolonize their digital collections, I created a Google Drive folder that includes all the fantastic resources I included in my research on this subject. Some of these resources include metadata auditing practices from other institutions, recommendations on how to include communities in archival description, and frameworks for decolonizing their descriptions.

While this short overview provides a wealth of information gathered from many scholars, associations, and institutions who have worked hard to make libraries a better place for all people, I encourage anyone reading this to continue reading literature on this topic. This overview does not come close to covering half of what invested scholars and institutions have contributed to this work. I do hope it encourages librarians, catalogers, and metadata architects to take a closer look at their collections.

FFV1: The Gains of Lossless

One of the greatest challenges to digitizing analog moving-image sources such as videotape and film reels isn’t the actual digitization. It’s the enormous file sizes that result, and the high costs associated with storing and maintaining those files for long-term preservation. For many years, Duke Libraries has generated 10-bit uncompressed preservation master files when digitizing our vast inventory of analog videotapes.

Unfortunately, one hour of uncompressed video can produce a 100 gigabyte file. That’s at least 50 times larger than an audio preservation file of the same duration, and about 1000 times larger than most still image preservation files. That’s a lot of data, and as we digitize more and more moving-image material over time, the long-term storage costs for these files can grow exponentially.

To help offset this challenge, Duke Libraries has recently implemented the FFV1 video codec as its primary format for moving image preservation. FFV1 was first created as part of the open-source FFmpeg software project, and has been developed, updated and improved by various contributors in the Association of Moving Image Archivists (AMIA) community.

FFV1 enables lossless compression of moving-image content. Just like uncompressed video, FFV1 delivers the highest possible image resolution, color quality and sharpness, while avoiding the motion compensation and compression artifacts that can occur with “lossy” compression. Yet, FFV1 produces a file that is, on average, 1/3 the size of its uncompressed counterpart.

sleeping bag
FFV1 produces a file that is, on average, 1/3 the size of its uncompressed counterpart. Yet, the audio & video content is identical, thanks to lossless compression.

The algorithms used in lossless compression are complex, but if you’ve ever prepared for a fall backpacking trip, and tightly rolled your fluffy goose-down sleeping bag into one of those nifty little stuff-sacks, essentially squeezing all the air out of it, you just employed (a simplified version of) lossless compression. After you set up your tent, and unpack your sleeping bag, it decompresses, and the sleeping bag is now physically identical to the way it was before you packed.

Yet, during the trek to the campsite, it took up a lot less room in your backpack, just like FFV1 files take up a lot less room in our digital repository. Like that sleeping bag, FFV1 lossless compression ensures that the compressed video file is mathematically identical to it’s pre-compressed state. No data is “lost” or irreversibly altered in the process.

Duke Libraries’ Digital Production Center utilizes a pair of 6-foot-tall video racks, which house a current total of eight videotape decks, comprised of a variety of obsolete formats such as U-matic (NTSC), U-matic (PAL), Betacam, DigiBeta, VHS (NTSC) and VHS (PAL, Secam). Each deck is converted from analog to digital (SDI) using Blackmagic Design Mini Converters.

The SDI signals are sent to a Blackmagic Design Smart Videohub, which is the central routing center for the entire system. Audio mixers and video transcoders allow the Digitization Specialist to tweak the analog signals so the waveform, vectorscope and decibel levels meet broadcast standards and the digitized video is faithful to its analog source. The output is then routed to one of two Retina 5K iMacs via Blackmagic UltraStudio devices, which convert the SDI signal to Thunderbolt 3.

FFV1 video digitization in progress in the Digital Production Center.

Because no major company (Apple, Microsoft, Adobe, Blackmagic, etc.) has yet adopted the FFV1 codec, multiple foundational layers of mostly open-source systems software had to be installed, tested and tweaked on our iMacs to make FFV1 work: Apple’s Xcode, Homebrew, AMIA’s vrecord, FFmpeg, Hex Fiend, AMIA’s ffmprovisr, GitHub Desktop, MediaInfo, and QCTools.

FFV1 operates via terminal command line prompts, so some understanding of programming language is helpful to enter the correct prompts, and be able to decipher the terminal logs.

The FFV1 files are “wrapped” in the open source Matroska (.mkv) media container. Our FFV1 scripts employ several degrees of quality-control checks, input logs and checksums, which ensure file integrity. The files can then be viewed using VLC media player, for Mac and Windows. Finally, we make an H.264 (.mp4) access derivative from the FFV1 preservation master, which can be sent to patrons, or published via Duke’s Digital Collections Repository.

An added bonus is that, not only can Duke Libraries digitize analog videotapes and film reels in FFV1, we can also utilize the codec (via scripting) to target a large batch of uncompressed video files (that were digitized from analog sources years ago) and make much smaller FFV1 copies, that are mathematically lossless. The script runs checksums on both the original uncompressed video file, and its new FFV1 counterpart, and verifies the content inside each container is identical.

Now, a digital collection of uncompressed masters that took up 9 terabytes can be deleted, and the newly-generated batch of FFV1 files, which only takes up 3 terabytes, are the new preservation masters for that collection. But no data has been lost, and the content is identical. Just like that goose-down sleeping bag, this helps the Duke University budget managers sleep better at night.

We’re hiring!

The Digital Production Center (DPC) is looking to hire a Digitization Specialist to join our team! The DPC team is on the forefront of enabling students, teachers, and researchers to continue their research by digitizing materials from our library collections.  We get to work with a variety of unique and rare materials (in a multitude of formats), and we use professional equipment to get the work done. Imagine working on digitizing papyri and comic books – the spectrum is far and wide! Get a glimpse of the collections that have been digitized by DPC staff by checking out our Duke Digital Collections.

Also, the people are really nice (and right now, we’re working in a socially distanced manner)!

More information about the job description can be found here. The successful candidate should be detailed-oriented, possess excellent organizational, project management skills, have scanning experience, and be able to work independently and effectively in a team environment. This position is part of the Digital Collections and Curation Services department and will report to the Digital Production Services manager.

More information about Duke’s benefit package can be found at https://hr.duke.edu/benefits. For more information and to apply, please submit an electronic resume, cover letter, and a list of 3 references to https://library.duke.edu/about/jobs/digitizationspecialist. Review of applications will begin immediately and will continue until the position is filled.

Seats in the time of COVID: Improving new services with user feedback

In fall 2020, the Libraries quickly developed several new COVID-safe services as we reopened our facilities to students and faculty in the midst of the pandemic. Two such services were Library Takeout, which allows Duke affiliates to pick up reserved books with minimal contact, and an online reservation system for seats and equipment in library study spaces. 

Libraries staff spent significant time over the summer of 2020 developing these new services. Once they were put in operation in the fall of 2020, Assessment & User Experience staff knew we needed to gather feedback from users and analyze data to better understand how the services were working and what could be improved. We developed brief, anonymous feedback surveys to be sent during two-week periods to each person who reserved equipment or a study seat or made an appointment to pick up books. 

 

What did we learn?

The vast majority of the 111 patrons who responded to the Library Takeout survey were extremely satisfied with both wait time and safety precautions, as shown in the figure below.

Satisfaction levels by library location with Library Takeout service

Patrons were also asked what worked well about the process, what did not work well, and whether they had any additional comments or suggestions. There were 69 comments about things that worked well. The most prevalent themes in these compliments were clear instructions, very short wait times, friendly security and staff, access to parking, and adequate safety precautions.

The directions were clear, the parking pass for the Upper Allen lot made arriving on campus for pick up easy, the security staff were helpful and efficient, and the library staff was cheerful and helpful as I’ve come to expect.

Very rigorous about precautions. Keep it that way.

There were 34 comments about things that did not work well, many of which also make suggestions for improvements. For example: 

  • There was interest in the Libraries offering weekend hours for materials pick-up
  • Several students found the check-in requirements at the library entrance confusing 
  • There were complaints about having to make appointments at all to pick up materials 
  • Several students reported issues with their parking passes not opening the gates
  • Interest in having the confirmation email for a scheduled pick-up be sent earlier
  • Several felt that the security presence at the doors was uncomfortable

The survey for seat and equipment reservations received 114 responses in the two-week period in which this survey was distributed at the beginning of the fall semester. Users were asked how easy five activities were: using the online system to book, checking in, finding the seat/equipment, using it, and cleaning up/checking out. An overwhelming percent of users found it “extremely easy” to use their seat/equipment (89%). In general, close to two-thirds of users found each of the other activities “extremely easy.” When “Somewhat easy” and “extremely easy” responses are combined, between 85-97% of respondents found each activity easy. The activity with the lowest “easy” score (85%) was “cleaning up and checking out after your reservation.” 

Seat/equipment reservations: how easy or difficult were the following activities?

When asked what worked well about reserving and using a seat or equipment, many praised the booking website for its clarity, simplicity, and ease of use, and also praised the entire process. Students were happy to be assured of a seat when they came to the library, and many commented on how clean, quiet, and nicely socially distanced the library was. Compliments were offered for the signage as well as for the security staff’s assistance in finding seats.

It was easy from start to finish. The security guard at the front was very helpful in explaining how to find my seat.

Was happy to see cleaning supplies to wipe down the desk area. felt safe. good social distancing precautions!

When asked what did not work well about reserving and using a study seat or equipment, reported issues included the following: 

  • Some respondents hadn’t realized they were supposed to check out online or clean their seating area when they were finished. The Libraries should add visuals next to the seats instructing on these procedures. 
  • When reserving, patrons can’t tell which seats are close to electrical outlets or windows. They requested a floorplan, map, or photos of the spaces so they can see where the seats are in relation to other things. 
  • Multiple people asked for the ability to easily extend one’s study time in the same seat if no one had booked it after them by the time their session was up. 
  • For the website, several people complained about the inability to edit reservation times without canceling and rebooking the whole thing, and a few other clunky visual things about the tool used for reservations. 
  • Several people requested weekend hours for the service. 

 

Changes we were able to make based on feedback

By gathering student feedback when we first began offering these services, we were able to quickly make changes so that the services better met our users’ needs. Below is a list of some of the key changes we made in response to survey feedback. 

  1. Revised and expanded opening hours in both Lilly and Perkins & Bostock Libraries in response to student requests and an analysis of usage patterns based on reservation system data.
  2. Removed the “check in” requirement for study seat users early in the fall semester, once we realized this was posing problems
  3. Added floorplans, images, and descriptions to the study seat reservation system so that users can get more info as they book study seats (here’s an example)
  4. Added more physical signage in the buildings to help students find their seats
  5. Developed a guide to study seats, including pictures, descriptions, and amenities of seats in Lilly, Music, Perkins, Bostock, and Rubenstein 
  6. Added online information so that students can easily see which seats do not have access to electrical outlets when deciding which seat to reserve (see this example)
  7. Added an Interview Room for students to book for 90-minute periods. Students can use this space to participate in virtual interviews.
  8. Added information about parking, elevator access, and daily reservation limits to the Reserve a Seat webpage and Reservation system.
  9. Increased outreach and marketing about reservable Study Seats through email blasts, social media, and blog posts. Library Takeout got plenty of buzz through this catchy video that went viral this past fall (870,000 views and counting)!

A Preview of MorphoSource 2 Beta

It’s an exciting time for the MorphoSource team, as we work to launch the MorphoSource 2 Beta application next Wednesday!

The new application improves and expands upon the original MorphoSource, a repository for 3D research data, and is being built using Hyrax, an open-source digital repository application widely implemented by libraries to manage digital repositories and collections. The team has been working on the site for the last two and a half years, and is looking forward to our efforts being made available to the MorphoSource community. At launch, users will be able to access records for over 140,000 media files, contributed by 1,500 researchers from all over the world.

MorphoSource 2 Beta Homepage
MorphoSource 2 Beta Homepage

While the current site is still available for browsing at www.morphosource.org, we are migrating the repository data over to the new site in preparation for the launch, and have paused the ingest of new data sets. When the migration is complete, users will be able to access the new application at the current url. Users with an account on the old site will be able to log in to the new site using their MorphoSource 1 credentials.

In my last post in June, I described some of the features that were in development at that time. In this post, I’ll highlight a few recent additions with screenshots from the beta site: Browse, Search, and User Dashboards.

Browse

Browse pages have been added as a quick entry point for users to discover data in several different ways. Users can use these pages to immediately access media, biological specimens, cultural heritage objects, organizations, teams, or projects.

MorphoSource Browse Categories
MorphoSource 2 Beta Browse Categories

Media Types and Modalities: Users can view all media records of a specific file type, such as image, CT image series, or mesh or point cloud. There are also links to records created by different methods, such as X-Ray, Magnetic Resource Imaging, or Photogrammetry.

Physical Object Types: Links to view either all the Biological Specimens or Cultural Heritage Objects in MorphoSource

Biological Taxonomy:  Users can find specimen records through the taxonomy browse by drilling down through the taxonomic ranks. The MorphoSource taxonomy records have been imported from the GBIF Backbone Taxonomy or have been created by MorphoSource users.

Taxonomy browse page
Taxonomy Browse Page

Projects: Projects are user-created groupings of media and specimens. From the browse page, projects can be searched by title and sorted by title, description, team, creator, or number of associated media or objects.

MorphoSource 2 Project Browse
Project Browse Page

Teams: Teams are groups of MorphoSource users that share management of media and team projects. A Team may be associated with an organization. The Team browse page lets users search and sort teams in a similar way to the Projects browse page.

Organizations: Lastly, users can view all of the organizations that have biological specimens or cultural heritage objects in MorphoSource. An organization may be an institution, department, collection, facility, lab, or other group. From the Organizations browse page, users can search by name and sort by parent institution name or institution code.

Faceted Searching

In addition to the browse pages, records for Media, Biological Specimens, Cultural Heritage Objects, Organizations, Teams, and Projects can also be found through the MorphoSource search interface. Searching has been customized for the different record types to include relevant facets. The different search categories can be chosen from the dropdown next to the search box ‘Go’ button.

MorphoSource 2 Beta Media Search Results
Media Search Results

Search results for media records can be faceted by file type, modality, object type (biological specimen or cultural heritage object), organization, tag, or membership in a team or project, while search results for objects can be limited by object type, creator, organization, taxonomy, associated media types, associated media tags, and membership of associated media in a team or project. Organization and Team/Project searches similarly have their own sets of facets.

MorphoSource 2 Object Search
Biological Specimen and Cultural Heritage Object Search Results

User Dashboards

Users who register an account on the site will have access to a dashboard that enables them to manage their data downloads. The dashboard is accessed by clicking on the profile icon at the top right of the site, and will open to the user’s media cart. The media cart contains two sections – the top holds all media items that the user currently has permission to download, while the bottom has media items with a restricted status where download has not been requested or approved:

MorphoSource 2 Beta Media Cart
Default User Dashboard

Users who have been granted contributor access to the site will have a dashboard that opens to the media and objects that they have contributed:

MorphoSource 2 Beta Contributor Dashboard
Contributor Dashboard

From the menu at the left,  all users can access their previous downloads, or projects, teams, or other repository content to which they have been granted access, and manage their user profile.  In addition, contributors can also create and manage projects and teams.

We hope that the browse, search, and dashboard enhancements, along with the other features we have been working on over the last couple of years, will enable users to easily discover and manage data sets in MorphoSource. And although we are looking forward to the launch, we are also excited to continue working on the site, and will be adding even more features in the near future.

2020 Highlights from Digital Collections

Welcome to the 2020 digital collections round up!

In spite of the dumpster fire of 2020, Duke Digital Collections had a productive and action packed year (maybe too action packed at times). 

Per usual we launched new and added content to existing digital collections (full list below). We are also wrapping up our mega-migration from our old digital collections system (Tripod2) to the Duke Digital Repository! This migration has been in process for 5 years, yes 5 years. We plan to celebrate this exciting milestone more in January so stay tuned. 

A classroom and auditorium blueprint, digitized for a patron and launched this month.

The Digital Production Center, in collaboration with the Rubenstein Library, shifted to a new folder level workflow for patron and instruction requests. This workflow was introduced just in time for the pandemic and the resulting unprecedented number of digitization requests.  As a result of the demand for digital images, all project work has been put aside and the DPC is focusing on patron and instruction requests only. Since late June, the DPC has produced over 40,000 images!  

Another digital collections highlight from 2020 is the development of new features for our preservation and access interface, the Duke Digital Repository.  We have wasted no time using these new features especially “metadata only”  and the DDR to CONTENTdm connection

Looking ahead to 2021, our priorities will be the folder level digitization workflow for researcher and instruction requests. The DPC received 200+ requests since June, and we need to get all those digitized folders moved into the repository. We are also experimenting with preserving scans created outside of the DPC. For example Rubenstein Library staff created a huge number of access copies using reading room scanners, and we would like to make them available to others.  Lastly, we have a few bigger digital collections to ingest and launch as well. 

Thanks to everyone associated with Digital Collections for their incredible work this year!!  Whew, it has been…a year. 

One of our newest digital collections features postcards from Greece: Salonica / Selanik / Thessaloniki
One of the Radio Haiti photographs launched recently.

Laundry list of 2020 Digital Collections

New Collections

Digital Collections Additions

Migrated Collections

Notes from the Duke University Libraries Digital Projects Team