Category Archives: Behind the Scenes

What I Did on My Summer Vacation

At the beginning of the school year in elementary school, we were usually given the assignment to write about our summer. I dreaded the assignment. Summers were spent running around from dawn to dusk, maybe a road trip packed into the family car to see relatives, nothing worth writing about. I now understand that it was a great way for teachers to get to know their students; a chance to visit the world through the students’ eyes. I am on my last day of ten days at Duke Kunshan University in Kunshan, China. I was tasked with helping their library as they grow, and this is DKU through my eyes.

The Campus
Photos of Duke Kunshan University
Phase one building of Duke Kunshan University.

Duke Kunshan University is located in Kunshan City, Jiangsu Province. The Province is home to many ancient water towns with buildings lining ancient canals, sidewalks for pedestrians, and shops on the first floor of most buildings. DKU pays homage to the area’s water towns with a pond in the center of campus, small fountains and reflecting pools in front of the academic building. The conference center, the academic building and faculty residences surround the pond. The academic building is home to the canteen, the library, team rooms, and classroom auditoriums. There is a building called the Innovation Center under construction that will house faculty offices and classrooms, with two more phases of buildings and a Duke Gardens area planned. In the center of the pond is a pavilion with arches in tribute to the architecture at Duke.

 

The Library
Photo of Duke Kunshan Library
Duke Kunshan Library

The library has seven full-time staff and two interns. All staff have Master’s degrees, and the interns are studying for their Masters in Library Science. The staff are from China and Australia and need to wear multiple hats to keep the library running smoothly.

We worked on setting up loan policies and discussed their need to load patrons into our integrated library system. DKU Library is expanding they types of items they’re loaning and expanding borrowing privileges to family of DKU faculty, staff and students as well as DKU alumni and visitors. Extending privileges to DKU family is very important, as DKU as a whole wants to feel like a strong community to everyone who has a link to the University. We started work so they could use the acquisitions module to track budgets and orders, and we solved some technical issues, allowing the staff to send loan notices to patrons and to print spine labels for books.

 

The Area

It wasn’t all work and no play. I visited two local water towns, the cities of Suzhou and Shanghai, and experienced the historic culture of the Kun Opera. I toured an ancient private garden, ate delicious food, shopped in Shanghai, and rode the bullet train which traveled between Kunshan and Shanghai at a speed of 268 km/h. I was honored with a special traditional dinner with the staff that included regional specialties like hairy crab soup, tender loofah and jellyfish. The DKU staff have been welcoming, friendly, and generous. I’m sad to leave and hope I get the opportunity to return. In the meantime, we’ve forged new friendships, new working relationships, and made lasting memories. It was the best summer vacation ever!

Photo of travels in China
Water Towns, Kun Opera, Shanghai Skyline
Food Picture
Food from the trip

Multispectral Imaging Summer Snapshots

If you are a regular Bitstreams reader, you know we just love talking about Multispectral Imaging.  Seriously, we can go on and on about it, and we are not the only ones.   This week however we are keeping it short and sweet and sharing a couple before and after images from one of our most recent imaging sessions.

Below are two stacked images of Ashkar MS 16 (from the Rubenstein Library).  The top half of each image is the manuscript under natural light, and the bottom are the results of Multispectral imaging and processing.  We tend to post black and white MSI images most often as they are generally the most legible, however our MSI software can produce a lot of wild color variations!  The orange one below seemed the most appropriate for a hot NC July afternoon like today.  More processing details are included in the image captions below – enjoy!

The text of this manuscript above was revealed primarily with the IR narrowband light at 780 nm.
This image was created using Teem, a tool used to process and visualize scientific raster data. This specific image is the result of flatfielding each wavelength image and arranging them in wavelength order to produce a vector for each pixel. The infinity norm is computed for each vector to produce a scalar value for each pixel which is then histogram-equalized and assigned a color by a color-mapping function.

Open, Flip, Scan, Close: Observations from The Duke Chronicle Collection Project

Beginning Launch in….

Exciting news from Digital Collections! The 1990’s decade of The Duke Chronicle is being prepped for completion. It has been nine months since I started scanning The Chronicle, and I have come across some interesting stories and images. Despite the fact that I can’t digest the 1990’s being twenty years ago, flipping through the pages brought back some good memories of those days. They also brought some perspective of events I was too young, and too focused on the new trendiest toy, to recall.

It all falls down

As I’m sure some of you remember, in the 1990’s, the world saw the slow destruction of the massive empire that was the Soviet Union. I was much too young to remember the monumental days of the fall of the Berlin Wall and the gradual independence of the Eastern European nations, but the students at Duke were old enough to witness and digest it. Apparently, there was such an interest in the topic that course enrollments skyrocketed in some areas. Since the situation was so new at the time, professors did not have any readings to assign, and previous course materials were made obsolete! I could see myself being one of the many students signing up for these courses.

     

Barbecue or peace of mind

Another random yet interesting article I found involved hog farms in North Carolina. Allegedly, the smell was so bad and spread so wide that neighbors were experiencing mood changes. A medical psychology professor completed an odor study, and found people were more depressed, angry and tired compared to people who didn’t live near hog farms. It became enough of an issue for local residents to file a lawsuit against the nearby hog farms. Although I have never lived near a hog farm, if I had to smell feces, urine and hog feed every time I came home, I don’t think I would be a happy camper either.

    

We have come so far

This particular article hit close to home. The University Archives were worried about navigating the preservation of important emails and other electronic documents. They discussed printing emails back in 1999, but we have now moved on to preserving electronic records in their original form. There are even courses dedicated to the subject in the archival field. It’s funny reading this article after scanning it for the very same purpose. Preservation.

    

Back in the day

Some more goodies I noticed while scanning this project.

Did anyone have any of these state of the art electronics?

Ohh, so this is how you found out what classes were available.

In the meantime 

I know the students, faculty and staff of the ’90s will probably get a kick out of viewing these old newspaper issues, but I’m sure everyone else will enjoy reading through The Chronicle too. While you wait for the 1990’s to be made available publicly, take a look  at the current digitized Chronicle collection.

 

 

 

Sustaining Open

On learning that this year’s conference on Open Repositories would be held in Bozeman, Montana, I was initially perplexed. What an odd, out-of-the-way corner of the world in which to hold an international conference on the work of institutional digital repositories. After touching down in Montana, however, it quickly became apparent how appropriate the setting would be to this year’s conference—a geographic metaphor for the conference theme of openness and sustainability. I grew up out west, but coastal California has nothing on the incomprehensibly vast and panoramic expanse of western Montana. I was fortunate enough to pass a few days driving around the state before the conference began, culminating in a long afternoon spent at Yellowstone National Park. As we wrapped up our hike that afternoon by navigating the crowds and the boardwalks hovering over the terraces of the Mammoth Hot Springs, I wondered about the toll our presence took on the park, what responsible consumption of the landscape looks like, and how we might best preserve the park’s beauty for the future.

Beaver Pond Loop Trail, Yellowstone National Park

Tuesday’s opening remarks from Kenning Arlitsch, conference host Montana State University’s Dean of Libraries, reflected these concerns, pivoting from a few words on what “open” means for library and information professionals to a lengthier consideration of the impact of “openness” on the uniqueness and precarity of the greater Yellowstone eco-system. Dr. Arlitsch noted that “[w]e can always create more digital space, but we cannot create more of these wild spaces.” While I agree unreservedly with the latter part of his statement, as the conference progressed, I found myself re-evaluating the whole of that assertion. Although it’s true that we may be able to create more digital space with some ease (particularly as the strict monetary cost of digital storage becomes more manageable), it’s what we do with this space that is meaningful for the future. One of my chief takeaways from my time in Montana was that responsibly stewarding our digital commons and sustaining open knowledge for the long term is hard, complicated work. As the volume of ever more complex digital assets accelerates, finding ways responsibly ensure access now and for the future is increasingly difficult.


“Research and Cultural Heritage communities have embraced the idea of Open; open communities, open source software, open data, scholarly communications, and open access publications and collections. These projects and communities require different modes of thinking and resourcing than purchasing vended products. While open may be the way forward, mitigating fatigue, finding sustainable funding, and building flexible digital repository platforms is something most of us are striving for.”


Many of the sessions I attended took the curation of research data in institutional repositories as their focus; in particular, a Monday workshop on “Engaging Liaison Librarians in the Data Deposit Workflow: Starting the Conversation” highlighted that research data curation is taking place through a wide array of variously resourced and staffed workflows across institutions. A good number of institutions do not have their own local repository for data, and even those larger organizations with broad data curation expertise and robust curatorial workflows (like Carnegie Mellon University, representatives from which led the workshop) may outsource their data publishing infrastructure to applications like Figshare, rather than build a local solution. Curatorial tasks tended to mean different things in different organizational contexts, and workflows varied according to staffing capacity. Our workshop breakout group spent some time debating the question of whether institutional repositories should even be in the business of research data curation, given the demanding nature of the work and the disparity in available resources among research organizations. It’s a tough question without any easy answers; while there are some good reasons for institutions to engage in this kind of work where they are able (maintaining local ownership of open data, institutional branding for researchers), it’s hard to escape the conclusion that many IRs are under-equipped from the standpoint of staff or infrastructure to sustainably process the on-coming wave of large-scale research data.

Mammoth Hot Springs, Yellowstone National Park

Elsewhere, from a technical perspective, presentations chiefly seemed to emphasize modularity, microservices, and avoiding reinventing the wheel. Going forward, it seems as though community development and shared solutions to problems held in common will be integral strategies to sustainably preserving our institutional research output and digital cultural heritage. The challenge resides in equitably distributing this work and in providing appropriate infrastructure to support maintenance and governance of the systems preserving and providing access to our data.

Charm City Sounds

Last week I had the opportunity to attend the 52nd Association for Recorded Sound Collections Annual Conference in Baltimore, MD.  From the ARSC website:

Founded in 1966, the Association for Recorded Sound Collections, Inc. is a nonprofit organization dedicated to the preservation and study of sound recordings—in all genres of music and speech, in all formats, and from all periods.

ARSC is unique in bringing together private individuals and institutional professionals. Archivists, librarians, and curators representing many of the world’s leading audiovisual repositories participate in ARSC alongside record collectors, record dealers, researchers, historians, discographers, musicians, engineers, producers, reviewers, and broadcasters.

ARSC’s vitality springs from more than 1000 knowledgeable, passionate, helpful members who really care about sound recordings.

ARSC Annual Conferences encourage open sharing of knowledge through informative presentations, workshops, and panel discussions. Tours, receptions, and special local events heighten the camaraderie that makes ARSC conferences lively and enjoyable.

This quote highlights several of the things that have made ARSC resources valuable and educational to me as the Audio Production Specialist at Duke Libraries:

  • The group’s membership includes both professionals and enthusiasts from a variety of backgrounds and types of institutions.
  • Members’ interests and specialties span a broad array of musical genres, media types, and time periods.
  • The organization serves as a repository of knowledge on obscure and obsolete sound recording media and technology.

This year’s conference offered a number of presentations that were directly relevant to our work here in Digital Collections and Curation Services, highlighting audio collections that have been digitized and the challenges encountered along the way.  Here’s a quick recap of some that stood out to me:

  • “Uncovering the Indian Neck Folk Festival Collection” by Maya Lerman (Folklife Center, Library of Congress).  This presentation showcased a collection of recordings and related documentation from a small invitation-only folk festival that ran from 1961-2014 and included early performances from Reverend Gary Davis, Dave Van Ronk, and Bob Dylan.  It touched on some of the difficulties in archiving optical and born-digital media (lack of metadata, deterioration of CD-Rs) as well as the benefits of educating prospective donors on best practices for media and documentation.
  • “A Garage in South Philly: The Vernacular Music Research Archive of Thornton Hagert” by David Sager and Anne Stanfield-Hagert.  This presentation paid tribute to the massive jazz archive of the late Mr. Hagert, comprising over 125,000 items of printed music, 75,000 recordings, 5,500 books, and 2,000 periodicals.  It spoke to the difficulties of selling or donating a private collection of this magnitude without splitting it up and undoing the careful, but idiosyncratic organizational structure as envisioned by the collector.
  • “Freedom is a Constant Struggle: The Golden State Mutual Sound Recordings” by Kelly Besser, Yasmin Dessem and Shanni Miller (UCLA Library).  This presentation covered the audio material from the archive of an African American-owned insurance company founded in 1925 in the Bay Area.  While audio was only a small part of this larger collection, the speakers demonstrated how it added additional context and depth to photographs, video, and written documents.  They also showed how this kind of archival audio can be an important tool in telling the stories of previously suppressed or unheard voices.
  • “Sounds, Sights and Sites of Activism in ’68” by Guha Shankar (Library of Congress).  This presentation examined a collection of recordings from “Resurrection City” in Washington, DC.  This was an encampment that was part of the Poor People’s Campaign, a demonstration for human rights organized by Martin Luther King, Jr. prior to his assassination in 1968.  The talk showed how these archival documents are being accessed and used to inform new forms of social and political activism and wider circulation via podcasts, websites, public lecture and exhibitions.

The ARSC Conference also touched on my personal interests in American traditional and vernacular music, especially folk and blues from the early 20th Century.  Presentations on the bluegrass scene in Baltimore, blues guitarist Johnny Shines, education outreach by the creators of PBS’s “American Epic” documentaries, and Hickory, NC’s own Blue Sky Boys provided a welcome break from favorite archivist topics such as metadata, workflows, and quality control.  Other fun parts of the conference included an impromptu jam session, a silent auction of books & records, and posters documenting the musical history of Baltimore.  True to the city’s nickname, I was charmed by my time in Baltimore and inspired by the amazingly diverse and dedicated work towards collecting and preserving our audio heritage by the ARSC community.

 

 

To Four Years and Beyond

It is graduation week here at Duke and everyone is scattering about like pollen in the air. There are large tents popping up, students taking pictures in gowns, and people taking long walks across campus. These students, like the groups before them, are embarking on new territory.

They are setting out into the world as adults preparing for the rest of their lives. For four years, they have been studying, partying and sleeping their way through life as pseudo grown ups, but now they have reached an unfamiliar page in their lives. They are being faced with societal expectations, financial obligations, and a world that is still in progress. How will this fresh batch of individuals fit into our ever changing society? I’m sure people have been asking this question for decades, but in asking this question I managed to find some digital collections featuring people who contributed to society in various ways.


Judy Richardson took part in the Civil Rights Movement through the Student Nonviolent Coordinating Committee.


Deena Stryker went to Cuba in order to document the Cuban Revolution.


Rabbi Marshall T. Meyer went to Argentina during the Dirty War.


H. Lee Waters travelled through the South to film and showcase the daily lives of Southerners.


All of these individuals went out into the world and gave something to it. For the past four years, our country has witnessed copious changes. We have seen serious adjustments in political climate, social activism, and technology. It will be interesting to see where the 2018 Duke graduates will go and what they will do in their open future.

The Backbone of the Library, the Library Catalog

Did you ever stop to think about how the materials you find in the Library’s catalog search get there?  Did you know the Duke Libraries have three staff members dedicated to making sure Duke’s library catalog is working so faculty and students can do their research? The library catalog is the backbone of the library and I hope by the end of this post you will have a new appreciation for some of the people who support this backbone of the library and what is involved to do that.

Diagram of functions of a library catalog
Functions of a library catalog

Discovery Services is charged with supporting the integrated library system (ILS), aka “the catalog”. What is an “integrated library system”?  According to Wikipedia, “an ILS (is used) to order and acquire, receive and invoice, catalog, circulate, track and shelve materials.” Our software is used by every staff person in all the Duke Libraries, including the professional school libraries, the Goodson Law Library, the Ford Library at the Fuqua School of Business, and the Medical Center Library, and the Duke Kunshan University Library. At Duke, we have been using Ex Libris’s Aleph as our ILS since 2004.

Discovery Services staff work with staff in Technical Services who do the acquiring, receiving and invoicing and cataloging of materials. Our support for that department includes setting up vendors who send orders and bibliographic records via the EDIFACT format or the MARC format. Some of our catalogers do original cataloging where they describe the book in the MARC format, and a great many of our records are copy cataloged from OCLC. Our ILS needs to be able to load these records, regardless of format, into our relational database.

We work with staff in Access and Delivery Services/Circulation in all the libraries to set up loan policies so that patrons may borrow the materials in our database. All loan policies are based on the patron type checking out the item, the library that owns the item and the item’s type. We currently have 59 item types for everything from books, to short-term loans, sound CD’s, and even 3D scanners! There are 37 patron types ranging from faculty, grad students, staff, undergrads, alumni and even retired library employees. And we support a total of 12 libraries. Combine all of those patrons, items and libraries, and there are a lot of rules! We edit policies for who may request an item and where they can choose to pick it up, when fines are applied and when overdue and lost notices are sent to patrons. We also load the current course lists and enrollment so students and faculty can use the materials in Course Reserves.

Diagram of ILS Connections
ILS Connections

The ILS is connected with other systems. There was a recent post here on Bitstreams about the work of the Discovery Strategy Team. Our ILS, Aleph, is represented in both the whiteboard photo and the Lucidchart photo. One example of an integration point the Library’s discovery interface. We also connect to the software that is used at the Library Service Center (GFA). When an item is requested from that location, the request is sent from the ILS to the software at the Library Service Center so they can pull and deliver the item. The ILS is also integrated with software outside of the library’s support including the Bursar’s Office, the University’s Identity Management system, and the University’s accounting system.

 

 

We also export our data for projects in which the library is involved, such as HathiTrust, Ivy Plus, TRLN Discovery (coming soon!), and SHARE-VDE. These shared collection projects often require extra work from Discovery Services to make sure the data the project wants is included in our export.

Discovery Services spent the fall semester working on upgrading Aleph. We worked with our OIT partners to create new virtual servers, install the Aleph software and upgrade our current data to the new version. There were many configuration changes, and we needed to test all of our custom programs to be sure they worked with the new version. We have been using the Aleph software for a more than a decade, and while we’ve upgraded the software over the years, libraries have continued to change.

folio logo

We are currently preparing a project to migrate to a new ILS and library services platform, FOLIO. That means moving our eight million bibliographic records and associated information, our two million patron records, hundreds of thousands orders, items, e-resources into the new data format FOLIO will require. We will build new servers, install the software, review and/or recreate all of our custom programs that we currently use. We will integrate FOLIO with all the applications the library uses, as well as applications across campus. It will be a multi-year project that will take thousands of hours of staff time to complete. The Discovery Services staff is involved in some of the FOLIO special interest groups working with people across the world who are working together to develop FOLIO.

We work hard to make it easy for our patrons to find library material, request it or borrow it. The next time you check out a book from the library, take a moment to think about all the work that was required behind the scenes to make that book available to you.

Mapping Duke University Libraries’ Discovery System Environment

Just over one year ago, Duke University Library’s Web Experience team charged a new subgroup – the Discovery Strategy Team – with “providing cohesion for the Libraries’ discovery environment and facilitate discussion and activity across the units responsible for the various systems and policies that support discovery for DUL users”. Jacquie Samples, head of the Metadata and Discovery Strategy Department in our Technical Services Unit, and I teamed up to co-chair the group, and we were excited to take on this critical work along with 8 of our colleagues from across the libraries.

Our first task was one that had long been recognized as a need by many people throughout the library – to create an up-to-date visualization of the systems that underpin DUL’s discovery environment, including the data sources, data flows, connections, and technical/functional ownership for each of these systems. Our goal was not to depict an ideal discovery landscape but rather to depict things as they are now (ideal could come later).

Before we could create a visualization of these systems and how they interacted, however, we realized we needed to identify what they were! This part of the process involved creating a giant laundry list of all of systems in the form of a google spreadsheet, so we could work on it collaboratively and iteratively. This spreadsheet became the foundation of the document we eventually produced, containing contextual information about the systems including:

  • Name(s) of the system
  • Description/Notes
  • Host
  • Path
  • Links to documentation
  • Technical & functional owners

Once we had our list of systems to work from, we began the process of visualizing how they work here at DUL. Each meeting of the team involved doing a lot of drawing on the whiteboard as we hashed out how a given system works – how staff & other systems interact with it, whether processes are automated or not, frequency of those processes, among other attributes. At the end of these meetings we would have a messy whiteboard drawing like this one:

We were very lucky to have the talented (and patient!) developer and designer Michael Daul on the team for this project, and his role was to take our whiteboard drawings and turn them into beautiful, legible visualizations using Lucidchart:

Once we had created visualizations that represented all of the systems in our spreadsheet, and shared them with stakeholders for feedback, we (ahem, Michael) compiled them into an interactive PDF using Adobe InDesign. We originally had high hopes of creating a super cool interactive and zoomable website where you could move in and out to create dynamic views of the visualizations, but ultimately realized this wouldn’t be easily updatable or sustainable. So, PDF it is, which may not be the fanciest of vehicles but is certainly easily consumed.

We’ve titled our document “Networked Discovery Systems at DUL”, and it contains two main sections: the visualizations that graphically depict the systems, and documentation derived from the spreadsheet we created to provide more information and context for each system. Users can click from a high-level view of the discovery system universe to documentation pages, to granular views of particular ‘constellations’ of systems. Anyone interested in checking it out can download it from this link

We’ve identified a number of potential use cases for this documentation, and hope that others will surface:

  • New staff orientation
  • Systems transparency
  • Improved communication
  • Planning
  • Troubleshooting

We’re going to keep iterating and updating the PDF as our discovery environment shifts and changes, and hope that having this documentation will help us to identify areas for improvement and get us closer to achieving that ideal discovery environment.

Fun with Solr Queries

Apache Solr is behind many of our systems that provide a way to search and browse via a web application (such as the Duke Digital Repository, parts of our Bento search application, and the not yet public next generation TRLN Discovery catalog). It’s a tool for indexing data and provides a powerful query API. In this post I will document a few Solr querying techniques that might be interesting or useful. In some cases I won’t be able to provide live links to queries because we restrict direct access to Solr. However, many of these Solr querying techniques can be used directly in an application’s search box. In those cases, I will include live links to example queries in the Duke Digital Repository.

Find a list of items from their identifiers.

With this query you can specify exactly what items you want to appear in a search result from a list of identifiers.

Query

id:"duke:448098" OR id:"duke:282429" OR id:"duke:142581"
Try it in the Duke Digital Repository

Find all records that have a value (any value) in a specific field.

This query will find all the items in the repository that have a value in the product field. (As with most of these queries, you must know the field name in Solr.)

Query

product_tesim:*
Try it in the Duke Digital Repository

Find all the items in the repository that are missing a field value.

You can find all items in the repository that don’t have any date metadata. Inquiring minds want to know.

Query

-date_tesim:[* TO *]
Try it in the Duke Digital Repository

Find items using a begins-with (left-anchored) query.

I want to see all items that have a subject term that begins with “Soviet Union.” The example is a left-anchored query and will exactly match fields that begin with “Soviet Union.” (Note, the field must not be tokenized for this to work as expected.)

Query

subject_facet_sim:/Soviet Union.*/
Try it in the Duke Digital Repository

Find items with an ends-with (right-anchored) query.

Again, this will only work as expected with an untokenized field.

Query

subject_facet_sim:/.*20th century/
Try it in the Duke Digital Repository

Some of you might have noticed that these queries look a lot like regular expressions. And you’re right! Read more about Solr’s support for regular expression queries.

The following examples require direct access to Solr, which is restricted to authorized users and applications. Instead of providing live links, I’ll show the basic syntax, a complete example query using http://localhost:8983/solr/core/* as the sample URL for a Solr index, and a sample response from Solr.

Count instances of values in a field.

I want to know how many items in the repository have a workflow state of published and how many are unpublished. To do that I can write a facet query that will count instances of each value in the specified field. (This is another query that will only work as expected with an untokenized field.)

Query

http://localhost:8983/solr/core/select?q=*:*&facet=true&facet.field=workflow_state_ssi&facet.mincount=1&fl=id

Solr Response (truncated)


...
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="workflow_state_ssi">
<int name="published">484075</int>
<int name="unpublished">2228</int>
</lst>
</lst>
</lst>
...

Collapse multiple records into one result based on a shared field value.

This one is somewhat advanced and likely only useful in particular circumstance. But if you had multiple records that were slight variants of each other, and wanted to collapse each variant down to a single result you can do that with a collapse query — as long as the records you want to collapse share a value.

Query

http://localhost:8983/solr/core/select?q=*:*&fq={!collapse%20field=oclc_number%20nullPolicy=expand%20max=termfreq(institution_f,duke)}

  • !collapse instructs Solr to use the Collapsing Query Parser.
  • field=oclc_number instructs Solr to collapse records that share the same value in the oclc_number field.
  • nullPolicy=expand instructs Solr to return any document without a matching OCLC as part of the result set. If this is excluded then records that don’t share an oclc_number with another record will be excluded from the results.
  • max=termfreq(institution,duke) instructs Solr to select as the representative record when collapsing multiple records the one that has the value “duke” in institution field.

CSV response writer (or JSON, Ruby, etc.)

Solr has a number of tricks up its sleeve when it comes to returning results. By default it will return results as XML. You can also specify JSON, or Ruby. You specify a response writer by adding the wt parameter to the URL (wt=json or wt=ruby, etc.).

Solr will also return results as a CSV file, which can then be opened in an Excel spreadsheet — a useful feature for working with metadata.

Query

http://localhost:8983/solr/core/select?q=sun&wt=csv&fl=id,title_tesim

Solr Response

id,title_tesim
duke:194006,Sun Bowl...Sun City...
duke:194002,Sun Bowl...Sun City...
duke:194009,Sun Bowl...Sun City.
duke:194019,Sun Bowl...Sun City.
duke:194037,"Sun City\, Sun Bowl"
duke:194036,"Sun City\, Sun Bowl"
duke:194030,Sun City
duke:194073,Sun City
duke:335601,Sun Control
duke:355105,Proved! Fast starts at 30° below zero!

This is just a small sample of useful ways you can query Solr.

Interactive Transcripts have Arrived!

Interactive Transcripts have Arrived!

This week Duke Digital Collections added our first set of interactive transcripts to one of our newest digital collections: the Silent Vigil (1968) and Allen Building Takeover (1969) collection of audio recordings.   This marks an exciting milestone in the accessibility efforts Duke University Libraries has been engaged in for the past 2.5 years. Last October, my colleague Sean wrote about our new accessibility features and the technology powering them, and today I’m going to tell you a little more about why we started these efforts as well as share some examples.

Interactive Transcript in the Silent Vigil (1968) and Allen Building Takeover (1969) Audio Recordings

Providing access to captions and transcripts is not new for digital collections.  We have been able to provide access to pdf transcripts and caption both in digital collections and finding aids for years. See items from the Behind the Veil and Memory Project digital collections for examples.

In recent years however, we stepped our efforts in creating captions and transcripts. Our work began in response to a 2015 lawsuit brought against Harvard and MIT by the National Association of the Deaf. The lawsuit triggered many discussions in the library, and the Advisory Council for Digital Collections eventually decided that we would proactively create captions or transcripts for all new A/V digital collections assuming it is feasible and reasonable to do so.  The feasible and reasonable part of our policy is key.  The Radio Haiti collection for example is composed of thousands of recordings primarily in Haitian Creole and French.  The costs to transcribe that volume of material in non-English languages make it unreasonable (and not feasible) to transcribe. In addition to our work in the library, Duke has established campus wide web accessibility guidelines that includes captioning and  transcription.  Therefore our work in digital collections is only one aspect of campus wide accessibility efforts.

To create transcripts and captions, we have partnered with several vendors since 2015, and we have seen the costs for these services drop dramatically.  Our primary vendor right now is Rev, who also works with Duke’s Academic Media Services department.  Rev guarantees 99% accurate captions or transcripts for $1/minute.

Early on, Duke Digital Collections decided to center our captioning efforts around the WebVTT format, which is a time-coded text based file and a W3C standard.  We use it for both audio and video captions when possible, but we can also accommodate legacy transcript formats like pdfs.  Transcripts and captions can be easily replaced with new versions if and when edits need to be made.

Examples from the Silent Vigil (1968) and Allen Building Takeover (1969) Audio Recordings

When WebVTT captions are present, they load in the interface as an interactive transcript.  This transcript can be used for navigation purposes; click the text and the file moves to that portion of the recording.

Click the image above to see the full item and transcript.

In addition to providing access to transcripts on the screen, we offer downloadable versions of the WebVTT transcript as a text file, a pdf or in the original webVTT format.

An advantage of the WebVTT format is that it includes “v” tags, which can be used to note changes in speakers and one can even add names to the transcript.  This can require additional  manual work if the names of the speakers is not obvious to the vendor, but we are excited to have this opportunity.

As Sean described in his blog post, we can also provide access to legacy pdf documents.  They cannot be rendered into an interactive version, but they are still accessible for download.

On a related note, we also have a new feature that links time codes listed in the description metadata field of an item to the corresponding portion of the audio or video file.  This enables librarians to describe specific segments of audio and/or video items.  The Radio Haiti digital collection is the first to utilize this feature, but the feature will be a huge benefit to the H. Lee Waters and Chapel Recordings digital collections as well as many others.

Click the image above to interact with linked time codes.

As mentioned at the top of this post, the Duke Vigil and Allen Building Takeover collection includes our first batch of interactive transcripts.  We plan to launch more this Spring, so stay tuned!!