Category Archives: Conferences

The Politics of Our Labor

I always appreciate the bird’s-eye view of the work I do gained by attending national conferences, and often come away with novel ideas on how to solve old problems and colleagues to reach out to when I encounter new ones. So I was anticipating as much when I left home for the airport last week to attend the 2016 DLF Forum in Milwaukee, Wisconsin.

dlf

And that is indeed the experience I had, but in addition to gaining new ideas and new friends, the keynote for the conference challenged me to think deeply about the broader context in which we as librarians and information professionals do our work, who we do that work for, and whether or not we are living up to the values of inclusivity and accessibility that I hold dear.

The keynote speaker was Stacie Williams, a librarian/archivist who talked about the politics of labor in our communities, both specific to libraries and archives and beyond. She posited that all labor is local, and focused on the caregiving work that, so often performed by women and minorities in under- or unpaid positions, is the “beam underpinning this entire system of labor as we know it … and yet it remains the most invisible part of what makes our economy run”. In order to value ourselves and our work, we need to value all of the labor upon which our society is based. And she asked us to think of the work we do as librarians and archivists – the services we provide – as a form of caregiving to our own communities. She also posited that the information work in which we are engaged has followed the late capitalism trend of an anti-care ethos, and implored us to examine our own institutional practices, asking questions such as:

  • Do we engage in digitization projects where work was performed by at-will workers with no benefits or unpaid interns, or outsourced to prison workers?
  • Are we physically situated on university campuses that are inaccessible to our local community, either by way of location or prohibitive expense?
  • Have we undergone extreme cuts to our workforces, hindering our ability to provide services?
  • Do our hiring practices replicate systems that reward racial/gender class standards?
  • Do we build positions into grants that don’t pay living wages?

Williams asked us to interrogate the ways in which our labor practices are problematic and to center library work in the care ethics necessary “to reflect the standards of access and equality that we say we hold in this profession”.  As a metadata specialist who spends a large chunk of her time working to create description for cultural heritage materials, this statement was especially resonant: “Few things are more liberatory than being able to tell your own story and history and have control and stewardship over your cultural narrative”. This is a tension I am especially aware of – describing resources for discovery and access in ways that honor and reflect the voices and self-identity of original creators or subjects.

The following days were of course filled with interesting and useful panels and presentations, working lunches, and meetups with new and old colleagues. But the keynote, along with the context of the national election, infused the rest of the conference with a spirit of thoughtfulness and openness toward engaging in a deep exploration of our labor practices and relationships to the communities of people we serve. It has given me a lot to think about, and I’m grateful to the DLF Forum planners for bringing this level of discourse to the annual conference. 

Blacklight Summit 2016

Last week I traveled to lovely Princeton, NJ to attend Blacklight Summit. For the second year in a row a smallish group of developers who use or work on Project Blacklight met to talk about our work and learn from each other.

Blacklight is an open source project written in Ruby on Rails that serves as a discovery interface over a Lucene Solr search index. It’s commonly used to build library catalogs, but is generally agnostic about the source and type of the data you want to search. It was even used to help reporters explore the leaked Panama Papers.
blacklight-logo-h200-transparent-black-text
At Duke we’re using Blacklight as the public interface to our digital repository. Metadata about repository objects are indexed in Solr and we use Blacklight (with a lot of customizations) to provide access to digital collections, including images, audio, and video. Some of the collections include: Gary Monroe Photographs, J. Walter Thompson Ford Advertisements, and Duke Chapel Recordings, among many others.

Blacklight has also been selected to replace the aging Endeca based catalog that provides search across the TRLN libraries. Expect to hear more information about this project in the future.
trln_logo_abbrev_rgb
Blacklight Summit is more of an unconference meeting than a conference, with a relatively small number of participants. It’s a great chance to learn and talk about common problems and interests with library developers from other institutions.

I’m going to give a brief overview of some of what we talked about and did during the two and a half day meeting and provides links for you explore more on your own.

First, a representative from each institution gave about a five minute overview of how they’re using Blacklight:

The group participated in a workshop on customizing Blacklight. The organizers paired people based on experience, so the most experienced and least experienced (self-identified) were paired up, and so on. Links to the github project for the workshop: https://github.com/projectblacklight/blacklight_summit_demo

We got an update on the state of Blacklight 7. Some of the highlights of what’s coming:

  • Move to Bootstrap 4 from Bootstrap 3
  • Use of HTML 5 structural elements
  • Better internationalization support
  • Move from helpers to presenters. (What are presenters: http://nithinbekal.com/posts/rails-presenters/)
  • Improved code quality
  • Partial structure that makes overrides easier

A release of Blacklight 7 won’t be ready until Bootstrap 4 is released.

There were also several conversations and breakout session about Solr, the indexing tool used to power Blacklight. I won’t go into great detail here, but some topics discussed included:

  • Developing a common Solr schema for library catalogs.
  • Tuning the performance of Solr when the index is updated frequently. (Items that are checkout out or returned need to be indexed relatively frequently to keep availability information up to date.)
  • Support for multi-lingual indexing and searching in Solr, especially Chinese, Japanese, and Korean languages. Stanford has done a lot of work on this.

I’m sure you’ll be hearing more from me about Blacklight on this blog, especially as we work to build a new TRLN shared catalog with it.

Hopscotch Design Fest 2016

A few weeks ago I attended my second HopScotch Design Fest in downtown Raleigh. Overall the conference was superb – almost every session I attended was interesting, inspiring, and valuable. Compared to last year, the format this time around was centered around themed groups of speakers and shorter presentations followed by a panel discussion. I was especially impressed with two of these sessions.

Design for Storytelling

Daniel Horovitz talked about how he’d reached a point in his career where he was tired of doing design work with computers. He decided to challenge himself and create at least one new piece of art every day using analog techniques (collage, drawing, etc). He began sharing his work online which lead to increased exposure and a desire from clients to create new projects in the new style he’d developed, instead of the computer-based design work that he’d spent most of his career working on. Continued exploration and growth in his new techniques lead to working on bigger and bigger projects around the world. His talent and body of work are truly impressive and it’s inspiring to hear that creative ruts can sometime lead to reinvention (and success!).


Ekene Eijeoma began his talk by inviting us to turn to the person next to us and say three things: I see you, I value you, and I acknowledge you. This fleetingly simple interaction was actually quite powerful – it was a really interesting experience. He went on to demonstrate how empathy has driven his work. I was particularly impressed with his interactive installation Wage Islands. It visualizes which parts of New York City are really affordable for the people who live there and allows users to see how things change with increases and decreases to the minimum wage.


Michelle Higa Fox showed us many examples of the amazing work that her design studio has created. She started off talking about the idea of micro story telling and the challenges of reaching users on social media channels where focus is fleeting and pulled in many directions. Here are a couple of really clever examples:

02_polaroid_480_0722d08_cakekaratechop_1016a

Her studio also builds seriously impressive interactive installations. She showed us a very recent work that involved transparent LCD screens and dioramas housed behind the screens that were hidden and revealed based on the context, while motion graphic content could be overlaid in front. It was amazing. I couldn’t find any images online, but I did find this video of another really cool interactive wall:

One anecdote she shared, which I found particularly useful, is that it’s very important to account for short experiences when designing these kinds of interfaces, as you can’t expect your users to stick around as long as you’d like them to. I think that’s something we can take more into consideration as we build interfaces for the library.

Design for Hacking Yourself

Brooke Belk lead us through a short mindfulness exercise (which was very refreshing) and talked about how practicing meditating can really help creativity flow more easily throughout the day. Something I need to try more often! Alexa Clay talked about her concept of the misfit economy. I was amused by her stories of doing role-playing at tech conferences where she dresses as the Amish Futurist and asks deeply challenging questions about the role of technology in the modern world.

But I was mostly impressed with Lulu Miller’s talk. She formerly was a producer at Radiolab, my favorite show on NPR, and now has her own podcast called Invisibilia which is all to say that she knows how to tell a good story. She shared a poignant tale about the elusive nature of creative pursuits she called the house and the bicycle. The story intertwined her experience of pursuing a career in fiction writing while attending grad school in Portland and her neighbor’s struggle to stop building custom bicycles and finish building his house. Other themes included the paradox of intention, having faith in yourself and your work, throwing out the blueprint, and putting out what you have right now! All sage advice for creative types. It really was a lovely experience – I hope it gets published in some form soon.

Star Wars: The Fans Strike Back

At the recent Association of Moving Image Archivists conference in Portland, Oregon, I saw a lot of great presentations related to film and video preservation. As a Star Wars fan, I found one session particularly interesting. It was presented by Jimi Jones, a doctoral student at the University of Illinois at Urbana-Champaign, and is the result of his research into the world of fan edits.

This is a fairly modern phenomenon, whereby fans of a particular film, music recording or television show, often frustrated by the unavailability of that work on modern media, take it upon themselves to make it available, irrespective of copyright and/or the original creator’s wishes. Some fan edits appropriate the work, and alter it significantly, to make their own unique version. Neither Jimi Jones nor AMIA is advocating for fan edits, but merely exploring the sociological and technological implications they may have in the world of film and video digitization and preservation.

An example is the original 1977 theatrical release of “Star Wars” (later retitled Star Wars Episode IV: A New Hope), a movie I spent my entire 1977 summer allowance on as a child, because I was so awestruck that I went back to my local theater to see it again and again. The version that I saw then, free of more recently superimposed CGI elements like Jabba The Hut, and the version in which Han Solo shoots Greedo in the Mos Eisley Cantina, before Greedo can shoot Solo, is not commercially available today via any modern high definition media such as Blu-Ray DVD or HD streaming.

The last time most fans saw the original, unaltered Star Wars Trilogy, it was likely on VHS tape (as shown above). George Lucas, the creator of Star Wars, insists that his more recent “Special Editions” of the Star Wars Trilogy, with the added CGI and the more politically-correct, less trigger-happy Han Solo, are the “definitive” versions. Thus Lucas has refused to allow any other version to be legally distributed for at least the past decade. Many Star Wars fans, however, find this unacceptable, and they are striking back.

Armed with sophisticated video digitization and editing software, a network of Star Wars fans have collaborated to create “Star Wars: Despecialized Edition,” a composite of assorted pre-existing elements that accurately presents the 1977-1983 theatrical versions of the original Star Wars Trilogy in high definition for the first time. The project is led by an English teacher in Czechoslovakia, who goes by the name of “Harmy” online and is referred to as a “guerilla restorationist.” Using BitTorrent, and other peer-to-peer networks, fans can now download “Despecialized,” burn it to Blu-Ray, print out high-quality cover art, and watch it on their modern widescreen TV sets in high definition.

The fans, rightly or wrongly, claim these are the versions of the films they grew up with, and they have a right to see them, regardless of what George Lucas thinks. Personally, I never liked the changes Lucas later made to the original trilogy, and I agree that “Han Shot First,” or to paraphrase Johnny Cash, “I shot a man named Greedo, just to watch him die.” We all know Greedo was a scumbag who was about to kill Solo anyway, so Han’s preemptive shot in the original Star Wars makes perfect sense. I’m not endorsing piracy, but, as a fan, I certainly understand the pent-up demand for “Star Wars: Despecialized Edition.”

tfa_vinyl

The psychology of nostalgia is interesting,  particularly when fans desire something so intensely, they will go to great lengths, technologically, and otherwise, to satiate that need. Absence makes the heart, or fan, grow stronger. This is not unique to Star Wars. For instance, Neil Young, one of the best songwriters of his generation, released a major-label record in 1973 called “Time Fades Away,” which, to this day, has never been released on compact disc.

The album, recorded on tour while his biggest hit single, “Heart of Gold,” was topping the charts, is an abrupt shift in mood and approach, and the beginning of a darker, more desolate string of albums that fans refer to as “The Ditch Trilogy.” Regarding this period, Neil said: “Heart of Gold put me in the middle of the road. Traveling there soon became a bore, so I headed for the ditch. A rougher ride but I saw more interesting people there.” Many fans, myself included, regard the three records that comprise the ditch trilogy as his greatest achievement, due to their brutal honesty, and Neil’s absolute refusal to play it safe by coasting on his recent mainstream success. But for Neil, Time Fades Away brings up so many bad memories, particularly regarding the death of his guitarist, Danny Whitten, that he has long refused to release it on CD.

In 2005, Neil Young fans began gathering at least 14,000 petition signatures to get the album released on compact disc, but that yielded no results. So many took it upon themselves, using modern technology, to meticulously transfer mint-condition vinyl copies of “Time Fades Away” from their turntable to desktop computer using widely available professional audio software, and then burn the album to CD. Fans also scanned the original cover art from the vinyl record, and made compact disc covers and labels that closely approximate what it would look like if the CD had been officially released.

Other fans, using peer-to-peer networks, were able to locate a digital “test pressing” of the audio for a future CD release that was nixed by Neil before it went into production. Combining that test pressing audio, free of vinyl static, with professional artwork, the fans were essentially able to produce what Neil refused to allow, a pristine-sounding, and professionally-looking version of Time Fades Away on compact disc. Perhaps in response, Neil, has, just in the last year, allowed Time Fades Away to be released in digital form via his high-resolution 192.0kHz/24bit music service, Pono Music.

It’s clear that the main intent of the fans of Star Wars, Time Fades Away and other works of art is not to profit off their hybrid creations, or to anger the original creators. It’s merely to finally have access to what they are so nostalgic about. Ironically, if it wasn’t for the unavailability of these works, a lot of this community, creativity, software mastery and “guerrilla restoration” would not be taking place. There’s something about the fact that certain works are missing from the marketplace, that makes fans hunger for them, talk about them, obsess about them, and then find creative ways of acquiring or reproducing them.

This is the same impulse that fuels the fire of toy collectors, book collectors, garage-sale hunters and eBay bidders. It’s this feeling that you had something, or experienced something magical when you were younger, and no one has the right to alter it, or take access to it away from you, not even the person who created it. If you can just find it again, watch it, listen to it and hold it in your hands, you can recapture that youthful feeling, share it with others, and protect the work from oblivion. It seems like just yesterday that I was watching Han Solo shoot Greedo first on the big screen, but that was almost 40 years ago. “’Cause you know how time fades away.”

Who Are you and Why are you Here: a Duke Digital Collections Poster

This week, my colleague Will Sexton and I (as well as several other Duke folks) are attending the Digital Library Federation conference in beautiful Vancouver, British Columbia.  While here, we presented a poster on our work to assess scholarly use of digital collections.   Please have a look at our poster below.

DC.DLF-poster.FINAL copy
Click image for a larger version

 

If you are interested in learning more about our assessment project, check out these previous blog posts:

We will also publish a report based on our survey findings sometime in the next few months – so stay tuned!

Fugitive Sheets Wrapup at TRLN 2015

Rachel Ingold (Curator for the History of Medicine Collections at the Rubenstein Library) and I co-presented yesterday at the TRLN Annual Conference 2015 in Chapel Hill, NC:


Raising the Bar for Lifting the Flaps: An Inside Look at the Anatomical Fugitive Sheets Digital Collection at Duke

Sean Aery, Digital Projects Developer, Duke
Rachel Ingold, Curator for the History of Medicine Collections, Duke

Duke’s Digital Collections program recently published a remarkable set of 16th-17th century anatomical fugitive sheets from The Rubenstein Library’s History of Medicine Collections. These illustrated sheets are similar to broadsides, but feature several layers of delicate flaps that lift to show inside the human body. The presenters will discuss the unique challenges posed by the source material including conservation, digitization, description, data modeling, and UI design. They will also demonstrate the resulting digital collection, which has already earned several accolades for its innovative yet elegant solutions for a project with a high degree of complexity.


Here are our slides from the session:

For more information on the project, see Rachel’s post introducing the collection and mine explaining how it works. Finally, Ethan Butler at Cuberis also posted a great in-depth look at the technology powering the interface.

Dispatches from the Digital Library Federation Forum

On October 27-29 librarians, archivists, developers, project managers, and others met for the Digital Library Federation (DLF) Forum in Atlanta, GA. The program was packed to the gills with outstanding projects and presenters, and several of us from Duke University Libraries were fortunate enough to attend.  Below is a round up of notes summarizing interesting sessions, software tools, projects and collections we learned about at the conference.

Please note that these notes were written by humans listening to presentations and mistakes are inevitable.  Click the links to learn more about each tool/project or session straight from the source.

Tools and Technology

Spotlight is an open-source tool for featuring digitized resources and is being developed at Stanford University.  It appears to have fairly similar functionality to Omeka, but is integrated into Blacklight, a discovery interface used by a growing number of libraries.

 

The J. Williard Marriott Library at the University of Utah presented on their use of Pamco Imaging tools to capture 360 degree images of artifacts.  The library purchased a system from Pamco that includes an automated turntable, lighting tent and software to both capture and display the 3-D objects.

 

There were two short presentations about media walls; one from our friends in Raleigh at the Hunt Library at N.C. State University, and the second from Georgia State.  Click the links to see just how much you can do with an amazing media wall.

Projects and Collections

The California Digital Library (CDL) is redesigning and reengineering their digital collections interface to create a kind of mini-Digital Public Library of America just for University of California digital collections.  They are designing the project using a platform called Nuxeo and storing their data through Amazon web services.  The new interface and platform development is highly informed by user studies done on the existing Calisphere digital collections interface.

 

Emblematica Online is a collection of  digitized emblem books contributed by several global institutions including Duke. The collection is hosted by University of Illinois at Urbana Champagne.  The project has been conducting user studies and hope to publish them in the coming year.

 

The University of Indiana Media Digitization and Preservation Initiative started in 2009 with a survey of all the audio and visual materials on campus.  In 2011, the initiative proposed digitizing all rare and unique audio and video items within a 15 year period. However in 2013, the President of the University said that the campus would commit to completing the project in a 7 year period.   To accomplish this ambitious goal, the university formed a public-private partnership with Memnon Archiving Services of Brussels. The university estimates that they will create over 9 petabytes of data. The initiative has been in the planning phases and should be ramping up in 2015.

Selected Session Notes

The Project Managers group within DLF organized a session on “Cultivating a Culture of Project Management” followed by a working lunch. Representatives from John’s Hopkins and Brown talked about implementing Agile Methodology for managing and developing technical projects.  Both libraries spoke positively about moving towards Agile, and the benefits of clear communication lines and defined development cycles.  A speaker from Temple university discussed her methods for tracking and communicating the capacity of her development team; her spreadsheet for doing so took the session by storm (I’m not exaggerating – check out Twitter around the time of this session).   Two speakers from the University of Michigan shared their work in creating a project management special interest group within their library to share PM skills, tools and heartaches.

A session entitled “Beyond the digital Surrogate” highlighted the work of several projects that are using digitized materials as a starting point for text mining and visualizing data.  First, many of UNC’s Documenting the American South collections are available as a text download.  Second, a tool out of Georgia Tech supports interactive exploration and visualization of text based archives.  Third, a team from University of Nebraska-Lincoln is developing methods for using visual information to leverage discovery and analysis of digital collections.

 

Assessment

“Moving Forward with Digital Library Assessment.” Based around the need to strategically focus our assessment efforts in digital libraries and to better understand and measure the value, impact, and associated costs of what we do. 

Community notes for this session

  • Joyce Chapman, Duke University
  • Jody DeRidder, University of Alabama
  • Nettie Lagace, National Information Standards Organization
  • Ho Jung Yoo, University of California, San Diego

Nettie Legace: update on NISO’s altmetrics initiative.

  • The first phase exposed areas for potential standardization. The community then collectively prioritized those potential projects, and the second phase is now developing those best practices. A Working group is developed, its recommendation due June 2016.
  • Alternative Metrics Initiative Phase 1 White Paper 

Joyce Chapman: a framework for estimating digitization costs

Jody DeRidder and Ho Jung Yoo: usability testing

  • What critical aspects need to be addressed by a community of practice?
  • What are next steps we can take as a community?

Freedom Summer 50th Anniversary

SNCC workers Lawrence Guyot and Sam Walker at COFO (Council of Federated Organizations) office in Gulfport, Mississippi. Courtesy of www.crmvet.org.
SNCC workers Lawrence Guyot and Sam Walker at COFO (Council of Federated Organizations) office in Gulfport, Mississippi. Courtesy of www.crmvet.org.

Fifty years ago, hundreds of student volunteers headed south to join the Student Nonviolent Coordinating Committee’s (SNCC) field staff and local people in their fight against white supremacy in Mississippi. This week, veterans of Freedom Summer are gathering at Tougaloo College, just north of Jackson, Mississippi, to commemorate their efforts to remake American democracy.

The 50th anniversary events, however, aren’t only for movement veterans. Students, young organizers, educators, historians, archivists, and local Mississippians make up the nearly one thousand people flocking to Tougaloo’s campus this Wednesday through Saturday. We here at Duke Libraries, as well as members of the SNCC Legacy Project Editorial Board, are in the mix, making connections with both activists and archivists about our forthcoming website, One Person, One Vote: The Legacy of SNCC and the Fight for Voting Rights.

Mississippi Freedom Summer 50th Anniversary. Courtesy of Freedom50.org.
Mississippi Freedom Summer 50th Anniversary. Courtesy of Freedom50.org.

This site will bring together material created in and around SNCC’s struggle for voting rights in the 1960s and pair it with new interpretations of that history by the movement veterans themselves. To pull this off, we’ll be drawing on Duke’s own collection of SNCC-related material, as well as incorporating the wealth of material already digitized by institutions like the University of Southern Mississippi, the Wisconsin Historical Society’s Freedom Summer Collection, the Mississippi Department of Archives and History, as well as others.

What becomes clear while circling through the panels, films, and hallway conversations at Freedom Summer 50th events is how the fight for voting rights is really a story of thousands of local people. The One Person, One Vote site will feature these everyday people – Mississippians like Peggy Jean Connor, Fannie Lou Hamer, Vernon Dahmer, and SNCC workers like Hollis Watkins, Bob Moses, and Charlie Cobb. And the list goes on. It’s not everyday that so many of these people come together under one roof, and we’re doing our share of listening to and connecting with the people whose stories will make up the One Person, One Vote site.

Society of North Carolina Archivists Annual Meeting Slides

On Tuesday April 8, I had the honor of presenting at the annual meeting of the Society of North Carolina Archivists with representatives from Wake Forest University and Davidson College. The focus of our panel was to present alternatives to CONTENTdm, a system for displaying digital collections widely used by libraries. At Duke, we have developed our own Tripod interface to digital collections. Wake Forest and Davidson use a variety of tools most notably DSpace and Islandora (via Lyrasis) respectively. It was great to present with and learn more about the Wake Forest and Davidson programs! I’ve embedded slides from all three speakers below.





Schema.org and Google for Local Discovery: Some Key Takeaways

Google CSE & schema.orgOver the past year and a half, among our many other projects, we have been experimenting with a creative new approach to powering searches within digital collections and finding aids using Google’s index of our structured data. My colleague Will Sexton and I have presented this idea in numerous venues, most recently and thoroughly for a recorded ASERL (Association of Southeastern Research Libraries) webinar on June 6, 2013.

We’re eager to share what we’ve learned to date and hope this new blog will make a good outlet. We’ve had some success, but have also encountered some considerable pitfalls along the way.

What We Set Out to Do

I won’t recap all the fine details of the project here, but in a nutshell, here are the problems we’ve been attempting to address:

  • Maintaining our own Solr index takes a ton of time to do right. We don’t have a ton of time.
  • Staff have noted poor relevance rank and poor support for search using non-Roman characters.
  • Our digital collections search box is actually used sparsely (in only 12% of visits).
  • External discovery (e.g., via Google) is of equal or greater importance vs. our local search for these “inside-out” resources.

Here’s our three-step strategy:

  1. Embed schema.org data in our HTML (using RDFa Lite)
  2. Get Google to index all of our embedded structured data
  3. Use Google’s index of our structured data to power our local search for finding aids & digital collections

Where We Are Today

We mapped several of our metadata fields to schema.org terms, then embedded that schema.org data in all 74,000 digital object pages and all 2,100 finding aids. We’re now using Google’s index of that data to power our default search for:

  1. All of our finding aids (a.k.a. collection guides).  [Example search for “photo”]
  2. One digital collection: Sidney Gamble Photographs. [Example search for “beijing”]

Though the strategy is the same, some of the implementation details are different between our finding aids and digital collections applications. Here are the main differences:

Site Service Google CSE API Max Results per Query
Finding Aids Google Custom Search (free) JS v1.0 100
Digital Collection Google Site Search
(premium version of Custom Search)
XML API 1,000

 

Finding Aids Search

Embedding the Data. We kept it super simple here. We labeled every finding aid page a ‘CollectionPage’ and tagged only a few properties: name, description, creator, and if present, a thumbnailUrl for a collection with digitized content.

Schema.org tags using RDFa Lite in finding aid HTML
Schema.org tags using RDFa Lite in finding aid HTML

Rendering Search Results Using Google’s Index. 

This worked great. We used a Google Custom Search Element (CSE) and created our own “rich snippets” using the CSE JavaScript API (v1.0) and the handy templating options Google provides. You can simply “View Source” to see the underlying code: it’s all there in the HTML. The HTML5 data- attributes set all the content and the display logic.

Google Javascript objects used in search result snippet presentation.
Google Javascript objects used in search result snippet presentation.

 

Digital Collections Search: Sidney D. Gamble Collection

Embedding the Data.

Our digital collections introduce more complexity in the structured data than we see in our finding aids. Naturally, we have a wide range of item types with diverse metadata. We want our markup to represent the relationship of an item to its source collection. The item, the webpage that it’s on, the collection it came from, and the media files associated with it all have properties that can be expressed using schema.org terms. So, we tried it all.[1]

Example Schema.org tags used in item pages
Example Schema.org tags used in item pages

Rendering Search Results Using Google’s Index. 

For the Gamble collection, we succeeded in making queries hit Google’s XML API while sustaining the look of our existing search results. Note that the facets in the left side aren’t powered via Google–we haven’t gotten far enough in our experiment to work with filtering the result set based on the structured data, but that’s possible to do.

Search result rendering using Google's XML API
Search result rendering using Google’s XML API

Outcomes 

 

The Good

We’ve been pleased with the ability to make our own rich snippets and highly customize the appearance of search results without having to do a ton of development. Getting our structured data back from Google’s index to work with is an awesome service and developing around the schema.org properties that we were already providing has been a nice way to kill two birds with one stone.

For performance, Google CSE is working well in both the finding aids and the Gamble digital collection search for these purposes:

  • getting the most relevant content presented early on in the search result
  • getting results quickly
  • handling non-Roman characters in search terms
  • retrieving a needle in a haystack — an item or handful of items that contain some unique text

The Gotchas

While Google CSE  shows relevant results quickly, we’re finding it’s not a good fit for exploratory searching when either of these aspects is important:

  • getting a stable and precise count of relevant results
  • browsing an exhaustive list of results that match a general query

Be careful: queries max out at 100 results with the JavaScript APIs or 1,000 results when using the XML API.  Those limits aren’t obvious in the documentation, yet they might be a deal-breaker for some potential uses.

For queries with several pages of hits, you may get an estimated result count that’s close, but unfortunately things occasionally and inexplicably go sour as you navigate from from one result page to the next.  E.g., the Gamble digital collection query ‘beijing‘ shows about 2,100 results (which is in the ballpark of what Solr returns), yet browse a few pages in and the result set will get truncated severely: you may only be able to actually browse about 200 of the results without issuing more specific query terms.

Other Considerations

Impact on External Discovery

Traffic to digital collections via external search engines has mostly climbed steadily every quarter for the past few years, from 26% of all visits in Jul-Sep 2011 up to 44% from Jan-Mar 2014 (to date) [2]. We entered schema.org tags in Oct 2012, however we don’t know whether adding that data has contributed at all to this trend. Does schema.org data impact relevance? It’s hard to tell.

Structured Data Syntax + Google APIs

Though RDFa Lite and microdata should be equally acceptable ways to add schema.org tags, Google’s APIs actually work better with microdata if there are nested item types.[3]  And regardless of microdata or RDFa, the Google CSE JavaScript API unfortunately can’t access more than one value for any given property, so that can be problematic [4].

Rich Snippets in Big Google

We’re seeing Google render rich snippets for our videos, because we’ve marked them as schema.org VideoObjects with properties like thumbnailUrl. That’s encouraging! Perhaps someday Google will render better snippets for things like photographs (of which we have a bunch), or maybe even more library domain-specific materials like digitized oral histories, manuscripts, and newspapers.  But at present, none of our other objects seem to trigger nice snippets like this.

A rich snippet triggered by using schema.org videoObject type & thumbnailUrl property.
A rich snippet triggered by using schema.org videoObject type & thumbnailUrl property.

Footnotes

[1] We represented item pages as schema.org “ItemPage” types using the “ispartOf” property to relate the item page to its corresponding “CollectionPage”. We made the ItemPage “about” a “CreativeWork”. Then we created mappings for many of our metadata fields to CreativeWork properties, e.g., creator, contentLocation, genre, dateCreated.

[2] Digital Collections External Search Traffic by Quarter

Quarter    Visits via Search   % Visits via Search

Jul – Sep 2011   26,621   25.97%
Oct – Dec 2011   32,191   29.59%
Jan – Mar 2012   41,048   32.16%
Apr – Jun 2012   33,872   34.49%
Jul – Sep 2012   28,250   32.40%
Oct – Dec 2012   38,472   36.52% <– entered schema.org tags Oct 19, 2012
Jan – Mar 2013   39,948   35.29%
Apr – Jun 2013   36,641   38.30%
Jul – Sep 2013   35,058   41.88%
Oct – Dec 2013   46,082   43.98%
Jan – Mar 2014   47,123   43.93%

[3] For example, if your RDFa indicates that “an ItemPage is about a CreativeWork whose creator is Sidney Gamble”– the creator of the creative work is not accessible to the API since the CreativeWork is not a top-level item.  To get around that, we had to duplicate all the CreativeWork properties in the HTML <head>, which is unnatural and a bit of a hack.

[4]  Google’s CSE JS APIs also don’t let us retrieve the data when there are multiple values specified for the same field. For a given CreativeWork, we might have six locations that are all important to represent: China; Beijing (China); Huabei xie he nu zi da xue (Beijing, China); 中国; 北京;  华北协和女子大学.  The JSON returned by the API only contains the first value: ‘China’. This, plus the result count limit, made the XML API our only viable choice for digital collections.