Category Archives: Technology

A collaborative approach to developing a new Duke Libraries catalog

Post contributed by: Emily Daly, Thomas Crichlow, and Cory Lown

If you’re a frequent or even casual user of the Duke Libraries catalog, you’ve probably noticed that it’s remained remarkably consistent over the last decade. Consistency can be a good thing, but there is certainly room for improvement in the Duke Libraries catalog, and staff from the libraries at Duke, UNC, and NCSU are excited to replace the current catalog’s aging infrastructure and outdated user interface with an entirely new collaboratively developed open-source discovery layer. While many things are changing, one key feature will remain the same: The catalog will continue to allow users to locate and access materials not only here at Duke but also across the other Triangle Research Libraries member libraries (NCSU, NCCU, UNC).

Users will be able to search for items in the Duke Libraries catalog and then expand to see books and items from NCSU, NCCU, and UNC if they wish.

Commitment to collaboration

In addition to an entirely new central index that supports institutional and consortial searching, the new catalog benefits from a shared, centrally developed codebase as well as locally hosted, customizable catalog interfaces. Perhaps most notably, the new catalog has been built with the needs of library and complex bibliographic data in mind. While the software used for the current library catalog has evolved and grown in complexity to support e-commerce and business needs (not higher ed or library needs), the library software development community has been hard at work building specialized discovery layers using the open-source Blacklight framework. Peer institutions including Stanford, Cornell, and Princeton are already using Blacklight for their library catalogs, and there is an active Blacklight development community that Duke is excited to be a part of. Being part of this community enables us to build on the good work already in place in other library catalogs, including more intuitive facets, adaptive linking for subjects and other fields, a more responsive user interface for access via tablets and phones, and the ability to preserve the order of MARC fields when it’s useful to researchers (MARC is an international standard for representing bibliographic and related data).

We’re upping our collaboration game locally, too: This project has given us the opportunity to develop a new model for collaborative software development. Rather than reinvent the wheel at each Triangle Research Library, we’re combining effort and expertise to develop a feature-rich yet highly customizable discovery layer that will serve the needs of researchers across the triangle. To do this, we have adopted an agile project management process with talented developers and dedicated product owners from NCSU, UNC, and Duke. The agile approach has helped us be more productive and efficient during the development phase and increased collaboration across the four Triangle Research Libraries, positioning us well for maintaining and governing the catalog after we go live.

This image depicts the structure of the development team that was formed in May 2017 to collaboratively build the new library catalog.

What’s next?

The development team has already conducted multiple rounds of user testing and made changes to the user interface based on findings. We’re now ready to hear feedback from library staff. To facilitate this, we’ll be launching the Duke instance of the catalog to all library staff next Wednesday, August 1. We encourage staff to explore catalog features and records and then report feedback, providing screenshots, URLs, and other details as needed. We’ll continue user testing this fall and solicit extensive feedback from faculty, students, staff, and general researchers.

Our plan (fingers crossed!) is to officially launch the new Duke Libraries catalog to all users in early 2019, perhaps as soon as the start of the spring semester. A local implementation team is already at work to be sure we’re ready to replace Duke’s old catalog with the new and improved version early next year. Meanwhile, development and interface enhancement of the catalog will continue this fall. While we are pleased with what we’ve accomplished over the last 18 months, there is still significant work to be done before we’ll be ready to go live. Here are a few items on the lengthy TO DO list:

  • finish loading the 16 million records from all four Triangle Research libraries
  • integrate Duke’s request workflows so users can request items they discover in the new catalog
  • develop a robust Advanced Search interface in response to user demand
  • tune relevance ranking
  • ensure that non-Roman scripts are searchable and display correctly
  • map non-MARC metadata so items such as digital collections records are discoverable
Effective search and display of non-Roman scripts is just one of the many items left on our list before we launch the library catalog to the public.

There is a lot of work ahead to be sure, but what we will launch to staff next week is a functional catalog with nearly 10 million records, and that number is increasing by the day. We invite you to take the new catalog for a spin and tell us what you think so we can make improvements and be ready for all researchers in just a few short months.

Revitalizing DSpace at Duke

Near the tail end of 2017, the Duke Libraries committed to a major multi-version upgrade for DukeSpace (powered by the open-source repository platform DSpace), and assembled an Avengers-like team to combine its members’ complementary powers to conquer it together.  The team persisted through several setbacks and ultimately prevailed in its mission. The new site launched successfully in March 2018.

That same team is now back for a sequel, collaborating to tackle additional issues around system integrations, statistics/reporting, citations, and platform maintenance. Phase II of the project will wrap up this summer.

I’d like to share a bit more about the DSpace upgrade project, beginning with some background on why it’s important and where the platform fits into the larger picture at Duke. Then I’ll share more about the areas to which we have devoted the most developer time and attention over the past several months.   Some of the development efforts were required to make DSpace 6 viable at all for Duke’s ongoing needs. Other efforts have been to strengthen connections between DukeSpace and other platforms.  We have also been enhancing several parts of the user interface to optimize its usability and visual appeal.

DSpace at Duke: What’s in It?

Duke began using DSpace around 2006 as a solution for Duke University Archives to collect and preserve electronic theses and dissertations (ETDs). In 2010, the university adopted an Open Access policy for articles authored by Duke faculty, and DukeSpace became the host platform to make these articles accessible under the policy. These two groups of materials represent the vast majority of the 15,000+ items currently in the platform. Ensuring long-term preservation, discovery, and access to these items is central to the library’s mission.

Integrations With Other Systems

DukeSpace is one of three key technology platforms working in concert to support scholarly communications at Duke. The other two are the proprietary Research Information Management System Symplectic Elements, and the open-source research networking tool VIVO (branded as Scholars@Duke). Here’s a diagram illustrating how the platforms work together, created by my colleague Paolo Mangiafico:

Credit: Paolo Mangiafico

 

In a nutshell, DSpace plays a critical role in Duke University scholars’ ability to have their research easily discovered, accessed, and used.

  • Faculty use Elements to manage information about their scholarly publications. That information is pulled neatly into Scholars@Duke which presents for each scholar an authoritative profile that also includes contact info, courses taught, news stories in which they’re mentioned,  and more.
  • The Scholars@Duke profile has an SEO-friendly URL, and the data from it is portable: it can be dynamically displayed anywhere else on the web (e.g., departmental websites).
  • Elements is also the place where faculty submit the open access copies of their articles; Elements in turn deposits those files and their metadata to DSpace. Faculty don’t encounter DSpace at all in the process of submitting their work.
  • Publications listed in a Scholars@Duke profile automatically include a link to the published version (which is often behind a paywall), and a link to the open access copy in DSpace (which is globally accessible).

Upgrading DSpace: Ripple Effects

The following diagram expands upon the previous one. It adds boxes to the right to account for ETDs and other materials deposited to DSpace either by batch import mechanisms or directly via the application’s web input forms. In a vacuum, a DSpace upgrade–complex as that is in its own right–would be just the green box. But as part of an array of systems working together, the upgrade meant ripping out and replacing so much more. Each white star on the diagram represents a component that had to be thoroughly investigated and completely re-done for this upgrade to succeed.

One of the most complicated factors in the upgrade effort was the bidirectional arrow marked “RT2”:  Symplectic’s new Repository Tools 2 connector. Like its predecessor RT1, it facilitates the deposit of files and metadata from Elements into DSpace (but now via different mechanisms). Unlike RT1, RT2 also permits harvesting files and metadata from DSpace back into Elements, even for items that weren’t originally deposited via Elements.  The biggest challenges there:

  • Divergent metadata architecture. DukeSpace and Elements employ over 60 metadata fields apiece (and they are not the same).
  • Crosswalks. The syntax for munging/mapping data elements from Elements to DSpace (and vice versa) is esoteric, new, and a moving target.
  • Legacy/inconsistent data. DukeSpace metadata had not previously been analyzed or curated in the 12 years it had been collected.
  • Newness. Duke is likely the first institution to integrate DSpace 6.x & Elements via RT2, so a lot had to be figured out through trial & error.

Kudos to superhero metadata architect Maggie Dickson for tackling all of these challenges head-on.

User Interface Enhancements in Action

There are over 2,000 DSpace instances in the world. Most implementors haven’t done much to customize the out-of-the-box templates, which look something like this for an item page:

DSpace interface out of the box. From http://demo.dspace.org/xmlui/

The UI framework itself is outdated (driven via XSLT 1.0 through Cocoon XML pipelines), which makes it hard for anyone to revise substantially. It’s a bit like trying to whittle a block of wood into something ornate using a really blunt instrument. The DSpace community is indeed working on addressing that for DSpace 7.0, but we didn’t have the luxury to wait. So we started with the vanilla template and chipped away at it, one piece at a time. These screenshots highlight the main areas we have been able to address so far.

Bootstrap / Bootswatch Theme

We layered on the same adapted Bootswatch theme in use by the Duke Libraries’ Drupal website and Duke Digital Repository, then applied the shared library masthead. This gives DukeSpace a fairly common look and feel with the rest of the library’s web presence.

Images, Icons, and Filesizes

We configured DSpace to generate and display thumbnail images for all items. Then we added icons corresponding to MIME types to help distinguish different kinds of files. We added really prominent indicators for when an item was embargoed (and when it would become available), and also revised the filesize display to be more clear and concise.

Usage & Attention Stats

Out of the box, DSpace item statistics are only available by clicking a link on the item page to go to a separate stats page. We figured out how to tap into the Solr statistics core and transform that data to display item views and file downloads directly in the item sidebar for easier access. We were also successful showing an Altmetric donut badge for any article with a DOI. These features together help provide a clear indication on the item page how much of an impact a work has made.

Rights

We added a lookup from the item page to retrieve the parent collection’s rights statement, which may contain a statement about Open Access, a Creative Commons license, or other explanatory text. This will hopefully assert rights information in a more natural spot for a user to see it, while at the same time draw more attention to Duke’s Open Access policy.

Scholars@Duke Profiles & ORCID Links

For any DukeSpace item author with a Scholars@Duke profile, we now display a clickable icon next to their name. This leads to their Scholars@Duke profile, where a visitor can learn much more about the scholar’s background, affiliations, and other research. Making this connection relies on some complicated parts: 1) enable getting Duke IDs automatically from Elements or manually via direct entry; 2) storing the ID in a DSpace field; 3) using the ID to query a VIVO API to retrieve the Scholars@Duke profile URL. We are able to treat a scholar’s ORCID in a similar fashion.

Other Development Areas

Beyond the public-facing UI, these areas in DSpace 6.2 also needed significant development for the upgrade project to succeed:

  • Fixed several bugs related to batch metadata import/export
  • Developed a mechanism to create user accounts via batch operations
  • Modified features related to authority control for metadata values

Coming Soon

By summer 2018, we aim to have the following in place:

Streamlined Sidebar

Add collapsable / expandable facet and browse options to reduce the number of menu links visible at any given time.

Citations

Present a copyable citation on the item page.


…And More!

  • Upgrade the XSLT processor from Xalan to Saxon, using XLST 3.0; this will enable us to accomplish more with less code going forward
  • Revise the Scholars@Duke profile lookup by using a different VIVO API
  • Create additional browse/facet options
  • Display aggregated stats in more places

We’re excited to get all of these changes in place soon. And we look forward to learning more from our users, our collaborators, and our peers in the DSpace community about what we can do next to improve upon the solid foundation we established during the project’s initial phases.

Shiny New Chrome!

Chrome bumper and grill

In 2008, Google released their free web browser, Chrome.  It’s improved speed and features led to quick adoption by users, and by the middle of 2012, Chrome had become the world’s most popular browser. Recent data puts it at over 55% market share [StatCounter].

As smartphones and tablets took off, Google decided to build an “operating system free” computer based around the Chrome browser – the first official Chromebook launched in mid-2011.  The idea was that since everyone is doing their work on the web anyway (assuming your work==Google Docs), then there wasn’t a need for most users to have a “full” operating system – especially since full operating systems require maintenance patches and security updates.  Their price-point didn’t hurt either – while some models now top-out over $1000, many Chromebooks come in under $300.Acer Chromebook

We purchased one of the cheaper models recently to do some testing and see if it might work for any DUL use-cases.  The specific model was an Acer Chromebook 14, priced at $250.  It has a 14” screen at full HD resolution, a metal body to protect against bumps and bruises, and it promises up to 12 hours of battery life.  Where we’d usually look at CPU and memory specs, these tend to be less important on a Chromebook — you’re basically just surfing the web, so you shouldn’t need a high-end (pricey) CPU nor a lot of memory.  At least that’s the theory.

But what can it do?

Basic websurfing, check!  Google Docs, check!  Mail.duke.edu for work-email, check!  Duke.box.com, check!  LibGuides, LibCal, Basecamp, Jira, Slack, Evernote … check!

LastPass even works to hold all the highly-complex, fully secure passwords that you use on all those sites (you do you complex passwords, don’t you?).

Not surprisingly, if you do a lot of your day-to-day work inside a browser, then a Chromebook can easily handle that.  For a lot of office workers, a Chromebook may very well get the job done – sitting in a meeting, typing notes into Evernote; checking email while you’re waiting for a meeting; popping into Slack to send someone a quick note.  All those work perfectly fine.

What about the non-web stuff I do?

Microsoft Word and Excel, well, kinda sorta.  You can upload them to Google Docs and then access them through the usual Google Docs web interface.  Of course, you can then share them as Google Docs with other people, but to get them back into “real” Microsoft Word requires an extra step.

Aleph, umm, no.  SAP for your budgets, umm, no. Those apps simply won’t run on the ChromeOS.  At least not directly.

Acer ChromebookBut just as many of you currently “remote” into your work computer from home, e.g., you _can_ use a Chromebook to “remote” into other machines, including “virtual” machines that we can set up to run standard Windows applications.  There’s an extra step or two in the process to reserve a remote system and connect to it.  But if you’re in a job where just a small amount of your work needs “real” Windows applications, there still might be some opportunity to leverage Chromebooks as a cheaper alternative to a laptop.

Final Thoughts:

I’m curious to see where (or not) Chromebooks might fit into the DUL technology landscape.  Their price is certainly budget-friendly, and since Google automatically updates and patches them, they could reduce IT staff effort.  But there are clearly issues we need to investigate.  Some of them seem solvable, at least technically.  But it’s not clear that the solution will be usable in day-to-day work.Google Chrome logo

If you’re interested in trying one out, please contact me!

 

The Backbone of the Library, the Library Catalog

Did you ever stop to think about how the materials you find in the Library’s catalog search get there?  Did you know the Duke Libraries have three staff members dedicated to making sure Duke’s library catalog is working so faculty and students can do their research? The library catalog is the backbone of the library and I hope by the end of this post you will have a new appreciation for some of the people who support this backbone of the library and what is involved to do that.

Diagram of functions of a library catalog
Functions of a library catalog

Discovery Services is charged with supporting the integrated library system (ILS), aka “the catalog”. What is an “integrated library system”?  According to Wikipedia, “an ILS (is used) to order and acquire, receive and invoice, catalog, circulate, track and shelve materials.” Our software is used by every staff person in all the Duke Libraries, including the professional school libraries, the Goodson Law Library, the Ford Library at the Fuqua School of Business, and the Medical Center Library, and the Duke Kunshan University Library. At Duke, we have been using Ex Libris’s Aleph as our ILS since 2004.

Discovery Services staff work with staff in Technical Services who do the acquiring, receiving and invoicing and cataloging of materials. Our support for that department includes setting up vendors who send orders and bibliographic records via the EDIFACT format or the MARC format. Some of our catalogers do original cataloging where they describe the book in the MARC format, and a great many of our records are copy cataloged from OCLC. Our ILS needs to be able to load these records, regardless of format, into our relational database.

We work with staff in Access and Delivery Services/Circulation in all the libraries to set up loan policies so that patrons may borrow the materials in our database. All loan policies are based on the patron type checking out the item, the library that owns the item and the item’s type. We currently have 59 item types for everything from books, to short-term loans, sound CD’s, and even 3D scanners! There are 37 patron types ranging from faculty, grad students, staff, undergrads, alumni and even retired library employees. And we support a total of 12 libraries. Combine all of those patrons, items and libraries, and there are a lot of rules! We edit policies for who may request an item and where they can choose to pick it up, when fines are applied and when overdue and lost notices are sent to patrons. We also load the current course lists and enrollment so students and faculty can use the materials in Course Reserves.

Diagram of ILS Connections
ILS Connections

The ILS is connected with other systems. There was a recent post here on Bitstreams about the work of the Discovery Strategy Team. Our ILS, Aleph, is represented in both the whiteboard photo and the Lucidchart photo. One example of an integration point the Library’s discovery interface. We also connect to the software that is used at the Library Service Center (GFA). When an item is requested from that location, the request is sent from the ILS to the software at the Library Service Center so they can pull and deliver the item. The ILS is also integrated with software outside of the library’s support including the Bursar’s Office, the University’s Identity Management system, and the University’s accounting system.

 

 

We also export our data for projects in which the library is involved, such as HathiTrust, Ivy Plus, TRLN Discovery (coming soon!), and SHARE-VDE. These shared collection projects often require extra work from Discovery Services to make sure the data the project wants is included in our export.

Discovery Services spent the fall semester working on upgrading Aleph. We worked with our OIT partners to create new virtual servers, install the Aleph software and upgrade our current data to the new version. There were many configuration changes, and we needed to test all of our custom programs to be sure they worked with the new version. We have been using the Aleph software for a more than a decade, and while we’ve upgraded the software over the years, libraries have continued to change.

folio logo

We are currently preparing a project to migrate to a new ILS and library services platform, FOLIO. That means moving our eight million bibliographic records and associated information, our two million patron records, hundreds of thousands orders, items, e-resources into the new data format FOLIO will require. We will build new servers, install the software, review and/or recreate all of our custom programs that we currently use. We will integrate FOLIO with all the applications the library uses, as well as applications across campus. It will be a multi-year project that will take thousands of hours of staff time to complete. The Discovery Services staff is involved in some of the FOLIO special interest groups working with people across the world who are working together to develop FOLIO.

We work hard to make it easy for our patrons to find library material, request it or borrow it. The next time you check out a book from the library, take a moment to think about all the work that was required behind the scenes to make that book available to you.

Living Our Best DSpace Lives

Last week, an indefatigable team at Duke University Libraries released an upgraded version of the DukeSpace platform, completing  the first phase of the critical project that I wrote about in this space in January.  One member of the team remarked that we now surely have “one of the best DSpaces in the world,” and I dare anyone to prove otherwise.

DukeSpace serves as the Libraries’ open-access institutional repository, which makes it a key aspect of our mission to “partner in research,” as outlined in our strategic plan.  As I wrote in January, the version of the DSpace platform that underlies the service had been stuck at 1.7, which was released during 2010 – the year the iPad came out, and Lady Gaga wore a meat dress. We upgraded to version 6.2, though the differences between the two versions are so great that it would be more accurate to call the project a migration.

That migration turned out to be one of the more complex technology projects we’ve undertaken over the years. The main complicating factor was the integration with Symplectic Elements, the Research Information Management System (RIMS) that powers the Scholars at Duke site. As far as we know, we are the first institution to integrate Elements with DSpace 6.2. It was a beast to do, and we are happy to share our knowledge gained if it will help any of our peers out there trying to do the same thing.

Meanwhile, feel free to click on over to and enjoy one of the best DSpaces in the world. And congratulations to one of the mightiest teams assembled since Spain won the World Cup!

Mapping Duke University Libraries’ Discovery System Environment

Just over one year ago, Duke University Library’s Web Experience team charged a new subgroup – the Discovery Strategy Team – with “providing cohesion for the Libraries’ discovery environment and facilitate discussion and activity across the units responsible for the various systems and policies that support discovery for DUL users”. Jacquie Samples, head of the Metadata and Discovery Strategy Department in our Technical Services Unit, and I teamed up to co-chair the group, and we were excited to take on this critical work along with 8 of our colleagues from across the libraries.

Our first task was one that had long been recognized as a need by many people throughout the library – to create an up-to-date visualization of the systems that underpin DUL’s discovery environment, including the data sources, data flows, connections, and technical/functional ownership for each of these systems. Our goal was not to depict an ideal discovery landscape but rather to depict things as they are now (ideal could come later).

Before we could create a visualization of these systems and how they interacted, however, we realized we needed to identify what they were! This part of the process involved creating a giant laundry list of all of systems in the form of a google spreadsheet, so we could work on it collaboratively and iteratively. This spreadsheet became the foundation of the document we eventually produced, containing contextual information about the systems including:

  • Name(s) of the system
  • Description/Notes
  • Host
  • Path
  • Links to documentation
  • Technical & functional owners

Once we had our list of systems to work from, we began the process of visualizing how they work here at DUL. Each meeting of the team involved doing a lot of drawing on the whiteboard as we hashed out how a given system works – how staff & other systems interact with it, whether processes are automated or not, frequency of those processes, among other attributes. At the end of these meetings we would have a messy whiteboard drawing like this one:

We were very lucky to have the talented (and patient!) developer and designer Michael Daul on the team for this project, and his role was to take our whiteboard drawings and turn them into beautiful, legible visualizations using Lucidchart:

Once we had created visualizations that represented all of the systems in our spreadsheet, and shared them with stakeholders for feedback, we (ahem, Michael) compiled them into an interactive PDF using Adobe InDesign. We originally had high hopes of creating a super cool interactive and zoomable website where you could move in and out to create dynamic views of the visualizations, but ultimately realized this wouldn’t be easily updatable or sustainable. So, PDF it is, which may not be the fanciest of vehicles but is certainly easily consumed.

We’ve titled our document “Networked Discovery Systems at DUL”, and it contains two main sections: the visualizations that graphically depict the systems, and documentation derived from the spreadsheet we created to provide more information and context for each system. Users can click from a high-level view of the discovery system universe to documentation pages, to granular views of particular ‘constellations’ of systems. Anyone interested in checking it out can download it from this link

We’ve identified a number of potential use cases for this documentation, and hope that others will surface:

  • New staff orientation
  • Systems transparency
  • Improved communication
  • Planning
  • Troubleshooting

We’re going to keep iterating and updating the PDF as our discovery environment shifts and changes, and hope that having this documentation will help us to identify areas for improvement and get us closer to achieving that ideal discovery environment.

Fun with Solr Queries

Apache Solr is behind many of our systems that provide a way to search and browse via a web application (such as the Duke Digital Repository, parts of our Bento search application, and the not yet public next generation TRLN Discovery catalog). It’s a tool for indexing data and provides a powerful query API. In this post I will document a few Solr querying techniques that might be interesting or useful. In some cases I won’t be able to provide live links to queries because we restrict direct access to Solr. However, many of these Solr querying techniques can be used directly in an application’s search box. In those cases, I will include live links to example queries in the Duke Digital Repository.

Find a list of items from their identifiers.

With this query you can specify exactly what items you want to appear in a search result from a list of identifiers.

Query

id:"duke:448098" OR id:"duke:282429" OR id:"duke:142581"
Try it in the Duke Digital Repository

Find all records that have a value (any value) in a specific field.

This query will find all the items in the repository that have a value in the product field. (As with most of these queries, you must know the field name in Solr.)

Query

product_tesim:*
Try it in the Duke Digital Repository

Find all the items in the repository that are missing a field value.

You can find all items in the repository that don’t have any date metadata. Inquiring minds want to know.

Query

-date_tesim:[* TO *]
Try it in the Duke Digital Repository

Find items using a begins-with (left-anchored) query.

I want to see all items that have a subject term that begins with “Soviet Union.” The example is a left-anchored query and will exactly match fields that begin with “Soviet Union.” (Note, the field must not be tokenized for this to work as expected.)

Query

subject_facet_sim:/Soviet Union.*/
Try it in the Duke Digital Repository

Find items with an ends-with (right-anchored) query.

Again, this will only work as expected with an untokenized field.

Query

subject_facet_sim:/.*20th century/
Try it in the Duke Digital Repository

Some of you might have noticed that these queries look a lot like regular expressions. And you’re right! Read more about Solr’s support for regular expression queries.

The following examples require direct access to Solr, which is restricted to authorized users and applications. Instead of providing live links, I’ll show the basic syntax, a complete example query using http://localhost:8983/solr/core/* as the sample URL for a Solr index, and a sample response from Solr.

Count instances of values in a field.

I want to know how many items in the repository have a workflow state of published and how many are unpublished. To do that I can write a facet query that will count instances of each value in the specified field. (This is another query that will only work as expected with an untokenized field.)

Query

http://localhost:8983/solr/core/select?q=*:*&facet=true&facet.field=workflow_state_ssi&facet.mincount=1&fl=id

Solr Response (truncated)


...
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="workflow_state_ssi">
<int name="published">484075</int>
<int name="unpublished">2228</int>
</lst>
</lst>
</lst>
...

Collapse multiple records into one result based on a shared field value.

This one is somewhat advanced and likely only useful in particular circumstance. But if you had multiple records that were slight variants of each other, and wanted to collapse each variant down to a single result you can do that with a collapse query — as long as the records you want to collapse share a value.

Query

http://localhost:8983/solr/core/select?q=*:*&fq={!collapse%20field=oclc_number%20nullPolicy=expand%20max=termfreq(institution_f,duke)}

  • !collapse instructs Solr to use the Collapsing Query Parser.
  • field=oclc_number instructs Solr to collapse records that share the same value in the oclc_number field.
  • nullPolicy=expand instructs Solr to return any document without a matching OCLC as part of the result set. If this is excluded then records that don’t share an oclc_number with another record will be excluded from the results.
  • max=termfreq(institution,duke) instructs Solr to select as the representative record when collapsing multiple records the one that has the value “duke” in institution field.

CSV response writer (or JSON, Ruby, etc.)

Solr has a number of tricks up its sleeve when it comes to returning results. By default it will return results as XML. You can also specify JSON, or Ruby. You specify a response writer by adding the wt parameter to the URL (wt=json or wt=ruby, etc.).

Solr will also return results as a CSV file, which can then be opened in an Excel spreadsheet — a useful feature for working with metadata.

Query

http://localhost:8983/solr/core/select?q=sun&wt=csv&fl=id,title_tesim

Solr Response

id,title_tesim
duke:194006,Sun Bowl...Sun City...
duke:194002,Sun Bowl...Sun City...
duke:194009,Sun Bowl...Sun City.
duke:194019,Sun Bowl...Sun City.
duke:194037,"Sun City\, Sun Bowl"
duke:194036,"Sun City\, Sun Bowl"
duke:194030,Sun City
duke:194073,Sun City
duke:335601,Sun Control
duke:355105,Proved! Fast starts at 30° below zero!

This is just a small sample of useful ways you can query Solr.

Adventures in 4K

When it comes to moving image digitization, Duke Libraries’ Digital Production Center primarily deals with obsolete videotape formats like U-matic, Betacam, VHS and DV, which are in standard-definition (SD). We typically don’t work with high-definition (HD) or ultra-high-definition (UHD) video because that is usually “born digital,” and doesn’t need any kind of conversion from analog, or real-time migration from magnetic tape. It’s already in the form of a digital file.

However, when I’m not at Duke, I do like to watch TV at home, in high-definition. This past Christmas, the television in my living room decided to kick the bucket, so I set out to get a new one. I went to my local Best Buy and a few other stores, to check out all the latest and greatest TVs. The first thing I noticed is that just about every TV on the market now features 4K ultra-high-definition (UHD), and many have high dynamic range (HDR).

Before we dive into 4K, some history is in order. Traditional, standard-definition televisions offered 480 lines of vertical resolution, with a 4:3 aspect ratio, meaning the height of the image display is 3/4 the dimension of the width. This is how television was broadcast for most of the 20th century. Full HD television, which gained popularity at the turn of the millennium, has 1080 pixels of vertical resolution (over twice as much as SD), and an aspect ratio of 16:9, which makes the height barely more than 1/2 the size of the width.

16:9 more closely resembles the proportions of a movie theater screen, and this change in TV specification helped to usher in the “home theater” era. Once 16:9 HD TVs became popular, the emergence of Blu-ray discs and players allowed consumers to rent or purchase movies, watch them in full HD and hear them in theater-like high fidelity, by adding 5.1 surround sound speakers and subwoofers. Those who could afford it started converting their basements and spare rooms into small movie theaters.

4K UHD has 3840 horizontal pixels and 2160 vertical pixels, twice as much resolution as HD, and almost five times more resolution than SD.

The next step in the television evolution was 4K ultra-high-definition (UHD) TVs, which have flooded big box stores in recent years. 4K UHD has an astounding resolution of 3840 horizontal pixels and 2160 vertical pixels, twice as much resolution as HD, and almost five times more resolution than SD. Gazing at the images on these 4K TVs in that Best Buy was pretty disorienting. The image is so sharp and finely-detailed, that it’s almost too much for your eyes and brain to process.

For example, looking at footage of a mountain range in 4K UHD feels like you’re seeing more detail than you would if you were actually looking at the same mountain range in person, with your naked eye. And high dynamic range (HDR) increases this effect, by offering a much larger palette of colors and more levels of subtle gradation from light to dark. The latter allows for more detail in the highlight and shadow areas of the image. The 4K experience is a textbook example of hyperreality, which is rapidly encroaching into every aspect of our modern lives, from entertainment to politics.

The next thing that dawned on me was: If I get a 4K TV, where am I going to get the 4K content? No television stations or cable channels are broadcasting in 4K and my old Blu-ray player doesn’t play 4K. Fortunately, all 4K TVs will also display 1080p HD content beautifully, so that warmed me up to the purchase. It mean’t I didn’t have to immediately replace my Blu-ray player, or just stare at a black screen night after night, waiting for my favorite TV stations to catch up with the new technology.

The salesperson that was helping me alerted me to the fact that Best Buy also sells 4K UHD Blu-ray discs and 4K-ready Blu-ray players, and that some content providers, like Netflix, are streaming many shows in 4K and in HDR, like “Stranger Things,” “Daredevil” and “The Punisher,” to name a few. So I went ahead with the purchase and brought home my new 4K TV. I also picked up a 4K-enabled Roku, which allows anyone with a fast internet connection and subscription to stream content from Netflix, Amazon and Hulu, as well as accessing most cable-TV channels via services like DirecTV Now, YouTube TV, Sling and Hulu.

I connected the new TV (a 55” Sony X800E) to my 4K Roku, ethernet, HD antenna and stereo system and sat down to watch. The 1080p broadcasts from the local HD stations looked and sounded great, and so did my favorite 1080p shows streaming from Netflix. I went with a larger TV than I had previously, so that was also a big improvement.

To get the true 4K HDR experience, I upgraded my Netflix account to the 4K-capable version, and started watching the new Marvel series, “The Punisher.” It didn’t look quite as razor sharp as the 4K images did in Best Buy, but that’s likely due to the fact that the 4K Netflix content is more compressed for streaming, whereas the TVs on the sales floor are playing 4K video in-house, that has very little, if any, compression.

As a test, I went back and forth between watching The Punisher in 4K UHD, and watching the same Punisher episodes in HD, using an additional, older Roku though a separate HDMI port. The 4K version did have a lot more detail than it’s HD counterpart, but it was also more grainy, with horizons of clear skies showing additional noise, as if the 4K technology is trying too hard to bring detail out of something that is inherently a flat plane of the same color.

Also, because of the high dynamic range, the image loses a bit of overall contrast when displaying so many subtle gradations between dark and light. 4K streaming also requires a fast internet connection and it downloads a lot of data, so if you want to go 4K, you may need to upgrade your ISP plan, and make sure there are no data caps. I have a 300 Mbps fiber connection, with ethernet cable routed to my TV, and that works perfectly when I’m streaming 4K content.

I have yet to buy a 4K Blu-ray player and try out a 4K Blu-ray disc, so I don’t know how that will look on my new TV, but from what I’ve read, it more fully takes advantage of the 4K data than streaming 4K does. One reason I’m reluctant to buy a 4K Blu-ray player gets back to content. Almost all the 4K Blu-ray discs for sale or rent now are recently-made Hollywood movies. If I’m going to buy a 4K Blu-ray player, I want to watch classics like 2001: A Space Odyssey,” The Godfather,” “Apocalypse Now” and Vertigo” in 4K, but those aren’t currently available because the studios have yet to release them in 4K. This requires going back to the original film stock and painstakingly digitizing and restoring them in 4K.

Some older films may not have enough inherent resolution to take full advantage of 4K, but it seems like films such as “2001: A Space Odyssey,” which was originally shot in 65 mm, would really be enhanced by a 4K restoration. Filmmakers and the entertainment industry are already experimenting with 8K and 16K technology, so I guess my 4K TV will be obsolete in a few years, and we’ll all be having seizures while watching TV, because our brains will no longer be able to handle the amount of data flooding our senses.

Prepare yourself for 8K and 16K video.

 

Interactive Transcripts have Arrived!

Interactive Transcripts have Arrived!

This week Duke Digital Collections added our first set of interactive transcripts to one of our newest digital collections: the Silent Vigil (1968) and Allen Building Takeover (1969) collection of audio recordings.   This marks an exciting milestone in the accessibility efforts Duke University Libraries has been engaged in for the past 2.5 years. Last October, my colleague Sean wrote about our new accessibility features and the technology powering them, and today I’m going to tell you a little more about why we started these efforts as well as share some examples.

Interactive Transcript in the Silent Vigil (1968) and Allen Building Takeover (1969) Audio Recordings

Providing access to captions and transcripts is not new for digital collections.  We have been able to provide access to pdf transcripts and caption both in digital collections and finding aids for years. See items from the Behind the Veil and Memory Project digital collections for examples.

In recent years however, we stepped our efforts in creating captions and transcripts. Our work began in response to a 2015 lawsuit brought against Harvard and MIT by the National Association of the Deaf. The lawsuit triggered many discussions in the library, and the Advisory Council for Digital Collections eventually decided that we would proactively create captions or transcripts for all new A/V digital collections assuming it is feasible and reasonable to do so.  The feasible and reasonable part of our policy is key.  The Radio Haiti collection for example is composed of thousands of recordings primarily in Haitian Creole and French.  The costs to transcribe that volume of material in non-English languages make it unreasonable (and not feasible) to transcribe. In addition to our work in the library, Duke has established campus wide web accessibility guidelines that includes captioning and  transcription.  Therefore our work in digital collections is only one aspect of campus wide accessibility efforts.

To create transcripts and captions, we have partnered with several vendors since 2015, and we have seen the costs for these services drop dramatically.  Our primary vendor right now is Rev, who also works with Duke’s Academic Media Services department.  Rev guarantees 99% accurate captions or transcripts for $1/minute.

Early on, Duke Digital Collections decided to center our captioning efforts around the WebVTT format, which is a time-coded text based file and a W3C standard.  We use it for both audio and video captions when possible, but we can also accommodate legacy transcript formats like pdfs.  Transcripts and captions can be easily replaced with new versions if and when edits need to be made.

Examples from the Silent Vigil (1968) and Allen Building Takeover (1969) Audio Recordings

When WebVTT captions are present, they load in the interface as an interactive transcript.  This transcript can be used for navigation purposes; click the text and the file moves to that portion of the recording.

Click the image above to see the full item and transcript.

In addition to providing access to transcripts on the screen, we offer downloadable versions of the WebVTT transcript as a text file, a pdf or in the original webVTT format.

An advantage of the WebVTT format is that it includes “v” tags, which can be used to note changes in speakers and one can even add names to the transcript.  This can require additional  manual work if the names of the speakers is not obvious to the vendor, but we are excited to have this opportunity.

As Sean described in his blog post, we can also provide access to legacy pdf documents.  They cannot be rendered into an interactive version, but they are still accessible for download.

On a related note, we also have a new feature that links time codes listed in the description metadata field of an item to the corresponding portion of the audio or video file.  This enables librarians to describe specific segments of audio and/or video items.  The Radio Haiti digital collection is the first to utilize this feature, but the feature will be a huge benefit to the H. Lee Waters and Chapel Recordings digital collections as well as many others.

Click the image above to interact with linked time codes.

As mentioned at the top of this post, the Duke Vigil and Allen Building Takeover collection includes our first batch of interactive transcripts.  We plan to launch more this Spring, so stay tuned!!

A New & Improved Rubenstein Library Website

We kicked off the spring 2018 semester by rolling out a brand-new design for the David M. Rubenstein Library website.  The new site features updated imagery from the collections, better navigation, and more prominent presence for the exhibits currently on display.

Rubenstein website, Jan 2018
January 2018: New design for the Rubenstein Library homepage.

 

Much credit goes to Katie Henningsen and Kate Collins who championed the project.

Objectives for the Redesign

  • Make wayfinding from the homepage clearer (by reorganizing links into a primary dropdown navigation)
  • Dynamically feature Rubenstein Library exhibits that are currently on display
  • Improve navigation to key Rubenstein site pages from within research center / collection pages
  • Display larger images illustrative of the library’s distinctive and diverse collections
  • Retain aspects of the homepage that have been effective, e.g., hours and resource search boxes
  • Improve the site aesthetic

Internal Navigation

With a new primary navigation in hand on the Rubenstein homepage that links to key pages in the site, we began to explore ways to get visitors to those links in an unobtrusive way when they aren’t on the homepage.  Each research center within the library, e.g., the John W. Hartman Center for Sales, Advertising & Marketing History, has its own sub-site with its own secondary menus, which already contend a bit with the blue Duke Libraries menu in the masthead.  To avoid burying visitors in a Russian nesting doll of navigation, we decided to try dropping the RL menu down from the breadcrumb trail link so it’s tucked away, but still accessible when needed. We’re eager to learn whether or not this is effective.

Rubenstein primary navigation menu available from breadcrumb trail.

A Look Back

Depending on how you count, this is now the seventh or eighth homepage design for the Rubenstein Library (formerly the Rare Book, Manuscript, and Special Collections Library; formerly the Special Collections Library). I thought I’d take a quick stroll down memory lane, courtesy of the Internet Archive’s Wayback Machine, to reflect on how far we have come over the years.

1996

Duke Special Collections Library website, 1996
October 1996 (explore)

Features:

  • prominent news, exhibits, and online collections
  • links to online SGML- and HTML-encoded finding aids (42 of them!)
  • a site search box powered by Excite!

1997

April 1997 (explore)

Features:

  • two-column layout with a left-hand nav
  • digitized collections
  • a special collections newsletter called The Broadside
  • became the “Rare Book, Manuscript, and Special Collections Library” in 1997

2005

December 2005 (explore Jan 2006)

Features:

  • color-coded navigation broken into three groups of links
  • image from the collections
  • featured exhibit with image
  • rounded corners and shadows
  • first use of a CMS (content management system named Cascade Server)*

2007

August 2007 (explore)

Features:

  • first time sharing a masthead with rest of the Duke University Libraries
  • retained the lists of links, single collection image, and featured exhibit from previous iteration

2011

September 2011 (explore)

Features:

  • renamed as the David M. Rubenstein Rare Book & Manuscript Library
  • first time with catalog and finding aids search boxes on the homepage
  • first appearance of social media & RSS icons
  • first iteration to display library hours
  • first news carousel appearance

2014

January 2014 (explore)

Features:

  • new site in Drupal content management system
  • first responsive RL website (works well on mobile devices)
  • array of vertical image panels from the collections
  • extended color palette to match Duke University website styles (at the time)
  • gradients and rounded buttons with shadows
  • first time able to search digital collections from RL homepage
  • first site with Login button for Aeon (Special Collections request system)

2017

January 2017 (explore)

Features:

2018

Rubenstein website, Jan 2018
January 2018

Features

  • lightened the overall aesthetic
  • featured image cycling from selections at random (diagonally sliced using css clip-path polygons)
  • prominent current exhibits feed with images
  • a primary nav with dropdown menus

How long will this latest edition of the Rubenstein Library homepage stick around? Only time will tell, but we’ll surely continue to iterate, learn from the past, and improve with each attempt. For now, we’re pleased with the new site, and hope you will be as well.


* Revised Feb 9, 2018 to reflect that the first version using a content management system was in 2005 rather than 2007.