All posts by Cory Lown

Nobody Wants a Slow Repository

As we’ve been adding features and refining the public interface to Duke’s Digital Repository, the application has become increasingly slow. Don’t worry, the very slowest versions were never deployed beyond our development servers. This blog post is about how I approached addressing the application’s performance problems before they made their way to our production site.

14729168562_ecc30e44d8_b
A modern web application, like the public interface to Duke’s Digital Repository, is a complex beast, relying on layers of software and services just to deliver a bunch of HTML, CSS, and JavaScript to your web browser. A page like this, the front page to the Alex Harris collection takes a lot to build — code to read configuration files, methods that assemble information needed to build the page, requests to Solr to find the images to display, requests to a separate administrative application service that provides contact information for the collection, another request to fetch related blog posts, and requests to our finding aid application to deliver information about the physical collection. All of these requests take time and all of them have to finish before anything gets delivered to your browser.

My main suspects for the slowness: HTTP requests to external services, such as the ones mentioned above; and repeated calls to slow methods in the application. But identifying precisely which HTTP requests are slow and what code needs to be optimized takes a bit of sleuthing.

The first thing I wanted to know was: how slow is this thing, really? Turns out it was getting getting really slow. Too slow. There’s old research (1960s old) about computer system performance and its impact on user perception and task performance that still applies today. This also old (1993 old) article from the Nielsen Norman Group summarizes the issue nicely.

To determine just how slow things were getting I used Chrome’s developer tools. The “Network” tab in Chrome’s developer tools is where the hard truth comes to light about just how bloated and slow your web application is. Or, as my high school teachers used to say when handing back test results: “read ’em and weep.”

network-panel-dev-tools

By using the Network tab in Browser Tools I was able to see that the browser was having to wait 15 or more seconds for anything to come back from the server. This is too slow.

The next thing I wanted to know was how many HTTP requests were being made to external services and which ones were being made repeatedly or were taking a long time. For this dose of reality I used the httplog gem, which logs useful information about every HTTP request, including how long the application has to wait for a response.

When added to the project’s Gemfile, httplog starts printing out useful information to the log about HTTP requests, such as this set of entries about the request to fetch finding aid information. I can see that the application is waiting over half a second to get a response back from the finding aid service:


D, [2016-08-06T12:51:09.531076 #2529] DEBUG -- : [httplog] Connecting: library.duke.edu:80
D, [2016-08-06T12:51:09.854003 #2529] DEBUG -- : [httplog] Sending: GET http://library.duke.edu:80/rubenstein/findingaids/harrisalex.xml
D, [2016-08-06T12:51:09.855387 #2529] DEBUG -- : [httplog] Data:
D, [2016-08-06T12:51:10.376456 #2529] DEBUG -- : [httplog] Status: 200
D, [2016-08-06T12:51:10.377061 #2529] DEBUG -- : [httplog] Benchmark: 0.520600972 seconds

As I expected, this request and many others were contributing significantly to the application’s slowness.

It was a bit harder to determine which parts of the code and which methods were also making the application slow. For this, I mainly used two approaches. The first was to look at the application logs which tracks how long different views take to assemble. This helped narrow down which parts of the code were especially slow (and also confirmed what I was seeing with httplog). For instance in the log I can see different partials that make up the whole page and how long each of them takes to assemble. From the log:


12:51:09 INFO: Rendered digital_collections/_home_featured_collections.html.erb (0.8ms)
12:51:09 INFO: Rendered digital_collections/_home_highlights.html.erb (1.3ms)
12:51:10 INFO: Rendered catalog/_show_finding_aid_full.html.erb (953.4ms)
12:51:11 INFO: Rendered catalog/_show_blog_post_feature.html.erb (0.9ms)
12:51:11 INFO: Rendered catalog/_show_blog_posts.html.erb (914.5ms)

(The finding aid and blog posts are slow due to the aforementioned HTTP requests.)

widget2

One particular area of concern was extremely slow searches. To identify the problem I turned to yet another tool. Rack-mini-profiler is a gem that when added to your project’s Gemfile adds an expandable tab on every page of the site. When you visit pages of the application in a browser it displays a detailed report of how long it takes to build each section of the page. This made it possible to narrow down areas of the application that were too slow.

search_results

What I found was that the thumbnail section of the page — which can appear up to twenty times or more on a search result page was very slow. And it wasn’t loading the images that was slow but running the code to select the correct thumbnail image took a long time to run. (Thumbnail selection is complicated in the repository because there are various types and sources for thumbnails.)

Having identified several contributors to the site’s poor performance (expensive thumbnail selection, and frequent and costly HTTP requests to various services) I could now work to address each of the issues.

I used three different approaches to improving the application’s performance: fragment caching, memoization, and code optimization.

Caching

finding_aid

I decided to use fragment caching to address the slow loading of finding aid information. The benefit of caching is that it’s really fast. Once Rails has the snippet of HTML cached (either in memory or on disk, depending on how it’s configured) it can use that fragment of cached markup, bypassing a lot of code and, in this case, that slow HTTP request. One downside to caching is that if something in the finding aid changes the application won’t reflect the change until the cache is cleared or expires (after 7 days in this case).


<% cache("finding_aid_brief_#{document.ead_id}", expires_in: 7.days) do %>
<%= source_collection({ :document => document, :placement => 'left' }) %>
<% end %>

Memoization

Memoization is similar to caching in that you’re storing information to be used repeatedly rather then recalculated every time. This can be a useful technique to use with expensive (slow) methods that get called frequently. The parent_collection_count method returns the total number of collections in a portal in the repository (such as the Digital Collections portal). This method is somewhat expensive because it first has to run a query to get information about all of the collections and then count them. Since this gets used more than once, I’m using Ruby’s conditional assignment operator (||=) to tell Ruby not to recalculate the value of @parent_collection_count every time the method is called. With memoization, if the value is already stored Ruby just reuses the previously calculated value. (There are some gotchas with this technique, but it’s very useful in the right circumstances.)


def parent_collections_count
@parent_collections_count ||= response(parent_collections_search).total
end

Code Optimization

One of the reasons thumbnails were slow to load in search results is that some items in the repository have hundreds of images. The method used to find the thumbnail path was loading image path information for all the item’s images rather than just the first one. To address this I wrote a new method that fetches just the item’s first image to use as the item’s thumbnail.

Combined, these changes made a significant improvement to the site’s performance. Overall application speed and performance will remain one of our priorities as we add features to the Duke Digital Repository.

Using Community-Built, Open-Source Software to Build a New Digital Collections Platform

The Library’s Digital Projects Services department has been working with Digital Repository Services on a software project that will eventually replace our existing Digital Collections platform. There will be future posts announcing the new way of discovering and accessing Duke’s Digital Collections, but I want to use this post to reflect on the tools and practices we’ve been using to build this new application.

There are a few important differences between this not yet released new application and our current system. One is that Digital Collections will be part of the library’s Digital Repository, which includes a much broader range of digital items and collections. The second is that since the repository is being developed using Project Hydra, we’re using a component of the Hydra stack, Project Blacklight, as the discovery and access layer for Digital Collections.

blacklight-logo-h200-transparent-black-text

The Blacklight Wiki explains that:

Blacklight is an open source, Ruby on Rails Engine that provides a basic discovery interface for searching an Apache Solr index, and provides search box, facet constraints, stable document urls, etc., all of which is customizable via Rails (templating) mechanisms.

The Blacklight Development Google Group has posts going back to 2009, and the GitHub repository has commits back to 2009 as well. So, the project’s been actively developed and used for a while. The Project Blacklight website maintains a list of different implementations of the software, where you can see the range of interfaces it has been used to develop.

One of the benefits of using a widely adopted open source platform is access to a community of developers who use the same software. I was able to solve many problems just by searching the Blacklight Development Google Group for answers. Blacklight made it easy to get a basic interface up and running quickly and provided a platform to add local customizations. Because the basics were already in place we were able to spend our time on more specialized features and local requirements. For example, specifying which search filters should appear for a collection and what metadata fields should be included in search were as easy as adding a few lines of configuration code to the application.

blacklight_config

date_slider

Even for some of the more specialized features, we’ve relied as much as possible on available add-ons and tools to add features to Blacklight. Because of this we’ve been able to add advanced features to the new application that did not require a large amount of development time. For example, we’re using the Blacklight Range Limit Ruby Gem to add a visual date picker with a histogram for searching the collections by year.

We also used the Blacklight Gallery Ruby Gem to add an option to view search results as a gallery with larger thumbnails.

gallery

Both of these features were relatively easy to implement because we were able to make use of plugins shared with the Blacklight community.

Another new (to us) tool we’re using is the IIPImage server for serving images to the application. Because the image server automatically creates and then returns the right size image based on parameters sent in a request, we don’t have to pre-generate thumbnails of various sizes to support different displays in the application. The image server can even crop images. Because the image server stores the images as Pyramid TIFFs, we’re able to provide very smooth and fast in-browser pan and zoom of images, which works similarly to Google maps. To get a better idea of what this means for exploring high resolution images in your browser, you can explore some of the examples on the IIPImage site.

To manage this project we’ve been following Agile project management techniques, which for us meant taking an iterative approach to designing and building features of the application in two week sprints. At the beginning of each sprint we decide what we’re going to work on from a backlog of user stories, and our goal by the end of the two weeks is to have a version of the code that is working and deployed with these features implemented. Each day we have a 15-minute stand-up meeting during which each person reviews what they worked on yesterday, explains what they’re going to work on today, and then notes anything that’s blocking their progress. These quick, daily meetings have helped keep the project moving by increasing communication and helping to focus our work.

We’re still putting some pieces in place, so our new platform for publishing Digital Collections isn’t available yet, but look for it soon along with more information about the project and its first published collection.

Digital Dogs

In a recent feature on their blog, our colleagues at NCSU Libraries posted some photographs of dogs from their collections. Being a person generally interested in dogs and old photographs, I became curious where dogs show up in Duke’s Digital Collections. Using very unsophisticated methods, I searched digital collections for “dogs” and thought I’d share what I found.

Of the 60 or so collections in Digital Collections 19 contain references to dogs. The table below lists the collections in which dogs or references to dogs appear most frequently.

Digital Collection Number of Items Referring to Dogs
Outdoor Advertising Association of America (OAAA) Archives, 1885-1990s 91
Historic American Sheet Music 40
William Gedney Photographs and Writings 39
R.C. Maxwell Company Outdoor Advertising 27
OAAA Slide Library 24
Sidney D. Gamble Photographs 12
Emergence of Advertising in America 11
Hugh Mangum Photographs 10
Musée des Horreurs 9
AdViews 7
Ad*Access 5
John Paver Papers 5
Documentary Photographs of Early Soviet Russia 4
Broadsides & Ephemera 3
American Song Sheets 3
Michael Francis Blake Photographs 2
Italian Posters 2
Paul Kwilecki Photographs 2
Medicine and Madison Avenue 1

As you might guess, not all the results for my search were actually photographs of dogs. Many from the advertising collections were either advertisements for dog food or hot dogs. There were quite a few ads and other materials where the word “dog” was used idiomatically. The most surprising finding to me was number of songs that are about or reference dogs. These include, “Old Dog Tray” and “The Whistler and His Dog” from Historic American Sheet Music, as well as “A Song for Dogs” and “Bull Dog an’ de Baby” from American Song Sheets.

Here’s a sampling of some photographs of dogs from Digital Collections, and a few cats as well.

Kwilecki

Hugh Mangum Photographs


Sidney D. Gamble Photographs



William Gedney Photographs and Writings







Documentary Photographs of Early Soviet Russia

R.C. Maxwell Company Outdoor Advertising


OAAA Archives

Michael Francis Blake Photographs


Advertising Culture

IMG_1367When I was a kid, one of my favorite things to do while visiting my grandparents was browsing through their collections of old National Geographic and Smithsonian magazines. I was more interested in the advertisements than the content of the articles. Most of the magazines were dated from the 1950s through the 1980s and they provided me with a glimpse into the world of my parents and grandparents from a time in the twentieth century I had missed.

IMG_1369I also had a fairly obsessive interest in air-cooled Volkswagen Beetles, which had ceased being sold in the US shortly before I was born. They were still a common sight in the 1980s and something about their odd shape and the distinct beat of their air-cooled boxer engine captured my young imagination. I was therefore delighted when an older cousin who had studied graphic design gave to me a collection of several hundred Volkswagen print advertisements that he had clipped from 1960s era Life magazines for a class project. Hinting at my future profession, I placed each sheet in a protective plastic sleeve, gave each one an accession number, and catalogued them in a spreadsheet.

I think that part of the reason I find old advertisements so interesting is what they can reveal about our cultural past. Because advertisements are designed specifically to sell things, they can reveal the collective desires, values, fears, and anxieties of a culture.

All of this to say, I love browsing the advertising collections at Duke Libraries. I’m especially fond of the outdoor advertising collections: the OAAA Archives, the OAAA Slide Library, and the John E. Brennan Outdoor Advertising Survey Reports. Because most of the items in these collections are photographs or slides of billboards they often capture candid street scenes, providing even more of a sense of the time and place where the advertisements were displayed.

I’ve picked out a few to share that I found interesting or funny for one reason or another. Some of the ads I’ve picked use language that sounds dated now, or display ideas or values that are out-moded. Others just show how things have changed. A few happen to have an old VW in them.

Small Problems, Little Solutions

I have been thinking lately about tools that make tasks I repeat frequently more efficient. For example, I’m an occasional do-it-yourself home repairer and have an old handsaw that works just fine for cutting a few pieces of wood for small repairs. It’s easy to understand how to use the saw, takes very little planning, and takes just a bit of manual effort.

P1040015

Last summer, however, I faced a larger task of rebuilding a whole section of my deck that had rotted. I began using the handsaw to cut the wood I would need for the repair and quickly realized my usual method was going take a long time and make me very sore and unhappy. I needed a better tool and method. This better tool was an electric circular saw, which is more expensive, harder to understand how to use, and more dangerous than the handsaw, but much more efficient. Since I have a healthy fear of death and dismemberment, I also took some time to learn how to use the dreadful thing in a safe manner. It took an initial investment in time and effort, but with the electric saw I was able to make much faster and less painful progress repairing the deck.

I encounter similar kinds of problems when writing software and making things for the web. It’s perfectly possible to do these things using a basic text editor to write everything out by hand. I got along fine this way for a long time. But there are many ways to make this work more efficient. The rest of this post is mainly a list of techniques and tools I’ve invested time and energy to learn to use to reduce annoying, repetitive tasks.

My favorite time and effort saver is learning how to execute common tasks in a text editor using keyboard shortcuts. Here are a few examples of shortcuts I use many times a day in my favorite editor, Sublime Text 2. The ones I used the most involve moving the cursor or text around without touching the mouse. (These are specific to Macintosh computers, but there are similar shortcuts available in other operating systems.)

  • Hold down the Option key and use the left and right arrow keys to move the cursor a word at a time instead of a space at a time.
  • Hold down the Command key and use the left or right arrow to move to the beginning or end of a line. The up or down arrow will take you to the top or end of the document.
  • Add the shift key to the above shortcuts to select text as the cursor moves.
  • The delete key will also work with these shortcuts.
  • Indent a line of text or a whole block of text using the Command key and the left and right brackets.

There are also more advanced text editor features or plugins that make coding easier by reducing the amount you have to type by hand.

Emmet is a utility that does a few things, but it mainly lets you use abbreviated CSS syntax to generate full HTML markup. For example I can type div.special and when I hit the tab key Emmet automatically turns that into:
Screen Shot 2015-03-13 at 5.17.21 PM
You can string these together to generate multi line nested HTML markup from a single string.

SublimeCodeIntel is another plugin for the text editor I use. It adds an intelligent auto-suggest menu that updates as you type with things that are specific to the programming language you’re working in and the specific program. For example in PHP it if I type “e” it will suggest “echo” and I can hit enter to use that suggestion. It also remembers things like the variable and class names in the project you’re working on. It even seems to learn what terms you use most frequently and suggests those first. It saves much typing.

There are also a couple of utilities I run in a terminal window while I’m working to automate different tasks. Many of these are powered by Guard, which is a Rails Gem that watches for changes to files. This is more useful than it might sound. For example, Guard can run LiveReload. When Guard notices a file has changed that you told it to watch it triggers LiveReload that then refreshes your browser window. With this tool I can make small changes to a project and see the updates in realtime in my browser without having the refresh the page manually. There are also Guard utilities for running tests, compressing JavaScript, and generating browser friendly CSS from easier to write and maintain (coder friendly) SCSS.

687474703a2f2f636c2e6c792f696d6167652f316b336f3172325a3361304a2f67756172642d49636f6e2e706e67

These are just a few of the ways I try to streamline repetitive tasks.

Vagrant Up

Ruby on Rails logo

Writing software for the web has come a long way in the past decade. Tools such as Ruby on Rails, Twitter Bootstrap, and jQuery provide standard methods for solving common problems and providing common features of web applications. These tools help developers use more of their time and energy to solve problems unique to the application they are building.

One of the costs of relying on libraries and frameworks to build software is that your project not only depends on the frameworks and libraries you’ve chosen to use, but these frameworks and libraries rely on still more components. Compounding this problem, software is never really finished. There are always bugs being fixed and features being changed and added. These attributes of software, that it changes and has dependencies, complicates software projects in a few different ways:

  • You must carefully manage the versions of libraries, frameworks, and other dependencies so that you ensure all the pieces work together.
  • Developers working on multiple projects may have to use different versions of the same libraries and packages for each of their projects.
  • If you’re working with a team of developers on a project all members of the team have to make sure their computers are set up with the correct versions of the software and dependencies of the project.

Thankfully, there are still more tools available to help manage these problems. For instance, Ruby Version Manager (RVM) is a popular tools used by Ruby on Rails developers. It lets the software developer install and switch between different versions of Ruby. Other tools, such as Bundler make it possible to define exactly what version of which Ruby Gems (Ruby software packages that add functionality to your own project) you need to install for a particular project. Combined, RVM and Bundler simplify the management of complex project dependencies. There are similar tools available for other programming languages, such as Composer, which is a dependency manager for PHP.

By Fco.plj (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

While many of us already use dependency managers in our work, one tool we haven’t been using that we’re evaluating for use on a new project is Vagrant. Vagrant is a tool for creating virtual machines, self-contained systems that run within a host operating system. Virtual machines are software implementations of a computer system. For instance, using a virtual machine I could run Windows on my Mac hardware.

Vagrant does a few things that may make it even easier for developers to manage project dependencies.

  • With Vagrant you can write a script that contains a set of instructions about what operating system and other software you want to install in a virtual machine. Creating a virtual machine with all the software you need for a given project is as then as simple as typing a single command.
  • Vagrant provides a shared directory between your host operating system and the virtual machine. You can use the operating system you use everyday as you work while the software project runs in a virtual machine. This is significant because it means each developer can continue to use the operating system and tools they prefer while the software they’re building is all running in copies of the exact same system.
  • You can add the script for creating the virtual machine to the project itself making it very easy for new developers to get the project running. They don’t have to go through the sometimes painful process of installing a project’s dependencies by hand because the Vagrant script does it for them.
  • A developer working on multiple projects can have a virtual machine set up for each of their projects so they never interfere with each other and each has the correct dependencies installed.

Here’s how to use Vagrant in the most minimal way:

  1. Download and install VirtualBox
  2. Download and install Vagrant
  3. In a terminal window type:
    vagrant init hashicorp/precise32
  4. After running the following command you will have downloaded, set up, and started a fully functional virtual machine running Ubuntu:
    vagrant up
  5. You can then connect to and start using the running virtual machine by connecting to it via SSH:
    vagrant ssh

"Vagrantup" by Fco.plj - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Vagrantup.jpg#mediaviewer/File:Vagrantup.jpg

In a more complex setup you’d probably add a provisioning script with instructions for downloading and installing additional software as part of the “vagrant up” process. See the Vagrant documentation for more details about provisioning options.

We’re considering using Vagrant on an upcoming project in an effort to make it easier for all the developers on the project to set up and maintain a working development environment. With Vagrant, just one developer will need to spend the time to create the script that generates the virtual machine for the project. This should save the time of other developers on the project who should only have to install VirtualBox, copy the Vagrant file and type “vagrant up.” At least, that’s the idea. Vagrant has great documentation, so if you’re interested in learning more their website is a good place to start.