We’re taking a user-centered approach in planning the new Digital Collections web interface to ensure that our new design meets the needs and expectations of the people who use it. One way to discover those needs is to analyze our web traffic in an attempt to decipher user intent when searching and browsing materials in our site. Valuable patterns exist in this data that can help us optimize the site’s utility and performance by supporting actual user information-seeking behaviors. Lou Rosenfeld recently wrote a terrific blog post about this “bottom-up analysis” on A List Apart.
Using aggregated data from Google Analytics, we studied searches performed in our site from the period between May 1st and November 1st this year. We found that Duke Digital Collections was searched approximately 131,000 times during this six month period; that’s an average of 717 searches per day. The average user spent about three minutes on the site after entering his or her search query and viewed nearly four pages. Visitors also adjusted their searches with keyword refinement 26% of the time.
Only three percent of these unique searches were entered from the homepage. Eight percent— a whopping 11,121 unique searches—were entered directly from the Ad*Access portal page, while other popular start pages included the Historic American Sheet Music collection (5%), the Duke Libraries homepage (5%), and the Emergence of Advertising in America collection (3%). Search engines and referring sites are responsible for the majority (81.8%) of DDC’s traffic, helping to explain this phenomenon. Links from search engine results and links from social media services like StumbleUpon (19.2% of all referrals), Digg (1.9%), Facebook (1.3%), and Twitter (0.9%) often lead users directly to item pages, bypassing our portals or homepages entirely.
Over 62,000 distinct searches were conducted, though we’ll focus on the top 500. The most frequent search, “beauty,” was entered 643 times; by comparison, the #500 search, “clean,” was entered 24 times. The bulk of keyword searches in our system (421 of the top 500, 84.2%) were entered in the form of single term queries. These queries were largely topical and exploratory in nature, allowing the user to browse through various results. In other cases, entire phrases or names of persons were entered into the system when a user had a more specific subject in mind (this was especially true for searches conducted in Historic American Sheet Music, where users often looked for specific titles of scores or names of composers).
Many searches (37 of the top 500), were for years, whether for a specific year or an entire decade. Top examples include: 1920 (312), 1950 (147), 1920s (138), 1911 (99), 1850 (88), 1930 (88), 1920’s (87).
Other users search for items of a particular format (18 of the top 500 queries). Examples: music (187), advertising (127), book (76), poster (74), ad (75), cookbooks (68), sheet music (65), advertisements (58).
Some users search for entire collections by name–e.g. gamble (68), Gamble (50), adviews (44)–though the system currently doesn’t adequately support that function, since it only indexes and returns matching items.
And finally, some users appear to want a way to see everything that can possibly come back in search results, using queries like “a” (71), “all” (41), and the “*” wild card (32); our system does not currently perform this kind of retrieval.
Using the Many Eyes toolkit we have created three data visualizations of the most frequent search queries in two of our most popular collections, Ad*Access and Historic American Sheet Music, and one for the website at large. Various search terms can easily be parsed at a glance, allowing one to see their frequency and trends.
A couple of caveats about the data that will help to qualify and clarify:
- Searches performed within the AdViews collection—launched July 21st—are mostly not reflected in this data. Up until Oct 21st, the AdViews site used a different method of searching (relying on MIT’s SIMILE EXHIBIT code that searched on-the-fly after each keystroke without generating a new URL).
- Links to search results (canned searches) count as searches. Going to page 2 of search results counts as doing a new search, too, as does toggling list/grid view.
Top 500 Duke Digital Collections Search Terms
Top 100 Historic American Sheet Music Search Terms
Top 50 Ad*Access Search Terms