Our Digital Collections program aspires to build “distinctive digital collections that provide access to Duke’s unique library and archival materials for teaching, learning, and research at Duke and worldwide.” Those are our primary stated objectives, though the reach and the value of putting collections online extends far beyond. For instance, these uses might not qualify as scholarly, but we celebrate them all the same:
- Heartwarming stories of people experiencing their lost loved ones after discovering a recorded oral history or a decades-old film.
- Engineers using a hundred-year old photograph to help create a 3-D augmented-reality digital reconstruction of a historical site in China
- Artists assembling collages of images, sometimes mashing up images from our music and photograph collections
- People organizing, annotating, and sharing their image discoveries on Pinterest.
Regardless of how much value we assign to different kinds of uses, determining the impact of our work is a hard problem to solve. There are no simple instruments to measure our outcomes, and the measurements we do take can at times feel uncertain, as if taken of a moving object with a wildly elastic ruler. Some helpful resources are out there, of both theoretical and practical varieties, but focusing on what matters most remains a challenge.
Back to our mission: how much are our collections actually used for the scholarly purposes we trumpet–teaching, learning, and research–versus other more casual uses? How do we distinguish these uses within the data we collect? Getting clearer answers could help us in several areas. First, what should we even digitize? What compelling stories of user engagement could be told to illustrate the value of the collections? How might we drum up more interest in the collections within scholarly communities?
Some of my Duke colleagues and I began exploring these questions this year in depth. We’ll have much more to report later, but already our work has uncovered some bits of interest to share. And, of course, we’ve unearthed more questions than answers.
Like many places, we use a service called Google Analytics to track how much our collections are accessed. We use analytics to understand what kinds of things that we digitize resonate with users online, and to to help us make informed improvements to the website. Google doesn’t track any personally identifiable data (thankfully); data is aggregated to a degree where privacy is protected yet site owners can still see generally where their traffic comes from.
For example, we know that on average1, our site visitors view just over 5 pages/visit, and stay for about 3.5 minutes. 60.3% of visitors bounce (that is, leave after seeing only one page). Mobile devices account for 20.1% of traffic. Over 26% of visits come from outside the U.S. The most common way a visit originates is via search engine (37.5%), and social media traffic—especially from Facebook—is quite significant (15.7% of visits). The data is vast; the opportunities for slicing and dicing it seem infinite. And we’ll forever grapple with how best to track, interpret, report, and respond to the things that are most meaningful to us.
There are two bits of Analytics data that can provide us with clues about our collections’ use in scholarly environments:
- Traffic on scholarly networks (a filtered view of ISPs)
- Referrals from scholarly pages (a filtered view of Referrer paths)
Tracking these figures (however imperfect) could help us get a better sense for the trends in the tenor of our audience, and help us set goals for any outreach efforts we undertake.
Traffic on Scholarly Networks
One key clue for scholarly use is the name of visitors’ Internet Service Provider (ISP). For example, a visit from somewhere on Duke’s campus has an ISP “duke university,” a NYC public school “new york city public schools,” and McGill University (in Canada) “mcgill university.” Of course, plenty of scholarly work gets done off-campus (where an ISP is likely Time Warner, Verizon, AT&T, etc.), and not all network traffic that happens on a campus is actually for scholarly purposes. So there are the usual caveats about signal and noise within the data.
Alas, we know that over the past calendar year1, we had:
- 11.7% of our visits (“sessions”) from visitors on a scholarly network (as defined in our filters by: ISP name has universit*, college*, or school* in it)2.
- 74,724 visits via scholarly networks
- 4,121 unique scholarly network ISPs
Referrals from Course Websites or Online Syllabi on .Edu Sites
Are our collections used for teaching and learning? How much can we tell simply through web analytics?
A referral happens when someone gets to our site by following a link from another site. In our data, we can see the full web address of any referring pages. But can we infer from a site URL whether a site was a course website or an online syllabus–pages that’d link to our site for the express purpose of teaching? We can try.
In the past year, referrals filtered by an expression3 to isolate course sites and syllabi on .Edu sites
- 0.18% of total visits
- 1,167 visits
- 68 unique sites (domains)
Or, if we remove the .Edu restriction2
- 1.21% of total visits
- 7,718 visits
- 221 unique sites (domains)
It’s hard to confidently assert that this data is accurate, and indeed many of the pages can’t be verified because they’re only accessible to the students in those classes. But regardless, a look at the data through this lens does occasionally help discover real uses for actual courses and/or generate leads for contacting instructors about the ways they’ve used the collections in their curriculum.
We know web analytics are just a single tool in a giant toolbox for determining how much our collections are contributing to teaching, learning, and research. One technique we’ve tried is using Google Scholar to track citations of collections, then logged and tagged those citations using Delicious. For instance, here are 70 scholarly citations for our Ad*Access collection. Among the citations are 30 articles, 19 books, and 10 theses. 26 sources cited something from the collection as a primary source. This technique is powerful and illuminates some interesting uses. But it unfortunately takes a lot of time to do well.
We’ve also recently launched a survey on our website that gathers some basic information from visitors about how they’re using the collections. And we have done some outreach with instructors at Duke and beyond. Stay tuned for much more as we explore the data. In the meantime, we would love to hear from others in the field how you approach answering these very same questions.
- Data from July 1, 2014 – June 26, 2015.
- We had first looked at isolating scholarly networks by narrowing to ISP network domains ending in “.edu” but upon digging further, there are two reasons why the ISP name provides better data. 1) .EDUs are only granted to accredited postsecondary institutions in the U.S., so visits from international universities or middle/high schools wouldn’t count. 2) A full 24% of all our visits have unknowable ISP network domains: “(not set)” or “unknown.unknown,” whereas only 6.3% of visits have unknown ISP names.
- Full referrer path: blackboard|sakai|moodle|webct|schoology|^bb|learn|course|isites|syllabus|classroom|^class.|/class/|^classes.|/~CLASS/