Tag Archives: internetarchive

Capturing the Duke Web

Post contributed by Matthew Farrell, Digital Records Archivist.

I can claim without controversy that the web is among the more popular avenues for communicating, publishing, and otherwise interacting with information. Although professionals involved in the creation of websites often have titles (engineer, web designer, information architect) that borrow the language of corollaries in the physical world, information on the web and how one experiences it is inherently ephemeral. Relics of the early web still extant online often owe their continued life to chance, such as the website for the 1996 film Space Jam or the long-thought-lost-until-a-copy-was-discovered-on-a-floppy-disk first website.

In order to preserve Duke’s web presence, in 2010 the University Archives partnered with Archive-It, a service of the Internet Archive, to take snapshots of various websites. In the five years since we have captured close to 500 Duke-related websites. Comparing a site’s evolution over time can be striking. This portal allows one to compare Duke homepages at different times. For example:

Duke University homepage, 2010
Duke University homepage, 2010

 

Duke University homepage, 2015
Duke University homepage, 2015

 

The following screencaps are for the Duke Chapel’s website.

Duke University Chapel homepage, 2010
Duke University Chapel homepage, 2010

 

Duke University Chapel homepage, 2015
Duke University Chapel homepage, 2015

 

While the above examples are changes that are, at least in part, cosmetic changes to information, capturing web content allows us to preserve and provide access to the social and intellectual conversations on campus. We have had success capturing Develle Dish in both DukeGroups and their more recent Sites.Duke iteration.

Because the Duke Fact Checker was not officially associated with the university, his blog went down after his passing in early 2014. Though its no longer available at its original URL, we were able to get annual captures of his commentary between 2012 and 2014.

All of this is great but was previously difficult to access without knowing how to use the system. As of February 2015, there are two easy ways to browse and search through the Duke Web Archives. First, the University Archives created a collection guide to the Duke-related websites. The 500 or so URLs are arranged loosely by organizational type and can be browsed here.

Because of the way the web is crawled, some sites may have been crawled that don’t appear in the collection guide. To help address this problem as well as provide another avenue into the collection, there is a search function provided by Archive-It and their Wayback Machine here. Using the Wayback search, one can search for any URL. If the site appears in our collection, even if only partially, the search will return it.

We are currently at work to address Social Media, so look for future posts around that subject.

Post contributed by Matthew Farrell, Digital Records Archivist.

5,000 Digital Books and Counting

The Internet Archive just reached an important milestone by digitizing 5,000 books at Duke. The 5,000th book, The British Album: In Two Volumes, contains poetry by “Della Crusca, Anna Matilda, Arley, Benedict, The Bard” and other writers on themes including love, horror, jealousy, and death, and is part of the general collections of the Rubenstein Library. The “Ode to Death” begins “THOU, whose remorseless rage, Nor vows, nor tears assuage, TRIUMPHANT DEATH!—to thee I raise, The bursting notes of dauntless praise!” The second volume can be found here.

The Scribe Scanner
The Scribe Scanner. Photo by Rita Johnston.

The Internet Archive scanning center at Duke University has been in operation for one and a half years and has digitized materials from collections within the Rubenstein Library, including the University Archives, Utopian Literature, and Confederate Imprints. I scan about 450 pages per hour and around 50 books a week. Most books in the public domain under 11 x 13 inches in size can be digitized on the Scribe book scanner, as well as pamphlets and loose documents.

Books digitized through Internet Archive are usually available on the site by the next day, are full-text searchable, and can be read in a web browser or downloaded to a computer; e-book reader; or mobile device. You can find newly digitized Duke materials by clicking on the RSS feed link at the bottom right on this blog or by visiting the Duke University Libraries Internet Archive page. Patrons can request a book to be digitized by the Internet Archive by contacting Rubenstein Library staff.

Post contributed by Rita Johnston, Scribe scanner operator.