Tag Archives: internetarchive

Radio Haiti on YouTube? An Archive in the World

Post contributed by Craig Breaden, Audiovisual Archivist

Radio Haiti on YouTube? Now there’s an idea…. When the Radio Haiti team at the Rubenstein Library embarked on a pilot project to see how the collection would perform on YouTube and the Internet Archive, we imagined it would be a fairly straightforward process, and that it was a natural fit.  The idea for the pilot, funded as part of an NEH grant, came from discussions around how to effectively re-broadcast the archive.  “Take the archive to its listeners,” was a rallying cry, “to Haitians in Haiti!”  This approach captured the spirit of Radio Haiti, whose tireless advocacy for democracy in Haiti was brought to a halt only by assassinations and death threats carried out under an umbrella of impunity.  With our pilot now complete, we are left with some expectations unfulfilled, some questions still unresolved.  But even so, we learned a lot about the process, while enjoying one unqualified success.

If research libraries are square pegs, YouTube is the round hole.  Librarians and archivists love metadata, YouTube loves “views.”  Researchers and users love a good search tool, YouTube loves to put your eyes on ads.  The differences between the missions of an ad-supported social media platform and a dot-EDU library have the potential to obscure the common goal of content delivery.  We knew using YouTube, if not exactly a deal with a devil, demanded compromise and creative thinking.  The first challenge was finding workflows that we could apply to the entire archive, including batch conversion of audio to video and bulk uploading of content and metadata.  It was with the metadata where we started running into trouble.  With paltry character limits on titles, descriptions, and keywords, YouTube left us scratching our head (when video is clearly the data hog, how does text get such short shrift?) and scrambling for a solution to provide adequate description for the recordings.  The situation seemed especially acute because our Radio Haiti metadata is trilingual (English, Haitian Creole, French), and takes a lot of text space to accommodate our anticipated user populations.  Ultimately we built in a default: every description that exceeded the 5000-character limit had an ellipsis added to the end along with a link to the Duke Digital Repository (DDR) page for that recording, so that, on YouTube, we still depended on the Library resource for full description.

View the YouTube pilot here: https://www.youtube.com/channel/UCLUqSmRQNALyrAMYxV44JOQ/videos

The Internet Archive, as its name might suggest, was far more accommodating, offering robust metadata fields without the ads or YouTube’s relentless “Up Next” pushiness.  It has the spirit and ethic of our great public libraries, with a dedication to the public weal.  Radio Haiti would be far from its first radio archive, and its mission, like any real archive’s, is long-term preservation.  There were only two downsides to the Internet Archive platform, and the first one it shared with YouTube:  There was no way to group related recordings (for example, multipart programs) via a relator metadata field in the upload spreadsheet.  That work would have to be done “manually,” in the description field, which might not be a big deal if there were 100 or so recordings, but the Radio Haiti Archive has 5,308 audio files.  Needless to say, the relationships between files that our DDR could make would not be replicated on these platforms.  The second, more obvious downside, is that for all its virtues the Internet Archive just doesn’t have the audiences that YouTube, media titan, boasts.

View the Internet Archive Pilot here: https://archive.org/details/radiohaiti

And that one unqualified, and unexpected, success? Our team of developers, driven by this pilot project to compress the digital footprint of Duke Digital Repository pages, thus decreasing load times in areas with limited digital infrastructure, made successful modifications repository-wide to the DDR. Data transfer required for a first-time visit was cut to as much as one sixth of the original size, meaning users’ browsers could render the site much faster and, in Haiti, where mobile data transfer is limited by plans that are typically purchased daily, more cheaply. So, while allowing faster load times in Haiti for our re-broadcasting of the Radio Haiti Archive, they also made the DDR as a whole more efficient.  For me, this is a great example of a specific need driving innovation. The Radio Haiti project improved the delivery of Duke University Libraries’ digital resources while also providing the opportunity for our team to see both the trees and the forest in our work.

The processing of the Radio Haiti Archive and the Radio Haiti Archive digital collection were made possible through grants from the National Endowment for the Humanities.

Radio Haiti and NEH logos

Capturing the Duke Web

Post contributed by Matthew Farrell, Digital Records Archivist.

I can claim without controversy that the web is among the more popular avenues for communicating, publishing, and otherwise interacting with information. Although professionals involved in the creation of websites often have titles (engineer, web designer, information architect) that borrow the language of corollaries in the physical world, information on the web and how one experiences it is inherently ephemeral. Relics of the early web still extant online often owe their continued life to chance, such as the website for the 1996 film Space Jam or the long-thought-lost-until-a-copy-was-discovered-on-a-floppy-disk first website.

In order to preserve Duke’s web presence, in 2010 the University Archives partnered with Archive-It, a service of the Internet Archive, to take snapshots of various websites. In the five years since we have captured close to 500 Duke-related websites. Comparing a site’s evolution over time can be striking. This portal allows one to compare Duke homepages at different times. For example:

Duke University homepage, 2010
Duke University homepage, 2010

 

Duke University homepage, 2015
Duke University homepage, 2015

 

The following screencaps are for the Duke Chapel’s website.

Duke University Chapel homepage, 2010
Duke University Chapel homepage, 2010

 

Duke University Chapel homepage, 2015
Duke University Chapel homepage, 2015

 

While the above examples are changes that are, at least in part, cosmetic changes to information, capturing web content allows us to preserve and provide access to the social and intellectual conversations on campus. We have had success capturing Develle Dish in both DukeGroups and their more recent Sites.Duke iteration.

Because the Duke Fact Checker was not officially associated with the university, his blog went down after his passing in early 2014. Though its no longer available at its original URL, we were able to get annual captures of his commentary between 2012 and 2014.

All of this is great but was previously difficult to access without knowing how to use the system. As of February 2015, there are two easy ways to browse and search through the Duke Web Archives. First, the University Archives created a collection guide to the Duke-related websites. The 500 or so URLs are arranged loosely by organizational type and can be browsed here.

Because of the way the web is crawled, some sites may have been crawled that don’t appear in the collection guide. To help address this problem as well as provide another avenue into the collection, there is a search function provided by Archive-It and their Wayback Machine here. Using the Wayback search, one can search for any URL. If the site appears in our collection, even if only partially, the search will return it.

We are currently at work to address Social Media, so look for future posts around that subject.

Post contributed by Matthew Farrell, Digital Records Archivist.

5,000 Digital Books and Counting

The Internet Archive just reached an important milestone by digitizing 5,000 books at Duke. The 5,000th book, The British Album: In Two Volumes, contains poetry by “Della Crusca, Anna Matilda, Arley, Benedict, The Bard” and other writers on themes including love, horror, jealousy, and death, and is part of the general collections of the Rubenstein Library. The “Ode to Death” begins “THOU, whose remorseless rage, Nor vows, nor tears assuage, TRIUMPHANT DEATH!—to thee I raise, The bursting notes of dauntless praise!” The second volume can be found here.

The Scribe Scanner
The Scribe Scanner. Photo by Rita Johnston.

The Internet Archive scanning center at Duke University has been in operation for one and a half years and has digitized materials from collections within the Rubenstein Library, including the University Archives, Utopian Literature, and Confederate Imprints. I scan about 450 pages per hour and around 50 books a week. Most books in the public domain under 11 x 13 inches in size can be digitized on the Scribe book scanner, as well as pamphlets and loose documents.

Books digitized through Internet Archive are usually available on the site by the next day, are full-text searchable, and can be read in a web browser or downloaded to a computer; e-book reader; or mobile device. You can find newly digitized Duke materials by clicking on the RSS feed link at the bottom right on this blog or by visiting the Duke University Libraries Internet Archive page. Patrons can request a book to be digitized by the Internet Archive by contacting Rubenstein Library staff.

Post contributed by Rita Johnston, Scribe scanner operator.