Category Archives: Behind the Scenes

Digitization Details: Thunderbolts, Waveforms & Black Magic

The technology for digitizing analog videotape is continually evolving. Thanks to increases in data transfer-rates and hard drive write-speeds, as well as the availability of more powerful computer processors at cheaper price-points, the Digital Production Center recently decided to upgrade its video digitization system. Funding for the improved technology was procured by Winston Atkins, Duke Libraries Preservation Officer. Of all the materials we work with in the Digital Production Center, analog videotape has one of the shortest lifespans. Thus, it is high on the list of the Library’s priorities for long-term digital preservation. Thanks, Winston!

thunderbolt_speed_comparision
Thunderbolt is leaving USB in the dust.

Due to innovative design, ease of use, and dominance within the video and filmmaking communities, we decided to go with a combination of products designed by Apple Inc., and Blackmagic Design. A new computer hardware interface recently adopted by Apple and Blackmagic, called Thunderbolt, allows the the two companies’ products to work seamlessly together at an unprecedented data-transfer speed of 10 Gigabits per second, per channel. This is much faster than previously available interfaces such as Firewire and USB. Because video content incorporates an enormous amount of data, the improved data-transfer speed allows the computer to capture the video signal in real time, without interruption or dropped frames.

analog_to_sdi
Blackmagic design converts the analog video signal to SDI (serial digital interface).

Our new data stream works as follows. Once a tape is playing on an analog videotape deck, the output signal travels through an Analog to SDI (serial digital interface) converter. This converts the content from analog to digital. Next, the digital signal travels via SDI cable through a Blackmagic SmartScope monitor, which allows for monitoring via waveform and vectorscope readouts. A veteran television engineer I know will talk to you for days regarding the physics of this, but, in layperson terms, these readouts let you verify the integrity of the color signal, and make sure your video levels are not too high (blown-out highlights) or too low (crushed shadows). If there is a problem, adjustments can be made via analog video signal processor or time-base corrector to bring the video signal within acceptable limits.

waveform
Blackmagic’s SmartScope allows for monitoring of the video’s waveform. The signal must stay between 0 and 700 (left side) or clipping will occur, which means you need to get that videotape to the emergency room, STAT!

Next, the video content travels via SDI cable to a Blackmagic Ultrastudio interface, which converts the signal from SDI to Thunderbolt, so it can now be recognized by a computer. The content then travels via Thunderbolt cable to a 27″ Apple iMac utilizing a 3.5 GHz Quad-core processor and NVIDIA GeForce graphics processor. Blackmagic’s Media Express software writes the data, via Thunderbolt cable, to a G-Drive Pro external storage system as a 10-bit, uncompressed preservation master file. After capture, editing can be done using Apple’s Final Cut Pro or QuickTime Pro. Compressed Mp4 access derivatives are then batch-processed using Apple’s Compressor software, or other utilities such as MPEG-Streamclip. Finally, the preservation master files are uploaded to Duke’s servers for long-term storage. Unless there are copyright restrictions, the access derivatives will be published online.

bob_hope
Video digitization happens in real time. A one-hour tape is digitized in, well, one hour, which is more than enough Bob Hope jokes for anyone.

Leveling Up Our Document Viewer

This past week, we were excited to be able to publish a rare 1804 manuscript copy of the Haitian Declaration of Independence in our digital collections website. We used the project as a catalyst for improving our document-viewing user experience, since we knew our existing platforms just wouldn’t cut it for this particular treasure from the Rubenstein Library collection. In order to present the declaration online, we decided to implement the open-source Diva.js viewer. We’re happy with the results so far and look forward to making more strides in our ability to represent documents in our site as the year progresses.

docviewer
Haitian Declaration of Independence as seen in Diva.js document viewer with full text transcription.

Challenges to Address

We have had two glaring limitations in providing access to digitized collections to date: 1) a less-than-stellar zoom & pan feature for images and 2) a suboptimal experience for navigating documents with multiple pages. For zooming and panning (see example), we use software called OpenLayers, which is primarily a mapping application. And for paginated items we’ve used two plugins designed to showcase image galleries, Galleria (example) and Colorbox (example). These tools are all pretty good at what they do, but we’ve been using them more as stopgap solutions for things they weren’t really created to do in the first place. As the old saying goes, when all you have is a hammer, everything looks like a nail.

Big (OR Zoom-Dependent) Things

A selection from our digitized Italian Cultural Posters. Large derivative is 11,000 x 8,000 pixels, a 28MB JPG.
A selection from our digitized Italian Cultural Posters. The “large” derivative is 11,000 x 8,000 pixels, a 28MB JPG.

Traditionally as we digitize images, whether freestanding or components of a multi-page object, at the end of the process we generate three JPG derivatives per page. We make a thumbnail (helpful in search results or other item sets), medium image (what you see on an item’s webpage), and large image (same dimensions as the preservation master, viewed via the ‘all sizes’ link). That’s a common approach, but there are several places where that doesn’t always work so well. Some things we’ve digitized are big, as in “shoot them in sections with a camera and stitch the images together” big. And we’ve got several more materials like this waiting in the wings to make available. A medium image doesn’t always do these things justice, but good luck downloading and navigating a giant 28MB JPG when all you want to do is zoom in a little bit.

Likewise, an object doesn’t have to be large to really need easy zooming to be part of the viewing experience. You might want to read the fine print on that newspaper ad, see the surgeon general’s warning on that billboard, or inspect the brushstrokes in that beautiful hand-painted glass lantern slide.

And finally, it’s not easy to anticipate the exact dimensions at which all our images will be useful to a person or program using them. Using our data to power an interactive display for a media wall? A mobile app? A slideshow on the web? You’ll probably want images that are different dimensions than what we’ve stored online. But to date, we haven’t been able to provide ways to specify different parameters (like height, width, and rotation angle) in the image URLs to help people use our images in environments beyond our website.

A page from Mary McCornack Thompson's 1908 travel diary, underrepresented by its presentation via an image gallery.
A page from Mary McCornack Thompson’s 1908 travel diary, limited by its presentation via an image gallery.

Paginated Things

We do love our documentary photography collections, but a lot of our digitized objects are represented by more than just a single image. Take an 11-page piece of sheet music or a 127-page diary, for example. Those aren’t just sequences or collections of images. Their paginated orientation is pretty essential to their representation online, but a lot of what characterizes those materials is unfortunately lost in translation when we use gallery tools to display them.

The Intersection of (Big OR Zoom-Dependent) AND Paginated

Here’s where things get interesting and quite a bit more complicated: when zooming, panning, page navigation, and system performance are all essential to interacting with a digital object. There are several tools out there that support these various aspects, but very few that do them all AND do them well. We knew we needed something that did.

Our Solution: Diva.js

diva-logoWe decided to use the open-source Diva.js (Document Image Viewer with AJAX). Developed at the Distributed Digital Music Archives and Libraries Lab (DDMAL) at McGill University, it’s “a Javascript frontend for viewing documents, designed to work with digital libraries to present multi-page documents as a single, continuous item” (see About page). We liked its combination of zooming, panning, and page navigation, as well as its extensibility. This Code4Lib article nicely summarizes how it works and why it was developed.

Setting up Diva.js required us to add a few new pieces to our infrastructure. The most significant was an image server (in our case, IIPImage) that could 1) deliver parts of a digital image upon request, and 2) deliver complete images at whatever size is requested via URL parameters.

Our Interface: How it Works

By default, we present a document in our usual item page template that provides branding, context, and metadata. You can scroll up and down to navigate pages, use Page Up or Page Down keys, or enter a page number to jump to a page directly. There’s a slider to zoom in or out, or alternatively you can double-click to zoom in / Ctrl-double-click to zoom out. You can toggle to a grid view of all pages and adjust how many pages to view at once in the grid. There’s a really handy full-screen option, too.

Fulltext transcription presented in fullscreen mode, thumbnail view.
Fulltext transcription presented in fullscreen mode, thumbnail view.
Page 4, zoom level 4, with link to download.
Page 4, zoom level 4, with link to download.

It’s optimized for performance via AJAX-driven “lazy loading”: only the page of the document that you’re currently viewing has to load in your browser, and likewise only the visible part of that page image in the viewer must load (via square tiles). You can also download a complete JPG for a page at the current resolution by clicking the grey arrow.

We extended Diva.js by building a synchronized fulltext pane that displays the transcript of the current page alongside the image (and beneath it in full-screen view). That doesn’t come out-of-the-box, but Diva.js provides some useful hooks into its various functions to enable developing this sort of thing. We also slightly modified the styles.

image tile
A tile delivered by IIPImage server

Behind the scenes, we have pyramid TIFF images (one for each page), served up as JPGs by IIPImage server. These files comprise arrays of 256×256 JPG tiles for each available zoom level for the image. Let’s take page 1 of the declaration for example. At zoom level 0 (all the way zoomed out), there’s only one image tile: it’s under 256×256 pixels; level 1 is 4 tiles, level 2 is 12, level 3 is 48, level 4 is 176. The page image at level 5 (all the way zoomed in) includes 682 tiles (example of one), which sounds like a lot, but then again the server only has to deliver the parts that you’re currently viewing.

Every item using Diva.js also needs to load a JSON stream including the dimensions for each page within the document, so we had to generate that data. If there’s a transcript present, we store it as a single HTML file, then use AJAX to dynamically pull in the part of that file that corresponds to the currently-viewed page in the document.

Diva.js & IIPImage Limitations

It’s a good interface, and is the best document representation we’ve been able to provide to date. Yet it’s far from perfect. There are several areas that are limiting or that we want to explore more as we look to make more documents available in the future.

Out of the box, Diva.js doesn’t support page metadata, transcriptions, or search & retrieval within a document. We do display a synchronized transcript, but there’s currently no mapping between the text and the location within each page where each word appears, nor can you perform a search and discover which pages contain a given keyword. Other folks using Diva.js are working on robust applications that handle these kinds of interactions, but the degree to which they must customize the application is high. See for example, the Salzinnes Antiphonal: a 485-page liturgical manuscript w/text and music or a prototype for the Liber Usualis: a 2,000+ page manuscript using optical music recognition to encode melodic fragments.

Diva.js also has discrete zooming, which can feel a little jarring when you jump between zoom levels. It’s not the smooth, continuous zoom experience that is becoming more commonplace in other viewers.

With the IIPImage server, we’ll likely re-evaluate using Pyramid TIFFs vs. JPEG2000s to see which file format works best for our digitization and publication workflow. In either case, there are several compression and caching variables to tinker with to find an ideal balance between image quality, storage space required, and system performance. We also discovered that the IIP server unfortunately strips out the images’ ICC color profiles when it delivers JPGs, so users may not be getting a true-to-form representation of the image colors we captured during digitization.

Next Steps

Launching our first project using Diva.js gives us a solid jumping-off point for expanding our ability to provide useful, compelling representations of our digitized documents online. We’ll assess how well this same approach would scale to other potential projects and in the meantime keep an eye on the landscape to see how things evolve. We’re better equipped now than ever to investigate alternative approaches and complementary tools for doing this work.

We’ll also engage more closely with our esteemed colleagues in the Duke Collaboratory for Classics Computing (DC3), who are at the forefront of building tools and services in support of digital scholarship. Well beyond supporting discovery and access to documents, their work enables a community of scholars to collaboratively transcribe and annotate items (an incredible–and incredibly useful–feat!). There’s a lot we’re eager to learn as we look ahead.

Digitization Details: Before We Push the “Scan” Button

The Digital Production Center at the Perkins Library has a clearly stated mission to “create digital captures of unique, valuable, or compelling primary resources for the purpose of preservation, access, and publication.”  Our mission statement goes on to say, “Our operating principle is to achieve consistent results of a measurable quality. We plan and perform our work in a structured and scalable way, so that our results are predictable and repeatable, and our digital collections are uniform.”

That’s a mouthful!

TV0198

What it means is the images have to be consistent not only from image to image within a collection but also from collection to collection over time.  And if that isn’t complex enough this has to be done using many different capture devices.  Each capture device has its own characteristics, which record and reproduce color in different ways.

How do we produce consistent images?

There are many variables to consider when solving the puzzle of “consistent results of a measurable quality.”  First, we start with the viewing environment, then move to monitor calibration and profiling, and end with capture device profiling.  All of these variables play a part in producing consistent results.

Full spectrum lighting is used in the Digital Production Center to create a neutral environment for viewing the original material.  Lighting that is not full spectrum often has a blue, magenta, green or yellow color shift, which we often don’t notice because our eyes are able to adjust effortlessly.  In the image below you can see the difference between tungsten lighting and neutral lighting.

Tungsten light (left) Neutral light (right)
Tungsten light (left) Neutral light (right)

Our walls are also painted 18 percent gray, which is neutral, so that no color is reflected from the walls onto the image while comparing it to the digital image.

Now that we have a neutral viewing environment, the next variable to consider is the computer monitors used to view our digitized images.  We use a spectrophotometer (straight out of the Jetsons, right?) made by xrite to measure the color accuracy, luminance and contrast of the monitor.  This hardware/software combination uses the spectrophotometer as it’s attached to the computer screen to read the brightness (luminance), contrast, white point and gamma of your monitor and makes adjustments for optimal viewing.  This is called monitor calibration.  The software then displays a series of color patches with known RGB values which the spectrophotometer measures and records the difference.  The result is an icc display profile.  This profile is saved to your operating system and is used to translate colors from what your monitor natively produces to a more accurate color representation.

Now our environment is neutral and our monitor is calibrated and profiled.  The next step in the process is to profile your capture device, whether it is a high-end digital scan back like the Phase One or BetterLight or an overhead scanner like a Zeutschel. From Epson flatbed scanners to Nikon slide scanners, all of these devices can be calibrated in the same way.  With all of the auto settings on your scanner turned off, a color target is digitized on the device you wish to calibrate.  The swatches on the color target are known values similar to the series of color patches used for profiling the monitor.  The digitized target is fed to the profiling software.  Each patch is measured and compared against its known value.  The differences are recorded and the result is an icc device profile.

Now that we have a neutral viewing environment for viewing the original material, our eyes don’t need to compensate for any color shift from the overhead lights or reflection from the walls.  Our monitors are calibrated/profiled so that the digitized images display correctly and our devices are profiled so they are able to produce consistent images regardless of what brand or type of capture device we use.

Gretag Macbeth color checker
Gretag Macbeth color checker

During our daily workflow we a Gretag Macbeth color checker to measure the output of the capture devices every day before we begin digitizing material to verify that the device is still working properly.

All of this work is done before we push the “scan” button to ensure that our results are predictable and repeatable, measurable and scalable.  Amen.

Mapping the Broadsides Collection, or, how to make an interactive map in 30 minutes or less

Ever find yourself with a pile of data that you want to plot on a map? You’ve got names of places and lots of other data associated with those places, maybe even images? Well, this happened to me recently. Let me explain.

A few years ago we published the Broadsides and Ephemera digital collection, which consists of over 4,100 items representing almost every U.S. state. When we cataloged the items in the collection, we made sure to identify, if possible, the state, county, and city of each broadside. We put quite a bit of effort into this part of the metadata work, but recently I got to thinking…what do we have to show for all of that work? Sure, we have a browseable list of place terms and someone can easily search for something like “Ninety-Six, South Carolina.” But, wouldn’t it be more interesting (and useful) if we could see all of the places represented in the Broadsides collection on one interactive map? Of course it would.

So, I decided to make a map. It was about 4:30pm on a Friday and I don’t work past 5, especially on a Friday. Here’s what I came up with in 30 minutes, a Map of Broadside Places. Below, I’ll explain how I used some free and easy-to-use tools like Excel, Open Refine, and Google Fusion Tables to put this together before quittin’ time.

Step 1: Get some structured data with geographic information
Mapping only works if your data contain some geographic information. You don’t necessarily need coordinates, just a list of place names, addresses, zip codes, etc. It helps if the geographic information is separated from any other data in your source, like in a separate spreadsheet column or database field. The more precise, structured, and consistent your geographic data, the easier it will be to map accurately. To produce the Broadsides Map, I simply exported all of the metadata records from our metadata management system (CONTENTdm) as a tab delimited text file, opened it in Excel, and removed some of the columns that I didn’t want to display on the map.

Step 2: Clean up any messy data..
For the best results, you’ll want to clean your data. After opening my tabbed file in Excel, I noticed that the place name column contained values for country, state, county, and city all strung together in the same cell but separated with semicolons (e.g. United States; North Carolina; Durham County (N.C.); Durham (N.C.)). Because I was only really interested in plotting the cities on the map, I decided to split the place name column into several columns in order to isolate the city values.

To do this, you have a couple of options. You can use Excel’s “text to columns” feature, instructing it to split the column into new columns based on the semicolon delimiter or you can load your tabbed file into Open Refine and use its “split columns into several columns” feature. Both tools work well for this task, but I prefer OpenRefine because it includes several more advanced data cleaning features. If you’ve never used OpenRefine before, I highly recommend it. It’s “cluster and edit” feature will blow your mind (if you’re a metadata librarian).

Step 3: Load the cleaned data into Google Fusion Tables
Google Fusion Tables is a great tool for merging two or more data sets and for mapping geographic data. You can access Fusion Tables from your Google Drive (formerly Google Docs) account. Just upload your spreadsheet to Fusion Tables and typically the application will automatically detect if one of your columns contains geographic or location data. If so, it will create a map view in a separate tab, and then begin geocoding the location data.

geocoding_fusion_tables

If Fusion Tables doesn’t automatically detect the geographic data in your source file, you can explicitly change a column’s data type in Fusion Tables to “Location” to trigger the geocoding process. Once the geocoding process begins, Fusion Tables will process every place name in your spreadsheet through the Google Maps API and attempt to plot that place on the map. In essence, it’s as if you were searching for each one of those terms in Google Maps and putting the results of all of those searches on the same map.

Once the geocoding process is complete, you’re left with a map that features a placemark for every place term the service was able to geocode. If you click on any of the placemarks, you’ll see a pop-up information window that, by default, lists all of the other metadata elements and values associated with that record. You’ll notice that the field labels in the info window match the column headers in your spreadsheet. You’ll probably want to tweak some settings to make this info window a little more user-friendly.

info_window_styled

Step 4: Make some simple tweaks to add images and clickable links to your map
To change the appearance of the information window, select the “change” option under the map tab then choose “change info window.” From here, you can add or remove fields from the info window display, change the data labels, or add some custom HTML code to turn the titles into clickable links or add thumbnail images. If your spreadsheet contains any sort of URL, or identifier that you can use to reliably construct a URL, adding these links and images is quite simple. You can call any value in your spreadsheet by referencing the column name in braces (e.g. {Identifier-DukeID}). Below is the custom HTML code I used to style the info window for my Broadsides map. Notice how the data in the {Identifier-DukeID} column is used to construct the links for the titles and image thumbnails in the info window.

info_window_screen

Step 5: Publish your map
Once you’re satisfied with you map, you can share a link to it or embed the map in your own web page or blog…like this one. Just choose tools->publish to grab the link or copy and paste the HTML code into your web page or blog.

To learn more about creating maps in Google Fusion Tables, see this Tutorial or contact the Duke Library’s Data and GIS Services.

Can You (Virtually) Dig It?

A group from Duke Libraries recently visited Dr. Maurizio Forte’s Digital Archaeology Initiative (a.k.a. “Dig@Lab”) to learn more about digital imaging of three-dimensional objects and to explore opportunities for collaboration between the lab and the library.

2014-04-29 15.37.39
These glasses and stylus allow you to disassemble the layers of a virtual site and rearrange and resize each part.

Dr. Forte (a Professor of Classical Studies, Art, and Visual Studies) and his colleagues were kind enough to demonstrate how they are using 3D imaging technology to “dig for information” in simulated archaeological sites and objects.  Their lab is a fascinating blend of cutting-edge software and display interfaces, such as the Unity 3D software being used in the photo above, and consumer video gaming equipment (recognize that joystick?).

Zeke tries not to laugh as Noah dons the virtual reality goggles.
Zeke tries not to laugh as Noah dons the virtual reality goggles.

Using the goggles and joystick above, we took turns exploring the streets and buildings of the ancient city of Regium Lepedi in Northern Italy.  The experience was extremely immersive and somewhat disorienting, from getting lost in narrow alleys to climbing winding staircases for an overhead view of the entire landscape.  The feeling of vertigo from the roof was visceral.  None of us took the challenge to jump off of the roof, which apparently you can do (and which is also very scary according to the lab researchers).  After taking the goggles off, I felt a heaviness and solidity return to my body as I readjusted to the “real world” around me, similar to the sensation of gravity after stepping off a trampoline.

Alex--can you hear me?
Alex–can you hear me?

The Libraries and Digital Projects team look forward to working more with Dr. Forte and bringing 3D imaging into our digital collections.

More information about the lab’s work can be found at:

http://sites.duke.edu/digatlab/

 

Mike views a mathematically modeled 3D rendering of a tile mosaic.
Mike views a mathematically modeled 3D rendering of a tile mosaic.

(Photos by Molly Bragg and Beth Doyle)

A Sketch for Digital Projects at DUL

What happens when an IT manager suffering from administrivia-induced ennui gets ahold of dia and starts browsing The Noun Project.
The inevitable result, when an IT manager suffering from Acute Administrivia-Induced Ennui gets ahold of dia and starts browsing The Noun Project.

We have all these plans and do all this work with the digital collections and the projects and what have you. Plan-plan-plan, work-work-work, and plan and work some more. Some things get done, others don’t, as we journey for that distant horizon, just on the other side of which lies, “Hooray, we finally got it!”

I started to draw a map for the next phases of that journey a few days ago, and it was going to be really serious. All these plans – repository migration, exhibit platform, workflow management, ArchivesSpace – would be represented in this exacting diagram of our content types and platforms and their interrelations. It might even have multiple sheets, which would layer atop one another like transparencies, to show the evolution of our stuff over time. UML books more than ten years old would be dusted off in the production of this diagram.

Almanac-Detail1Then my brain started to hurt, and I found myself doodling in response. I started having fun with it. You might even say I completely dorked out on it. Thus you have the “Sketch for an almanac of digital projects at Duke University Libraries” above.

Placing whimsical sea monster icons on a field of faux design elements took a lot of my time this week, so I’m afraid I’m not able to write any more about the diagram right now. However, provided it doesn’t prove a source of embarrassment and regret, I might revisit it in the near future.

Using Google Spreadsheets with Timelines

Doris Duke timeline
Doris Duke timeline

We’ve been making use of the fabulous Timeline.js library for a while now. The first timeline we published, compiled by Mary Samouelian about the life of Doris Duke, uses Timeline.js to display text and images in an elegant interactive format. Back then the library was called Verite Timeline and our implementation involved parsing XML files using Python to render out the content on the page. And in general, this approach worked great. However, managing and updating the XML files isn’t all that easy. Things also get complicated when more than one person wants to work on them — especially at the same time.

Enter Google Spreadsheets! Timeline.js is now designed to easily grab data from a publicly-published Google spreadsheet and create great looking output out of the box. Managing the timeline data in the spreadsheet is a huge step up from XML files in terms of ease of use for our researchers and for maintainability. And it helps that librarians love spreadsheets. If someone errantly enters some bad data, it’s simple to undo that particular edit as all changes are tracked by default. If a researcher wants to add a new timeline event, they can easily go into the spreadsheet and enter a new row. Changes are reflected on the live page almost immediately.

Spreadsheet data

Timeline.js provides a very helpful template for getting started with entering your data. They require that you include certain key columns and that the columns be named following their data schema. You are free to add additional columns, however, and we’ve played around with doing so in order to include categorical descriptions and the like.

Here is a sample of some data from our Doris Duke timeline.

Data for Doris Duke Timeline
Data for Doris Duke Timeline

For entries with more than one image, we don’t include a ‘Start Date’ which means Timeline.js will skip over them. We then render these out as smaller thumbnails on our timeline page.

Images on Doris Duke timeline page
Images on Doris Duke timeline page

Going all-in with spreadsheets

We’ve published our subsequent timelines using a combination of the Google spreadsheet data to generate the Timeline.js output while also using the XML files to load in and display relational data (using the EAC-CPF standard) while using Python to generate the pages. However, for our latest timeline on the J. Walter Thompson Company (preview the dev version), we’ve decided to house all of the data (including the CPF relations) in a Google Spreadsheet and use PHP to parse everything. This approach will mean that we no longer need to rely on the XML files, so our researchers can quickly make updates to the timeline pages. We can easily convert the spreadsheet data back into an XML file if the need arises.

J. Walter Thompson Company Timeline
J. Walter Thompson Company Timeline

Code snippets

Note: there’s an updated syntax for newly created spreadsheets.

We’re taking advantage of the Google spreadsheet data API that allows for the data to easily be parsed as JSON. Querying the spreadsheet in PHP looks something like this:

$theURL = "http://spreadsheets.google.com/feeds/list/[your-spreadsheet-key]/
od6/public/values?alt=json&callback=displayContent";

$theJSON = file_get_contents($theURL, 0, $ctx); //the $ctx variable sets a timeout limit

$theData = json_decode($theJSON, TRUE);

And then we can loop through and parse out the data using something like this:

foreach ($theData['feed']['entry'] as $item) {

	echo $item['gsx$startdate']['$t'];
	// Note that the column names in the spreadsheet are targeted by adding 'gsx$' 
	   and 'the column name in lc with no spaces'
	   You may also want to use 'strtotime' on the dates so that you can 
	   transform them using 'date'

	echo $item['gsx$enddate']['$t'];

	echo $item['gsx$headline']['$t'];

	echo $item['gsx$text']['$t'];

	... // and so on
}

One important thing to note is that by default, the above query structure only gets data from the primary worksheet in the spreadsheet (which is targeted using the od6 variable). Should you want to target other worksheets, you’ll need to know which ‘od’ variable to use in your query. You can view the full structure of your spreadsheet by using a url like this:

https://spreadsheets.google.com/feeds/worksheets/[your-spreadsheet-key]/public/basic

Then match up the ‘od’ instance to the correct content and query it.

Timelines and Drupal

We’ve also decided to integrate the publishing of timelines into our Drupal CMS, which drives the Duke University Libraries website, by developing a custom module. Implementing the backend code as a module will make it easy to apply custom templates in the future so that we can change the look and feel of a timeline for a given context. The module isn’t quite finished yet, but it should be ready in the next week or two. All in all, this new process will allow timelines to be created, published, and updated quickly and easily.


UPDATE

I recently learned that sometime in early 2014, google changed the syntax for published spreadsheet URLs and they are no longer using spreadsheet key as an identifier. As such, the syntax for retrieving a JSON feed has changed.

The new syntax looks like this:

https://spreadsheets.google.com/feeds/cells/[spreadsheet-ID]/[spreadsheet-index]/public/basic?alt=json&callback=displayContent

‘spreadsheet-ID’ is the string of text that shows up when you publish your spreadsheet:

https://docs.google.com/spreadsheets/d/[spreadsheet-ID]/pubhtml

‘spreadsheet-index’ you can see when editing your spreadsheet – it’s the value that is assigned to ‘gid’ or in the case below, it’s ‘0’:

https://docs.google.com/spreadsheets/d/[spreadsheet-ID]/edit#gid=0

I hope this helps save some frustration of finding documentation on the new syntax.

Post contributed by Michael Daul

Digitization Details: Bringing Duke Living History Into Your Future

Recently, I digitized 123 videotapes from the Duke University Living History Program. Beginning in the early 1970’s, Duke University faculty members conducted interviews with prominent world leaders, politicians and activists. The first interviews were videotaped in Perkins Library at a time when video was groundbreaking technology, almost a decade before consumer-grade VCRs starting showing up in people’s living rooms. Some of the interviews begin with a visionary introduction by Jay Rutherfurd, who championed the program:

“At the W. R. Perkins library, in Duke University, we now commit this exciting experiment in electronic journalism into your future. May it illuminate well, educate wisely, and relate meaningfully, for future generations.”

Clearly, the “future” that Mr. Rutherfurd envisioned has arrived. Thanks to modern technology, we can now create digital surrogates of these videotaped interviews for long-term preservation and access. The subjects featured in this collection span a variety of generations, nationalities, occupations and political leanings. Interviewees include Les Aspin, Ellsworth Bunker, Dr. Samuel DuBois Cook, Joseph Banks Rhine, Jesse Jackson, Robert McNamara, Dean Rusk, King Mihai of Romania, Terry Sanford, Judy Woodruff, Angier Biddle Duke and many more. The collection also includes videotapes of speeches given on the Duke campus by Ronald Reagan, Abbie Hoffman, Bob Dole, Julian Bond and Elie Wiesel.

residue
Residue wiped off the head of a U-matic playback deck, the result of sticky-shed syndrome.

Many of the interviews were recorded on 3/4″ videotape, also called “U-matic.” Invented by Sony in 1969, the U-matic format was the first videotape to be housed inside a plastic cassette for portability, and would soon replace film as the primary television news-gathering format. Unfortunately, most U-matic tapes have not aged well. After decades in storage, many of the videotapes in our collection now have sticky-shed syndrome, a condition in which the oxide that holds the visual content is literally flaking off the polyester tape base, and is gummy in texture. When a videotape has sticky-shed, not only will it not play correctly, the residue can also clog up the tape heads in the U-matic playback deck, then transfer the contaminant to other tapes played afterwards in the same deck. A U-matic videotape player in good working order is now an obsolete collector’s item, and our tapes are fragile, so we came up with a solution: throw those tapes in the oven!

oven2
After baking, the cookies (I mean U-matic videotapes) are ready for digitization!

At first that may sound reckless, but baking audio and videotapes at relatively low temperatures for an extended period of time is a well-tested method for minimizing the effects of sticky-shed syndrome. The Digital Production Center recently acquired a scientific oven, and after initial testing, we baked each Duke Living History U-matic videotape at 52 celsius (125 fahrenheit) for about 10 hours. Baking the videotapes temporarily removed the moisture that had accumulated in the binder, and made them playable for digitization. About 90% of our U-matic tapes played well after baking. Many of them were unplayable beforehand.

videoracks
The Digital Production Center’s video rack and routing system.

After giving the videotapes time to cool down, we digitize each tape, in real time, as an uncompressed  file (.mov) for long-term preservation. Afterwards, we make a smaller, compressed version (.mp4) of the same recording, which is our access copy. Our U-matic decks are housed in an efficiently-designed rack system, which also includes other obsolete videotape formats like VHS, Betacam and Hi8. Centralized audio and video routers allow us to quickly switch between formats while ensuring a clean, balanced and accurate conversion from analog to digital. Combining the art of analog tape baking with modern video digitization, the Digital Production Center is able to rescue the content from the videotapes, before the magnetic tape ages and degrades any further. While the U-matic tapes are nearing the end of their life-span, the digital surrogates will potentially last for centuries to come. We are able to benefit from Mr. Rutherfurd’s exciting experiment into our future, and carry it forward… into your future. May it illuminate well, educate wisely, and relate meaningfully, for future generations.

 

Post contributed by Alex Marsh

 

Digitization Details: Sidney D. Gamble’s Lantern Slides

I have worked in the Digital Production Center since March of 2005 and I’ve seen a lot of digital collections published in my time here.  I have seen so many images that sometimes it is difficult to say which collection is my favorite but the Sidney D. Gamble Photographs have always been near the top.

The Sidney D. Gamble Photographs are an amazing collection of black and white photographs of daily life in China taken between 1908 and 1932.  These documentary style images of urban and rural life, public events, architecture, religious statuary, and the countryside really resonate with me for their unopposed moment in time feel.  Recently the Digital Collections Implementation Team was tasked with digitizing a subset of lantern slides from this collection.  What is a lantern slide you might ask?

RL_10074_LS_0105_large
Herding Ducks

A lantern slide is a photographic transparency which is glass-mounted and often hand-colored for projection by a “magic lantern.”  The magic lantern was the earliest form of slide projector which, in its earliest incarnation, used candles to project painted slides onto a wall or cloth screen.   The projectionist was often hidden from the audience making it seem more magical.   By the time the 1840s rolled around photographic processes had been developed by William and Frederick Langenheim that enabled a glass plate negative to be printed onto another glass plate by a contact method creating a positive.  These positives were then painted in the same fashion that the earlier slides were painted (think Kodachrome).  The magic lantern predates the school slate and the chalkboard for use in a classroom.

After working with and enjoying the digitization of the nitrate negatives from the Sidney D. Gamble Photographs it has been icing on the cake to work with the lantern slides from the same collection so many years later.  While the original black and white images resonate with me the lantern slides have added a whole new dimension to the experience.  On one hand the black and white images lend a sense of history and times passed and on the other, the vivid colors of the lantern slides draw me into the scene as if it were the present.

RL_10074_LS_0297_large
Barbers on Bund

I am in awe of the amount of work and the variety of skill sets required to create a collection such as this.   Sidney D. Gamble, an amateur photographer, to trek across China over 4 trips spanning 24 years, photographing and processing nitrate negatives in the field without a traditional darkroom, all the while taking notes and labeling the negatives.  Then to come home and create the glass plate positives and hand color over 500 of them.  For being an “amateur photographer” Gamble’s images are striking.  The type of camera he used takes skill and knowledge to create a reasonably correct exposure.  Processing the film is technically challenging in a traditional darkroom and is made much more difficult in the field.  Taking enough notes while shooting, processing and traveling so they make sense as a collection is a feat in itself.  The transfer from negative film to positive glass plates on such a scale is a tedious and technical venture.  Then to hand paint all of the slides takes additional skill and tools.  All of this makes digitization of the material look like child’s play.

An inventory of the hand-colored slides was created before digitization began.  Any hand-colored slides with existing black and white negatives were identified so they can be displayed together online.  A color-balanced light box was used to illuminate the lantern slides and a Phase One P65 Reprographic camera was used in conjunction with a precision Kaiser copy stand to capture them.   All of the equipment used in the Digital Production Center is color-calibrated and profiled so consistent results can be achieved from capture to capture.  This removes the majority of the subjective decision making from the digitization process.  Sidney D. Gamble had many variables to contend with to produce the lantern slides much like the Digital Collections Implementation Team deals with many variables when publishing a digital collection.  From conservation of the physical material, digitization, metadata, interface design to the technology used to deliver the images online and the servers and network that connect everything to make it happen, there are plenty of variables.  They are just different variables.

Nowadays we photograph and share the minutia of our lives.  When Sidney Gamble took his photographs he had to be much more deliberate.   I appreciate his deliberateness as much as I appreciate all the people involved in publishing collections.  I look forward to publication of the Sidney D. Gamble lantern slides in the near future and hope you will enjoy this collection as much as I have over the years.

Post Contributed by Mike Adamo

A Day in the Life of Digital Collections

I joined the digital collections team in early December 2013, and from day 1 I have been immersed in the details of our long list of unique projects, all with their own set of schedules, stakeholders, and resource needs.  My task has also been to evaluate and improve our overall workflow, create outreach and promotional opportunities (like this blog!), and really anything else that comes up that is related to digital projects. What does that all mean in terms of day-to-day work? It means I attend A LOT of meetings.

Haitian Declaration of Independence
Just another day in the Digital Production Center imaging the Haitian Declaration of Independence!

Luckily most of my meetings are absolutely fascinating and revolve around very exciting projects and materials.  Here are some of my favorite meetings from the last few weeks.  Truth be told, I didn’t go to all of these in one day, but they are a pretty representative sample of the types of meetings I do attend everyday.

Haitian Declaration of Independence:  Perhaps you have heard that the Rubenstein library has a copy of this historic document?  The digital collections implementation team recently met with RL curator Will Hansen to discuss digitizing and providing access to the declaration, and of course he brought it with him.  Its not that large to be honest, but very impressive.  In DPPS we are using this project as catalyst to implement an image server and a new document viewing tool to provide better access to documents like the declaration.

 

“Girl Lost in Thought at Fast Food Counter” Image from the William Gedney Digital Collection

Workflows, Workflows, workflows:  Every week I attend operational meetings with both the Duke Digital Collections Implementation teams and the Digital Production Center to discuss work in progress, scheduling, new projects, and how to perfect our ever changing workflows.  I presented, along with my colleagues from Digital Projects and Production Services as part of our monthly ITS meeting, First Wednesday, on our overall process and some of the changes we have been making since I came on board.  Check out all of our slides! 

Gedney:  Duke Digital Collections patrons are no strangers to the William Gedney Photographs and Writings digital collection.  The physical collection is being re-processed and we will be digitizing more of it later in 2014.  This is a large project with a long timeline, but we are so excited to provide access to more materials in one of our most popular digital collections.

 

Early Greek MS:  the Rubenstein Library has a large collection of early Greek manuscripts.  Many items have already been digitized, and Rubenstein Technical Services is in the process of cataloging them.     Once cataloging is complete, we will be able to plan the publishing aspects of this project.  Both DPPS and our colleagues in the Collaboratory for Classics Computing are thrilled to provide access to digital versions of these items.

Stay tuned for continuing developments in these and all the other projects we have in progress!

GreekMS
A scanned image from one of the Greek Manuscripts in the Rubenstein collection.

 

Post authored by Molly Bragg