Noise is an inescapable part of our sonic environment. As I sit at my quiet library desk writing this, I can hear the undercurrent of the building’s pipes and HVAC systems, the click-clack of the Scribe overhead book scanner, footsteps from the floor above, doors opening and closing in the hallway, and the various rustlings of my own fidgeting. In our daily lives, our brains tune out much of this extraneous noise to help us focus on the task at hand and be alert to sounds conveying immediately useful information: a colleagues’s voice, a cell-phone buzz, a fire alarm.
When sound is recorded electronically, however, this tuned-out noise is often pushed to the foreground. This may be due to the recording conditions (e.g. a field recording done on budget equipment in someone’s home or outdoors) or inherent in the recording technology itself (electrical interference, mechanical surface noise). Noise is always present in the audio materials we digitize and archive, many of which are interviews, oral histories, and events recorded to cassette or open reel tape by amateurs in the field. Our first goal is to make the cleanest and most direct analog-to-digital transfer possible, and then save this as our archival master .wav file with no alterations. Once this is accomplished, we have some leeway to work with the digital audio and try to create a more easily listenable and intelligible access copy.
I recently started experimenting with Steinberg WaveLab software to clean up digitized recordings from the Larry Rubin Papers. This collection contains some amazing documentation of Rubin’s work as a civil rights organizer in the 1960s, but the ever-present hum & hiss often threaten to obscure the content. I worked with two plug-ins in WaveLab to try to mitigate the noise while leaving the bulk of the audio information intact.
Even if you don’t know it by name, anyone who has used electronic audio equipment has probably heard the dreaded 60 Cycle Hum. This is a fixed low-frequency tone that is related to our main electric power grid operating at 120 volts AC in the United States. Due to improper grounding and electromagnetic interference from nearby wires and appliances, this current can leak into our audio signals and appear as the ubiquitous 60 Hz hum (disclaimer–you may not be able to hear this as well on tiny laptop speakers or earbuds). Wavelab’s De-Buzzer plug-in allowed me to isolate this troublesome frequency and reduce its volume level drastically in relation to the interview material. Starting from a recommended preset, I adjusted the sensitivity of the noise reduction by ear to cut unwanted hum without introducing any obvious digital artifacts in the sound.
Similarly omnipresent in analog audio is High-Frequency Hiss. This wash of noise is native to any electrical system (see Noise Floor) and is especially problematic in tape-based media where the contact of the recording and playback heads against the tape introduces another level of “surface noise.” I used the De-Noiser plug-in to reduce hiss while being careful not to cut into the high-frequency content too much. Applying this effect too heavily could make the voices in the recording sound dull and muddy, which would be counterproductive to improving overall intelligibility.
Listen to the before & after audio snippets below. While the audio is still far from perfect due to the original recording conditions, conservative application of the noise reduction tools has significantly cleaned up the sound. It’s possible to cut the noise even further with more aggressive use of the effects, but I felt that would do more harm than good to the overall sound quality.
I was fairly pleased with these results and plan to keep working with these and other software tools in the future to create digital audio files that meet the needs of archivists and researchers. We can’t eliminate all of the noise from our media-saturated lives, but we can always keep striving to keep the signal-to-noise ratio at manageable and healthy levels.
Back in March I wrote a blog post about the Library exploring Multispectral Imaging (MSI) to see if it was feasible to bring this capability to the Library. It seems that all the stars have aligned, all the ducks have been put in order, the t’s crossed and the i’s dotted because over the past few days/weeks we have been receiving shipments of MSI equipment, scheduling the painting of walls and installation of tile floors and finalizing equipment installation and training dates (thanks Molly!). A lot of time and energy went into bringing MSI to the Library and I’m sure I speak for everyone involved along the way that WE ARE REALLY EXCITED!
I won’t get too technical but I feel like geeking out on this a little… like I said… I’m excited!
Lights, Cameras and Digital Backs: To maximize the usefulness of this equipment and the space it will consume we will capture both MSI and full color images with (mostly) the same equipment. MSI and full color capture require different light sources, digital backs and software. In order to capture full color images, we will be using the Atom Lighting and copy stand system and a Phase One IQ180 80MP digital back from Digital Transitions. To capture MSI we will be using narrowband multispectral EurekaLight panels with a Phase One IQ260 Achromatic, 60MP digital back. These two setups will use the same camera body, lens and copy stand. The hope is to set the equipment up in a way that we can “easily” switch between the two setups.
The computer that drives the system: Bill Christianson of R. B. Toth Associates has been working with Library IT to build a work station that will drive both the MSI and full color systems. We opted for a dual boot system because the Capture One software that drives the Phase One digital back for capturing full-color images has been more stable in a Mac environment and MSI capture requires software that only runs on a Windows system. Complicated, but I’m sure they will work out all the technical details.
The Equipment (Geek out):
Phase One IQ260 Achromatic, 60MP Digital Back
Phase One IQ180, 80MP Digital Back
Phase One iXR Camera Body
Phase One 120mm LS Lens
DT Atom Digitization Bench -Motorized Column (received)
DT Photon LED 20″ Light Banks (received)
Narrowband multispectral EurekaLight panels
Fluorescence filters and control
Workstation (in progress)
Blackout curtains and track (received)
The space: We are moving our current Phase One system and the MSI system into the same room. While full-color capture is pretty straightforward in terms of environment (overhead lights off, continuous light source for exposing material, neutral wall color and no windows), the MSI environment requires total darkness during capture. In order to have both systems in the same room we will be using blackout curtains between the two systems so the MSI system will be able to capture in total darkness and the full-color system will be able to use a continuous light source. While the blackout curtains are a significant upgrade, the overall space needs some minor remodeling. We will be upgrading to full spectrum overhead lighting, gray walls and a tile floor to match the existing lab environment.
As shown above… we have begun to receive MSI equipment, installation and training dates have been finalized, the work station is being built and configured as I write this and the room that will house both Phase One systems has been cleared out and is ready for a makeover… It is actually happening!
What a team effort!
I look forward to future blog posts about the discoveries we will make using our new MSI system!
In the Digital Production Center, many of the videotapes we digitize have “bars and tone” at the beginning of the tape. These are officially called “SMPTE color bars.” SMPTE stands for The Society of Motion Picture and Television Engineers, the organization that established the color bars as the North American video standard, beginning in the 1970s. In addition to the color bars presented visually, there is an audio tone that is emitted from the videotape at the same time, thus the phrase “bars and tone.”
The purpose of bars and tone is to serve as a reference or target for the calibration of color and audio levels coming from the videotape during transmission. The color bars are presented at 75% intensity. The audio tone is a 1kHz sine wave. In the DPC, we can make adjustments to the incoming signal, in order to bring the target values into specification. This is done by monitoring the vectorscope output, and the audio levels. Below, you can see the color bars are in proper alignment on the DPC’s vectorscope readout, after initial adjustment.
We use Blackmagic Design’s SmartView monitors to check the vectorscope, as well as waveform and audio levels. The SmartView is an updated, more compact and lightweight version of the older, analog equipment traditionally used in television studios. The Smartview monitors are integrated into our video rack system, along with other video digitization equipment, and numerous videotape decks.
If you are old enough to have grown up in the black and white television era, you may recognize this old TV test pattern, commonly referred to as the “Indian-head test pattern.” This often appeared just before a TV station began broadcasting in the morning, and again right after the station signed off at night. The design was introduced in 1939 by RCA. The “Indian-head” image was integrated into a pattern of lines and shapes that television engineers used to calibrate broadcast equipment. Because the illustration of the Native American chief contained identifiable shades of gray, and had fine detail in the feathers of the headdress, it was ideal for adjusting brightness and contrast.
When color television debuted in the 1960’s, the “Indian-head test pattern” was replaced with a test card showing color bars, a precursor to the SMPTE color bars. Today, the “Indian-head test pattern” is remembered nostalgically, as a symbol of the advent of television, and as a unique piece of Americana. The master art for the test pattern was discovered in an RCA dumpster in 1970, and has since been sold to a private collector. In 2009, when all U.S. television stations were required to end their analog signal transmission, many of the stations used the Indian-head test pattern as their final analog broadcast image.
If you happen to be rummaging through your parents’ or grandparents’ attic, basement or garage, and stumble upon some old reel-to-reel audiotape, or perhaps some dust-covered videotape reels that seem absurdly large & clunky, they are most likely worthless, except for perhaps sentimental value. Even if these artifacts did, at one time, have some unique historic content, you may never know, because there’s a strong chance that decades of temperature extremes have made the media unplayable. The machines that were once used to play the media are often no longer manufactured, hard to find, and only a handful of retired engineers know how to repair them. That is, if they can find the right spare parts, which no one sells anymore.
However, once in a while, something that is one of a kind miraculously survives. That was the case for Troy Haupt, a resident of North Carolina’s Outer Banks, who discovered that his father, Martin Haupt, had recorded the very first Super Bowl onto 2” Quadruplex color videotape directly from the 1967 live television broadcast. After Martin passed away, the tapes ended up in Troy’s mother’s attic, yet somehow survived the elements.
What makes this so unique is that, in 1967, videotape was very expensive and archiving at television networks was not a priority. So the networks that aired the first Super Bowl, CBS and NBC, did not save any of the broadcast.
But Martin Haupt happened to work for a company that repaired professional videotape recorders, which were, in 1967, cutting edge technology. Taping television broadcasts was part of Martin’s job, a way to test the machines he was rebuilding. Fortunately, Martin went to work the day Super Bowl 1 aired live. The two Quadruplex videotapes that Martin Haupt used to record Super Bowl 1 cost $200 each in 1967. In today’s dollars, that’s almost $3000 total for the two tapes. Buying a “VCR” at your local department store was unfathomable then, and would not be possible for at least another decade. Somehow, Martin missed recording halftime, and part of the third quarter, but it turns out that Martin’s son Troy now owns the most complete known video recording of Super Bowl 1, in which the quarterback Bart Starr led the Green Bay Packers to a 35-10 victory over the Kansas City Chiefs.
For music fans, another treasure was uncovered in a storage locker in Marin County, CA, in 1986. Betty Cantor-Jackson worked for The Grateful Dead’s road crew, and made professional multi-track recordings of many of their best concerts, between 1971-1980, on reel-to-reel audiotape. The Dead were known for marathon concerts in which some extended songs, like “Dark Star” could easily fill an entire audio reel. The band gave Betty permission to record, but she purchased her own gear and blank tape, tapping into the band’s mixing console to capture high-quality, soundboard recordings of the band’s epic concerts during their prime era. Betty held onto her tapes until she fell on hard times in the 1980’s, lost her home, and had to move the tapes to a storage locker. She couldn’t pay the storage fees, so the locker contents went up for auction.
Some 1000 audio reels ended up in the hands of three different buyers, none of whom knew what the tapes contained. Once the music was discovered, copies of the recordings began to leak to hardcore tape-traders within the Deadhead community, and they became affectionately referred to as “The Betty Boards.” It turns out the tapes include some legendary performances, such as the 1971 Capitol Theatre run, and the May 1977 tour, including “Barton Hall, May 8, 1977,” considered by many Deadheads as one of the best Grateful Dead concerts of all time.
You would think the current owners of Super Bowl 1 and Barton Hall, May 8, 1977 would be sitting on gold. But, that’s where the lawyers come in. Legally, the people who possess these tapes own the physical tapes, but not the content on those tapes. So, Troy Haupt owns the 2” inch quadriplex reels of Super Bowl 1, but the NFL owns what you can see on those reels. The NFL owns the copyright of the broadcast. Likewise, The Grateful Dead owns the music on the audio reels, regardless of who owns the physical tape that contains the music. Unfortunately, for NFL fans and Deadheads, this makes the content somewhat inaccessable for now. Troy Haupt has offered to sell his videotapes to the NFL, but they have mostly ignored him. If Troy tries to sell the tapes to a third party instead, the NFL says they will sue him, for unauthorized distribution of their content. The owners of the Grateful Dead tapes face a similar dilema. The band’s management isn’t willing to pay money for the physical tapes, but if the owners, or any third party the owners sell the tapes to, try to distribute the music, they will get sued. However, if it weren’t for Martin Haupt and Betty Cantor-Jackson, who had the foresight to record these events in the first place, the content would not exist at all.
Over the past 6 months or so the Digital Production Center has been collaborating with Duke Collaboratory for Classics Computing (DC3) and the Conservation Services Department to investigate multispectral imaging capabilities for the Library. Multispectral imaging (MSI) is a mode of image capture that uses a series of narrow band lights of specific frequencies along with a series of filters to illuminate an object. Highly tailored hardware and software are used in a controlled environment to capture artifacts with the goal of revealing information not seen by the human eye. This type of capture system in the Library would benefit many departments and researchers alike. Our primary focus for this collaboration are the needs of the Papyri community, Conservation Services along with additional capacity for the Digital Production Center.
Josh Sosin of DC3 was already in contact with Mike Toth of R. B. Toth Associates, a company that is at the leading edge of MSI for Cultural Heritage and research communities, on a joint effort between DC3, Conservation Services and the Duke Eye Center to use Optical Coherence Tomography (OCT) to hopefully reveal hidden layers of mummy masks made of papyri. The DPC has a long standing relationship with Digital Transitions, a reseller of the Phase One digital back, which happens to be the same digital back used in the Toth MSI system. And the Conservation lab was already involved in the OCT collaboration so it was only natural to invite R. B. Toth Associates to the Library to show us their MSI system.
After observing the OCT work done at the Eye Center we made our way to the Library to setup the MSI system. Bill Christens-Barry of R. B. Toth Associates walked me through some very high-level physics related to MSI, we setup the system and got ready to capture selected material which included Ashkar-Gilson manuscripts, various papyri and other material that might benefit from MSI. By the time we started capturing images we had a full house. Crammed into the room were members of DC3, DPC, Conservation, Digital Transitions and Toth Associates all of whom had a stake in this collaboration. After long hours of sitting in the dark (necessary for MSI image capture) we emerged from the room blurry eyed and full of hope that something previously unseen would be revealed.
The resulting captures are as ‘stack’ or ‘block’ of monochromatic images captured using different wavelengths of light and ultraviolet and infrared filters. Using software developed by Bill Christens-Barry to process and manipulate the images will reveal information if it is there by combining, removing or enhancing images in the stack. One of the first items we processed was Ashkar-GilsonMS14 Deuteronomy 4.2-4.23 seen below. This really blew us away.
This item went from nearly unreadable to almost entirely readable! Bill assured me that he had only done minimal processing and that he should be able to uncover more of the text in the darker areas with some fine tuning. The text of this manuscript was revealed primarily through the use of the IR filter and was not necessarily the direct product of exposing the manuscript to individual bands of light but the result is no less spectacular. Because the capture process is so time consuming and time was limited no other Ashkar-Gilson manuscript was digitized at this time.
We digitized the image on the left in 2010 and ever since then, when asked, ‘What is the most exciting thing you have digitized’ I often answer, “The Ashkar-Gilson manuscripts. Manuscripts from ca. 7th to 8th Century C.E. Some of them still have fur on the back and a number of them are unreadable… but you can feel the history.” Now my admiration for these manuscripts is renewed and maybe Josh can tell me what it says.
It is our hope that we can bring this technology to Duke University so we can explore our material in greater depth and reveal information that has not been seen for a very, very long time.
Beth Doyle, Head of Conservation Services, wrote a blog post for Preservation Underground about her experience with MSI. Check it out!
Also, check out this article from the New & Observer.
Duke Libraries’ Digital Collections offer a wealth of primary source material, opening unique windows to cultural moments both long past and quickly closing. In my work as an audio digitization specialist, I take a particular interest in current and historical audio technology and also how it is depicted in other media. The digitized Duke Chronicle newspaper issues from the 1980’s provide a look at how students of the time were consuming and using ever-smaller audio devices in the early days of portable technology.
Sony introduced the Walkman in the U.S. in 1980. Roughly pocket-sized (actually somewhere around the size of a sandwich or small brick), it allowed the user to take their music on the go, listening to cassette tapes on lightweight headphones while walking, jogging, or travelling. The product was wildly successful and ubiquitous in its time, so much so that “walkman” became a generic term for any portable audio device.
The success of the Walkman was probably bolstered by the jogging/fitness craze that began in the late 1970s. Health-conscious consumers could get in shape while listening to their favorite tunes. This points to two of the main concepts that Sony highlighted in their marketing of the Walkman: personalization and privatization.
Previously, the only widely available portable audio devices were transistor radios, meaning that the listener was at the mercy of the DJ or station manager’s musical tastes. However, the Walkman user could choose from their own collection of commercially available albums, or take it a step further, and make custom mixtapes of their favorite songs.
The Walkman also allowed the user to “tune out” surrounding distractions and be immersed in their own private sonic environment. In an increasingly noisy and urbanized world, the listener was able to carve out a small space in the cacophony and confusion. Some models had two headphone jacks so you could even share this space with a friend.
One can see that these guiding concepts behind the Walkman and its successful marketing have only continued to proliferate and accelerate in the world today. We now expect unlimited on-demand media on our handheld devices 24 hours a day. Students of the 1980’s had to make do with a boombox and backpack full of cassette tapes.
Its that time of year when all the year end “best of” lists come out, best music, movies, books, etc. Well, we could not resist following suit this year, so… Ladies in gentlemen, I give you in – no particular order – the 2015 best of list for the Digital Projects and Production Services department (DPPS).
In 2015, DPPS welcomed a new staff member to our team; Maggie Dickson came on board as our metadata architect! She is already leading a team to whip our digital collections metadata into shape, and is actively consulting with the digital repository team and others around the library. Bringing metadata expertise into the DPPS portfolio ensures that collections are as discoverable, shareable, and re-purposable as possible.
King Intern for Digital Collections
DPPS started the year with two large University Archives projects on our plates: the ongoing Duke University Chronicle digitization and a grant to digitize hundreds of Chapel recordings. Thankfully, University Archives allocated funding for us to hire an intern, and what a fabulous intern we found in Jessica Serrao (the proof is in her wonderful blogposts). The internship has been an unqualified success, and we hope to be able to repeat such a collaboration with other units around the library.
Our digital project developers have spent much of the year developing the new Tripod3 interface for the Duke Digital Repository. This process has been an excellent opportunity for cross departmental collaborative application development and implementing Agile methodology with sprints, scrums, and stand up meetings galore! We launched our first collection not the new platform in October and we will have a second one out the door before the end of this year. We plan on building on this success in 2016 as we migrate existing collections over to Tripod3.
Repository ingest planning
Speaking of Tripod3 and the Duke Digital Repository, we have ingesting digital collections into the Duke Digital Repository since 2014. However, we have a plan to kick ingests up a notch (or 5). Although the real work will happen in 2016, the planning has been a long time coming and we are all very excited to be at this phase of the Tripod3 / repository process (even if it will be a lot of work). Stay tuned!
Digital Collections Promotional Card
This is admittedly a small achievement, but it is one that has been on my to-do list for 2 years so it actually feels like a pretty big deal. In 2015, we designed a 5 x 7 postcard to hand out during Digital Production Center (DPC) tours, at conferences, and to any visitors to the library. Also, I just really love to see my UNC fan colleagues cringe every time they turn the card over and see Coach K’s face. Its really the little things that make our work fun.
New Exhibits Website
In anticipation of opening of new exhibit spaces in the renovated Rubenstein library, DPPS collaborated with the exhibits coordinator to create a brand new library exhibits webpage. This is your one stop shop for all library exhibits information in all its well-designed glory.
Audio and Video Preservation
In 2014, the Digital production Center bolstered workflows for preservation based digitization. Unlike our digital collections projects, these preservation digitization efforts do not have a publication outcome so they often go unnoticed. Over the past year, we have quietly digitized around 400 audio cassettes in house (this doesn’t count outsourced Chapel Recordings digitization), some of which need to be dramatically re-housed.
On the video side, efforts have been sidelined by digital preservation storage costs. However some behind the scenes planning is in the works, which means we should be able to do more next year. Also, we were able to purchase a Umatic tape cleaner this year, which while it doesn’t sound very glamorous to the rest of the world, thrills us to no end.
Revisiting the William Gedney Digital Collection
Fans of Duke Digital Collections are familiar with the current Gedney Digital Collection. Both the physical and digital collection have long needed an update. So in recent years, the physical collection has been reprocessed, and this Fall we started an effort to digitized more materials in the collection and to higher standards than were practical in the late 1990s.
When the Rubenstein Library re-opened, our neighbor moved into the new building, and the DPC got to expand into his office! The extra breathing room means more space for our specialists and our equipment, which is not only more comfortable but also better for our digitization practices. The two spaces are separate for now, but we are hoping to be able to combine them in the next year or two.
2015 was a great year in DPPS, and there are many more accomplishments we could add to this list. One of our team mottos is: “great productivity and collaboration, business as usual”. We look forward to more of the same in 2016!
As 2015 winds down, the Digital Production Center is wrapping up a four-year collaboration with the Duke Herbarium to digitize their lichen and bryophyte specimens. The project is funded by the National Science Foundation, and the ultimate goal is to digitize over 2 million specimens from more than 60 collections across the nation. Lichens and bryophytes (mosses and their relatives) are important indicators of climate change. After the images from the participating institutions are uploaded to one central portal, called iDigBio, large-scale distribution mapping will be used to identify regions where environmental changes are taking place, allowing scientists to study the patterns and effects of these changes.
The specimens are first transported from the Duke Herbarium to Perkins Library on a scheduled timeline. Then, we photograph the specimen labels using our Phase One overhead camera. Some of the specimens are very bulky, but our camera’s depth of field is broad enough to keep them in focus. To be clear, what the project is utilizing is not photos of the actual plant specimens themselves, but rather images of the typed and hand-written scientific metadata adorning the envelopes which house the specimens. After we photograph them, the images are uploaded to the national database, where they are available for online research, along with other specimen labels uploaded from universities across the United States. Optical character recognition is used to digest and organize the scientific metadata in the images.
Over the past four years, the Digital Production Center has digitized approximately 100,000 lichen and bryophyte specimens. Many are from the Duke Herbarium, but some other institutions have also asked us to digitize some of their specimens, such as UNC-Chapel Hill, SUNY-Binghamton, Towson University and the University of Richmond. The Duke Herbarium is the second-largest herbarium of all U.S. private universities, next to Harvard. It was started in 1921, and it contains more than 800,000 specimens of vascular plants, bryophytes, algae, lichens, and fungi, some of which were collected as far back as the 1800s. Several specimens have unintentionally humorous names, like the following, which wants to be funky, but isn’t fooling anyone. Ok, maybe only I find that funny.
The project has been extensive, but enjoyable, thanks to the leadership of Duke Herbarium Data Manager Blanka Shaw. Dr. Shaw has personally collected bryophytes on many continents, and has brought a wealth of knowledge, energy and good humor to the collaboration with the Digital Production Center. The Duke Herbarium is open for visitors, and citizen scientists are also needed to volunteer for transcription and georeferencing of the extensive metadata collected in the national database.
We experience a number of different cycles in the Digital Projects and Production Services Department (DPPS). There is of course the project lifecycle, that mysterious abstraction by which we try to find commonalities in work processes that can seem unique for every case. We follow the academic calendar, learn our fate through the annual budget cycle, and attend weekly, monthly, and quarterly meetings.
The annual reporting cycle at Duke University Libraries usually falls to departments in August, with those reports informing a master library report completed later. Because of the activities and commitments around the opening of the Rubenstein Library, the departments were let off the hook for their individual reports this year. Nevertheless, I thought I would use my turn in the Bitstreams rotation to review some highlights from our 2014-15 cycle.
Today we will take a detailed look at how the Duke Chronicle, the university’s beloved newspaper for over 100 years, is digitized. Since our scope of digitization spans nine decades (1905-1989), it is an ongoing project the Digital Production Center (DPC), part of Digital Projects and Production Services (DPPS) and Duke University Libraries’ Digital Collections Program, has been chipping away at. Scanning and digitizing may seem straightforward to many – place an item on a scanner and press scan, for goodness sake! – but we at the DPC want to shed light on our own processes to give you a sense of what we do behind the scenes. It seems like an easy-peasy process of scanning and uploading images online, but there is much more that goes into it than that. Digitizing a large collection of newspapers is not always a fun-filled endeavor, and the physical act of scanning thousands of news pages is done by many dedicated (and patient!) student workers, staff members, and me, the King Intern for Digital Collections.
Many steps in the digitization process do not actually occur in the DPC, but among other teams or departments within the library. Though I focus mainly on the DPC’s responsibilities, I will briefly explain the steps others perform in this digital projects tango…or maybe it’s a waltz?
Each proposed project must first be approved by the Advisory Council for Digital Collections (ACDC), a team that reviews each project for its strategic value. Then it is passed on to the Digital Collections Implementation Team (DCIT) to perform a feasibility study that examines the project’s strengths and weaknesses (see Thomas Crichlow’s post for an overview of these teams). The DCIT then helps guide the project to fruition. After clearing these hoops back in 2013, the Duke Chronicle project started its journey toward digital glory.
We pull 10 years’ worth of newspapers at a time from the University Archives in Rubenstein Library. Only one decade at a time is processed to make the 80+ years of Chronicle publications more manageable. The first stop is Conservation. To make sure the materials are stable enough to withstand digitizing, Conservation must inspect the condition of the paper prior to giving the DPC the go-ahead. Because newspapers since the mid-19th century were printed on cheap and very acidic wood pulp paper, the pages can become brittle over time and may warrant extensive repairs. Senior Conservator, Erin Hammeke, has done great work mending tears and brittle edges of many Chronicle pages since the start of this project. As we embark on digitizing the older decades, from the 1940s and earlier, Erin’s expertise will be indispensable. We rely on her not only to repair brittle pages but to guide the DPC’s strategy when deciding the best and safest way to digitize such fragile materials. Also, several volumes of the Chronicle have been bound, and to gain the best digital image scan these must be removed from their binding. Erin to the rescue!
Now that Conservation has assessed the condition and given the DPC the green light, preliminary prep work must still be done before the scanner comes into play. A digitization guide is created in Microsoft Excel to list each Chronicle issue along with its descriptive metadata (more information about this process can be found in my metadata blog post). This spreadsheet acts as a guide in the digitization process (hence its name, digitization guide!) to keep track of each analog newspaper issue and, once scanned, its corresponding digital image. In this process, each Chronicle issue is inspected to collect the necessary metadata. At this time, a unique identifier is assigned to every issue based on the DPC’s naming conventions. This identifier stays with each item for the duration of its digital life and allows for easy identification of one among thousands of Chronicle issues. At the completion of the digitization guide, the Chronicle is now ready for the scanner.
The Scanning Process
With all loose unbound issues, the Zeutschel is our go-to scanner because it allows for large format items to be imaged on a flat surface. This is less invasive and less damaging to the pages, and is quicker than other scanning methods. The Zeutschel can handle items up to 25 x 18 inches, which accommodates the larger sized formats of the Chronicle used in the 1940s and 1950s. If bound issues must be digitized, due to the absence of a loose copy or the inability to safely dis-bound a volume, the Phase One digital camera system is used as it can better capture large bound pages that may not necessarily lay flat.
For every scanning session, we need the digitization guide handy as it tells what to name the image files using the previously assigned unique identifier. Each issue of the newspaper is scanned as a separate folder of images, with one image representing one page of the newspaper. This system of organization allows for each issue to become its own compound object – multiple files bound together with an XML structure – once published to the website. The Zeutschel’s scanning software helps organize these image files into properly named folders. Of course, no digitization session would be complete without the initial target scan that checks for color calibration (See Mike Adamo’s post for a color calibration crash course).
The scanner’s plate glass can now be raised with the push of a button (or the tap of a foot pedal) and the Chronicle issue is placed on the flatbed. Lowering the plate glass down, the pages are flattened for a better scan result. Now comes the excitement… we can finally press SCAN. For each page, the plate glass is raised, lowered, and the scan button is pressed. Chronicle issues can have anywhere from 2 to 30 or more pages, so you can image this process can become monotonous – or even mesmerizing – at times. Luckily, with the smaller format decades, like the 1970s and 1980s, the inner pages can be scanned two at a time and the Zeutschel software separates them into two images, which cuts down on the scan time. As for the larger formats, the pages are so big you can only fit one on the flatbed. That means each page is a separate scan, but older years tended to publish less issues, so it’s a trade-off. To put the volume of this work into perspective, the 1,408 issues of the 1980s Chronicle took 28,089 scans to complete, while the 1950s Chronicle of about 482 issues took around 3,700 scans to complete.
Every scanned image that pops up on the screen is also checked for alignment and cropping errors that may require a re-scan. Once all the pages in an issue are digitized and checked for errors, clicking the software’s Finalize button will compile the images in the designated folder. We now return to our digitization guide to enter in metadata pertaining to the scanning of that issue, including capture person, capture date, capture device, and what target image relates to this session (subsequent issues do not need a new target scanned, as long as the scanning takes place in the same session).
Now, with the next issue, rinse and repeat: set the software settings and name the folder, scan the issue, finalize, and fill out the digitization guide. You get the gist.
We now find ourselves with a slue of folders filled with digitized Chronicle images. The next phase of the process is quality control (QC). Once every issue from the decade is scanned, the first round of QC checks all images for excess borders to be cropped, crooked images to be squared, and any other minute discrepancy that may have resulted from the scanning process. This could be missing images, pages out of order, or even images scanned upside down. This stage of QC is often performed by student workers who diligently inspect image after image using Adobe Photoshop. The second round of QC is performed by our Digital Production Specialist Zeke Graves who gives every item a final pass.
At this stage, derivatives of the original preservation-quality images are created. The originals are archived in dark storage, while the smaller-sized derivatives are used in the CONTENTdm ingest process. CONTENTdm is the digital collection management software we use that collates the digital images with their appropriate descriptive metadata from our digitization guide, and creates one compound object for each Chronicle issue. It also generates the layer of Optical Character Recognition (OCR) data that makes the Chronicle text searchable, and provides an online interface for users to discover the collection once published on the website. The images and metadata are ingested into CONTENTdm’s Project Client in small batches (1 to 3 years of Chronicle issues) to reduce the chance of upload errors. Once ingested into CONTENTdm, the items are then spot-checked to make sure the metadata paired up with the correct image. During this step, other metadata is added that is specific to CONTENTdm fields, including the ingest person’s initials. Then, another ingest must run to push the files and data from the Project Client to the CONTENTdm server. A third step after this ingest finishes is to approve the items in the CONTENTdm administrative interface. This gives the go-ahead to publish the material online.
Hold on, we aren’t done yet. The project is now passed along to our developers in DPPS who must add this material to our digital collections platform for online discovery and access (they are currently developing Tripod3 to replace the previous Tripod2 platform, which is more eloquently described in Will Sexton’s post back in April). Not only does this improve discoverability, but it makes all of the library’s digital collections look more uniform in their online presentation.
Then, FINALLY, the collection goes live on the web. Now, just repeat the process for every decade of the Duke Chronicle, and you can see how this can become a rather time-heavy and laborious process. A labor of love, that is.
I could have narrowly stuck with describing to you the scanning process and the wonders of the Zeutschel, but I felt that I’d be shortchanging you. Active scanning is only a part of the whole digitization process which warrants a much broader narrative than just “push scan.” Along this journey to digitize the Duke Chronicle, we’ve collectively learned many things. The quirks and trials of each decade inform our process for the next, giving us the chance to improve along the way (to learn how we reflect upon each digital project after completion, go to Molly Bragg’s blog post on post-mortem reports).
If your curiosity is piqued as to how the Duke Chronicle looks online, the Fall 1959-Spring 1970 and January 1980-February 1989 issues are already available to view in our digital collections. The 1970s Chronicle is the next decade slated for publication, followed by the 1950s. Though this isn’t a comprehensive detailed account of the digitization process, I hope it provides you with a clearer picture of how we bring a collection, like the Duke Chronicle, into digital existence.
Notes from the Duke University Libraries Digital Projects Team