In the Digital Production Center, many of the videotapes we digitize have “bars and tone” at the beginning of the tape. These are officially called “SMPTE color bars.” SMPTE stands for The Society of Motion Picture and Television Engineers, the organization that established the color bars as the North American video standard, beginning in the 1970s. In addition to the color bars presented visually, there is an audio tone that is emitted from the videotape at the same time, thus the phrase “bars and tone.”
The purpose of bars and tone is to serve as a reference or target for the calibration of color and audio levels coming from the videotape during transmission. The color bars are presented at 75% intensity. The audio tone is a 1kHz sine wave. In the DPC, we can make adjustments to the incoming signal, in order to bring the target values into specification. This is done by monitoring the vectorscope output, and the audio levels. Below, you can see the color bars are in proper alignment on the DPC’s vectorscope readout, after initial adjustment.
We use Blackmagic Design’s SmartView monitors to check the vectorscope, as well as waveform and audio levels. The SmartView is an updated, more compact and lightweight version of the older, analog equipment traditionally used in television studios. The Smartview monitors are integrated into our video rack system, along with other video digitization equipment, and numerous videotape decks.
If you are old enough to have grown up in the black and white television era, you may recognize this old TV test pattern, commonly referred to as the “Indian-head test pattern.” This often appeared just before a TV station began broadcasting in the morning, and again right after the station signed off at night. The design was introduced in 1939 by RCA. The “Indian-head” image was integrated into a pattern of lines and shapes that television engineers used to calibrate broadcast equipment. Because the illustration of the Native American chief contained identifiable shades of gray, and had fine detail in the feathers of the headdress, it was ideal for adjusting brightness and contrast.
When color television debuted in the 1960’s, the “Indian-head test pattern” was replaced with a test card showing color bars, a precursor to the SMPTE color bars. Today, the “Indian-head test pattern” is remembered nostalgically, as a symbol of the advent of television, and as a unique piece of Americana. The master art for the test pattern was discovered in an RCA dumpster in 1970, and has since been sold to a private collector. In 2009, when all U.S. television stations were required to end their analog signal transmission, many of the stations used the Indian-head test pattern as their final analog broadcast image.
In previous posts I have referred to the FADGI standard for still image capture when describing still image creation in the Digital Production Center in support of our Digital Collections Program. We follow this standard in order to create archival files for preservation, long-term retention and access to our materials online. These guidelines help us create digital content in a consistent, scalable and efficient way. The most common cited part of the standard is the PPI guidelines for capturing various types of material. It is a collection of charts that contain various material types, physical dimensions and recommended capture specifications. The charts are very useful and relatively easy to read and understand. But this standard includes 93 “exciting” pages of all things still image capture including file specifications, color encoding, data storage, physical environment, backup strategies, metadata and workflows. Below I will boil down the first 50 or so pages.
Full disclosure. Perkins Library and our digitization program didn’t start with any part of these guidelines in place. In fact, these guidelines didn’t exist at the time of our first attempt at in-house digitization in 1993. We didn’t even have an official digitization lab until early 2005. We started with one Epson flatbed scanner and one high end CRT monitor. As our Digital Collections Program has matured, we have been able to add equipment and implement more of the standard starting with scanner and monitor calibration and benchmark testing of capture equipment before purchase. We then established more consistent workflows and technical metadata capture, developed a more robust file naming scheme, file movement and data storage strategies. We now work hard to synchronize our efforts between all of the departments involved in our Digital Collections Program. We are always refining our workflows and processes to become more efficient at publishing and preserving Digital Collections.
Dive Deep. For those of you who would like to take a deep dive into image capture for cultural heritage institutions, here is the full standard. For those of you who don’t fall into this category, I’ve boiled down the standard below. I believe that it’s necessary to use the whole standard in order for a program to become stable and mature. As we did, this can be implemented over time.
Boil It Down. The FADGI standard provides a tiered approach for still image capture, from 1 to 4 stars, with four stars being the highest. The 1 and 2 star tiers are used when imaging for access and tiers 3 and 4 are used for archival imaging and preservation at the focus.
The physical environment: The environment should be color neutral. Walls should be painted a neutral gray to minimize color shifts and flare that might come from a wall color that is not neutral. Monitors should be positioned to avoid glare on the screens (This is why most professional monitors have hoods). Overhead lighting should be around 5000K (Tungsten, florescent and other bulbs can have yellow, magenta and green color shifts which can affect the perception of the color of an image). Each capture device should be separated so that light spillover doesn’t affect another capture device.
Monitors and Light boxes and viewing of originals: Overhead light or a viewing booth should be set up for viewing of originals and should be a neutral 5000K. A light box used for viewing transmissive material should also be 5000K.
Digital images should be viewed in the colorspace they were captured in and the monitor should be able to display that colorspace. Most monitors display in the sRGB colorspace. However, professional monitors use the AdobeRGB colorspace which is commonly used in cultural heritage image capture. The color temperature of your monitor should be set to the Kelvin temperature that most closely matches the viewing environment. If the overhead lights are 5000K, then the monitor’s color temperature should also be set to 5000K.
Calibrating packages that consist of hardware and software that read and evaluate color is an essential piece of equipment. These packages normalize the luminosity, color temperature and color balance of a monitor and create an ICC display profile that is used by the computer’s operating system to display colors correctly so that accurate color assessment can be made.
Capture Devices: The market is flooded with capture devices of varying quality. It is important to do research on any new capture device. I recommend skipping the marketing schemes that tout all the bells and whistles and just stick to talking to institutions that have established digital collections programs. This will help to focus research on the few contenders that will produce the files that you need. They will help you slog through how many megapixels are necessary, what lens are best for which application, what scanner driver is easiest to use while balanced with getting the best color out of your scanner. Beyond the capture device, other things that come into play are effective scanner drivers that produce the most accurate and consistent results, upgrade paths for your equipment and service packages that help maintain your equipment.
Capture Specifications: I’ll keep this part short because there are a wide variety of charts covering many formats, capture specifications and their corresponding tiers. Below I have simplified the information from the charts. These specification hover between tier 3 and 4 mostly leaning toward 4.
Always use a FADGI compliant reference target at the beginning of a session to ensure the capture device is within acceptable deviation. The target values differ depending on which reference targets are used. Most targets come with a chart representing numerical value of each swatch in the target. Our lab uses a classic Gretagmacbeth target and our acceptable color deviation is +/- 5 units of color.
Our general technical specs for reflective material including books, documents, photographs and maps are:
Master File Format: TIFF
Resolution: 300 ppi
Bit Depth: 8
Color Depth: 24 bit RGB
Color Space: Adobe 1998
These specifications generally follow the standard. If the materials being scanned are smaller than 5×7 inches we increase the PPI to 400 or 600 depending on the font size and dimensions of the object.
Our general technical specs for transmissive material including acetate, nitrate and glass plate negatives, slides and other positive transmissive material are:
Master File Format: TIFF
Resolution: 3000 – 4000 ppi
Bit Depth: 16
Color Depth: 24 bit RGB
Color Space: Adobe 1998
These specifications generally follow the standard. If the transmissive materials being scanned are larger than 4×5 we decrease the PPI to 1500 or 2000 depending on negative size and condition.
Recommended capture devices: The standard goes into detail on what capture devices to use and not to use when digitizing different types of material. It describes when to use manually operated planetary scanners as opposed to a digital scan back, when to use a digital scan back instead of a flatbed scanner, when and when not to use a sheet fed scanner. Not every device can capture every type of material. In our lab we have 6 different devices to capture a wide variety of material in different states of fragility. We work with our Conservation Department when making decisions on what capture device to use.
General Guidelines for still image capture
Do not apply pressure with a glass platen or otherwise unless approved by a paper conservator.
Do not use vacuum boards or high UV light sources unless approved by a paper conservator.
Do not use auto page turning devices unless approved by a paper conservator.
For master files, pages, documents and photographs should be imaged to include the entire area of the page, document or photograph.
For bound items the digital image should capture as far into the gutter as practical but must include all of the content that is visible to the eye.
If a backing sheet is used on a translucent piece of paper to increase contrast and readability, it must extend beyond the edge of the page to the end of the image on all open sides of the page.
For master files, documents should be imaged to include the entire area and a small amount beyond to define the area.
Do not use lighting systems that raise the surface temperature of the original more than 6 degrees F(3 degrees C)in the total imaging process.
When capturing oversized material, if the sections of a multiple scan item are compiled into a single image, the separate images should be retained for archival and printing purposes.
The use of glass or other materials to hold photographic images flat during capture is allowed, but only when the original will not be harmed by doing so. Care must be taken to assure that flattening a photograph will not result in emulsion cracking, or the base material being damaged. Tightly curled materials must not be forced to lay flat.
For original color transparencies, the tonal scale and color balance of the digital image should match the original transparency being scanned to provide accurate representation of the image.
When scanning negatives, for master files the tonal orientation may be inverted to produce a positive The resulting image will need to be adjusted to produce a visually-pleasing representation. Digitizing negatives is very analogous to printing negatives in a darkroom and it is very dependent on the photographer’s/ technician’s skill and visual literacy to produce a good image. There are few objective metrics for evaluating the overall representation of digital images produced from negatives.
The lack of dynamic range in a film scanning system will result in poor highlight and shadow detail and poor color reproduction.
No image retouching is permitted to master files.
These details were pulled directly from the standard. They cover a lot of ground but there are always decisions to be made that are uniquely related to the material to be digitized. There are 50 or so more pages of this standard related to workflow, color management, data storage, file naming and technical metadata. I’ll have to cover that in my next blog post.
The FADGI standard for still image capture is very thorough but also leaves room to adapt. While we don’t follow everything outlined in the standard we do follow the majority. This standard, years of experience and a lot of trial and error have helped make our program more sound, consistent and scalable.
While most of my Bitstreams posts have focused on my work preserving and archiving audio collections, my job responsibilities also include digitizing materials for display in Duke University Libraries Exhibits. The recent renovation and expansion of the Perkins Library entrance and the Rubenstein Library have opened up significantly more gallery space, meaning more exhibits being rotated through at a faster pace.
Working with such a variety of media spanning different library collections presents a number of challenges and necessitates working closely with our Exhibits and Conservation departments. First, we have to make sure that we have all of the items listed in the inventory provided by the exhibit curator. Secondly, we have to make sure we have all of the relevant information about how each item should be digitally captured (e.g. What image resolution and file specifications? Which pages from a larger volume? What section of a larger map or print?) Next we have to consider handling for items that are in fragile condition and need special attention. Finally, we use all of this information to determine which scanner, camera, or A/V deck is appropriate for each item and what the most efficient order to capture them in is.
All of this planning and preliminary work helps to ensure that the digitization process goes smoothly and that most questions and irregularities have already been addressed. Even so, there are always issues that come up forcing us to improvise creative solutions. For instance: how to level and stabilize a large, fragile folded map that is tipped into a volume with tight binding? How to assemble a seamless composite image of an extremely large poster that has to be photographed in multiple sections? How to minimize glare and reflection from glossy photos that are cupped from age? I won’t give away all of our secrets here, but I’ll provide a couple examples from the Duke Chapel exhibit that is currently on display in the Jerry and Bruce Chappell Family gallery.
This facsimile of a drawing for one of the Chapel’s carved angels was reproduced from an original architectural blueprint. It came to us as a large and tightly rolled blueprint–so large, in fact, that we had to add a piece of plywood to our usual camera work surface to accommodate it. We then strategically placed weights around the blueprint to keep it flattened while not obscuring the section with the drawing. The paper was still slightly wrinkled and buckled in places (which can lead to uneven color and lighting in the resulting digital image) but fortunately the already mottled complexion of the blueprint material made it impossible to notice these imperfections.
These projected images of the Chapel’s stained glass were reproduced from slides taken by a student in 1983 and currently housed in the University Archives. After the first run through our slide scanner, the digital images looked okay on screen, but were noticeably blurry when enlarged. Further investigation of the slides revealed an additional clear plastic protective housing which we were able to carefully remove. Without this extra refractive layer, the digital images were noticeably sharper and more vibrant.
Despite the digitization challenges, it is satisfying to see these otherwise hidden treasures being displayed and enjoyed in places that students, staff, and visitors pass through everyday–and knowing that we played a small part in contributing to the finished product!
If you happen to be rummaging through your parents’ or grandparents’ attic, basement or garage, and stumble upon some old reel-to-reel audiotape, or perhaps some dust-covered videotape reels that seem absurdly large & clunky, they are most likely worthless, except for perhaps sentimental value. Even if these artifacts did, at one time, have some unique historic content, you may never know, because there’s a strong chance that decades of temperature extremes have made the media unplayable. The machines that were once used to play the media are often no longer manufactured, hard to find, and only a handful of retired engineers know how to repair them. That is, if they can find the right spare parts, which no one sells anymore.
However, once in a while, something that is one of a kind miraculously survives. That was the case for Troy Haupt, a resident of North Carolina’s Outer Banks, who discovered that his father, Martin Haupt, had recorded the very first Super Bowl onto 2” Quadruplex color videotape directly from the 1967 live television broadcast. After Martin passed away, the tapes ended up in Troy’s mother’s attic, yet somehow survived the elements.
What makes this so unique is that, in 1967, videotape was very expensive and archiving at television networks was not a priority. So the networks that aired the first Super Bowl, CBS and NBC, did not save any of the broadcast.
But Martin Haupt happened to work for a company that repaired professional videotape recorders, which were, in 1967, cutting edge technology. Taping television broadcasts was part of Martin’s job, a way to test the machines he was rebuilding. Fortunately, Martin went to work the day Super Bowl 1 aired live. The two Quadruplex videotapes that Martin Haupt used to record Super Bowl 1 cost $200 each in 1967. In today’s dollars, that’s almost $3000 total for the two tapes. Buying a “VCR” at your local department store was unfathomable then, and would not be possible for at least another decade. Somehow, Martin missed recording halftime, and part of the third quarter, but it turns out that Martin’s son Troy now owns the most complete known video recording of Super Bowl 1, in which the quarterback Bart Starr led the Green Bay Packers to a 35-10 victory over the Kansas City Chiefs.
For music fans, another treasure was uncovered in a storage locker in Marin County, CA, in 1986. Betty Cantor-Jackson worked for The Grateful Dead’s road crew, and made professional multi-track recordings of many of their best concerts, between 1971-1980, on reel-to-reel audiotape. The Dead were known for marathon concerts in which some extended songs, like “Dark Star” could easily fill an entire audio reel. The band gave Betty permission to record, but she purchased her own gear and blank tape, tapping into the band’s mixing console to capture high-quality, soundboard recordings of the band’s epic concerts during their prime era. Betty held onto her tapes until she fell on hard times in the 1980’s, lost her home, and had to move the tapes to a storage locker. She couldn’t pay the storage fees, so the locker contents went up for auction.
Some 1000 audio reels ended up in the hands of three different buyers, none of whom knew what the tapes contained. Once the music was discovered, copies of the recordings began to leak to hardcore tape-traders within the Deadhead community, and they became affectionately referred to as “The Betty Boards.” It turns out the tapes include some legendary performances, such as the 1971 Capitol Theatre run, and the May 1977 tour, including “Barton Hall, May 8, 1977,” considered by many Deadheads as one of the best Grateful Dead concerts of all time.
You would think the current owners of Super Bowl 1 and Barton Hall, May 8, 1977 would be sitting on gold. But, that’s where the lawyers come in. Legally, the people who possess these tapes own the physical tapes, but not the content on those tapes. So, Troy Haupt owns the 2” inch quadriplex reels of Super Bowl 1, but the NFL owns what you can see on those reels. The NFL owns the copyright of the broadcast. Likewise, The Grateful Dead owns the music on the audio reels, regardless of who owns the physical tape that contains the music. Unfortunately, for NFL fans and Deadheads, this makes the content somewhat inaccessable for now. Troy Haupt has offered to sell his videotapes to the NFL, but they have mostly ignored him. If Troy tries to sell the tapes to a third party instead, the NFL says they will sue him, for unauthorized distribution of their content. The owners of the Grateful Dead tapes face a similar dilema. The band’s management isn’t willing to pay money for the physical tapes, but if the owners, or any third party the owners sell the tapes to, try to distribute the music, they will get sued. However, if it weren’t for Martin Haupt and Betty Cantor-Jackson, who had the foresight to record these events in the first place, the content would not exist at all.
Over the past 6 months or so the Digital Production Center has been collaborating with Duke Collaboratory for Classics Computing (DC3) and the Conservation Services Department to investigate multispectral imaging capabilities for the Library. Multispectral imaging (MSI) is a mode of image capture that uses a series of narrow band lights of specific frequencies along with a series of filters to illuminate an object. Highly tailored hardware and software are used in a controlled environment to capture artifacts with the goal of revealing information not seen by the human eye. This type of capture system in the Library would benefit many departments and researchers alike. Our primary focus for this collaboration are the needs of the Papyri community, Conservation Services along with additional capacity for the Digital Production Center.
Josh Sosin of DC3 was already in contact with Mike Toth of R. B. Toth Associates, a company that is at the leading edge of MSI for Cultural Heritage and research communities, on a joint effort between DC3, Conservation Services and the Duke Eye Center to use Optical Coherence Tomography (OCT) to hopefully reveal hidden layers of mummy masks made of papyri. The DPC has a long standing relationship with Digital Transitions, a reseller of the Phase One digital back, which happens to be the same digital back used in the Toth MSI system. And the Conservation lab was already involved in the OCT collaboration so it was only natural to invite R. B. Toth Associates to the Library to show us their MSI system.
After observing the OCT work done at the Eye Center we made our way to the Library to setup the MSI system. Bill Christens-Barry of R. B. Toth Associates walked me through some very high-level physics related to MSI, we setup the system and got ready to capture selected material which included Ashkar-Gilson manuscripts, various papyri and other material that might benefit from MSI. By the time we started capturing images we had a full house. Crammed into the room were members of DC3, DPC, Conservation, Digital Transitions and Toth Associates all of whom had a stake in this collaboration. After long hours of sitting in the dark (necessary for MSI image capture) we emerged from the room blurry eyed and full of hope that something previously unseen would be revealed.
The resulting captures are as ‘stack’ or ‘block’ of monochromatic images captured using different wavelengths of light and ultraviolet and infrared filters. Using software developed by Bill Christens-Barry to process and manipulate the images will reveal information if it is there by combining, removing or enhancing images in the stack. One of the first items we processed was Ashkar-GilsonMS14 Deuteronomy 4.2-4.23 seen below. This really blew us away.
This item went from nearly unreadable to almost entirely readable! Bill assured me that he had only done minimal processing and that he should be able to uncover more of the text in the darker areas with some fine tuning. The text of this manuscript was revealed primarily through the use of the IR filter and was not necessarily the direct product of exposing the manuscript to individual bands of light but the result is no less spectacular. Because the capture process is so time consuming and time was limited no other Ashkar-Gilson manuscript was digitized at this time.
We digitized the image on the left in 2010 and ever since then, when asked, ‘What is the most exciting thing you have digitized’ I often answer, “The Ashkar-Gilson manuscripts. Manuscripts from ca. 7th to 8th Century C.E. Some of them still have fur on the back and a number of them are unreadable… but you can feel the history.” Now my admiration for these manuscripts is renewed and maybe Josh can tell me what it says.
It is our hope that we can bring this technology to Duke University so we can explore our material in greater depth and reveal information that has not been seen for a very, very long time.
Beth Doyle, Head of Conservation Services, wrote a blog post for Preservation Underground about her experience with MSI. Check it out!
Also, check out this article from the New & Observer.
Its that time of year when all the year end “best of” lists come out, best music, movies, books, etc. Well, we could not resist following suit this year, so… Ladies in gentlemen, I give you in – no particular order – the 2015 best of list for the Digital Projects and Production Services department (DPPS).
In 2015, DPPS welcomed a new staff member to our team; Maggie Dickson came on board as our metadata architect! She is already leading a team to whip our digital collections metadata into shape, and is actively consulting with the digital repository team and others around the library. Bringing metadata expertise into the DPPS portfolio ensures that collections are as discoverable, shareable, and re-purposable as possible.
King Intern for Digital Collections
DPPS started the year with two large University Archives projects on our plates: the ongoing Duke University Chronicle digitization and a grant to digitize hundreds of Chapel recordings. Thankfully, University Archives allocated funding for us to hire an intern, and what a fabulous intern we found in Jessica Serrao (the proof is in her wonderful blogposts). The internship has been an unqualified success, and we hope to be able to repeat such a collaboration with other units around the library.
Our digital project developers have spent much of the year developing the new Tripod3 interface for the Duke Digital Repository. This process has been an excellent opportunity for cross departmental collaborative application development and implementing Agile methodology with sprints, scrums, and stand up meetings galore! We launched our first collection not the new platform in October and we will have a second one out the door before the end of this year. We plan on building on this success in 2016 as we migrate existing collections over to Tripod3.
Repository ingest planning
Speaking of Tripod3 and the Duke Digital Repository, we have ingesting digital collections into the Duke Digital Repository since 2014. However, we have a plan to kick ingests up a notch (or 5). Although the real work will happen in 2016, the planning has been a long time coming and we are all very excited to be at this phase of the Tripod3 / repository process (even if it will be a lot of work). Stay tuned!
Digital Collections Promotional Card
This is admittedly a small achievement, but it is one that has been on my to-do list for 2 years so it actually feels like a pretty big deal. In 2015, we designed a 5 x 7 postcard to hand out during Digital Production Center (DPC) tours, at conferences, and to any visitors to the library. Also, I just really love to see my UNC fan colleagues cringe every time they turn the card over and see Coach K’s face. Its really the little things that make our work fun.
New Exhibits Website
In anticipation of opening of new exhibit spaces in the renovated Rubenstein library, DPPS collaborated with the exhibits coordinator to create a brand new library exhibits webpage. This is your one stop shop for all library exhibits information in all its well-designed glory.
Audio and Video Preservation
In 2014, the Digital production Center bolstered workflows for preservation based digitization. Unlike our digital collections projects, these preservation digitization efforts do not have a publication outcome so they often go unnoticed. Over the past year, we have quietly digitized around 400 audio cassettes in house (this doesn’t count outsourced Chapel Recordings digitization), some of which need to be dramatically re-housed.
On the video side, efforts have been sidelined by digital preservation storage costs. However some behind the scenes planning is in the works, which means we should be able to do more next year. Also, we were able to purchase a Umatic tape cleaner this year, which while it doesn’t sound very glamorous to the rest of the world, thrills us to no end.
Revisiting the William Gedney Digital Collection
Fans of Duke Digital Collections are familiar with the current Gedney Digital Collection. Both the physical and digital collection have long needed an update. So in recent years, the physical collection has been reprocessed, and this Fall we started an effort to digitized more materials in the collection and to higher standards than were practical in the late 1990s.
When the Rubenstein Library re-opened, our neighbor moved into the new building, and the DPC got to expand into his office! The extra breathing room means more space for our specialists and our equipment, which is not only more comfortable but also better for our digitization practices. The two spaces are separate for now, but we are hoping to be able to combine them in the next year or two.
2015 was a great year in DPPS, and there are many more accomplishments we could add to this list. One of our team mottos is: “great productivity and collaboration, business as usual”. We look forward to more of the same in 2016!
The initial thought I had for this blog post was to describe a slice of my day that revolved around the work of William Gedney. I was going to spin a tale about being on the hunt for a light meter to take lux (luminance) readings used to help calibrate the capture environment of one of our scanners. On my search for the light meter I bumped into the new exhibit of William Gedney’s handmade books displayed in the Chappell Family Gallery in the Perkins Library. I had digitized a number of these books a few months ago and enjoyed pretty much every image in the books. One of the books on display was opened to a particular photograph. To my surprise, I had just digitized a finished print of the same image that very morning while working on a larger project to digitize all of Gedney’s finished prints, proof prints, contact sheets and other material. Once the project is complete (a year or so from now) I will have personally seen, handled and digitized over 20,000 of Gedney’s photographs. Whoa! Would I be able to recognize Gedney images whenever one presented itself just like the book in the gallery? Maybe.
Once the collection is digitized and published through Duke Digital Collections the whole world will be able to see this amazing body of work. Instead of boring you with the details of that story I thought I would just leave you with a few images from the collection. For me, many of Gedney’s photographs have a kinetic energy to them. It seems as if I can almost feel the air. My imagination may be working overtime to achieve this and the reality of what was happening when the photograph was taken may be wholly different but the fact is these photographs spin up my imagination and transport me to the moments he has captured. These photographs inspire me to dust off my enlarger and set up a darkroom.
It may take some time to complete this particular project but there are other William Gedney related projects, materials and events available at Duke.
As 2015 winds down, the Digital Production Center is wrapping up a four-year collaboration with the Duke Herbarium to digitize their lichen and bryophyte specimens. The project is funded by the National Science Foundation, and the ultimate goal is to digitize over 2 million specimens from more than 60 collections across the nation. Lichens and bryophytes (mosses and their relatives) are important indicators of climate change. After the images from the participating institutions are uploaded to one central portal, called iDigBio, large-scale distribution mapping will be used to identify regions where environmental changes are taking place, allowing scientists to study the patterns and effects of these changes.
The specimens are first transported from the Duke Herbarium to Perkins Library on a scheduled timeline. Then, we photograph the specimen labels using our Phase One overhead camera. Some of the specimens are very bulky, but our camera’s depth of field is broad enough to keep them in focus. To be clear, what the project is utilizing is not photos of the actual plant specimens themselves, but rather images of the typed and hand-written scientific metadata adorning the envelopes which house the specimens. After we photograph them, the images are uploaded to the national database, where they are available for online research, along with other specimen labels uploaded from universities across the United States. Optical character recognition is used to digest and organize the scientific metadata in the images.
Over the past four years, the Digital Production Center has digitized approximately 100,000 lichen and bryophyte specimens. Many are from the Duke Herbarium, but some other institutions have also asked us to digitize some of their specimens, such as UNC-Chapel Hill, SUNY-Binghamton, Towson University and the University of Richmond. The Duke Herbarium is the second-largest herbarium of all U.S. private universities, next to Harvard. It was started in 1921, and it contains more than 800,000 specimens of vascular plants, bryophytes, algae, lichens, and fungi, some of which were collected as far back as the 1800s. Several specimens have unintentionally humorous names, like the following, which wants to be funky, but isn’t fooling anyone. Ok, maybe only I find that funny.
The project has been extensive, but enjoyable, thanks to the leadership of Duke Herbarium Data Manager Blanka Shaw. Dr. Shaw has personally collected bryophytes on many continents, and has brought a wealth of knowledge, energy and good humor to the collaboration with the Digital Production Center. The Duke Herbarium is open for visitors, and citizen scientists are also needed to volunteer for transcription and georeferencing of the extensive metadata collected in the national database.
We experience a number of different cycles in the Digital Projects and Production Services Department (DPPS). There is of course the project lifecycle, that mysterious abstraction by which we try to find commonalities in work processes that can seem unique for every case. We follow the academic calendar, learn our fate through the annual budget cycle, and attend weekly, monthly, and quarterly meetings.
The annual reporting cycle at Duke University Libraries usually falls to departments in August, with those reports informing a master library report completed later. Because of the activities and commitments around the opening of the Rubenstein Library, the departments were let off the hook for their individual reports this year. Nevertheless, I thought I would use my turn in the Bitstreams rotation to review some highlights from our 2014-15 cycle.
Today we will take a detailed look at how the Duke Chronicle, the university’s beloved newspaper for over 100 years, is digitized. Since our scope of digitization spans nine decades (1905-1989), it is an ongoing project the Digital Production Center (DPC), part of Digital Projects and Production Services (DPPS) and Duke University Libraries’ Digital Collections Program, has been chipping away at. Scanning and digitizing may seem straightforward to many – place an item on a scanner and press scan, for goodness sake! – but we at the DPC want to shed light on our own processes to give you a sense of what we do behind the scenes. It seems like an easy-peasy process of scanning and uploading images online, but there is much more that goes into it than that. Digitizing a large collection of newspapers is not always a fun-filled endeavor, and the physical act of scanning thousands of news pages is done by many dedicated (and patient!) student workers, staff members, and me, the King Intern for Digital Collections.
Many steps in the digitization process do not actually occur in the DPC, but among other teams or departments within the library. Though I focus mainly on the DPC’s responsibilities, I will briefly explain the steps others perform in this digital projects tango…or maybe it’s a waltz?
Each proposed project must first be approved by the Advisory Council for Digital Collections (ACDC), a team that reviews each project for its strategic value. Then it is passed on to the Digital Collections Implementation Team (DCIT) to perform a feasibility study that examines the project’s strengths and weaknesses (see Thomas Crichlow’s post for an overview of these teams). The DCIT then helps guide the project to fruition. After clearing these hoops back in 2013, the Duke Chronicle project started its journey toward digital glory.
We pull 10 years’ worth of newspapers at a time from the University Archives in Rubenstein Library. Only one decade at a time is processed to make the 80+ years of Chronicle publications more manageable. The first stop is Conservation. To make sure the materials are stable enough to withstand digitizing, Conservation must inspect the condition of the paper prior to giving the DPC the go-ahead. Because newspapers since the mid-19th century were printed on cheap and very acidic wood pulp paper, the pages can become brittle over time and may warrant extensive repairs. Senior Conservator, Erin Hammeke, has done great work mending tears and brittle edges of many Chronicle pages since the start of this project. As we embark on digitizing the older decades, from the 1940s and earlier, Erin’s expertise will be indispensable. We rely on her not only to repair brittle pages but to guide the DPC’s strategy when deciding the best and safest way to digitize such fragile materials. Also, several volumes of the Chronicle have been bound, and to gain the best digital image scan these must be removed from their binding. Erin to the rescue!
Now that Conservation has assessed the condition and given the DPC the green light, preliminary prep work must still be done before the scanner comes into play. A digitization guide is created in Microsoft Excel to list each Chronicle issue along with its descriptive metadata (more information about this process can be found in my metadata blog post). This spreadsheet acts as a guide in the digitization process (hence its name, digitization guide!) to keep track of each analog newspaper issue and, once scanned, its corresponding digital image. In this process, each Chronicle issue is inspected to collect the necessary metadata. At this time, a unique identifier is assigned to every issue based on the DPC’s naming conventions. This identifier stays with each item for the duration of its digital life and allows for easy identification of one among thousands of Chronicle issues. At the completion of the digitization guide, the Chronicle is now ready for the scanner.
The Scanning Process
With all loose unbound issues, the Zeutschel is our go-to scanner because it allows for large format items to be imaged on a flat surface. This is less invasive and less damaging to the pages, and is quicker than other scanning methods. The Zeutschel can handle items up to 25 x 18 inches, which accommodates the larger sized formats of the Chronicle used in the 1940s and 1950s. If bound issues must be digitized, due to the absence of a loose copy or the inability to safely dis-bound a volume, the Phase One digital camera system is used as it can better capture large bound pages that may not necessarily lay flat.
For every scanning session, we need the digitization guide handy as it tells what to name the image files using the previously assigned unique identifier. Each issue of the newspaper is scanned as a separate folder of images, with one image representing one page of the newspaper. This system of organization allows for each issue to become its own compound object – multiple files bound together with an XML structure – once published to the website. The Zeutschel’s scanning software helps organize these image files into properly named folders. Of course, no digitization session would be complete without the initial target scan that checks for color calibration (See Mike Adamo’s post for a color calibration crash course).
The scanner’s plate glass can now be raised with the push of a button (or the tap of a foot pedal) and the Chronicle issue is placed on the flatbed. Lowering the plate glass down, the pages are flattened for a better scan result. Now comes the excitement… we can finally press SCAN. For each page, the plate glass is raised, lowered, and the scan button is pressed. Chronicle issues can have anywhere from 2 to 30 or more pages, so you can image this process can become monotonous – or even mesmerizing – at times. Luckily, with the smaller format decades, like the 1970s and 1980s, the inner pages can be scanned two at a time and the Zeutschel software separates them into two images, which cuts down on the scan time. As for the larger formats, the pages are so big you can only fit one on the flatbed. That means each page is a separate scan, but older years tended to publish less issues, so it’s a trade-off. To put the volume of this work into perspective, the 1,408 issues of the 1980s Chronicle took 28,089 scans to complete, while the 1950s Chronicle of about 482 issues took around 3,700 scans to complete.
Every scanned image that pops up on the screen is also checked for alignment and cropping errors that may require a re-scan. Once all the pages in an issue are digitized and checked for errors, clicking the software’s Finalize button will compile the images in the designated folder. We now return to our digitization guide to enter in metadata pertaining to the scanning of that issue, including capture person, capture date, capture device, and what target image relates to this session (subsequent issues do not need a new target scanned, as long as the scanning takes place in the same session).
Now, with the next issue, rinse and repeat: set the software settings and name the folder, scan the issue, finalize, and fill out the digitization guide. You get the gist.
We now find ourselves with a slue of folders filled with digitized Chronicle images. The next phase of the process is quality control (QC). Once every issue from the decade is scanned, the first round of QC checks all images for excess borders to be cropped, crooked images to be squared, and any other minute discrepancy that may have resulted from the scanning process. This could be missing images, pages out of order, or even images scanned upside down. This stage of QC is often performed by student workers who diligently inspect image after image using Adobe Photoshop. The second round of QC is performed by our Digital Production Specialist Zeke Graves who gives every item a final pass.
At this stage, derivatives of the original preservation-quality images are created. The originals are archived in dark storage, while the smaller-sized derivatives are used in the CONTENTdm ingest process. CONTENTdm is the digital collection management software we use that collates the digital images with their appropriate descriptive metadata from our digitization guide, and creates one compound object for each Chronicle issue. It also generates the layer of Optical Character Recognition (OCR) data that makes the Chronicle text searchable, and provides an online interface for users to discover the collection once published on the website. The images and metadata are ingested into CONTENTdm’s Project Client in small batches (1 to 3 years of Chronicle issues) to reduce the chance of upload errors. Once ingested into CONTENTdm, the items are then spot-checked to make sure the metadata paired up with the correct image. During this step, other metadata is added that is specific to CONTENTdm fields, including the ingest person’s initials. Then, another ingest must run to push the files and data from the Project Client to the CONTENTdm server. A third step after this ingest finishes is to approve the items in the CONTENTdm administrative interface. This gives the go-ahead to publish the material online.
Hold on, we aren’t done yet. The project is now passed along to our developers in DPPS who must add this material to our digital collections platform for online discovery and access (they are currently developing Tripod3 to replace the previous Tripod2 platform, which is more eloquently described in Will Sexton’s post back in April). Not only does this improve discoverability, but it makes all of the library’s digital collections look more uniform in their online presentation.
Then, FINALLY, the collection goes live on the web. Now, just repeat the process for every decade of the Duke Chronicle, and you can see how this can become a rather time-heavy and laborious process. A labor of love, that is.
I could have narrowly stuck with describing to you the scanning process and the wonders of the Zeutschel, but I felt that I’d be shortchanging you. Active scanning is only a part of the whole digitization process which warrants a much broader narrative than just “push scan.” Along this journey to digitize the Duke Chronicle, we’ve collectively learned many things. The quirks and trials of each decade inform our process for the next, giving us the chance to improve along the way (to learn how we reflect upon each digital project after completion, go to Molly Bragg’s blog post on post-mortem reports).
If your curiosity is piqued as to how the Duke Chronicle looks online, the Fall 1959-Spring 1970 and January 1980-February 1989 issues are already available to view in our digital collections. The 1970s Chronicle is the next decade slated for publication, followed by the 1950s. Though this isn’t a comprehensive detailed account of the digitization process, I hope it provides you with a clearer picture of how we bring a collection, like the Duke Chronicle, into digital existence.
Notes from the Duke University Libraries Digital Projects Team