EAA: Technical Information

Page Contents

Selection of Materials
Creation of the Images
Database Format
TEI Encoding for Transcribed Text

Selection of Materials

The materials in the project were selected from eleven separate collections by Hartman Center staff and graduate students in history between May 1998 to December 1998. Advertising items and publications were chosen for inclusion in the project based on attractiveness, size, and their merit as representative of advertising campaigns and companies during the time frame of 1850 – 1920.

Creation of the Images

The process for the creation of images in the Emergence of Advertising in America: 1850 – 1920 Project (EAA) and the Ad*Access Project were practically identical. The advertisements in the EAA Project were scanned on UMAX Mirage II and Mirage IIse (11×17″) and Agfa Arcus II (8×14″) flatbed scanners. These were connected to Power Macintosh 7300/200 workstations running Mac OS8 and Adobe Photoshop 4.0. From September 1998 to August 1999, Duke students working in the Digital Scriptorium scanned over 9,000 images. These master images were created at 150 dpi in 24-bit RGB color and saved in JPEG format. Testing indicated that the 150 dpi color scans provided great enough resolution for 1mm characters to be adequately visible on both existing computer monitors and laser-quality prints. The R. C. Maxwell Company Outdoor Advertising Photographs, however, were scanned at 300 dpi. Testing on the Arcus II flatbed scanners revealed that the majority of straight lines in an image (i.e. building walls, billboards) appeared jagged when scanned at 150 dpi, but straight when scanned at 300 dpi.

Each master image was passed through a quality control process, checked for image quality, amount of skew, page orientation, cropping, color, and other problems which arose. The wide range of materials required adjusting scanning techniques to ensure the high quality of the images. The prevalence of Moiré patterns caused by halftone dots was the primary scanning problem. Magazine advertisements presented additional difficulties. Differences in paper texture, illustration type (color drawing, photographs, etc.), more than one type of illustration type in an advertisement, and overlapping of illustration and text affected the presence of Moiré patterns differently. A variety of techniques were devised to deal with this issue. The descreening feature found in the image capture software corrected many problems alone, but for the more difficult advertisements it was necessary to adjust the levels, and/or blurring or sharpening of portions or the entire advertising item was required. The bound items from the Cookbook and the Early Advertising Publication Collections at times had “shadows” in the middle of the text, relative to the number of pages in an item. The oversize length and width of many items from the Broadsides Collection necessitated creating 2 or more images from one broadside to insure that all visual information from that item was included.

Programming to automatically create 72 dpi images and thumbnails from the original scans was conceived and developed using the Perl scripting language and ImageMagick , a freely available UNIX graphics package. The conversion consisted of several steps. First, all 9,000 of the original images were transferred to the Scriptorium machine (a dual-processor Sun Sparc 20) by FTP and then arranged according to the directory structure scheme devised at the beginning of the project (see below). This scheme allows for quick server access and ease of file management by creating a tree-like structure in which each branch may contain no more than 100 subdirectories.

During the scanning and database entry phase, each advertising item was identified by a unique identifier based upon the number issued during the selection phase of the project. These numbers were based on category name (e.g. all Cookbooks begin with the letters CK)and alphabetical/chronological placement in the selection of the items for the project. All the image files were renamed from their working names to a regular and easily identifiable file naming system based on the advertisement’s identifier (CK0017), followed by the size of the image – expressed as “150dpi” in this step (CK0017-150dpi.jpeg). The items with multiple pages required more than one image to capture the total advertisement. These were identified with an image (page) number in addition to the advertisement’s unique identifier (CK0017-03-150dpi.jpeg). The unique identifier serves as the connecting link holding the database records and the images together.

Finally, by taking advantage of the ability of the Scriptorium machine to run multiple processes and employing another machine running Linux, multiple Perl conversion scripts were run both day and night allowing the generation of additional images. These included 9,000 of 72 dpi images, “small” images, and thumbnails. It was decided that both a “small” image measuring 300 pixels in width and a thumbnail measuring 100 pixels in width would be produced for each item (and additional pages as required). The small image is embedded within the database record, and the items with multiple pages have all these small images on one HTML page containing images and database information, or as in the case of the Cookbook and Early Advertising Publications Collections, the small image links to a page with all images and transcribed texts from these books.

The EAA database is comprised of a total of approximately 27,000 individual images and 264 pages of transcribed text. The entire project utilizes almost 7 gigabytes of disk space. Individual 150 dpi images have an average file size of 740 kilobytes.

Database Format

The database which contained the information for the advertisements was converted from FileMaker Pro 4.0 for Power MacIntosh to SGML format in the EAD Version 1.0 DTD. Each database field was mapped to an EAD element made unique by use of attributes.

The resulting SGML database is presented in HTML for ordinary web browsers using DynaWeb software from Enigma. This method allows for searches to be limited to the unique fields, which allows highly targeted searching. Access to the database is through both these targeted searches and “canned searches” on the category field, as well as through browsing the specific category (see Category Descriptions). In addition a user may perform a targeted or keyword search within the DynaWeb interface.

TEI Encoding for Text and Book Materials

Text in the Nicole Di Bona Peterson Cookbook Collection and the Early Advertising Publications were encoded using the Text Encoding Initiative (TEI) TEI Lite DTD. The use of the DTD was supplemented by the TEI Text Encoding in Libraries Guidelines for Best Encoding Practices. The Title Page and the Table of Contents or Index from fifty of the Early Advertising Publications and eighty-two cookbooks are included in the website. All of the transcribed texts are also available as images; the TEI encoding serves as a structure for organizing and presenting the images and text. Each encoded document features a complete description of the source material and its subject content to allow for searchability and resource discovery even without complete transcription of the text. Like the databases, the TEI SGML documents are indexed and presented in HTML for ordinary web browsers using DynaWeb software from Enigma. This allows for subject searches across the entire collection including the information about the advertising item or book (found in each collection’s database) and information in each book (the transcribed text from the Cookbooks and Early Advertising Publications).

EAA: Technical Information

Selection of Materials

Creation of the Images

Database Format

TEI Encoding for Text and Book Materials

Leave a Reply Cancel reply