Adventures in metadata hygiene: using Open Refine, XSLT, and Excel to dedup and reconcile name and subject headings in EAD

OpenRefine, formerly Google Refine, bills itself as “a free, open source, powerful tool for working with messy data.”  As someone who works with messy data almost every day, I can’t recommend it enough.  While Open Refine is a great tool for cleaning up “grid-shaped data” (spreadsheets), it’s a bit more challenging to use when your source data is in some other format, particularly XML.

Some corporate name terms from an EAD collection guide
Some corporate name terms from an EAD (XML) collection guide

As part of a recent project to migrate data from EAD (Encoded Archival Description) to ArchivesSpace, I needed to clean up about 27,000 name and subject headings spread across over 2,000 EAD records in XML.  Because the majority of these EAD XML files were encoded by hand using a basic text editor (don’t ask why), I knew there were likely to be variants of the same subject and name terms throughout the corpus–terms with extra white space, different punctuation and capitalization, etc.  I needed a quick way to analyze all these terms, dedup them, normalize them, and update the XML before importing it into ArchivesSpace.  I knew Open Refine was the tool for the job, but the process of getting the terms 1) out of the EAD, 2) into OpenRefine for munging, and 3) back into EAD wasn’t something I’d tackled before.

Below is a basic outline of the workflow I devised, combining XSLT, OpenRefine, and, yes, Excel.  I’ve provided links to some source files when available.  As with any major data cleanup project, I’m sure there are 100 better ways to do this, but hopefully somebody will find something useful here.

1. Use XSLT to extract names and subjects from EAD files into a spreadsheet

I’ve said it before, but sooner or later all metadata is a spreadsheet. Here is some XSLT that will extract all the subjects, names, places and genre terms from the <controlaccess> section in a directory full of EAD files and then dump those terms along with some other information into a tab-separated spreadsheet with four columns: original_term, cleaned_term (empty), term_type, and eadid_term_source.

controlaccess_extractor.xsl

 2. Import the spreadsheet into OpenRefine and clean the messy data!

Once you open the resulting tab delimited file in OpenRefine, you’ll see the four columns of data above, with “cleaned_term” column empty. Copy the values from the first column (original_term) to the second column (cleaned_term).  You’ll want to preserve the original terms in the first column and only edit the terms in the second column so you can have a way to match the old values in your EAD with any edited values later on.

OpenRefine offers several amazing tools for viewing and cleaning data.  For my project, I mostly used the “cluster and edit” feature, which applies several different matching algorithms to identify, cluster, and facilitate clean up of term variants. You can read more about clustering in Open Refine here: Clustering in Depth.

In my list of about 27,000 terms, I identified around 1200 term variants in about 2 hours using the “cluster and edit” feature, reducing the total number of unique values from about 18,000 to 16,800 (about 7%). Finding and replacing all 1200 of these variants manually in EAD or even in Excel would have taken days and lots of coffee.

refine_screeshot
Screenshot of “Cluster & Edit” tool in OpenRefine, showing variants that needed to be merged into a single heading.

 

In addition to “cluster and edit,” OpenRefine provides a really powerful way to reconcile your data against known vocabularies.  So, for example, you can configure OpenRefine to query the Library of Congress Subject Heading database and attempt to find LCSH values that match or come close to matching the subject terms in your spreadsheet.  I experimented with this feature a bit, but found the matching a bit unreliable for my needs.  I’d love to explore this feature again with a different data set.  To learn more about vocabulary reconciliation in OpenRefine, check out freeyourmetadata.org

 3. Export the cleaned spreadsheet from OpenRefine as an Excel file

Simple enough.

4. Open the Excel file and use Excel’s “XML Map” feature to export the spreadsheet as XML.

I admit that this is quite a hack, but one I’ve used several times to convert Excel spreadsheets to XML that I can then process with XSLT.  To get Excel to export your spreadsheet as XML, you’ll first need to create a new template XML file that follows the schema you want to output.  Excel refers to this as an “XML Map.”  For my project, I used this one: controlaccess_cleaner_xmlmap.xml

From the Developer tab, choose Source, and then add the sample XML file as the XML Map in the right hand window.  You can read more about using XML Maps in Excel here.

After loading your XML Map, drag the XML elements from the tree view in the right hand window to the top of the matching columns in the spreadsheet.  This will instruct Excel to map data in your columns to the proper XML elements when exporting the spreadsheet as XML.

Once you’ve mapped all your columns, select Export from the developer tab to export all of the spreadsheet data as XML.

Your XML file should look something like this: controlaccess_cleaner_dataset.xml

control_access_dataset_chunk
Sample chunk of exported XML, showing mappings from original terms to cleaned terms, type of term, and originating EAD identifier.

 

5. Use XSLT to batch process your source EAD files and find and replace the original terms with the cleaned terms.

For my project, I bundled the term cleanup as part of a larger XSLT “scrubber” script that fixed several other known issues with our EAD data all at once.  I typically use the Oxygen XML Editor to batch process XML with XSLT, but there are free tools available for this.

Below is a link to the entire XSLT scrubber file, with the templates controlling the <controlaccess> term cleanup on lines 412 to 493.  In order to access the XML file  you saved in step 4 that contains the mappings between old values and cleaned values, you’ll need to call that XML from within your XSLT script (see lines 17-19).

AT-import-fixer.xsl

What this script does, essentially, is process all of your source EAD files at once, finding and replacing all of the old name and subject terms with the ones you normalized and deduped in OpenRefine. To be more specific, for each term in EAD, the XSLT script will find the matching term in the <original_term>field of the XML file you produced in step 4 above.  If it finds a match, it will then replace that original term with the value of the <cleaned_term>.  Below is a sample XSLT template that controls the find and replace of <persname> terms.

XSLT template that find and replaces old values with cleaned ones.
XSLT template that find and replaces old values with cleaned ones.

 

Final Thoughts

Admittedly, cobbling together all these steps was quite an undertaking, but once you have the architecture in place, this workflow can be incredibly useful for normalizing, reconciling, and deduping metadata values in any flavor of XML with just a few tweaks to the files provided.  Give it a try and let me know how it goes, or better yet, tell me a better way…please.

More resources for working with OpenRefine:

“Using Google Refine to Clean Messy Data” (Propublica Blog)

freeyourmetadata.org

Getting to the Finish Line: Wrapping Up Digital Collections Projects

Part of my job as Digital Collections Program Manager is to manage our various projects from idea to proposal to implementation and finally to publication. It can be a long and complicated process with many different people taking part along the way.  When we (we being the Digital Collections Implementation Team or DCIT) launch a project online, there are special blog posts, announcements and media attention.  Everyone feels great about a successful project implementation, however as the excitement of the launch subsides the project team is not quite done. The last step in a digital collections project at Duke is the post project review.

Project post-mortems keeps the team from feeling like the men in this image!

Post project reviews are part of project management best practices for effectively closing and assessing the outcomes of projects.  There are a lot of resources for project management available online, but as usual Wikipedia provides a good summary of project post-mortems as well as the different types and phases of project management in general.   Also if you Google “project post-mortem,” you will get more links then you know what to do with.

Process

 As we finish up projects we conduct what we call a “post-mortem,” and it is essentially a post project review.   The name evokes autopsies, and what we do is not dissimilar but thankfully there are no bodies involved (except when we closed up the recent Anatomical Fugitive Sheets digital collection – eh? see what I did there? wink wink).  The goals of our post mortem process are for the project team to do the following:

  • Reflect on the project’s outcomes both positive and negative
  • Document any unique decisions or methods employed during the project
  • Document resources put into the project.

In practice, this means that I ask the project team to send me comments about what they thought went well and what was challenging about the project in question.   Sometimes we meet in person to do this, but often we send comments through email or our project management tool.  I also meet in person with each project champion as a project wraps up.  Project champions are the people that propose and conceive a project.  I ask everyone the same general questions: what worked about the project and what was challenging. With champions, this conversation is also an opportunity to discuss any future plans for promotion as well as think of any related projects that may come up in the future.

DCIT's Post-Mortem Template
DCIT’s Post-Mortem Template

Once I have all the comments from the team and champion I put these into my post-mortem template (see right – click to expand).  I also pull together project stats such as the number of items published, and the hours spent on the project.  Everyone in the core project team is asked to track and submit the hours they spend on projects, which makes pulling stats an easy process.  I designed the template I use as a word document.  Its structured enough to be organized but unstructured enough for me to add new categories on the fly as needed (for example, we worked with a design contractor on a recent project so I added a “working with contractor” section).

 Seems like a simple enough process right?  It is, assuming you can have two ingredients.  First, you need to have a high degree of trust in your core team and good relationships with project stakeholders.  The ability to speak honestly (really really honestly) about a project is a necessity for the information you gather to be useful.  Secondly, you do actually have to conduct the review.  My team gets pulled so quickly from project to project, its really easy to NOT make time for this process.  What helps my team, is that post mortems are a formal part of our project checklists.  Also, I worked with my team to set up our information gathering process, so we all own it and its relevant and easy for them.

DCIT is never to busy for project reviews!

Impacts

The impacts these documents have on our work are very positive. First there is short term benefit just by having the core team communicate what they thought worked and didn’t work. Since we instituted this in the last year, we have used these lessons learns to make small but important changes to our process.

This process also gives the project team direct feedback from our project champions.  This is something I get a lot through my informal interactions with various stakeholders in my role as project manager, however the core team doesn’t always get exposed to direct feedback both positive and negative.

The long term benefit is using the data in these reports to make predictions about resources needed for future projects, track project outcomes at a program level, and for other uses we haven’t considered yet.

Further Resources

 All in all, I cannot recommend a post project review process to anyone and everyone who is managing projects enough.  If you are not convinced by my template (which is very simple), there are lots of examples out there.  Google “project post-mortem templates” (or similar terminology) to see a huge variety.

There are also a few library and digital collections project related resources you may find useful as well:

Here is a blog post from California Digital Library on project post-mortems that was published in 2010, but remains relevant. 

UCLA’s Library recently published a “Library Special Collections Digital Project Toolkit” that includes an “Assessment and Evaluation” section and a “Closeout Questionnaire”

 

 

A Look Under the Hood—and the Flaps—of the Anatomical Fugitive Sheets Collection

We have digitized some fairly complex objects over the years that have challenged our Digital Collections team to push the boundaries of typical digital library solutions for digitization and publication. It happens often: objects we want to digitize are sort of like something we’ve done for a previous project, but not quite, so we can’t simply mimic whatever we did before to get the new project done. We’re frequently flexing our creative muscles.  In many cases, our most successful projects ended up that way because we didn’t concede to the temptation of representing items digitally in an oversimplified manner, or, worse still, as something they are not.

Working with so many rare and unique items from the Rubenstein Library through the years, we’ve become unfazed by these representation challenges and time and again have simply pulled together our team’s brainpower (and willpower) to make something work. Dare I say it, we’ve been unflappable. But this year, we met our match and surely needed some help.

In March, we published ten anatomical fugitive sheets from the 1500s to 1600s. They’re printed illustrations from the Rubenstein Library’s History of Medicine Collections, depicting the human body using layers of paper flaps that can be lifted to reveal internal organs. They’re amazing. They’re distinctive. And they’re really complicated.

Fugitive Sheet
Fugitive Sheet example, accessible online at http://library.duke.edu/digitalcollections/rubenstein_fgsms01003/ (Photo Credit: Les Todd)

The complexity of this project necessitated enlisting help from beyond the library’s walls. Early on, Prof. Mark Olson in Duke’s Art, Art History & Visual Studies department was instrumental in helping us identify modern technical approaches for capturing and modeling such objects. We contracted out development work through local web firm Cuberis, who programmed the bulk of the UI. In-house, we handled digitization, metadata, and integration with our discovery & access application with a lot of collaborative creativity between the digital collections team, the collection curator, conservators, and rare materials cataloger.

In a moment, I’ll discuss what modern technologies make the Fugitive Sheets interface hum. But first, here’s a look at what others have done with flap-based items.

Flaps in the Wind, Er… Wild

There are a few examples of anatomical flap objects represented on the Web, both at Duke and beyond. Common approaches include:

  1. A Sequence of Images. Capture one image of the full item for every state of the flaps possible, then let a user navigate them as if viewing a paginated document or photo sequence.
  2. Video. Either film someone lifting the flaps, or make an auto-playing video of the image sequence above.
  3. Flash. Develop a Flash application and put a SWF file on the web.

The third approach is actually what powers Duke’s Four Seasons project, which remains one of the best interactive historical anatomy interfaces available today. Developed way back in 2000 by Educational Media Services, Four Seasons began as a Java program distributed on CD-ROM (gasp!) and in subsequent years found a home as a Flash application embedded on the library website.

Flash-based flap interface for The Four Seasons, available at http://library.duke.edu/rubenstein/history-of-medicine/four-seasons
Flash-based flap interface for The Four Seasons, available at http://library.duke.edu/rubenstein/history-of-medicine/four-seasons

Flash has fallen out of favor over the last decade for many reasons, most notably: 1) it won’t work on iOS devices, 2) it’s bad for accessibility, 3) it’s invisible to search engines, and most importantly, 4) most of what Flash used to do exclusively can now be done just as well using HTML5.

Anatomy of a Modern Flap Interface

The Web has made giant leaps forward in the past five years due to advances in HTML, CSS, and Javascript and the evolution of web browsers. Key specs for HTML5 and CSS3 have been supported by all major browsers for several years now.  Below are the vital bits (so to speak) in use by the Anatomical Fugitive Sheets. Many of these things would not have worked (or worked well) on the Web five years ago.

HTML5 Parts

1. SVG (scalable vector graphics). An <svg> element in HTML contains shape data for each flap using a coordinates system. The <path> holds a string with line instructions using shorthand (M, L, c, etc.) for tracing the contour: MoveTo, Lineto, Curveto, Arcto. We duplicate the <path> with a transform attribute to render the shape of the back of the flap.

SVG for flap
SVG coordinates in a <path> element representing the back of a flap.

2. Cross-window messaging API. Each fugitive sheet is rendered within an <iframe> on a page and the clickable layer navigation lives in its parent page, so they’re essentially two separate web pages presented as if one. Having a click in one page do something in another is possible through the Javascript method postMessage, part of the HTML5 spec.

  • From parent page to iframe: frame.contentWindow.postMessage(message, '*');
  • From iframe to parent page: window.top.postMessage(message, '*');

CSS3 Parts

  1. transition Property. Here’s where the flap animation action happens.  The flap elements all have the style declaration transition:1s ease-in-out. That ensures that when a flap property like height changes, it animates over the course of one second, slower at the start and end and quicker in the middle.  Clicking to open a flap calls a Javascript function that simultaneously switches the height of the flap front to zero and the back to its full size.
  2. transform Property. This scales down the figure and all its interactive components for display in the iframe, e.g., body.framed .flip-up-wrapper { transform:scale(.5) }; This scaling doesn’t apply in the full-size and zoomed-in views and thus enables the flaps to work identically at full- or half-resolution.

Capture & Encoding

Capture

Because the fugitive sheets are large and extremely fragile, our Digital Production Center staff and conservators worked carefully together to untangle and prop open each flap to be photographed separately. It often required two or more people to steady and flatten the flaps while being careful not to cast shadows on the layer being shot. I wasn’t there, but in my mind I imagine a game of library Twister.

Staff captured images using an overhead reproduction camera using white paper below each flap to make it easier to later determine and crop the contours. Unlike most images we digitize, the flaps’ derivative images are stored and delivered in PNG format to preserve transparency.

Encoding

As we do for all digital collections, we encode in an XML document the structural, administrative, and descriptive data about the digital objects using accepted library standards so that 1) the data can be preserved and ported between applications, and 2) we can use it to power our discovery & access interface. We use METS, a flexible Library of Congress standard for describing all kinds of digital objects.

METS worked pretty well for representing the flap data (see example), and we tapped into a few parts of the standard that we’ve never or rarely used for other items. Specifically, we:

  • added the LC MIX namespace for technical image metadata
  • used an amdSec to store flap heights & widths
  • used file/@GROUPID to divide flap images between figure 1, figure 2, etc.
  • used fptr/area/@COORDS to hold the SVG path coordinates for each flap

The descriptive metadata for the fugitive sheets posed its own challenges outside the box for our usual projects. All the information about the sheets existed as MARC catalog records, and crosswalking from MARC to anything else is more of an art than a science.

Looking Ahead

We’ll try to build on the accomplishments from the Fugitive Sheets Collection as we tackle new complex digitization projects. The History of Medicine Collections in particular are brimming with items that will be far more challenging than these sheets to model, like paginated flap books with fold-out pages and flaps that open in different directions. Undaunted, we’ll keep flapping our wings to stay aloft.

When MiniDiscs Recorded the Earth

My last several posts have focused on endangered audio formats: open reel tape, compact cassette, and DAT. Each of these media types boasted some advantages over their predecessors, as well as disadvantages that ultimately led to them falling out of favor with most consumers. Whether entirely relegated to our growing tech graveyard or moving into niche and specialty markets, each of the above formats has seen its brightest days and now slowly fades into extinction.

This week, we turn to the MiniDisc, a strange species that arose from Sony Electronics in 1992 and was already well on its way to being no more than a forgotten layer in the technological record by the time its production was discontinued in 2013.

IMG_1783

The MiniDisc was a magneto-optical disc-based system that offered 74 minutes of high-quality digital audio per disc (up to 320 minutes in long-play mode). It utilized a psychoacoustic lossy compression scheme (known as ATRAC) that allowed for significant data compression with little perceptible effect on audio fidelity. This meant you could record near perfect digital copies of CDs, tapes, or records—a revolutionary feat before the rise of writable CDs and hard disc recording. The minidisc platform was also popular in broadcasting and field recording. It was extremely light and portable, had excellent battery life, and possessed a number of sophisticated file editing and naming functions.

Despite these advantages, the format never grabbed a strong foothold in the market for several reasons. The players were expensive, retailing at $750 on launch in December 1992. Even the smaller portable Minidisc “Walkman” never dropped into the low consumer price range.  As a result, relatively few music albums were commercially released on the format. Once affordable CD-Rs and then mp3 players came onto the scene, the Minidisc was all but obsolete without ever truly breaking through to the mainstream.

IMG_1787

I recently unearthed a box containing my first and only Minidisc player, probably purchased used on eBay sometime in the early 2000’s. It filled several needs for me: a field recorder (for capturing ambient sound to be used in audio art and music), a playback device for environmental sounds and backing tracks in performance situations, and a “Walkman” that was smaller, held more music, and skipped less than my clunky portable CD player.

While it was long ago superceded by other electronic tools in my kit, the gaudy metallic yellow still evokes nostalgia. I remember the house I lived in at the time, walks to town with headphones on, excursions into the woods to record birds and creeks and escape the omnipresent hum of traffic and the electrical grid. The handwritten labels on the discs provide clues to personal interests and obsessions of the time: “Circuit Bends,” “Recess – Musique Concrete Master,” “Field Recordings 2/28/04,” “PIL – Second Edition, Keith Hudson – Pick A Dub, Sonic Youth – Sister, Velvet Underground – White Light White Heat.” The sounds and voices of family, friends, and creative collaborators populate these discs as they inhabit the recesses of my memory.

IMG_1780

While some may look at old technology as supplanted and obsolete, I refrain from this kind of Darwinism. The current renaissance of the supposedly extinct vinyl LP has demonstrated that markets and tastes change, and that ancient audio formats can be resurrected and have vital second lives. Opto-magnetic ghosts still walk the earth, and I hear them calling. I’m keeping my Minidisc player.

You’re going to lose: The inherent complexity, and near impossibility, of developing for digital collections

 

“Nobody likes you. Everybody hates you. You’re going to lose. Smile, you f*#~.”

Joe Hallenbeck, The Last Boy Scout

Screen Shot 2015-04-01 at 12.17.56 PMWhile I’m glad not to be living in a Tony Scott movie, on occasion I feel like Bruce Willis’ character near the beginning of “The Last Boy Scout.” Just look at some of the things they say about us.

Current online interfaces to primary source materials do not fully meet the needs of even experienced researchers. (DeRidder and Matheny)

The criticism, it cuts deep. But at least they were trying to be gentle, unlike this author:

[I]n use, more often than not, digital library users and digital libraries are in an adversarial position. (Saracevic, p. 9)

That’s gonna leave a mark. Still, it’s the little shots they take, the sidelong jabs, that hurt the most:

The anxiety over “missing something” was quite common across interviews, and historians often attributed this to the lack of comprehensive search tools for primary sources. (Rumer and Schonfeld, p. 16)

Screen Shot 2015-04-03 at 10.57.02 AM
Item types in Tripod2.

I’m fond of saying that the youtube developers have it easy. They support one content type – and until recently, it was Flash, for pete’s sake – minimal metadata, and then what? Comments? Links to some other videos? Wow, that’s complicated.

By contrast, we’ve developed for no less than fifteen different item types during the life of Tripod2, the platform that we’ve used to provide discovery and access for Duke Digital Collections since March 2011. You want a challenge? Try building an interface for flippable anatomical fugitive sheets.  It’s one thing to create a feature allowing users to embed videos from a flat web-site structure; it’s quite another to allow it from a site loaded with heterogeneous content types, then extend it to include items nested within multiple levels of description in finding aids (for an example, see the “Southwest Georgia Voters Project” item here).

I think the problem set of developing tools for digitized primary sources is one of the most interesting areas in the field of librarianship, and for the digital collections team, it’s one of our favorite areas of work. However, the quotes that open this post (the ones not delivered by Bruce Willis, anyway) are part of a literature that finds significant disparity between the needs of the researchers who form our primary audience and the tools that we – collectively speaking, in the field of digital libraries – have built.

Our team has just begun work on our next-generation platform for digital collections, which we call Tripod3. It will be built on the Fedora/Hydra framework that our Digital Repository Services team is using to develop the Duke Digital Repository. As the project manager, I’m trying to catch up on the recent literature of assessment for digital collections, and consider how we can improve on what we’ve done in the past. It’s one of the main ways  we can engage with researchers, as I wrote about in a previous post.

One of the issues we need to address is the problem of archival context. It’s something that the users of digitized primary sources cite again and again in the studies I’ve read. It manifests itself in a few ways, and could be the subject of a lengthier piece, but I think Chassanoff gives a good sense of it in her study (pp. 470-1):

Overall, findings suggest that historians seem to feel most comfortable using digitized sources when an online environment replicates essential attributes found in archives. Materials should be obtained from a reputable repository, and the online finding aid should provide detailed description. Historians want to be able to access the entire collection online and obtain any needed information about an item’s provenance. Indeed, the possibility that certain materials are omitted from an online collection appears to be more of a concern than it is in person at an archives.

The idea of archival context poses what I think is the central design problem of digital collections. It’s a particular challenge because, while it’s clear that researchers want and require the ability to see an object in its archival context, they also don’t want it. By which I mean, they also want to be able to find everything in the same flat context that everything assumes with a retrieval service like Google.

Archival context implies hierarchy, using the arrangement of the physical materials to order the digital. We were supposed to have broken away from the tyranny of physical arrangement years ago. David Weinberger’s Everything is Miscellaneous trumpeted this change in 2007, and while we had already internalized what he called the “third order of order” by then, it is the unambiguous way of the world now.

With our Tripod2 platform, we built both a shallow “digital collections miscellany” interface at http://library.duke.edu/digitalcollections/, but later started embedding items directly in finding aids.  Examples of the latter include the Jazz Loft Project Records and the Alexander Stephens Papers. What we never did was integrate these two modes of publication for digitized primary sources. Items from finding aids do not appear in search results for the main digital collections site, and items on the main site do not generally link back to the finding aid for their parent collection, and not to the series in which they’re arranged.

While I might give us a passing grade for the subject of “Providing archival context,” it wouldn’t be high enough to get us into, say, Duke. I expect this problem to be at the center of our work on the next-generation platform.


Sources

 

Alexandra Chassanoff, “Historians and the Use of Primary Materials in the Digital Age,” The American Archivist 76, no. 2, 458-480.

Jody L. DeRidder and Kathryn G. Matheny, “What Do Researchers Need? Feedback On Use of Online Primary Source Materials,” D-Lib Magazine 20, no. 7/8, available at http://www.dlib.org/dlib/july14/deridder/07deridder.html

Jennifer Rumer and Roger C. Schonfeld, “Supporting the Changing Research Practices of Historians: Final Report from ITHAKA S+R,” (2012), http://www.sr.ithaka.org/sites/default/files /reports/supporting-the-changing-research-practices-of-historians.pdf.

Tefko Saracevic, “How Were Digital Libraries Evaluated?”, paper first presented at the DELOS WP7 Workshop on the Evaluation of Digital Libraries (2004), available at http://www.scils.rutgers. edu/~tefko/DL_evaluation_LIDA.pdf

Launching One Person, One Vote

Promotional postcard for One Person, One Vote site.
Promotional postcard for One Person, One Vote site.

On Monday, March 2nd, the new website, One Person, One Vote: The Legacy of SNCC and the Fight for Voting Rightswent live. The launch represented an unprecedented feat of collaboration between activists, scholars, archivists, digital specialists, and students. In a year and a half, this group went from wanting to tell a grassroots story of SNCC’s voting rights activism to bringing that idea to fruition in a documentary website.

So what did it take to get there? The short answer is a dedicated group of people who believed in a common goal, mobilized resources, put in the work, and trusted each other’s knowledge and expertise enough to bring the project to life. Here’s a brief look at the people behind-the-scenes:

Advisory Board: Made up of representatives of the SNCC Legacy Project, Duke Libraries, and the Center for Documentary Studies, the Advisory Board tackled the monumental task of raising funds, making a way, and ensuring the future of the project.

Editorial Board: One Person, One Vote site has content galore. It features 82 profiles, multimedia stories, an interactive timeline, and map that collectively tell a story of SNCC’s voting rights activism. The enormous task of prioritizing content fell to the Editorial Board. Three historians, three SNCC veterans, and three Duke Libraries staff spent long hours debating the details of who and what to include and how to do it.

OPOVlogo_mediumProject Team: Once the Editorial Board prioritized content, it was the Project Team’s job to carry out the work. Made up of six undergrads, two grad students, and one intern, the Project Team researched and wrote profiles and created the first drafts of the site’s content.

Visiting Activist Scholars: SNCC veterans and Editorial Board members, Charlie Cobb and Judy Richardson, came to Duke during the 2014 – 2015 academic year to advise the Project Team and work with the Project Manager in creating content for One Person, One Vote. As the students worked to write history from the perspective of the activists and local people, the Visiting Activist Scholars guided them, serving as the project’s “SNCC eyes.”

OPOV_logo_textDesign Contractors: The One Person, One Vote Project hired The Splinter Group to design and create a WordPress theme for the site with input from the Editorial Board.

Duke Libraries Digital Specialists: The amazing people in Duke Libraries’ Digital Production Center and Digital Projects turned One Person, One Vote into a reality. They digitized archival material, built new features, problem-solved, and did a thousand other essential tasks that made One Person, One Vote the functional, sleek, and beautiful site that it is.

Of course, this is only the short list. Many more people within the SNCC Legacy Project, the Center for Documentary Studies, and Duke Libraries arranged meetings and travel plans, designed postcards and wrote press releases, and gave their thoughts and ideas throughout the process. One Person, One Vote is unquestionably the work of many and represents a new way for activists, scholars, and librarians to partner in telling a people’s history.

Man to Fight Computers!

1965 Engineers Show Image_DukEngineer
Duke Engineers Show in March 1965, DukEngineers

Fifty years ago this week, Duke students faced off with computers in model car races and tic-tac-toe matches in the annual Engineers’ Show.  In stark contrast to the up-and-coming computers, a Duke Chronicle article dubbed these human competitors as old-fashioned and obsolete.  Five decades later, although we humans haven’t completely lost our foothold to computers, they have become a much bigger part of our daily lives than in 1965.  Yes, there are those of you out there who fear the imminent robot coup is near, but we mostly have found a way to live alongside this technology we have created.  Perhaps we could call it a peaceful coexistence.

 

Zeutschel Image
Zeutschel Overhead Scanner

At least, that’s how I would describe our relationship to technology here at the Digital Production Center (DPC) where I began my internship six weeks ago.  We may not have the entertaining gadgets of the Engineers’ Show, like a mechanical swimming shark or mechanical monkey climbing a pole, but we do have exciting high-tech scanners like the Zeutschel, which made such instant internet access to articles like “Man To Fight Computers” possible.  The university’s student newspaper has been digitized from fall 1959 to spring 1970, and it is an ongoing project here at the DPC to digitize the rest of the collection spanning from 1905 to 1989.

 

My first scanning project has been the 1970s Duke Chronicle issues.  While standing at the Zeutschel as it works its digitization magic, it is fascinating to read the news headlines and learn university history through pages written by and for the student population.  The Duke Chronicle has been covering campus activities since 1905 when Duke was still Trinity College.  Over the years it has captured the evolution of student life as well as the world beyond East and West Campus.  The Chronicle is like a time capsule in its own right, each issue freezing and preserving moments in time for future generations to enjoy.  This is a wonderful resource for researchers, history nerds (like me!), and Duke enthusiasts alike, and I invite you to explore the digitized collection to see what interesting articles you may find.  And don’t forget to keep checking back with BitStreams to hear about the latest access to other decades of the Duke Chronicle.

 

1965 Engineers Show_DukEngineer
DukEngineer, The College of Engineering magazine, covered this particular Engineers’ Show in their April 1965 issue.

The year 1965 doesn’t seem that distant in time, yet in terms of technological advancement it might as well be eons away from where we are now.  Playing tic-tac-toe against a computer seems arcane compared to today’s game consoles and online gaming communities, but it does put things into perspective.  Since that March day in 1965, it is my hope that man and computer both have put down their boxing gloves.

Small Problems, Little Solutions

I have been thinking lately about tools that make tasks I repeat frequently more efficient. For example, I’m an occasional do-it-yourself home repairer and have an old handsaw that works just fine for cutting a few pieces of wood for small repairs. It’s easy to understand how to use the saw, takes very little planning, and takes just a bit of manual effort.

P1040015

Last summer, however, I faced a larger task of rebuilding a whole section of my deck that had rotted. I began using the handsaw to cut the wood I would need for the repair and quickly realized my usual method was going take a long time and make me very sore and unhappy. I needed a better tool and method. This better tool was an electric circular saw, which is more expensive, harder to understand how to use, and more dangerous than the handsaw, but much more efficient. Since I have a healthy fear of death and dismemberment, I also took some time to learn how to use the dreadful thing in a safe manner. It took an initial investment in time and effort, but with the electric saw I was able to make much faster and less painful progress repairing the deck.

I encounter similar kinds of problems when writing software and making things for the web. It’s perfectly possible to do these things using a basic text editor to write everything out by hand. I got along fine this way for a long time. But there are many ways to make this work more efficient. The rest of this post is mainly a list of techniques and tools I’ve invested time and energy to learn to use to reduce annoying, repetitive tasks.

My favorite time and effort saver is learning how to execute common tasks in a text editor using keyboard shortcuts. Here are a few examples of shortcuts I use many times a day in my favorite editor, Sublime Text 2. The ones I used the most involve moving the cursor or text around without touching the mouse. (These are specific to Macintosh computers, but there are similar shortcuts available in other operating systems.)

  • Hold down the Option key and use the left and right arrow keys to move the cursor a word at a time instead of a space at a time.
  • Hold down the Command key and use the left or right arrow to move to the beginning or end of a line. The up or down arrow will take you to the top or end of the document.
  • Add the shift key to the above shortcuts to select text as the cursor moves.
  • The delete key will also work with these shortcuts.
  • Indent a line of text or a whole block of text using the Command key and the left and right brackets.

There are also more advanced text editor features or plugins that make coding easier by reducing the amount you have to type by hand.

Emmet is a utility that does a few things, but it mainly lets you use abbreviated CSS syntax to generate full HTML markup. For example I can type div.special and when I hit the tab key Emmet automatically turns that into:
Screen Shot 2015-03-13 at 5.17.21 PM
You can string these together to generate multi line nested HTML markup from a single string.

SublimeCodeIntel is another plugin for the text editor I use. It adds an intelligent auto-suggest menu that updates as you type with things that are specific to the programming language you’re working in and the specific program. For example in PHP it if I type “e” it will suggest “echo” and I can hit enter to use that suggestion. It also remembers things like the variable and class names in the project you’re working on. It even seems to learn what terms you use most frequently and suggests those first. It saves much typing.

There are also a couple of utilities I run in a terminal window while I’m working to automate different tasks. Many of these are powered by Guard, which is a Rails Gem that watches for changes to files. This is more useful than it might sound. For example, Guard can run LiveReload. When Guard notices a file has changed that you told it to watch it triggers LiveReload that then refreshes your browser window. With this tool I can make small changes to a project and see the updates in realtime in my browser without having the refresh the page manually. There are also Guard utilities for running tests, compressing JavaScript, and generating browser friendly CSS from easier to write and maintain (coder friendly) SCSS.

687474703a2f2f636c2e6c792f696d6167652f316b336f3172325a3361304a2f67756172642d49636f6e2e706e67

These are just a few of the ways I try to streamline repetitive tasks.

The History of Medicine’s Anatomical Fugitive Sheet Digital Collection

As Curator for the History of Medicine Collections in the Rubenstein Rare Book & Manuscript Library, I have the opportunity to work with incredible items, including Renaissance era amputation saws, physician case books from the nineteenth century, and anatomical illustrations with moveable parts, just to name a few.

HOM1
One of the Anatomical Fugitive Sheets with flap down.
HOM2
Same image as the previous one, but with top flap up.

In my opinion, our holdings of anatomical fugitive sheets are some of the most remarkable and rare items one can find in historical medical collections. Our collection includes ten of these sheets, and each one is fascinating for its own reasons.

These anatomical fugitive sheets, which date from the early sixteenth to the mid-seventeenth centuries, are single sheets, similar to broadsides, that are unique in that they contain overlays or flaps that lift to reveal the inside of the human body.

I have read arguments that such items would have been used by barber surgeons or medical students, but others say these were hung in apothecary shops or purchased and kept by individuals with an interest in knowing what was inside their body. After almost 500 years, it is amazing that these anatomical fugitive sheets still exist. While we do have a few sheets that have lost some or all of their flaps, I think it’s fascinating to examine where flaps are broken. Somehow these broken and missing parts make these sheets more real to me – a reminder that each one has a story to tell. How and when did the flap get torn? How would this have really been used in 1539?

After the success of our Animated Anatomies exhibit, many of my colleagues and I have been discussing how to make our materials that contain flaps available online. I can tell you, it’s no easy task, but I am thrilled that we now have a digital version of our collection of anatomical fugitive sheets. With funding from the Elon Clark Endowment, a local custom web design firm, Cuberis, was outsourced to create the code, making these items interactive. Our own amazing Digital Collections Team not only photographed each overlay, but also took the code and applied it to DUL’s digital collection site, making it all work freely to a public audience.

There are so many people involved in making something like this happen. Thanks to Mark Olson, Cordelia and William Laverack Family Assistant Professor of Art, Art History & Visual Studies here at Duke University, for his role in getting this project started. And here in the DUL – a huge thanks to Erin Hammeke (Conservation), Mike Adamo and Molly Bragg (Digital Production Center), Noah Huffman and Lauren Reno (Rubenstein Library Technical Services), Will Sexton, Cory Lown, and especially Sean Aery (Digital Projects Department). They are an incredible team that makes beautiful things happen. Obviously.

Post contributed by Rachel Ingold

Taken near doorways

We’re continually walking through doorways or passing them by, but how often do we linger to witness the life that unfolds nearby? Let the photographs below be your doorway, connecting you with lives lived in other places and times.

Man holding small boy in the air while a woman looks on from doorway.
Man holding small boy in the air while a woman looks on from doorway, from William Gedney Photographs and Writings

Man in doorway. Woman walking down sidewalk
New York City: Greenwich Village, from Ronald Reis Photographs

Man sitting on chair holding a small child.
Man sitting on chair holding a small child, from William Gedney Photographs and Writings

Woman, boy and man near entrance to store.
Outside entrance to Wynn’s Department Store, 1968 Dec., from Paul Kwilecki Photographs

Woman with cat in doorway
Woman with cat in doorway, Pear Orchard, 1961, from Paul Kwilecki Photographs

family portrait taken in front of doorway.
N479, from Hugh Mangum Photographs

Man eating, with child in background
Man Eating, with Child in Background, from Sidney D. Gamble Photographs

Be adventurous. Explore more images taken by these photographers as displayed within Duke University Libraries’ digitized collections.

Notes from the Duke University Libraries Digital Projects Team