Digital Collections, Technology, User Experience

Perplexed by Context? Slick Sticky Titles Skip the Toll of the Scroll

February 24, 2016 Sean Aery

We have a few new exciting enhancements within our digital collections and archival collection guide interfaces to share this week, all related to the challenge of presenting the proper archival context for materials represented online. This is an enormous problem we’ve previously outlined on this blog, both in terms of reconciling different descriptions of the same things (in multiple metadata formats/systems) and in terms of providing researchers with a clear indication of how a digitized item’s physical counterpart is arranged and described within its source archival collection.

Here are the new features:

View Item in Context Link

Our new digital collections (the ones in the Duke Digital Repository) have included a prominent link (under header “Source Collection”) from a digitized item to its source archival collection with some snippets of info from the collection guide presented in a popover. This was an important step toward connecting the dots, but still only gets someone to the top of the collection guide; from there, researchers are left on their own for figuring out where in the collection an item resides.

Archival source collection info presented for an item in the W. Duke & Sons collection.

Beginning with this week’s newly-available Alex Harris Photographs Collection (and also the Benjamin & Julia Stockton Rush Papers Collection), we take it another step forward and present a deep link directly to the row in the collection guide that describes the item. For now, this link says “View Item in Context.”

A deep link to View Item in Context for an item in the Alex Harris Photographs Collection

This linkage is powered by indicating an ArchivesSpace ID in a digital object’s administrative metadata; it can be the ID for a series, subseries, folder, or item title, so we’re flexible in how granular the connection is between the digital object and its archival description.

Sticky Title & Series Info

Our archival collection guides are currently rendered as single webpages broken into sections. Larger collections make for long webpages. Sometimes they’re really super long. Where the contents of the collection are listed, there’s a visual hierarchy in place with nested descriptions of series, subseries, etc. but it’s still difficult to navigate around and simultaneously understand what it is you’re viewing. The physical tedium of scrolling and the cognitive load required to connect related descriptive information located far away on a page make for bad usability.

As of last week, we now we keep the title of the collection “stuck” to the top of the screen once you’re no longer viewing the top of the page (it also functions as a link to get back to the top). And even more helpful is a new sticky series header that links to the beginning of the archival series within which the currently visible items were arranged; there’s usually an important description up there that helps contextualize the items listed below. This sticky header is context-aware, meaning it follows you around like a loyal companion, updating itself perpetually to reflect where you are as you navigate up or down.

Title & series information "stuck" to the top of a collection guide. — Title & series information “stuck” to the top of a collection guide.

This feature is powered via the excellent Bootstrap Scrollspy Javascript utility combined with some custom styling.

All Series Browser

To give researchers easier browsing between different archival series in a collection, we added a link in the sticky header to browse “All Series.” That link pops down a menu to jump directly to the start of each series within the collection guide.

Direct Links to Anything

Researchers can now easily get a link to any row in a collection guide where the contents are described. This can be anything: a series, subseries, folder, or item. It’s simple—just mouseover the row, click the arrow that appears at the left, and copy the URL from the address bar. The row in the collection guide that’s the target of that link gets highlighted in green.

Click the arrow to link directly to a row within the collection guide.

We would love to get feedback on these features to learn whether they’re helpful and see how we might enhance or adjust them going forward. Try them out and let us know what you think!

Special thanks to our metadata gurus Noah Huffman and Maggie Dickson for their contributions on these features.

Behind the Scenes, Digital Collections, Uncategorized

Rediscovering the Tuscarora Indians through The Trinity Archive

February 18, 2016 Meghan Lyon

This is a story about how our own digital collections program led us to rediscover an amazing manuscript collection that has been at Duke since at least 1896. The Trinity Archive, now published as The Archive, is a Duke University student literary and cultural journal, first published in 1887 while the college was still based in Trinity, N.C. It is one of the oldest continuously-published literary magazines in the United States. Early editions of the Trinity Archive, held in the University Archives, were digitized through Duke’s digital collections program and are now available through the Internet Archive.

It turns out that the Duke University Archivist, Valerie Gillispie, enjoys reading digitized issues of the Trinity Archive. While perusing the December 1896 edition, she found an interesting article: “The Removal of the Tuscarora Indians from North Carolina.” Written by Sanders Dent, then manager of the magazine, the article aims to “arrange some facts found in the old papers of General Jeremiah Slade and, thus, preserve an interesting bit of North Carolina history for her future historian. General Slade was one of the Commissioners appointed by the Legislature in 1802 to settle the affairs of the Tuscarora Indians and from his letters we get most of the material for this sketch.” Dent’s article recounts the history of the Tuscarora in North Carolina in the eighteenth and early nineteenth century. Following the end of the Tuscarora War in 1713, many Tuscarora fled to upstate New York and joined the Iroquois Confederacy as the Sixth Nation. Those that remained in North Carolina were granted land in Bertie County, but by the late eighteenth century they too were being forced to lease their land to the whites and leave the state for New York.

Dent’s article liberally quotes from letters held in the Jeremiah Slade Papers. Between 1803 and 1818, Slade served as an agent for the Tuscarora, managing their land leases in North Carolina and tracking money owed them by their white tenants. The papers include letters, receipts, and legal documents between Slade and the Tuscarora in Niagara, New York, with several documents signed with an X by the chiefs representing their tribe. Dent adds in a footnote that Slade’s “papers are now in the possession of the Trinity College Historical Society.”

A power of attorney sent to Jeremiah Slade in 1817, signed by Tuscarora chiefs and warriors

Thanks to Dent’s footnote, Val found that the Jeremiah Slade Papers were now held in the Rubenstein Library (but under his son’s name, as the William Slade Papers). It was an exciting connection to our Rubenstein Library ancestors, the Trinity College Historical Society. Founded by Trinity College students and professors in 1892, TCHS sought to “collect, arrange, and preserve a library of books, pamphlets, maps, charts, manuscripts, papers, paintings, statuary, and other materials illustrative of the history of North Carolina and the South.” It was a history club and a museum and a library all-in-one, and many of the library’s oldest Southern collections were acquired by TCHS before being transferred to Duke’s manuscript department in the early twentieth century. (You can read more about the TCHS here and here.)

IMG_20160204_143328161 — Undated letter to Slade from the Tuscarora, asking for transfer of funds and telling him they intended to “prosecute their claims” to the N.C. Legislature

How and when the Slade Papers first came to the Trinity College Historical Society is still a mystery. The TCHS records, held by the University Archives, are incomplete for that period. A clue lies in the Slade Papers, with an 1884 item from J.D.B. Hooper, a professor at the University of North Carolina. Hooper writes that “I have consented to receive from Mr. William B. Slade, a Box of Scraps, culled by him, from newspapers, magazines, &c. with a request that I will endeavor to have them received into some library, public or private, where they may, at some future time, become useful…” He goes on to write, “I think that they may furnish materials for interesting Scrap books, when they shall fall into the hands of a person of leisure and literary taste.” Um, sure. Thanks Professor Hooper! (His papers are held at UNC.) The only other hint I have found as to the initial transfer of the Slade Papers to Duke lies in this undated clipping from the collection:

Undated clipping announcing the transfer of Slade's scrapbooks to the Trinity College Library — Undated clipping announcing the transfer of Slade’s scrapbooks to the Trinity College Library

But I can find no record of Slade scrapbooks in our accession logs or catalog records from the 1890s. I can only assume that with the scrapbooks came the box of papers that Hooper mentions. It all must have arrived sometime before 1896, when Dent wrote the Trinity Archive piece.

Since this all came to light after Val’s browsing of the Trinity Archive, we decided to revisit the Slade Family Papers, update their housing, and enhance the collection’s description to reflect contemporary descriptive standards and scholarship interests. The original catalog record had no mention of the Tuscarora, and there was no finding aid or other web presence for the collection. It was really fun to re-process such an old collection and see its contents firsthand. The Tuscarora documents, while fascinating, are only a small piece of the Slade story. The majority of the collection documents the nineteenth-century operations of the Slade plantations, farms, and fisheries around Williamston, N.C. Plus, each generation of the Slade family had many children, so there are a lot of letters between all the siblings and cousins discussing their activities, family life, education, politics, and entertainment. There are also extensive legal and financial documents, including receipts, account books, land deeds, court cases, and other items. I was amazed at the amount of documentation discussing slaves; items recording student life at different North Carolina colleges in the early nineteenth century; letters detailing life in the Confederacy during the Civil War; and materials about postwar recovery and politics, including the new business arrangements between the Slades and their former slaves, now freedmen.

Slave valuation, 1820, in the Slade Family Papers — Estate inventory including slave valuations, 1820, in the Slade Family Papers

It’s always wonderful to see what sort of research can happen as a result of digitization and online access to our collections. But the re-processing and new finding aid for the Slade Family Papers was special. It is one of those rare projects where it all came full circle: because the Trinity Archive was available online, we rediscovered this collection, and along with it, further evidence of the work of the Trinity College Historical Society. The TCHS acquired the Slade Family Papers, among many other things, over 120 years ago for future historians to study and use. We are active participants in that legacy today.

Digital Collections, Technology

Moving the Needle: Bring on Phase 2 of the Tripod3/Digital Collections Migration

February 15, 2016 Will Sexton

Last time I wrote for Bitstreams, I said “Today is the New Future.” It was a day of optimism, as we published for the first time in our next-generation platform for digital collections. The debut of the W. Duke, Sons & Co. Advertising Materials, 1880-1910 was the first visible success of a major effort to migrate our digital collections into the Duke Digital Repository. “Our current plan,” I propounded, “Is to have nearly all of the content of Duke Digital Collections available in the new platform by the end of March, 2016.”

Since then we’ve published a second collection – the Benjamin and Julia Stockton Rush Papers – in the new platform, but we’ve also done more extensive planning for the migration. We’ll divide the work into six-week phases or “supersprints” that overlay the shorter sprints of our software development cycle. The work will take longer than I suggested in October – we now project the bulk of it to be completed by the end of the fourth six-week phase, or toward the end of June of this year, with some continuing until deeper in the calendar year.

As it happens, today represents the rollover from Phase 1 to Phase 2 of our plan. Phase 1 was relatively light in its payload. During the next phase – concluding in six weeks on March 28 – we plan to add 24 of the collections currently published in our older platform, as well as two new collections.

As team leader, I take upon myself the hugely important task of assigning mottos to each phase of the project. The motto for Phase 1 was “Plant the seeds in the bottle.” It derives from the story of David Latimer’s bottle garden, which he planted in 1960 and has not watered since Duke Law alum Richard Nixon was president.

This image from from the Friedrich Carl Peetz Photographs, along with many other items from our photography and manuscript collections, will be among those re-published in the Duke Digital Repository during Phase 2 of our migration process.

Imagine, I said to the group, we are creating self-sustaining environments for our collections, that we can stash under the staircase next to the wine rack. Maybe we tend to them once or twice, but they thrive without our constant curation and intervention. Everyone sort of looked at me as if I had suggested using a guillotine as a bagel slicer for a staff breakfast event. But they’re all good sports. We hunkered down, and expect to publish one new collection, and re-publish two of the older collections, in the new platform this week.

The motto for Phase 2 is “Move the needle.” The object here is to lean on our work in Phase 1 to complete a much larger batch of materials. We’ll extend our work on photography collections in Phase 1 to include many of the existing photography collections. We’ll also re-publish many of the “manuscript collections,” which is our way of referring to the dozen or so collections that we previously published by embedding content in collection guides.

If we are successful in this approach, by the end of Phase 2, we’ll have completed a significant portion of the digital collections migrated to the Duke Digital Repository. Each collection, presumably, will flourish, sealed in a fertile, self-regulating environment, like bottle gardens, or wine.

Here’s a page where we’ll track progress.

As we’ve written previously, we’re in the process of re-digitizing the William Gedney Photographs, so they will not be migrated to the Duke Digital Repository in Phase 2, but will wait until we’ve completed that project.

Announcements, Collections, New Collections

Catch You on the Flip Side – 1970s Duke Chronicle Digitized and Online

February 9, 2016 Molly Bragg

The 1970s are here! That is, in digital form. The Duke Chronicle digital collection now includes issues from the grooviest decade of the twentieth century.

The American memory of the 1970s is complex, wavering from carefree love to Vietnam and civil rights. As the social turmoil of the 1960s flowed into the 1970s, Terry Sanford was sworn in as president of Duke University. This marked the beginning of his sixteen-year term, but also marked the decade in which Sanford twice ran for president and partook in heated debates with Alabama governor George Wallace. He presided over the university In the midst of the Vietnam War and national protests, the Watergate scandal, and the aftermath of the Allen Building occupation in 1969.

In response to the demands from the Allen Building takeover, the Duke University community worked to improve social inequalities on campus. The 1972 incoming freshman class boasted more than twice as many black students than ever before in university history. Black Studies Program faculty and students struggle to create their own department, which became a controversial event on campus throughout the ‘70s. One Chronicle article even tentatively labeled 1976 as “The Year of the Black at Duke,” reflecting the strides made to incorporate black students and faculty into campus life and academics.

The 1970s was also a decade of change for women at Duke. In 1972, Trinity College and the Woman’s College merged, and not all constituents agreed with the move. Women’s athletics were also shaken by the application of Title IX implemented by the Department of Health, Education and Welfare (HEW), that prohibited discrimination on the basis of sex. This regulation significantly impacted the future of the Physical Education Department as well as women’s sports at Duke.

Amidst this sea of change at Duke, there were many things that brought students joy — like the Blue Devils defeating UNC 92-84 in basketball, and snowball fights in November.

The addition of the 1970s to the Duke Chronicle digital collection marks a milestone for the Digital Projects and Production Services Department. We can now provide you with a complete run of issues from 1959 to 1989, and the 1950s will be heading your way soon! We invite you to explore the 1970s issues and see for yourself how history unfolded across the nation and across Duke campus.

Post Contributed by Jessica Serrao

Behind the Scenes, Projects

Content Galore: the SNCC Digital Gateway’s Ongoing Challenge

February 5, 2016 Karlyn Forner

SDG_ContentLog — Google Drive content log for SNCC Digital Gateway.

So much content. Gobs of content. Never-ending ideas for more content. Content–how to produce, present, and connect it–it’s a challenge the SNCC Digital Gateway Project faces on a daily basis.

The SNCC Digital Gateway deals in two types of content.

First is the content written by the student Project Team under the direction of our SNCC Visiting Activist Scholar. This includes 600 – 700 word profiles of people, stories of events, audiovisual pieces exploring different perspectives in SNCC, and close-ups of the inner workings of SNCC as an organization. When the SNCC Digital Gateway debuts in December of 2016, it will feature over 150 profiles, 50 events, 9 audiovisual pieces, and 25 organizing SNCC pages.

Arrest record for Willie Ricks Individuals active in civil disturbances, vol. 1, ADAH

The second type of content in the SNCC Digital Gateway is the primary sources embedded within the profiles, events, and organizing SNCC pages. Each piece of written content features 6 – 8 digitized primary sources. These are items like the arrest record of SNCC field secretary Willie Ricks — “Extremely radical, militant individual”–, articles from SNCC’s newsletter, The Student Voice, or SNCC activists recounting their organizing experiences in the 1988 We Shall Not Be Moved conference at Trinity College.

Multiply the amount of written content by the number of embedded sources, and that totals well over 1500 items. And that’s only for the 2016 debut…2017 is devoted to producing more content! By the time the SNCC Digital Gateway is complete, it’s aiming to feature 300 – 400 profiles, 100 plus events, 50 organizing SNCC pages, and over 20 audiovisual pieces.

Producing so much content is a challenge in and of itself, and our resources have limits. But the SNCC Digital Gateway also needs to present these vast volumes of content in a user-friendly, intuitive way. One Person, One Vote, the pilot site to SDG, taught us a good deal about what works and doesn’t work in site architecture. We wanted the SNCC Digital Gateway to be more accessible to students and teachers, movement veterans and the general public. That meant providing users ways to explore by people and place, periods, organizations, and ideas. The Editorial Board and project staff have spent months hammering out how best to do that. We ended up with something that looks like this:

SDG_Wireframe — Wireframe for the SNCC Digital Gateway sketched on the whiteboard wall of the project room.

In mid-January, we met with Kompleks Creative, the designers of the SNCC Digital Gateway, to see what they thought was possible. Here’s an illustrative recount of the conversation about profiles and how to navigate through them using geography:

SDG:“We want users to be able to sort through profiles by state, region, county, or city, and we’d really like them to be able to get to counties and cities directly.

Kompleks: “How many counties are you talking about?”

SDG: “Probably 100 or more.”

Kompleks: “Wow. That’s not going to work.”

Don’t worry. We came to up with a good solution. But the fact that the SNCC Digital Gateway needs to handle 500 – 600 pieces of content when finished (never mind the thousands of embedded sources) is an ongoing hurdle. The design process is only beginning, so our site architecture questions are far from sorted out. But in the end, the SNCC Digital Gateway needs to bring SNCC’s history to life in a way that both channels how movement activists understood their work and is accessible and compelling for a new generation of young people, teachers, and scholars.

Good thing we’re only half a year into a three-year project.

Uncategorized

It’s Date Night Here at Digital Projects and Production Services

January 29, 2016 Maggie Dickson 1 Comment

Now, I know last week’s Bitstreams post about metadata and date encoding left you wanting to know more about Duke Digital Collections date metadata, the Extended Date Time Format, and how we are planning to apply it. Well, don’t you worry – I’m going to talk about it now in scintillating detail.

As Cory talked about last week, we have a lot of inconsistent, “squishy” date metadata in use in our digital collections, squishy both in terms of what those dates mean and how they are represented. This is a problem when you want to do fun stuff with your dates, like create facets and visualizations, or, um, retrieve reliably comprehensive search results when looking for everything from a time period. So we’re beginning the process of normalizing all of that data, but as we’re talking about special collections materials, date squishiness is not an uncommon occurrence, it is inherent to the materials, and we need to be able to represent it programmatically.

There are 12,146 unique date values present in our digital collections metadata, and these values range from very machine-processable – “January 1, 1936” or “1971-1972”, – to a lot less so – “[ca. late 1880s]” or “[1950s] Nov. 22]”, along with a plethora of values all meaning the same thing – “n.d.”, “None”, “undated”, “Unknown”, etc (along with one inscrutable instance of the word “Philadelphia”). In order to begin the process of normalizing the data, we identified the main patterns those dates took, and came up with a list of 38 rough patterns into which all but 178 values fell. Next we took a stab at converting those patterns to EDTF. The following represents the great bulk of our data:

Pattern	EDTF	Display Example
yyyy	yyyy	1910
yyyy-yyyy; yyyy/yyyy	yyyy/yyyy	1910-1913
mm/dd/yy; yyyy Mon. dd; yyyy-mm-dd; dd-Mon-yy	yyyy-mm-dd	January 1, 1910
yyyy Mon. dd?	yyyy-mm-dd?	January 1, 1910?
yyyy Mon dd-dd	yyyy-mm-dd/yyyy-mm-dd	January 1, 1910 to January 3, 1910
yyyy-mm; yyyy Mon.; yyyy/mm	yyyy-mm	January 1910
yyyy Mon.?	yyyy-mm?	January 1910?
ca. yyyy; circa yyyy	yyyy~	Circa 1910
yyyy?; [yyyy?]	yyyy?	1910?
[yyyy/yyyy?]	yyyy?/yyyy?	1910 to 1913?
yyyy or yyyy	[yyyy,yyyy]	1910 or 1913
yyyx; yyy?; yyy_?; yyy?; [yyy-]	yyyu	191x
yyyys	yyyx	1910s
circa yyyy-yyyy; yyyy-yyyy and n.d.	yyyy~/yyyy~	Circa 1910 to 1913

As you can see above, the specification accommodated most of the patterns we identified, but when we tried to encode more nuanced dates, we discovered that couldn’t quite take the encoding as far as we wanted.

For example, one pattern that shows up in our metadata frequently looks like this:

Circa 1940s-1950s

A decade encoded in EDTF looks like this:

194x (for 1940s)

and we can encode a circa date like this:

1940~ (for circa 1940)

But we can’t combine the two formats – the following is not a valid EDTF date:

194x~ (for circa 1940s)

Ideally, the specification would allow us to create an encoded date that looked like the above date, as well as this:

194x~/195x~ (for circa 1940s to 1950s)

We can work around this deficiency by stripping ‘circa’ from the date ranges and using the ‘unspecified’ encoding:

194u/195u (which we can translate to display as: 1940s to 1950s)

But this approach isn’t ideal, as it is inconsistent with our other usage of the format and isn’t technically ‘correct’, either. Happily, the EDTF specification is open for modification and proposals for modifications are still being taken. A quick glance at the listserv archives indicates that we’re not the only people trying to encode this kind of squishiness.

In the meantime, we can keep ourselves busy with cleaning up, normalizing, and converting the great bulk of our date metadata, as well as dealing with those 178 outliers individually. We still feel good about using EDTF – it’s a LOT better than our current date situation, and has some good room for improvement, as well. Pretty solid for a first date, I’d say.

Behind the Scenes, Digital Collections, Technology

Enjoy your Metadata: Fun with Date Encoding

January 22, 2016 Cory Lown

Dates are fascinating. They mark time. They’re based, imperfectly, on the pattern of the Earth’s planetary motion. (A year is a little bit longer than 365 days, so our calendar system adds leap years to compensate. Still, it’s off somewhat over thousands of years.) For digital collections it’s important to know when the original item was created or published, so we record date information in our metadata.

But our date metadata has a problem. It’s not consistent. The dates are generally human readable, but equivalent dates are formatted variously (10/4/2015 vs. Oct. 4, 2015 vs. 2015-10-4) and there are different levels of precision (1920s vs. June 1981 vs Spring 1972), and degrees of certainty about the accuracy of the date (circa 1931). While a person would generally be able to interpret what these dates mean, computers need a lot of instructions to do much of anything but display them. In their current form our date metadata is not consistently formatted or readily machine readable.

To fix this, we’re going to clean up our date metadata with a few goals in mind. We want dates to be searchable — if someone searches for ‘1957’ it would be great if the results included everything in our collection from that year. We also want to be able to: sort search results by date; provide a way to browse our collections by year; and display dates in human readable formats. To meet these goals we aim to transform our date metadata into a consistent, standard, and machine readable format across our digital collections.

Thankfully, there is an international standard for encoding dates, ISO 8601. In brief, ISO 8601 defines a standard way of representing dates and times. Better yet, the standard is implemented by the date and time libraries of many programming languages making ISO 8601 readily machine readable. At first glance, ISO 8601 seems like the obvious answer to encoding our date metadata, since it specifies a standard way of encoding dates of various level of precision — year (1975), month (1975-07) or day precision (1975-07-01).

Let’s see what Ruby (the programming language we’re using to develop our new digital collections platform) can do with ISO 8601 formatted date strings.

Here’s an easy case — an ISO formatted date string with day precision:

> d = Date.iso8601('1975-07-01') => #<Date: 1975-07-01 ((2442595j,0s,0n),+0s,2299161j)>

That output means that Ruby successfully parsed the date string into a date object, which I can use to do things like get information about the year:

> d.year => 1975

Determine whether it’s a Tuesday:

> d.tuesday? => true

And I can generate a reformatted date for display using the strftime method:

> d.strftime('%A, %B %-d, %Y') => "Tuesday, July 1, 1975"

So that’s nice. But there’s a problem. Ruby’s implementation of ISO 8601 is limited. It only handles dates with day precision (1975-07-01) and doesn’t know what to do with dates with only month or year precision.

Ruby will attempt to parse dates with month precision, but interprets the month as the day, contrary to the ISO standard:

> d = Date.iso8601('1975-07') => #<Date: 1975-01-07 ((2442420j,0s,0n),+0s,2299161j)> 2.1.5 :044 > d.day => 7 2.1.5 :045 > d.month => 1

For dates with year precision Ruby just throws up it hands:

> d = Date.iso8601('1975') ArgumentError: invalid date

Derp.

For many items in our the collection the precise date is unknown, but an approximate date can be assigned — e.g. “circa 1981.” In other cases we may at best be able provide decade or century levels of precision — “1920s,” “1900s,” etc. ISO 8601 doesn’t provide a way to express these more ambiguous dates.

Extended Date Time Format (EDTF) to the rescue!

EDTF is a draft specification of an extension to the ISO 8601 date standard to address some of the limitations of ISO 8601 and to provide a standard way to encode machine readable dates in ways that are useful to cultural heritage institutions. You can read the full draft on the Library of Congress website.

EDTF adds to ISO 8601 several different ways of specifying dates. A few that are important for our date metadata are shown in the table below.

EDTF encoding	meaning
1984?	uncertain: possibly the year 1984, but not definitely
1984~	approximately the year 1984
192x	decade of the 1920s
2001-21	Spring, 2001

Lucky us, the edtf-ruby gem adds EDTF support to Ruby. With edtf-ruby installed, I can do things like this:

Work with dates with month precision:

> d = Date.edtf('1975-07') => Tue, 01 Jul 1975

Ruby creates a Date object with the earliest day in July (the 1st), but it also knows that the precision for the date is month precision:

> d.month_precision? => true

> d.day_precision? => false

Based on this information I can decide how I want to format the date for display, probably something like this:

> d.strftime('%B %Y') => "July 1975"

EDTF also adds a way to encode a season, like “Spring 1975”:

> d = Date.edtf('1975-21') => #<EDTF::Season:0x007fcdd3072738 @year=1975, @season=:spring, @qualifier=nil>

Which can be formatted for display:

> d.to_date.strftime("#{d.season.capitalize} %Y") => "Spring 1975"

Or a decade:

> d = Date.edtf('192x') => #<EDTF::Decade:0x007fcdd2067668 @year=1920>

Which can be transformed for display in our public interface:

> d.to_date.strftime('%Ys') => "1920s"

Even though EDTF is still a draft standard, we’ve decided to use it as the encoding format for our date metadata for digital collections because it will allow us to express a wide range of date information in a machine readable format. By transforming our date metadata to EDTF and making use of the edtf-ruby gem to parse EDTF encoded date strings, we’ll be able to make our date metadata work harder — to provide sorting of results by date, searching, browsing and more flexible and consistent human readable display formats.

Announcements, Behind the Scenes, Collections, Technology

OHMS-in’ with H. Lee Waters’ Movies of Local People

January 15, 2016 Molly Bragg

Q: How is a silent H. Lee Waters film like an oral history recording?
A: Neither is text searchable.

But, leave it to oral historians to construct solutions for access to audiovisual resources of all stripes. No mistake, they’ve been thinking about it for a long time. Purposefully, profoundly non-textual at their creation, oral histories have since their postwar genesis contended with a central irony: as research they are exploited almost exclusively via textual transcription. Oral histories that don’t get transcribed get, instead, infamously ignored. So as the online floodgates have opened and digital media recorders and players have kept pace, oral historians have seen an opportunity to grapple meaningfully with closing the gap between the text and its source, and perhaps at the same time free the interview from the expectation that it should be transcribed.

Enter OHMS (http://www.oralhistoryonline.org/). In 2013, Doug Boyd at the University of Kentucky debuted the results of an IMLS-funded project to create the Oral History Metadata Synchronizer. A free, open-source tool, OHMS empowers even the smallest oral history archive to encode its media with textual information. The OHMS editor enables the oral historian to easily create item level metadata for an oral history recording, including an index or subject list that can drop a researcher into an interview at that selected point. OHMS can also timestamp an existing transcript, so that researchers can track the audio via the text. In its short life, OHMS has demonstrated a way to bridge the great divide among oral history theorists, which reads something like this: Should our focus be the audio or the transcript?

While it springs from the minds of oral historians, OHMS might more accurately be termed the Media Metadata Synchronizer. When I saw Doug’s presentation on OHMS at the Oral History Association meeting in 2013, two alternative uses immediately came to mind: OHMS had the potential to help us provide bilingual entry to the 3,500+ recordings in our Radio Haiti Collection (currently being digitzed), and it could dramatically enhance access to one of Duke’s great collections, the H. Lee Waters Films. Waters filmed his Movies of Local People in mostly smaller communities around North Carolina from 1936-1942, using silent reversal film stock. Waters’ effort to supplement his family’s income has over the intervening years become a major historical document of the state during the Great Depression. And yet as rich as the collection is, it is difficult for students, scholars, and filmmakers to find specific scenes or subjects among the thousands of two-second shots Waters put to film. Several years ago, an intern in the archive created shotlists for some of the films, but these existed independently of the films and were not terribly accurate in matching times since they were created using VHS tapes (and VHS players are notorious for displaying incorrect times). OHMS would give us the opportunity to update the shotlists we had and create some new ones, linking description to precise points within the films.

Implementing OHMS at Duke Libraries was a pleasure, mostly because I had the opportunity to work with my colleagues in Digital Projects and Production Services, an outstanding team that can do amazing things with our equally amazing archival resources. Recognizing the open-source spirit of OHMS, Sean Aery, Will Sexton, and Molly Bragg immediately saw how the system could help us get deeper into the Waters films without having to build out a complex infrastructure (or lay out lots of cash). And so, when the H. Lee Waters website went live last year with 35 hours of mostly undescribed digital video (although we did post those older shotlists too, where we had them), it was generally agreed that a phase two would happen sooner rather than later and include a pilot for OHMS shotlists. Rubenstein Audiovisual Intern Olivia Carteaux worked diligently through the spring to normalize existing shotlists and create new ones where possible. This necessitated breaking down the descriptive data we had into spreadsheets, so we could then “crosswalk” the description into the OHMS xml file that is at the heart of the system.

While the OHMS index viewer allows for metadata including title or description, partial transcript, segment synopsis, keywords, subjects, GPS coordinates and a link to a map, we concentrated on providing a descriptive sentence as the title and, where it was easy to find, the location of the action.

While on the face of it generating description for the H. Lee Waters films might seem fairly straightforward, we found a number of challenges in describing his silent moving images. For starters, given Waters’ quick edits, what would adequate frequency of description look like? A new descriptive entry at every cut would be extremely unwieldy. At the same time we recognized that without a spoken or textual counterpart to the image, every time we chose not to describe would deprive potential users of a “way in.” We settled on creating entries whenever the general scene or action changed; for instance, when Waters shifts from a scene on main street to one in front of a mill or school, or within the scene at a school when the action goes from schoolyard play to the pledge of allegiance. Sometimes the shifts are obvious, other times they are more subtle, so watching the action with a deep focus is necessary. We also created new entries whenever Waters created a trick shot, such as a split screen, a speed up or slow down of the action, a reverse shot, or a masking shot. Additionally, storefront signs, buildings, and landmarks also became good places to create entries, depending on their prominence; for these, too, we attempted to create GPS coordinates where we could easily do so.

Our second challenge was how much to invest in each description. “A picture is worth a thousand words” and “every picture tells a story” sum up much of the Waters footage, but brevity was of value to the workflow. One sentence, which did not have to be properly complete — a sort of descriptive bullet point — was decided on as our rule of thumb. In the next phase of this process I hope to use the keywords field more effectively, but that requires a controlled vocabulary, which brings me to our third challenge: normalizing description was the most difficult single piece of describing the films. Turns out there’s not a lot of library-based methodology for describing moving images, although there are general recommended approaches for describing images for the visually impaired. Then, of course, there’s the difficulty in deciding how to represent nuanced factors such as race, ethnicity, class, and gender. It is clear that in the event we undertake to create shotlists for all the Waters films, the first order of business will be to create a thesaurus of terms, to provide consistent description across the films.

When we felt like we had enough transformed shotlists for a pilot OHMS project for the Waters website, the OHMS player was loaded onto a server and the playlists uploaded. Links to the 29 shotlists were then placed below the video windows on their respective pages. To access the video and synchronized description, simply click on the link that says “Synchronized Shot List.” In this initial run we’re hoping to upload about 20 more shotlists, and at that point take a breath and see how we can improve on what we’ve accomplished. Given the challenges of presenting audiovisual resources online, there’s never really a “done,” only steady improvement. OHMS has provided what I believe is a clear step forward on access to the Waters films, and has the potential to help us transform other audiovisual collections into deeply mined treasures of the archive.

Post contributed by Craig Breaden, Audiovisual Archivist, Rubenstein Library

Digital Collections, Equipment, Technology

Future Retro: New Frontiers in Portability

January 8, 2016 Zeke Graves 1 Comment

Duke Libraries’ Digital Collections offer a wealth of primary source material, opening unique windows to cultural moments both long past and quickly closing. In my work as an audio digitization specialist, I take a particular interest in current and historical audio technology and also how it is depicted in other media. The digitized Duke Chronicle newspaper issues from the 1980’s provide a look at how students of the time were consuming and using ever-smaller audio devices in the early days of portable technology.

Sony introduced the Walkman in the U.S. in 1980. Roughly pocket-sized (actually somewhere around the size of a sandwich or small brick), it allowed the user to take their music on the go, listening to cassette tapes on lightweight headphones while walking, jogging, or travelling. The product was wildly successful and ubiquitous in its time, so much so that “walkman” became a generic term for any portable audio device.

The success of the Walkman was probably bolstered by the jogging/fitness craze that began in the late 1970s. Health-conscious consumers could get in shape while listening to their favorite tunes. This points to two of the main concepts that Sony highlighted in their marketing of the Walkman: personalization and privatization.

Previously, the only widely available portable audio devices were transistor radios, meaning that the listener was at the mercy of the DJ or station manager’s musical tastes. However, the Walkman user could choose from their own collection of commercially available albums, or take it a step further, and make custom mixtapes of their favorite songs.

The Walkman also allowed the user to “tune out” surrounding distractions and be immersed in their own private sonic environment. In an increasingly noisy and urbanized world, the listener was able to carve out a small space in the cacophony and confusion. Some models had two headphone jacks so you could even share this space with a friend.

One can see that these guiding concepts behind the Walkman and its successful marketing have only continued to proliferate and accelerate in the world today. We now expect unlimited on-demand media on our handheld devices 24 hours a day. Students of the 1980’s had to make do with a boombox and backpack full of cassette tapes.

Technology, User Experience

508 Update

December 17, 2015 Michael Daul 1 Comment

Web accessibility is something that I care a lot about. In the 15 some odd years that I’ve been doing professional web work, it’s been really satisfying to see accessibility increasingly becoming an area of focus and importance. While we’re not there yet, I am more and more confident that accessibility and universal design will be embraced not just an afterthought, but rather considered as essential and integrated at the first steps of a project.

Accessibility interests have been making headlines this past year, such as with the lawsuit filed against edX (MIT and Harvard). Whereas the edX lawsuit focused on section 504 of the Rehabilitation Act of 1973, the web world and accessibility are usually synonymous with section 508. The current guidelines were enacted in 1998 and badly in need of an update. In February of this year, the Access Board published a proposed update to the 508 standards. They are going to take a year or so to digest and evaluate all of the comments they have received. It’s expected that the new law will be published in the Federal Register around October of next year. Institutions will have six months to make sure they are compliant, which means everything needs to be ready to go around April of 2017.

I recently attended a webinar on the upcoming changes that was developed by the SSB Bart Group. Key areas of interest to me were as follows.

WCAG 2.0 will be base standard

The Web Content Accessibility Guidelines (WCAG) are general a more simplified yet also more strict set of guidelines for making content available to all users as compared to the existing 508 guidelines. The WCAG standard is adapted around the world, so the updated rule to section 508 means there will be an international focus on standards.

Focus on functional use instead of product type(s)

The rules will focus less on ‘prescriptive’ fixes and more on general approaches to making content accessible. The current rules are very detailed in terms of what sorts of devices need to do what. The new rule tends to favor user preferences in order to give users control. The goal being to try to enable the broadest range of users, including those with cognitive disabilities.

Non-web content is now covered

This applies to anything that will be publicly available from an institution, including things like PDFs, office documents, and so on. It also includes social media and email. One thing to note is that only the final document is covered, so working versions may not be accessible. Similarly, archival content is not covered unless it’s made available to the public.

Strengthened interoperability standards

These standards will apply to software and frameworks, as well as mobile and hybrid apps. However, it does not apply specifically to web apps, due to the WCAG safe harbor. But the end result should be that it’s easier for assistive technologies to communicate with other software.

Requirements for authoring tools to create accessible content

This means that editing tools like Microsoft office and Adobe Acrobat will need to output content that is accessible by default. Currently it can take a great deal of effort after the fact to make a document accessible. Often times content creators either lack the knowledge of how to make them, or can’t invest the time needed. I think this change should end up benefiting a lot of users.

In general, the intent and purpose of these changes help the 508 standards catch up to the modern world of technology. The hopeful outcome will be that accessibility is baked in to content from the start and not just included as an afterthought. I think the biggest motivator to consider is that making content accessible doesn’t just benefit disabled users, but rather it makes that content easier to use, find, etc. for everyone.