Bitstreams: The Digital Collections Blog
Duke University Libraries
Skip to content
  • Subscribe
descriptive image
Behind the Scenes, Digital Collections, Technology

Enjoy your Metadata: Fun with Date Encoding

January 22, 2016 Cory Lown

Inter-grav

Dates are fascinating. They mark time. They’re based, imperfectly, on the pattern of the Earth’s planetary motion. (A year is a little bit longer than 365 days, so our calendar system adds leap years to compensate. Still, it’s off somewhat over thousands of years.) For digital collections it’s important to know when the original item was created or published, so we record date information in our metadata.

But our date metadata has a problem. It’s not consistent. The dates are generally human readable, but equivalent dates are formatted variously (10/4/2015 vs. Oct. 4, 2015 vs. 2015-10-4) and there are different levels of precision (1920s vs. June 1981 vs Spring 1972), and degrees of certainty about the accuracy of the date (circa 1931). While a person would generally be able to interpret what these dates mean, computers need a lot of instructions to do much of anything but display them. In their current form our date metadata is not consistently formatted or readily machine readable.

Screen Shot 2016-01-22 at 3.40.04 PM

To fix this, we’re going to clean up our date metadata with a few goals in mind. We want dates to be searchable — if someone searches for ‘1957’ it would be great if the results included everything in our collection from that year. We also want to be able to: sort search results by date; provide a way to browse our collections by year; and display dates in human readable formats. To meet these goals we aim to transform our date metadata into a consistent, standard, and machine readable format across our digital collections.

Thankfully, there is an international standard for encoding dates, ISO 8601. In brief, ISO 8601 defines a standard way of representing dates and times. Better yet, the standard is implemented by the date and time libraries of many programming languages making ISO 8601 readily machine readable. At first glance, ISO 8601 seems like the obvious answer to encoding our date metadata, since it specifies a standard way of encoding dates of various level of precision — year (1975), month (1975-07) or day precision (1975-07-01).

xkcd ISO 8601

Let’s see what Ruby (the programming language we’re using to develop our new digital collections platform) can do with ISO 8601 formatted date strings.

Here’s an easy case — an ISO formatted date string with day precision:

> d = Date.iso8601('1975-07-01')
=> #<Date: 1975-07-01 ((2442595j,0s,0n),+0s,2299161j)>

That output means that Ruby successfully parsed the date string into a date object, which I can use to do things like get information about the year:

> d.year
=> 1975

Determine whether it’s a Tuesday:

> d.tuesday?
=> true

And I can generate a reformatted date for display using the strftime method:

> d.strftime('%A, %B %-d, %Y')
=> "Tuesday, July 1, 1975"

So that’s nice. But there’s a problem. Ruby’s implementation of ISO 8601 is limited. It only handles dates with day precision (1975-07-01) and doesn’t know what to do with dates with only month or year precision.

Ruby will attempt to parse dates with month precision, but interprets the month as the day, contrary to the ISO standard:

> d = Date.iso8601('1975-07')
=> #<Date: 1975-01-07 ((2442420j,0s,0n),+0s,2299161j)>
2.1.5 :044 > d.day
=> 7
2.1.5 :045 > d.month
=> 1

For dates with year precision Ruby just throws up it hands:

> d = Date.iso8601('1975')
ArgumentError: invalid date

Derp.

For many items in our the collection the precise date is unknown, but an approximate date can be assigned — e.g. “circa 1981.” In other cases we may at best be able provide decade or century levels of precision — “1920s,” “1900s,” etc. ISO 8601 doesn’t provide a way to express these more ambiguous dates.

Extended Date Time Format (EDTF) to the rescue!

EDTF is a draft specification of an extension to the ISO 8601 date standard to address some of the limitations of ISO 8601 and to provide a standard way to encode machine readable dates in ways that are useful to cultural heritage institutions. You can read the full draft on the Library of Congress website.

EDTF adds to ISO 8601 several different ways of specifying dates. A few that are important for our date metadata are shown in the table below.

EDTF encoding meaning
1984? uncertain: possibly the year 1984, but not definitely
1984~ approximately the year 1984
192x decade of the 1920s
2001-21 Spring, 2001

Lucky us, the edtf-ruby gem adds EDTF support to Ruby. With edtf-ruby installed, I can do things like this:

Work with dates with month precision:

> d = Date.edtf('1975-07')
=> Tue, 01 Jul 1975

Ruby creates a Date object with the earliest day in July (the 1st), but it also knows that the precision for the date is month precision:

> d.month_precision?
=> true

> d.day_precision?
=> false

Based on this information I can decide how I want to format the date for display, probably something like this:

> d.strftime('%B %Y')
=> "July 1975"

EDTF also adds a way to encode a season, like “Spring 1975”:

> d = Date.edtf('1975-21')
=> #<EDTF::Season:0x007fcdd3072738 @year=1975, @season=:spring, @qualifier=nil>

Which can be formatted for display:

> d.to_date.strftime("#{d.season.capitalize} %Y")
=> "Spring 1975"

Or a decade:

> d = Date.edtf('192x')
=> #<EDTF::Decade:0x007fcdd2067668 @year=1920>

Which can be transformed for display in our public interface:

> d.to_date.strftime('%Ys')
=> "1920s"

Even though EDTF is still a draft standard, we’ve decided to use it as the encoding format for our date metadata for digital collections because it will allow us to express a wide range of date information in a machine readable format. By transforming our date metadata to EDTF and making use of the edtf-ruby gem to parse EDTF encoded date strings, we’ll be able to make our date metadata work harder — to provide sorting of results by date, searching, browsing and more flexible and consistent human readable display formats.

metadata

Post navigation

Previous PostOHMS-in’ with H. Lee Waters’ Movies of Local PeopleNext PostIt’s Date Night Here at Digital Projects and Production Services

Notes from the Duke University Libraries Digital Projects Team

Welcome to Bitstreams!

News, pictures and digital projects know-how served weekly!

Be sure to also visit Duke Digital Collections.

Featured Posts

  • descriptive image Respectfully Yours: A Deep Dive into Digitizing the Booker T. Washington Collection
  • descriptive image Wars of Aliens, Men, and Women: or, Some Things we Digitized in the DPC this Year
  • Understanding the experiences and needs of Black students at Duke
  • descriptive image Labor in the Time of Coronavirus

Recent Posts

  • How we broke up with Basecamp
  • Behind the Scenes of Documenting the Patron Request Workflow
  • Looking Back at Summer Camp 2023
  • 2.5 Years in the Life of Digital Collections
  • Two Years In: The Finish Line Approaches for Digitizing Behind the Veil

Search Bitstreams

Categories

Tags

  • 3D
  • adaccess
  • adviews
  • alexharris
  • audio
  • behindtheveil
  • broadsides
  • digital collections
  • Digital preservation
  • dukechapel
  • dukechronicle
  • Duke Digital Repository
  • eaa
  • earlymss
  • finding aids
  • frankespada
  • gamble
  • gedney
  • hasm
  • hleewaters
  • hyrax
  • Integrated Library System
  • italianposters
  • jwtfordmotorads
  • karales
  • kwilecki
  • metadata
  • morphosource
  • oaaaarchives
  • oaaaslidelibrary
  • outdooradvertising
  • projects
  • Repository
  • Research Data
  • rights management
  • Rubenstein Section A
  • rubenstein_fugitivesheets
  • SNCC Digital Gateway
  • SNCC Legacy Project
  • stryker
  • The Migration
  • Tripod3
  • university archives
  • W. Duke Sons & Co.
  • wdukesons

Learn more about our commitment to inclusive description of library collections.

Bitstreams: The Digital Collections Blog