Dates are fascinating. They mark time. They’re based, imperfectly, on the pattern of the Earth’s planetary motion. (A year is a little bit longer than 365 days, so our calendar system adds leap years to compensate. Still, it’s off somewhat over thousands of years.) For digital collections it’s important to know when the original item was created or published, so we record date information in our metadata.
But our date metadata has a problem. It’s not consistent. The dates are generally human readable, but equivalent dates are formatted variously (10/4/2015 vs. Oct. 4, 2015 vs. 2015-10-4) and there are different levels of precision (1920s vs. June 1981 vs Spring 1972), and degrees of certainty about the accuracy of the date (circa 1931). While a person would generally be able to interpret what these dates mean, computers need a lot of instructions to do much of anything but display them. In their current form our date metadata is not consistently formatted or readily machine readable.
To fix this, we’re going to clean up our date metadata with a few goals in mind. We want dates to be searchable — if someone searches for ‘1957’ it would be great if the results included everything in our collection from that year. We also want to be able to: sort search results by date; provide a way to browse our collections by year; and display dates in human readable formats. To meet these goals we aim to transform our date metadata into a consistent, standard, and machine readable format across our digital collections.
Thankfully, there is an international standard for encoding dates, ISO 8601. In brief, ISO 8601 defines a standard way of representing dates and times. Better yet, the standard is implemented by the date and time libraries of many programming languages making ISO 8601 readily machine readable. At first glance, ISO 8601 seems like the obvious answer to encoding our date metadata, since it specifies a standard way of encoding dates of various level of precision — year (1975), month (1975-07) or day precision (1975-07-01).
Let’s see what Ruby (the programming language we’re using to develop our new digital collections platform) can do with ISO 8601 formatted date strings.
Here’s an easy case — an ISO formatted date string with day precision:
> d = Date.iso8601('1975-07-01')
=> #<Date: 1975-07-01 ((2442595j,0s,0n),+0s,2299161j)>
That output means that Ruby successfully parsed the date string into a date object, which I can use to do things like get information about the year:
> d.year
=> 1975
Determine whether it’s a Tuesday:
> d.tuesday?
=> true
And I can generate a reformatted date for display using the strftime method:
> d.strftime('%A, %B %-d, %Y')
=> "Tuesday, July 1, 1975"
So that’s nice. But there’s a problem. Ruby’s implementation of ISO 8601 is limited. It only handles dates with day precision (1975-07-01) and doesn’t know what to do with dates with only month or year precision.
Ruby will attempt to parse dates with month precision, but interprets the month as the day, contrary to the ISO standard:
> d = Date.iso8601('1975-07')
=> #<Date: 1975-01-07 ((2442420j,0s,0n),+0s,2299161j)>
2.1.5 :044 > d.day
=> 7
2.1.5 :045 > d.month
=> 1
For dates with year precision Ruby just throws up it hands:
> d = Date.iso8601('1975')
ArgumentError: invalid date
Derp.
For many items in our the collection the precise date is unknown, but an approximate date can be assigned — e.g. “circa 1981.” In other cases we may at best be able provide decade or century levels of precision — “1920s,” “1900s,” etc. ISO 8601 doesn’t provide a way to express these more ambiguous dates.
Extended Date Time Format (EDTF) to the rescue!
EDTF is a draft specification of an extension to the ISO 8601 date standard to address some of the limitations of ISO 8601 and to provide a standard way to encode machine readable dates in ways that are useful to cultural heritage institutions. You can read the full draft on the Library of Congress website.
EDTF adds to ISO 8601 several different ways of specifying dates. A few that are important for our date metadata are shown in the table below.
EDTF encoding | meaning |
---|---|
1984? | uncertain: possibly the year 1984, but not definitely |
1984~ | approximately the year 1984 |
192x | decade of the 1920s |
2001-21 | Spring, 2001 |
Lucky us, the edtf-ruby gem adds EDTF support to Ruby. With edtf-ruby installed, I can do things like this:
Work with dates with month precision:
> d = Date.edtf('1975-07')
=> Tue, 01 Jul 1975
Ruby creates a Date object with the earliest day in July (the 1st), but it also knows that the precision for the date is month precision:
> d.month_precision?
=> true
> d.day_precision?
=> false
Based on this information I can decide how I want to format the date for display, probably something like this:
> d.strftime('%B %Y')
=> "July 1975"
EDTF also adds a way to encode a season, like “Spring 1975”:
> d = Date.edtf('1975-21')
=> #<EDTF::Season:0x007fcdd3072738 @year=1975, @season=:spring, @qualifier=nil>
Which can be formatted for display:
> d.to_date.strftime("#{d.season.capitalize} %Y")
=> "Spring 1975"
Or a decade:
> d = Date.edtf('192x')
=> #<EDTF::Decade:0x007fcdd2067668 @year=1920>
Which can be transformed for display in our public interface:
> d.to_date.strftime('%Ys')
=> "1920s"
Even though EDTF is still a draft standard, we’ve decided to use it as the encoding format for our date metadata for digital collections because it will allow us to express a wide range of date information in a machine readable format. By transforming our date metadata to EDTF and making use of the edtf-ruby gem to parse EDTF encoded date strings, we’ll be able to make our date metadata work harder — to provide sorting of results by date, searching, browsing and more flexible and consistent human readable display formats.