There’s a new whale in town

I’m going to take some time today to talk about CETEIcean, a new library that Raff Viglianti of MITH and I have been working on. We just pushed out the latest release this morning. CETEIcean grows out of work I’ve been doing for the Digital Latin Library to enable the online presentation of critical editions of Latin texts. Instead of transforming TEI XML into HTML for rendering in a browser, it does a 1::1 conversion of the XML into HTML Custom Elements. In practice what that means is that TEI tags, e.g. <text>, get turned into tags that look like <tei-text> and these can be registered with the browser so that it treats them as HTML. What this enables, in browsers that support it, is for the appearance and function of elements to be customized using CSS and JavaScript. For example, a <ref> element can be displayed as an HTML link, and when it’s clicked, the browser goes to the URL in its `@target` attribute.

CETEIcean does all this by using the emerging Web Components standards in browsers where they’re available, and falling back to plain old Javascript DOM manipulation where they aren’t. This means you should get the same behaviors in most browsers. It is something of a departure from normal TEI practice in that it doesn’t use any XSLT at all. TEI’s relationship with XSLT is a bit fraught: on the one hand, it’s indispensable, you pretty much have to do some sort of transformation on a TEI text to make it usable for your purpose, whether that purpose is to publish it on the web, to turn it into print-ready PDF, or to extract some information from it for further processing. On the other hand, particularly as it pertains to the web, XSLT is a niche technology that is mostly frozen in the state it was in 15 years ago. Browsers only support XSLT 1.0, and while XSLT itself has progressed through versions 2.0 and 3.0, support for those comes in the form of partially-to-entirely proprietary software. You can do XSLT 2.0 in the browser via Saxon-CE, but it is extremely clunky, and the recently announced XSLT 3.0 browser support via Saxon-JS requires a commercial version of Saxon to compile the stylesheets. There’s nothing wrong with selling software, of course, but it is a problem from the point of view of a primarily academic community that promulgates standards, distributes tools for working with those standards, and tries to foster collaboration in the development and improvement of those tools and standards. Or, to put it another way, I’m not going to distribute code or data that requires users to buy a software package to work with it, and the TEI isn’t either. XSLT 1.0 had a robust set of open and closed software support. 2.0’s support was much narrower, with some proprietary implementations and a single, partial (though robust) Open Source implementation from Saxonica. XSLT 3.0 looks to have no OSS implementation at all. So from my perspective, XSLT is still a useful technology, but one that isn’t aging well.

JavaScript/ECMAScript on the other hand, for all that there are flaws in the language, has an extremely robust community and ever-improving performance. Web browsers don’t seem to be on their way out either. These technologies seem like a better basis for new development in the markup publishing space. To use CETEIcean all you need is a TEI document as the source and to make an HTML wrapper with a bit of Javascript that includes the library and calls it to load the document. You can add your own CSS and your own behavioral customizations using very basic templates or JavaScript. Check out the demo page for examples.

One of the interesting implications of publishing TEI documents this way comes from the different affordances of the technologies. Using CETEIcean, the default behavior is for TEI to be displayed as it’s encoded, which puts some interesting constraints on how you might encode documents. To be sure, it’s perfectly possible to rearrange content using JavaScript, or to create derivative HTML markup. The DLL does this to generate an apparatus criticus from the inline <app> tags in the source, for example. But XSLT makes rearranging the source so simple that it’s almost the default behavior—you don’t really need to care about how things are laid out in the source because your transformation will just rewrite them for you. I’m coming to the conclusion though that this is not always a good thing, and that some of TEI’s constraints, the impact of which were smoothed over by XSLT’s transformative power, may lead you to model your texts in ways that are overly complicated. The other perhaps useful constraint CETEIcean provides is that it doesn’t let you throw away any of your text model. You can easily hide bits of your document using display: none in a CSS selector. But it will still be there and available, whereas with XSLT you tend to throw away the stuff you aren’t going to display. In DLL texts, we don’t display alternate witness readings in the main text, for example, but they’re still right there, and if you want to show an alternate reading instead of the base text, all you need do is turn the <rdg> into a <lem> and vice-versa. Having the full text model available in the browser means we can have functions that manipulate it in useful ways. The isomorphism of CETEIcean documents also means it should be possible to have things like robust annotation (since annotation targets should be able to be trivially mapped to the source documents) and in-browser editing (because we can easily turn the HTML back into TEI). In sum, it seem like this approach has a great deal of potential and starts to get us out of the cul-de-sac our XSLT dependency had put us in.

a collection of parts flying in loose formation