Six or seven years ago, we discovered a handy new data mashup service from Yahoo! called Yahoo! Pipes. It had a slick drag-n-drop visual programming interface that made it easy to grab data from a bunch of different live sources, then combine, reshape, and conditionally change it into a new dynamic feed modeled however we happened to need it. “Pipes” was a perfect name, a nod to the | (pipe) character used in Unix to chain command-line inputs and outputs, and evocative of the blue pipes you would drag to connect modules in the Pipes UI to funnel data from one to another. It was—quite literally—a series of tubes.
Over the years, we grew to rely on Yahoo! Pipes’ data-mashing wizardry for several features central to the presentation of information on our library website. If you’ve read Bitstreams in the past, you probably have followed a link that was shuttled through Pipes before ultimately being rendered on the website.
Here’s are some of the things we had done in the library website that Pipes made possible:
- Library Events. Make a single library-sponsored event RSS feed combining raw XML data from the Duke University Events Calendar with RSS feeds from six or more departmental calendars.
- New Additions. Create media-rich RSS feeds of New Additions (by category) to the library catalog by mashing raw XML into MediaRSS.
- Blogs. Combine RSS feeds from ten or more library blogs into one shared feed.
- Jobs. Create a shared RSS feed of library job postings matching any of four job types.
Imagine our dismay in June, when Yahoo! announced it was pulling the plug on Pipes, shutting it down for good in September. In our scramble to find a suitable replacement, we saw Huginn as the best alternative.
Cleverly named after a raven in Norse mythology, Huginn is an open-source data mashup application. It can do a lot of the things Yahoo! Pipes could, but it’s also quite different.
Similarities to Yahoo! Pipes
- Collect data from various sources on the web and transform it
- Combine disparate data into a single stream
- Emit a new customized feed at a URL for other services to access
Differences from Pipes
- No visual editor; instead, you hand-code JSON to configure
- Open source rather than hosted; you have to run it yourself
- Constantly being improved by developers worldwide
- A Ruby on Rails app; can be forked/customized as needed
To recreate each feed we’d built in Pipes, we had to build two kinds of Huginn Agents: one or more “Website Agents” to gather and extract the data we need, then a “Data Output Agent” to publish a new customized feed. Agents are set up by writing some configuration rules structured as JSON.
Huginn description: “The Website Agent scrapes a website, XML document, or JSON feed and creates Events based on the results.”
With a Website Agent, we’re gathering data from a source (for us, typically RSS or raw XML). We specify a URL, then start structuring what elements we want to extract using XPath expressions.
Data Output Agent
Huginn description: The Data Output Agent outputs received events as either RSS or JSON. Use it to output a public or private stream of Huginn data.
The Data Output Agent uses one or more Website Agents as data sources. We configure some rules about what to expose and can further refine the data in the output using Liquid Templating. In the case of New Additions to the catalog, it’s here where we make a <media:content> element in our feed and assemble a URL to a cover image from bits of data extracted from the raw XML.
So far, so good. Huginn is now successfully powering most of the feeds that we had previously managed through Yahoo! Pipes. We look forward to seeing what kinds of features are added by the developer community.
Shoutouts to Cory Lown & Michael Daul for all their work in helping make the transition from Pipes to Huginn.