Friday, November 11, 2011

Simplifying RSS feeds with Yahoo! pipes

About two years ago a friend finally convinced me to start using an RSS reader to manage the list of web sites that I was browsing on a daily basis. RSS stands for Really Simple Syndication and is a very lightweight way for a web site to list it's content as it's updated. You'll often see little icons like this that denote that a RSS "feed" is available

RSS Icon
I'd tried using RSS feeds before with Firefox, but the interface is not very good. It basically turns every feed into a tree of bookmarks and you've got to scan your bookmarks to see if there is new content. Easier than surfing to a bunch of sites, but not a lot.

Enter Google Reader, a dedicated RSS feed reader that looks and acts a lot like GMail or any other e-mail reader. Google Reader gives you folders that all your feeds are in and an "Inbox" that lets your see what's there. You click on a title to read it and when you've read everything in a feed, a group of feeds, or your whole list of feeds you click "Mark all as Read" and, viola, your inbox is cleared out. The interface looks something like this:
Google Reader. This is a slightly older pic, the interface looks a little different and this person has no folders of feeds set up.
So, it's all well and good. You can sign up for hundreds of feeds, peruse thousands of articles on the web every day, and spend very little time doing it. Google Reader tells you what articles of a feed you read. For some feeds, I read almost 100% of the articles, for others, less than 10%.

So all was well and good for a year or so, but I increasingly wanted something a little more flexible. Two cases in point:
  1. I like to read a number of opinion columnists. Several of them work for the NY Times. They each have their own feed (see here for a list of all NYT feeds), but that would mean subscribing to several feeds. While it's easy to mark everything read, marking some things read while leaving others unread is a bit of a pain. The Times also supplies a feed of ALL it's columnists, but it doesn't give any indication in the feed of who wrote the column. I have no interest in Maureen Dowd, but I can't filter out what she wrote without clicking through to a web page to see if she wrote it. Plus there are columnists for other newspapers I also want to read.
  2. There are a number of "great deal" type RSS feeds out there, but they suffer from the same problems. Either they mix hard drives and cereal in the same feed (guess which one I'm more interested in) or they have lots of very detailed feeds, so I might want to follow 10 of them.
So I started looking around for a way to aggregate a bunch of feeds into something that Google Reader would see as a single feed. After finding a few simple services, I stumbled upon Yahoo! Pipes, which is a very full featured way of doing this. Pipes gets its name from the Unix concept of pipes, which is a way of combining a number of simple programs each feeding it's output into the input of the next program. You can do very useful things very quickly this way. For instance, you might do this: cat file.txt | split | sort | uniq -c | sort -n -k 2 which would take a file, split it into individual words, sort them, remove duplicates and count how many times each word is used, and finally sort again by how many times each word was used. You'd end up with an ordered list of how often you used each word.

Pipes extends this concept to working with RSS feeds and providing an RSS feed on the back end. So the simplest thing I can do is just aggregate a bunch of feeds into one. For my columnists I go one step further. For each feed, I prepend the name of the columnist to the title before aggregating it, so I see entries like "Brooks: Let's All Feel Superior" and I never see "Dowd: Why I Wrote Another Snarky Column". So you can see what I'm talking about, here is my columnist feed.

The best part of working with Pipes is that the editor is graphical where you drag little boxes around and connect them by drawing pipes from the output of one box to the input of another. At the bottom of the screen is a debugger that will tell you the output of each box. This video demonstrates the basic concept, although it's a bit dry.

For my deals, I took it another step forward. I do something similar breaking down things by sources and only the sources I want. But that are also things I am actively looking to buy. So I put together a massive feed of about every deal RSS out on the web. Then I can split that feed apart and search items for matching patterns (like the next camera I want to buy) and recombine those into items in my overall deals list prepended with ALERT! Basically, I've got this machinery scouring the web looking for my next camera, all with one time setup. I actually do this by using Yahoo! Pipes to chain together three existing pipes into one (it's easier to make some things modular). Here is my final deals feed.

One issue I did find with Google Reader plus Pipes is that Google Reader does not fetch a feed when you decide to read it. It fetches feeds periodically and caches them. And it decides how often to do this based on how many other people follow the same feeds. Since my feeds basically have an audience of one, Google would like to fetch it only every few hours. However, if you select only one feed and hit the refresh button, Google will re-fetch that feed and display it. For some things, a few hours old is fine, but for time sensitive information, this forced re-fetch is nice. The only downside is having to hone in on one feed before using it (refreshing everything won't trigger it).

Pipes can do a lot more, like turning web pages without feeds into feeds, processing data in other formats, etc. I'm sure I'll be exploring that in the future, but for now I'm really happy that I've been able to reduce the number of feeds I follow, reduce the number of items I see, and increase their relevance. If you are a regular RSS feed user and feel like they are getting hard to manage, take a look at Pipes.

No comments:

Post a Comment