Atom feed  Subscribe

Pipes experiment

I decided to have a little play around with Pipes a few days ago for the first time. I had a little problem that I thought would be relatively easy to solve using Pipes.

First of all let me say I love pipes, I love the whole idea of feed mashing and have been fascinated with the concept for a while. Pipes however make such feed mashing child's play, no code involved (OK you can insert some regex). I was not going to go to deep in this particular session however as the problem I had set seemed simple enough. But in implementing I learnt an important lesson about taking such a high level graphical view, one I had perhaps forgotten. With all of this simple drag and drop connecting of the pipe components, my brain actually forgot to engage on the normal analysis I would undergo during even the most simple programming task. As a result I completed the design within 10 minutes and set it running. Only several days later did I realise the fundamental error in the design, one which any casual pipes user could easily make.

Here is the screen shot of the design :

screenshot_01

My goal for the pipe is to alert me of posts on blogs that may mention Folknology allowing me to respond to them. Thus the simple version of this gathers a feed from Technorati and another from Google blog search. Those URL components enable me to set the search criteria, which the fetch components then gather. The resulting feed is then mashed by a 'Union' pipe into a single feed. I then take the union'd feed out through a 'unique' passing or blocking filter. This latter pipe component eliminates duplicates that occur across both sources of posts (Technorati and Google) as many of the same posts are discovered by each search engine. After some basic testing (couple of minutes) I was happy it was performing as expected, I then subscribed to the feed output so I could monitor my new pipe.

Initially the results were fine, but fairly soon I saw the same posts being repeated, and I figured that I had incorrectly configured the Unique filter, but subsequent investigation showed no such configuration error. The problem occurring here is due of the filter not being able to remember critical data, these pipes are actually stateless in many ways, so the unique filter only eliminates identical posts that are present concurrently, but in reality the posts may appear from each source at different times to the other source, so as far as the filter is concrened the latter occurrence is unique. Thus over the time I could receive the same post but separated by a period of days. The unique filter should be called a concurrent unique filter, or unique only over a small period of time. The subtlety of this difference can cause rather large errors in results as you can imagine, it in facts results in quite unexpected behavior.

Of course as a programmer tackling this problem I would have written the program around a lookup table of the feed url's, each new entry in the feeds would be compared to this lookup table to see if it had been captured or not, the lookup table in return would be stored, perhaps in a database or db file. As a programmer state is built in as variables or objects which can be provided with persistence (be recorded between runs), what pipes is not providing as standard is this level of functionality. Pipes is like a form of functional programming but with no memory, thus for common group of problems it does not perform well. The trouble is that this is not conveyed to the user of pipes, your average business users for example would have little pre warning of such issues, and would probably discover it only after testing. The lesson here is really a simple one "you don't know what you don't know and until you are shown what you didn't think of". For a system like pipes to be useful as a business tool it must somehow build the solutions to this class of problems into it's architecture. In other words the limitations like these need to be up front displayed in a way that is simply visualised. In addition direct community involvement by those more familiar with the pitfalls should be present at the initial engagement, like a social hand-holding perhaps by example, in order to make the journey more painless and to help the user gain confidence with the tools.

This is only a cursory glance at pipes and I would love to hear from others that may have experimented with it, I would also love to hear opinions on these class of problems that I am sure Pipes, Teqlo, Coghead etc.. are all going to experience with novice participants. What complexities are being passed over laying in wait for the unwary participant?

Technorati Tags: , , , , , ,




Add a comment Send a TrackBack