Archive for the ‘RSS’ Category

linkrdr

Tuesday, February 21st, 2012

Linkrdr is a new Web RSS reader with claims similar to Amethyst and a very different approach. They went public yesterday and are making changes rapidly. After I had added a dozen feeds that I read often, they announce OPML import support. The initial ranking before I started reading anything was basically the more links in my feeds and in the posts they point to a topic, the higher the “relevance”. Reading several dozen posts did not seem to materially alter the order. This morning they announced a new ranking algorithm and the ordering of posts is starting to reflect my interests.

I still prefer Amethyst. Linkrdr shows the most promise of any other RSS reader I’ve tried.

Faraday?

Monday, February 6th, 2012

The most troublesome, ugly, bug infested code in Amethyst fetches RSS feeds in the face of Internet congestion, down servers, buggy RSS implementations, incorrect encodings, and so on. I’ve tried several times to clean it up, but it is dealing with messes, so it is messy. Mess is just part of its job. I’ve seen BMW repair shops where you could sit on the floor and eat a picnic without worrying about your clothes and the food safety. But I’ve never seen a garbage truck that didn’t smell.

Faraday is a library by Mislav, the author of mislav-will_paginate, a gem I (and many others) use for pagination in Ruby on Rails. It is a very nice package. Faraday is middleware for making HTTP request, the flip side of Rack, middleware for handling HTTP requests. This is an intriguing idea. I don’t think I can replace all my ugly RSS fetch code, but it looks like I can replace a lot of it and break the remainder up into smaller, more understandable bits.

Even Faster XML Parsing

Saturday, January 28th, 2012

I wrote Fast XML Parsing in Ruby over last summer. It has a number of optimizations in it, including combining a bunch of string compares into one regular expression (regex) compare. It has bothered me it still does a series of bunch of string and regex compares, one after another until a match. They could be combined into one (unreadable) regex, if there was a simple (and fast) way to determine which matched. Right now each regex or set of strings has a different action. A single regex could cover all the valid matches, but how to determine which action?

I mentioned creating a Domain Specific Language (DSL) for this situation. But there is already something like this, YACC (Yet Another Compiler Compiler). It has been around for decades in the Unix world. YACC handles LALR(1) languages. Regular expressions are a subset of LALR(1) languages.

Bison is an open source version of YACC with some additional features. Rbison claims to merge Ruby and Bison to produce a Ruby callable YACC parser (in C) with actions written in Ruby. A great solution, except it has been abandoned by it creator and his repository taken down. Rbison 0.0.7 is in a number of FreeBSD repositories. I looked at it and it shows some work was done, but it is a long way from being usable. The goal may be too ambitious or even impossible.

Racc is YACC written entirely in Ruby. It is usable and I am working on using it speed up the RSS and OPML parsers described in the article. The OPML parser is working – it passes the test suite and doesn’t blow up running the examples. I haven’t pushed it to github yet. I haven’t run any benchmarks yet, but my current thinking is that it will not be faster. It is pure Ruby, while the Ruby regex library routines is in C. I expect it can match a bunch of regex faster than a single Racc parser.

I expect to have the RSS parser converted to use Racc in the next few days. I’ll post the benchmark results when complete, which ever way they turn out.

State of Blog Search

Tuesday, December 6th, 2011

If you read this blog, you are probably interested in blogs.  Doc Searls has an insightful post about The Near-Death of Blog Search.  The key takeaway for me was that archiving content is great, but if it doesn’t show up in search engines, its value declines sharply.

Fast XML Parsing in Ruby

Tuesday, November 15th, 2011

Dr. Dobb’s Journal just published my article on Fast XML Parsing in Ruby  It is based on techniques I developed to speed up refreshing RSS feeds for Amethyst.  In the years since I first wrote the code and benchmarked the competition, some of them have improved their speed, but my code is still the fastest (though more specialized).

Google Reader

Thursday, November 3rd, 2011

Google Reader is Amethyst’s most visible competition.  A new release has been rolled out recently that is getting a lot of mostly negative notice.  Chris Wetherell was the original developer of Google Reader.  He has posted some of his thoughts about the direction Google Reader is going on Google Plus here.

I think his statement, “Reader is (was?) for information junkies; not just tech nerds. This market totally exists and is weirdly under-served (and is possibly affluent).”, is correct, but I haven’t found how to reach those people.  In fact, I am seriously considering shutting down Amethyst on the cloud and keeping it as my secret weapon.  There are still some things for me to learn running in the cloud, so for the moment it is going to stay up.

Your Own “River of News”

Wednesday, October 19th, 2011

Dave Winer, one of the originators of RSS, blogged about reaching River2 1.0, his “River of News” aggregator. What is it good for? I’ll let him explain:

Who would find this interesting: news organizations and journalism schools. Operating a river is a way to automate news gathering in your sphere of interest, your community. And for J-schools, it’s a way to give your students a head start on the news system of the future, which will surely operate in this fashion. Imho of course.

I certainly agree a “River of News” on topics of your choice is a valuable resource. Not quite sure why it is restricted to news organizations and journalism schools. I expect installation takes some technical skill (pre-built packages are available for Windows and MacOS), but it isn’t rocket science or brain surgery.

“River of News” is just one of the ways of viewing your feeds in Amethyst (the others are feed, rather than post oriented, also valuable).

Conditional GET Requests

Tuesday, September 13th, 2011

Amethyst fetches updates to all subscribed RSS feeds every hour.  And most of the time, there is nothing new.  This is a waste of bandwidth and CPU usage, both Amethyst’s and the RSS feed server.  I briefly looked into Conditional GET support (only get the data if it has changed).  I didn’t dig very deep and everything I found was about supporting responses to Conditional GETs in Rails, not making requests.  Finally it annoyed me enough to dig a little deeper.

The most useful resource I found was a link to RESTful Web Services in Google Books.  It isn’t specific to Rails, but the examples are in Ruby.  I’m using EventMachine HTTP Request to read RSS feeds and it has last_modified and etag accessors. Just store them in your response handling code, add the stored values to the headers:

	request = HttpRequest.new(channel)
	headers = {}
	headers['If-Modified-Since'] = channel.last_modified if channel.last_modified
	headers['If-None-Match'] = channel.etag if channel.etag
	http = request.get :head => headers

and handle a 304 (Not Modified) response code the same as a successful response with no new posts.

This cut the CPU by 80%, i.e. CPU load is 20% of what it was. This may have been premature optimization/scaling, but it was very satisfying and it will reduce my Amazon Web Services bill slightly. The developer(s) is one of the stakeholders you need to keep happy too.

Competing with Zombies

Tuesday, May 24th, 2011

It would be nice if Amethyst showed up in “Web RSS reader” searches in the first page or two.  It doesn’t.  I checked and gave up after 26 pages.  So what is there?  Some blog posts from the height of RSS media attention, 2007-2008, that are still relevant.  But much of the search results is zombies, companies that are no longer in business or no longer in the Web RSS reader business, e.g., PostRank.  Google gives brownie points for longevity.  They may want to rethink that.

Facebook Backtracks on RSS Feeds

Sunday, May 22nd, 2011

The (admittedly quiet) uproar about Twitter and Facebook moving away from supporting RSS feeds (e.g,
Twitter and Facebook Both Quietly Kill RSS, Completely) has had an effect. Facebook has put back links to the RSS feeds for public pages, see Facebook Listens. RSS Added Back to Pages. Will Twitter be next?. Thank you Facebook.

In comments to the earlier post, a Facebook engineer noted that developers prefer the JSON APIs.  True, and non-developers and non-Facebook specific apps are shut out by a proprietary API.  Both Facebook and Twitter are big enough that they might think that of course everyone will support their API.

Amethyst may support the Twitter API in the future, but we can support Twitter’s RSS feeds immediately.  RSS’s expected 1 hour refresh is not the best match for Twitter’s near real-time nature, but it is an easy first step.  I think of a ladder with doable distances between rungs and an easy first step.  Too many people think when they add another rung (“raising the bar”) they have to cut off the bottom rungs to do it.  Supporting RSS feeds is not terribly resource intensive.  I see little to gain by dropping legacy support and something to lose by excluding potential clients.

With RSS feeds, Amethyst can support Twitter immediately.  And we have a lot of work invested in reliable RSS in the face of timeout, redirections, not quite standard RSS/XML, etc.  Doing the same for just one non-RSS source is not inviting without some proven interest. The next rung is relatively easy, formatting the Twitter RSS feeds more like the typical Twitter client (i.e. no redundant title and description).  Straightforward, though not as easy as it looks; the changes reach half way through the Rails stack,from HTML templates almost all the way down to the database.  Moving to support of Twitter’s real-time nature and the Twitter native API requires changes all the way up and down the Rails stack.  It is made more inviting because several envisioned changes require the same deep changes.

Amethyst is over 5 years old.  Many reasonable current uses just didn’t exist back then, e.g. supporting smart phones and other mobile devices.