Archive for the ‘RSS’ Category

linkrdr oops

Thursday, March 1st, 2012

Yesterday when I went to log into linkrdr, my browser auto-filled in my Twitter handle and a password.  But the form said e-mail address, not username.  I tried anyway.  No go, back to same form. I tried authenticating with Twitter.  That worked and redirected me to a form asking for my e-mail address with a ‘Sign up’, not ‘Login’ submit button.  Submitting gave an error message about username and e-mail address already existing.  So I took the hint, went back to the login page and replaced my username (Twitter handle) with my e-mail address.  That worked.

linkrdr a week on

Saturday, February 25th, 2012

linkrdr is a Web app with goals similar to Amethyst and a very different implementation. I have been using it off and on for a week, mostly shadowing what I’ve already read in Amethyst. The linkrdr team is making rapid improvements. Between the two, it’s starting to reflect my interests. I’m still thrashing about a bit, not understanding what some of the links do. I’m not sure what the icon at the far right does and how that is different from opening up the item (clicking on the greater than sign at the far left or the title) and clicking on the links there.

I’m not ready to abandon Amethyst yet. Nor linkrdr. If you are interested in a smart RSS reader, give linkrdr a try.

linkrdr

Tuesday, February 21st, 2012

Linkrdr is a new Web RSS reader with claims similar to Amethyst and a very different approach. They went public yesterday and are making changes rapidly. After I had added a dozen feeds that I read often, they announce OPML import support. The initial ranking before I started reading anything was basically the more links in my feeds and in the posts they point to a topic, the higher the “relevance”. Reading several dozen posts did not seem to materially alter the order. This morning they announced a new ranking algorithm and the ordering of posts is starting to reflect my interests.

I still prefer Amethyst. Linkrdr shows the most promise of any other RSS reader I’ve tried.

Faraday?

Monday, February 6th, 2012

The most troublesome, ugly, bug infested code in Amethyst fetches RSS feeds in the face of Internet congestion, down servers, buggy RSS implementations, incorrect encodings, and so on. I’ve tried several times to clean it up, but it is dealing with messes, so it is messy. Mess is just part of its job. I’ve seen BMW repair shops where you could sit on the floor and eat a picnic without worrying about your clothes and the food safety. But I’ve never seen a garbage truck that didn’t smell.

Faraday is a library by Mislav, the author of mislav-will_paginate, a gem I (and many others) use for pagination in Ruby on Rails. It is a very nice package. Faraday is middleware for making HTTP request, the flip side of Rack, middleware for handling HTTP requests. This is an intriguing idea. I don’t think I can replace all my ugly RSS fetch code, but it looks like I can replace a lot of it and break the remainder up into smaller, more understandable bits.

Even Faster XML Parsing

Saturday, January 28th, 2012

I wrote Fast XML Parsing in Ruby over last summer. It has a number of optimizations in it, including combining a bunch of string compares into one regular expression (regex) compare. It has bothered me it still does a series of bunch of string and regex compares, one after another until a match. They could be combined into one (unreadable) regex, if there was a simple (and fast) way to determine which matched. Right now each regex or set of strings has a different action. A single regex could cover all the valid matches, but how to determine which action?

I mentioned creating a Domain Specific Language (DSL) for this situation. But there is already something like this, YACC (Yet Another Compiler Compiler). It has been around for decades in the Unix world. YACC handles LALR(1) languages. Regular expressions are a subset of LALR(1) languages.

Bison is an open source version of YACC with some additional features. Rbison claims to merge Ruby and Bison to produce a Ruby callable YACC parser (in C) with actions written in Ruby. A great solution, except it has been abandoned by it creator and his repository taken down. Rbison 0.0.7 is in a number of FreeBSD repositories. I looked at it and it shows some work was done, but it is a long way from being usable. The goal may be too ambitious or even impossible.

Racc is YACC written entirely in Ruby. It is usable and I am working on using it speed up the RSS and OPML parsers described in the article. The OPML parser is working – it passes the test suite and doesn’t blow up running the examples. I haven’t pushed it to github yet. I haven’t run any benchmarks yet, but my current thinking is that it will not be faster. It is pure Ruby, while the Ruby regex library routines is in C. I expect it can match a bunch of regex faster than a single Racc parser.

I expect to have the RSS parser converted to use Racc in the next few days. I’ll post the benchmark results when complete, which ever way they turn out.

State of Blog Search

Tuesday, December 6th, 2011

If you read this blog, you are probably interested in blogs.  Doc Searls has an insightful post about The Near-Death of Blog Search.  The key takeaway for me was that archiving content is great, but if it doesn’t show up in search engines, its value declines sharply.

Fast XML Parsing in Ruby

Tuesday, November 15th, 2011

Dr. Dobb’s Journal just published my article on Fast XML Parsing in Ruby  It is based on techniques I developed to speed up refreshing RSS feeds for Amethyst.  In the years since I first wrote the code and benchmarked the competition, some of them have improved their speed, but my code is still the fastest (though more specialized).

Google Reader

Thursday, November 3rd, 2011

Google Reader is Amethyst’s most visible competition.  A new release has been rolled out recently that is getting a lot of mostly negative notice.  Chris Wetherell was the original developer of Google Reader.  He has posted some of his thoughts about the direction Google Reader is going on Google Plus here.

I think his statement, “Reader is (was?) for information junkies; not just tech nerds. This market totally exists and is weirdly under-served (and is possibly affluent).”, is correct, but I haven’t found how to reach those people.  In fact, I am seriously considering shutting down Amethyst on the cloud and keeping it as my secret weapon.  There are still some things for me to learn running in the cloud, so for the moment it is going to stay up.

Your Own “River of News”

Wednesday, October 19th, 2011

Dave Winer, one of the originators of RSS, blogged about reaching River2 1.0, his “River of News” aggregator. What is it good for? I’ll let him explain:

Who would find this interesting: news organizations and journalism schools. Operating a river is a way to automate news gathering in your sphere of interest, your community. And for J-schools, it’s a way to give your students a head start on the news system of the future, which will surely operate in this fashion. Imho of course.

I certainly agree a “River of News” on topics of your choice is a valuable resource. Not quite sure why it is restricted to news organizations and journalism schools. I expect installation takes some technical skill (pre-built packages are available for Windows and MacOS), but it isn’t rocket science or brain surgery.

“River of News” is just one of the ways of viewing your feeds in Amethyst (the others are feed, rather than post oriented, also valuable).

Conditional GET Requests

Tuesday, September 13th, 2011

Amethyst fetches updates to all subscribed RSS feeds every hour.  And most of the time, there is nothing new.  This is a waste of bandwidth and CPU usage, both Amethyst’s and the RSS feed server.  I briefly looked into Conditional GET support (only get the data if it has changed).  I didn’t dig very deep and everything I found was about supporting responses to Conditional GETs in Rails, not making requests.  Finally it annoyed me enough to dig a little deeper.

The most useful resource I found was a link to RESTful Web Services in Google Books.  It isn’t specific to Rails, but the examples are in Ruby.  I’m using EventMachine HTTP Request to read RSS feeds and it has last_modified and etag accessors. Just store them in your response handling code, add the stored values to the headers:

	request = HttpRequest.new(channel)
	headers = {}
	headers['If-Modified-Since'] = channel.last_modified if channel.last_modified
	headers['If-None-Match'] = channel.etag if channel.etag
	http = request.get :head => headers

and handle a 304 (Not Modified) response code the same as a successful response with no new posts.

This cut the CPU by 80%, i.e. CPU load is 20% of what it was. This may have been premature optimization/scaling, but it was very satisfying and it will reduce my Amazon Web Services bill slightly. The developer(s) is one of the stakeholders you need to keep happy too.