Archive for the ‘RoR’ Category

Faraday?

Monday, February 6th, 2012

The most troublesome, ugly, bug infested code in Amethyst fetches RSS feeds in the face of Internet congestion, down servers, buggy RSS implementations, incorrect encodings, and so on. I’ve tried several times to clean it up, but it is dealing with messes, so it is messy. Mess is just part of its job. I’ve seen BMW repair shops where you could sit on the floor and eat a picnic without worrying about your clothes and the food safety. But I’ve never seen a garbage truck that didn’t smell.

Faraday is a library by Mislav, the author of mislav-will_paginate, a gem I (and many others) use for pagination in Ruby on Rails. It is a very nice package. Faraday is middleware for making HTTP request, the flip side of Rack, middleware for handling HTTP requests. This is an intriguing idea. I don’t think I can replace all my ugly RSS fetch code, but it looks like I can replace a lot of it and break the remainder up into smaller, more understandable bits.

ActiveRecord callback methods deprecated in 2.3.8

Saturday, January 21st, 2012

According to this, instance callback instance were deprecated, starting in Rails 2.3.8. An instance callback method is like this:

class Item < ActiveRecord::Base
  def before_create
    # whatever
  end
end

By ActiveRecord 3.1.3 (used in Padrino 0.10.5), instance callback method support is gone. The supported way to do the above is:

class Item < ActiveRecord::Base
  before_create :set_defaults

  def set_defaults
    # whatever
  end
end

ActiveRecord (AR) had two or more incompatible ways to specify callbacks: instance methods (1st example) and callback queues (2nd example). If you have inherited AR classes (e.g. AdminUser is a child class of User, which is an AR) the instance method of the parent class is overridden and never called from child class instances. With callback queues, all callbacks are called. Current practice is described in ActiveRecord::Callbacks

In the Rails version of AmethystRSS.net, I'm using Rails 2.3.14 with instance callback methods, but not seeing any deprecation warnings. Not sure why.

:update Considered Harmful

Wednesday, October 26th, 2011

A common Rails 2 view helper call is something along the lines of link_to_remote "Delete this post", :update => dom_id(post), :url => { :action => "destroy", :id => post.id }. This sends an AJAX request to the server and replaces the contents of the DOM object with ID equal to the return value of dom_id(post) (e.g., “post_1234″) with the response. Fine unless there is more than one post on the page and the user can click the links faster than the server can respond. In Firefox, As Far As I Can Tell (AFAICT), both requests will replace both posts with the first response from the server. The second response is dropped on the floor.  Not good.

This situation is quite possible on Amethyst, e.g. “like” a bunch of posts as fast as you can.  Oddly, it is easier to do on my development laptop (no network latency) than in the production server on Amazon Web Services (AWS).  I am converting all :updates to treat the response as Javascript that explicitly enumerates the post to replace and the content to replace it with.

Clever Hack, Smelly Code

Wednesday, October 19th, 2011

A common operation in Amethyst is to retrieve the feed information from the database, and the number of unread articles in the feed. Operating in pure SQL I could do something like this SELECT feeds.*, COUNT(articles.id) AS count_articles FROM feeds JOIN articles ON feeds.id = articles.feed_id WHERE feeds.id = 1234 (note: not tested).  If is possible in Rails 2 with ActiveRecord to do something similar with feed = Feed.find(1234, :select => 'feeds.*, COUNT(articles.id) AS count_articles', :joins => :articles). And it will work. The article count will be in feed.attributes['count_articles']. But, if the find() becomes more complex, it may not. At some point, ActiveRecord will generate it’s own select and the COUNT(articles.id) will fall on the floor.

Clever, but brittle as all get out. A slight change to my calling code or any update to Rails could break, a definite code smell.  So I’ve ripped out all places where I did this.  Yeah it takes an additional trip to the database, but all or almost all of the necessary data is already sitting in the DB’s cache, so it’s in the low single digit millisecond range.  Not worth brittle code.

Conditional GET Requests

Tuesday, September 13th, 2011

Amethyst fetches updates to all subscribed RSS feeds every hour.  And most of the time, there is nothing new.  This is a waste of bandwidth and CPU usage, both Amethyst’s and the RSS feed server.  I briefly looked into Conditional GET support (only get the data if it has changed).  I didn’t dig very deep and everything I found was about supporting responses to Conditional GETs in Rails, not making requests.  Finally it annoyed me enough to dig a little deeper.

The most useful resource I found was a link to RESTful Web Services in Google Books.  It isn’t specific to Rails, but the examples are in Ruby.  I’m using EventMachine HTTP Request to read RSS feeds and it has last_modified and etag accessors. Just store them in your response handling code, add the stored values to the headers:

	request = HttpRequest.new(channel)
	headers = {}
	headers['If-Modified-Since'] = channel.last_modified if channel.last_modified
	headers['If-None-Match'] = channel.etag if channel.etag
	http = request.get :head => headers

and handle a 304 (Not Modified) response code the same as a successful response with no new posts.

This cut the CPU by 80%, i.e. CPU load is 20% of what it was. This may have been premature optimization/scaling, but it was very satisfying and it will reduce my Amazon Web Services bill slightly. The developer(s) is one of the stakeholders you need to keep happy too.

Demoing Amethyst tonight (May 9, 2011)

Monday, May 9th, 2011

I will be one of four people demoing their products at Bootstrap Austin tonight at 7-9pm.  It will be held at Link Coworking on Anderson near the Alamo Drafthouse, 2700 West Anderson Lane #205.  Leave yourself some additional time, I’m told it’s buried in the interior and not easy to find.

Reading RSS Feeds w/ EventMachine

Thursday, June 3rd, 2010

Switching from reading multiple RSS feeds sequentially with open-uri to effectively reading in parallel with EventMachine and em-http has cut the maximum elapsed (clock) time from 300 seconds to under 50 seconds and the average from 17 seconds to 5 seconds.  With the change I can double the frequency of feed refreshes without stepping on the next refresh.  This helps even out the load.

EventMachine and Ruby

Monday, May 17th, 2010

EventMachine (a Ruby implementation of the Reactor pattern) has helped solve the problem of timeouts cascading during serially downloading multiple RSS feed refreshes.  When there are too many timeouts, the refresh does not complete before the next scheduled refresh.  With EventMachine, the refreshes are effectively downloaded in parallel (asynchronous or non-blocking I/O) without the overhead of spawning multiple processes or threads.  And Ruby makes it easy with in-line callbacks (blocks), too easy.

In C and most procedural languages, callbacks are named functions, physically separate from where they are attached to a context to be executed later.  In C, I find it easier to remember that they are executed after, possibly long after, they are passed to the reactor.  The ease of use in Ruby is deceptive.  When a RSS feed refresh times out or is temporarily redirected (status 302), the context that setup up the reads and callbacks is long gone, yet it is just a few lines above it in the code.  I’ve resulted to comments in critical places to remind me that appearances are deceptive. so think before you commit.

There may be a clean way to immediately restart the RSS feed read of the temporary URL without duplicating code, but I haven’t found it.  Instead, the current code builds a list of temporary redirects as the reads complete.  After all first round RSS feed reads complete, the list of temporary redirects is read using the same code that did the initial refresh reads.  Takes longer than immediately starting an asynchronous read, but I can understand the code.

Some famous programmer said that debugging is twice as hard as programming.  If so, then our cleverest code is beyond our ability to debug it.

Amethyst Refresh Slowdowns, Part II

Monday, May 17th, 2010

There are additional scenarios where the RSS feed refresh falls behind.

Deleting words and word-pairs from the database that haven’t  been used in months is very uneven.  Sometimes nothing, sometimes several minutes of database thrashing.  Limiting how many are deleted at a whack and doing the deletes more often evens out the load.  This has ceased to be a problem.

Rebuilding the full text search index is normally done once a day.  It was scheduled to happen right after the database backup.  Moving it to start half an hour before the backup and immediately after a refresh completes gives it enough time to finish before the next refresh.

If a refresh starts a little late, that isn’t a problem unless it runs into the refresh after that.  So far, that hasn’t happened.  Eventually it will.  Hopefully I will have a better solution by then.

Database backups still frequently impact refreshes.  The impact is small, just a few  feeds, a few minutes late, after the bars close and before most of America is awake.  I am looking at alternative backup methods.

It has taken some digging to get past the usual suspects and fix or at least identify the real culprits, at least 3 out of the 5 Whys.  Spreading out the daily tasks isolates and has helped identify their impact.  Only backups remain as a yet-to-be-solved problem.

Amethyst Refresh Slowdowns

Thursday, May 6th, 2010

I think I have a better handle on the occasional times when Amethyst gets behind updating RSS feeds.  There are two interrelated issues.  The feed updates runs nice 2 (lower priority) so it doesn’t impact Web server responsiveness, and if it gets behind, it skips refreshes.  The latter is so it doesn’t go crazy  on the development laptop after resuming from hibernate.  The feed updates bog down during database backups (expected) and during the downloads of the backups to off-site  storage (unexpected).

Running the feed updates at normal priority helps, but does not eliminate the problem.  I’ve just rolled out some changes so the updates aren’t skipped unless it they are more than 15 minutes behind.  Now that I think about it, this interval can probably upped to the feed refresh period (1 hour).  The effect should be approximately the same.