Archive for the ‘RoR’ Category

Amethyst’s Fate

Wednesday, June 20th, 2012

As mentioned in Beauty, a previous post, I’m shutting down Amethyst on the Internet. I’m my only current customer and it’s taking time and money to keep it running that can be better spent elsewhere. It may not be a viable commercial product, but it continues to be very useful to me. It is the major way I read news. If I can’t drop an RSS feed for some news/information source into Amethyst, I’m much less likely to read it regularly.

Development will be ongoing.  With just one user profile, direction is clearer and fewer tradeoffs need be juggled.  Rails 3 has outgrown my needs but Rails 2 is not getting any updates, bug fixes, or security fixes.  The first two I can deal with myself, but generally not the last.  On the Internet, that’s a problem.  On my own laptop, it’s a problem I can live with.

Padrino is a lighter weight Web framework built on top of Sinatra.  It’s somewhere around the functionality of Rails 1 when I started.  And I know Web frameworks better now so I can fill in most of the holes.  Some of the speed annoyances of Rails 2 are gone on Padrino.  Or at least on the partial conversion to Padrino already done.

I’ve considered open sourcing Amethyst, however as noted in Beauty, much of the code is not something I’m proud of.  Unless there is a request, I won’t publish it.  I am working on making the Padrino port code I can be proud of and am willing to show in public.  Currently it is running in split mode, the backend that reads the subscribed RSS feeds is the old Rails 2 code.  The front end is partially ported to Padrino.  I’ll push it to GitHub when a minimal front end and complete backend are working in Padrino.

I’ll continue blogging here about the Padrino port.

Software Speed, Programming Languages, and Hardware

Saturday, March 17th, 2012

Three articles I’ve read recently have altered my thinking about data structures, algorithms, and performance. They confirm my belief that the more you and your code knows about the structure of your data, the better. My misconception (or perhaps outdated conception) is that once in memory, access times are nearly the same regardless of location. Actually, current main memory (not cache) architecture behaves much like a disk, just orders of magnitude faster. Sequential access is much faster than random access. Google “computer memory row select column select” for details.

The first article, “Software Development for Infrastructure” in the January 2012 issue of Computer, is by Bjarne Stroustrop, the inventor of C++. He talks about a number of software issues, but the most surprising to me was his results from a program to add N random integers to a sorted list and then deleting random members of the list. He compared a vector (or array) implementation to a doubly linked list. I would expect the array version to be faster for small N. He tested up to N = 500,000 and the vector implementation was faster with the difference increasing with size. He used a linear search with both implementations, though a binary search on the array would be much faster. Preallocating space for the linked list helped, but the array version was still faster. This I did not expect.

He asked “How do I organize my code and data to minimize memory usage, cache misses, and so on?” His first order answer is:

  • don’t store data unnecessarily
  • keep data compact
  • access memory in a predictable manner

It is the last point where the above example shines.  The linked list is scattered all over memory, the array is alway compact and access patterns predictable.

Two other articles, sorting in C++: 3 times faster than C and Why we didn’t use a bloom filter illustrate that generalizations about programming language speed need to be tested. I think the analysis in the first article of why C++ was faster than C in this case is spot on. I am not convinced that the analysis in the second article is correct, the author has an attitude about the C++ Standard Template Library (STL). He blames Object-Oriented Programming (OOP) and vtables (used in C++ to implement virtual methods). I can think of no good reason to use heavy OOP nor virtual methods in his example. He may have bought into somebody’s dogma of using OOP everywhere and then rightly rejected it as inappropriate and then overgeneralized to rejecting it everywhere. Or may just used just a poor implementation of the STL.

I think his real speedups are coming from a custom solution that exactly matched his problem (the number of elements in a set intersection) instead of the more general solution of computing the set intersection and counting the elements in it. Note: his first solution was to use Redis for this. Convenient, easy, and I could have predicted ahead of time, dead slow. Building sets with millions of elements over the network is slow. Staying in C or C++ is going to be way faster. Beyond that, measure.

More on the Future

Friday, February 24th, 2012

The Sun is Setting on Rails-style MVC Frameworks points out a lot of the same problems I am encountering with Amethyst and suggests a direction to move. Including pointing out what isn’t there yet in terms of API styles and support libraries/protocols.

New Shiny versus Getting Things Done

Friday, February 24th, 2012

As Your favourite programming language is not good enough points out, don’t fall in love with a programming language and stay with it until “death do you part”.

Rails Went Off The Rails: Why I’m Rebuilding Archaeopteryx In CoffeeScript is a longer rant on fashion, framework developers versus framework users interests, and when to get off the train to go in a new direction. I largely agree that Rails 3 is heading in a direction I don’t want to go. Or my application doesn’t need.

I’ve started porting it to Padrino, but I wonder if that the best course. Much has changed in the 5-6 years I’ve been working on Amethyst. Perhaps moving to a different architecture, both server and client side might be a good idea.

Faraday?

Monday, February 6th, 2012

The most troublesome, ugly, bug infested code in Amethyst fetches RSS feeds in the face of Internet congestion, down servers, buggy RSS implementations, incorrect encodings, and so on. I’ve tried several times to clean it up, but it is dealing with messes, so it is messy. Mess is just part of its job. I’ve seen BMW repair shops where you could sit on the floor and eat a picnic without worrying about your clothes and the food safety. But I’ve never seen a garbage truck that didn’t smell.

Faraday is a library by Mislav, the author of mislav-will_paginate, a gem I (and many others) use for pagination in Ruby on Rails. It is a very nice package. Faraday is middleware for making HTTP request, the flip side of Rack, middleware for handling HTTP requests. This is an intriguing idea. I don’t think I can replace all my ugly RSS fetch code, but it looks like I can replace a lot of it and break the remainder up into smaller, more understandable bits.

ActiveRecord callback methods deprecated in 2.3.8

Saturday, January 21st, 2012

According to this, instance callback instance were deprecated, starting in Rails 2.3.8. An instance callback method is like this:

class Item < ActiveRecord::Base
  def before_create
    # whatever
  end
end

By ActiveRecord 3.1.3 (used in Padrino 0.10.5), instance callback method support is gone. The supported way to do the above is:

class Item < ActiveRecord::Base
  before_create :set_defaults

  def set_defaults
    # whatever
  end
end

ActiveRecord (AR) had two or more incompatible ways to specify callbacks: instance methods (1st example) and callback queues (2nd example). If you have inherited AR classes (e.g. AdminUser is a child class of User, which is an AR) the instance method of the parent class is overridden and never called from child class instances. With callback queues, all callbacks are called. Current practice is described in ActiveRecord::Callbacks

In the Rails version of AmethystRSS.net, I'm using Rails 2.3.14 with instance callback methods, but not seeing any deprecation warnings. Not sure why.

:update Considered Harmful

Wednesday, October 26th, 2011

A common Rails 2 view helper call is something along the lines of link_to_remote "Delete this post", :update => dom_id(post), :url => { :action => "destroy", :id => post.id }. This sends an AJAX request to the server and replaces the contents of the DOM object with ID equal to the return value of dom_id(post) (e.g., “post_1234″) with the response. Fine unless there is more than one post on the page and the user can click the links faster than the server can respond. In Firefox, As Far As I Can Tell (AFAICT), both requests will replace both posts with the first response from the server. The second response is dropped on the floor.  Not good.

This situation is quite possible on Amethyst, e.g. “like” a bunch of posts as fast as you can.  Oddly, it is easier to do on my development laptop (no network latency) than in the production server on Amazon Web Services (AWS).  I am converting all :updates to treat the response as Javascript that explicitly enumerates the post to replace and the content to replace it with.

Clever Hack, Smelly Code

Wednesday, October 19th, 2011

A common operation in Amethyst is to retrieve the feed information from the database, and the number of unread articles in the feed. Operating in pure SQL I could do something like this SELECT feeds.*, COUNT(articles.id) AS count_articles FROM feeds JOIN articles ON feeds.id = articles.feed_id WHERE feeds.id = 1234 (note: not tested).  If is possible in Rails 2 with ActiveRecord to do something similar with feed = Feed.find(1234, :select => 'feeds.*, COUNT(articles.id) AS count_articles', :joins => :articles). And it will work. The article count will be in feed.attributes['count_articles']. But, if the find() becomes more complex, it may not. At some point, ActiveRecord will generate it’s own select and the COUNT(articles.id) will fall on the floor.

Clever, but brittle as all get out. A slight change to my calling code or any update to Rails could break, a definite code smell.  So I’ve ripped out all places where I did this.  Yeah it takes an additional trip to the database, but all or almost all of the necessary data is already sitting in the DB’s cache, so it’s in the low single digit millisecond range.  Not worth brittle code.

Conditional GET Requests

Tuesday, September 13th, 2011

Amethyst fetches updates to all subscribed RSS feeds every hour.  And most of the time, there is nothing new.  This is a waste of bandwidth and CPU usage, both Amethyst’s and the RSS feed server.  I briefly looked into Conditional GET support (only get the data if it has changed).  I didn’t dig very deep and everything I found was about supporting responses to Conditional GETs in Rails, not making requests.  Finally it annoyed me enough to dig a little deeper.

The most useful resource I found was a link to RESTful Web Services in Google Books.  It isn’t specific to Rails, but the examples are in Ruby.  I’m using EventMachine HTTP Request to read RSS feeds and it has last_modified and etag accessors. Just store them in your response handling code, add the stored values to the headers:

	request = HttpRequest.new(channel)
	headers = {}
	headers['If-Modified-Since'] = channel.last_modified if channel.last_modified
	headers['If-None-Match'] = channel.etag if channel.etag
	http = request.get :head => headers

and handle a 304 (Not Modified) response code the same as a successful response with no new posts.

This cut the CPU by 80%, i.e. CPU load is 20% of what it was. This may have been premature optimization/scaling, but it was very satisfying and it will reduce my Amazon Web Services bill slightly. The developer(s) is one of the stakeholders you need to keep happy too.

Demoing Amethyst tonight (May 9, 2011)

Monday, May 9th, 2011

I will be one of four people demoing their products at Bootstrap Austin tonight at 7-9pm.  It will be held at Link Coworking on Anderson near the Alamo Drafthouse, 2700 West Anderson Lane #205.  Leave yourself some additional time, I’m told it’s buried in the interior and not easy to find.