In an attempt to speed up parts of Amethyst, I have been adding indexes for every column that is searched on. For example, to find old, unused word/word-pairs, the following SQL statement is used.
DELETE FROM tokens WHERE 162129586585337856 <= fnv AND fnv < 166633186212708352 AND updated_at<'2010-06-22 13:43:06' AND occurrences=0.
fnv is a hash of the word(s) and the range is used to spread the deletes out over a week.) In a mis-guided attempt to speed it up, I added indexes on
occurrences. The result was that adding a single feed could take as much as 15 minutes.
Digging into the MySQL docs, asking questions in the Ruby on Rails forums, and using MySQL’s
EXPLAIN SELECT ... command revealed that generally only one index is used in a query (
fnv in this case). For Amethyst, all actions on this table used the same index and the other indexes where just slowing down writes to the table. I’ve read it several times, indexes can speed up reads, but they slow down writes. Deleting all the indexes on the
tokens table and adding back the one index that is actually used as the primary key cut the add time for the feed mentioned above from 15 minutes to 2 minutes.
Before you rush off and do something similar, check it out on your development machine. The
tokens table has 16 million rows. The delete and add index changes took 6 hours. 6 hours where the table was in read-only mode and the application essentially unusable.