Three articles I’ve read recently have altered my thinking about data structures, algorithms, and performance. They confirm my belief that the more you and your code knows about the structure of your data, the better. My misconception (or perhaps outdated conception) is that once in memory, access times are nearly the same regardless of location. Actually, current main memory (not cache) architecture behaves much like a disk, just orders of magnitude faster. Sequential access is much faster than random access. Google “computer memory row select column select” for details.
The first article, “Software Development for Infrastructure” in the January 2012 issue of Computer, is by Bjarne Stroustrop, the inventor of C++. He talks about a number of software issues, but the most surprising to me was his results from a program to add N random integers to a sorted list and then deleting random members of the list. He compared a vector (or array) implementation to a doubly linked list. I would expect the array version to be faster for small N. He tested up to N = 500,000 and the vector implementation was faster with the difference increasing with size. He used a linear search with both implementations, though a binary search on the array would be much faster. Preallocating space for the linked list helped, but the array version was still faster. This I did not expect.
He asked “How do I organize my code and data to minimize memory usage, cache misses, and so on?” His first order answer is:
- don’t store data unnecessarily
- keep data compact
- access memory in a predictable manner
It is the last point where the above example shines. The linked list is scattered all over memory, the array is alway compact and access patterns predictable.
Two other articles, sorting in C++: 3 times faster than C and Why we didn’t use a bloom filter illustrate that generalizations about programming language speed need to be tested. I think the analysis in the first article of why C++ was faster than C in this case is spot on. I am not convinced that the analysis in the second article is correct, the author has an attitude about the C++ Standard Template Library (STL). He blames Object-Oriented Programming (OOP) and vtables (used in C++ to implement virtual methods). I can think of no good reason to use heavy OOP nor virtual methods in his example. He may have bought into somebody’s dogma of using OOP everywhere and then rightly rejected it as inappropriate and then overgeneralized to rejecting it everywhere. Or may just used just a poor implementation of the STL.
I think his real speedups are coming from a custom solution that exactly matched his problem (the number of elements in a set intersection) instead of the more general solution of computing the set intersection and counting the elements in it. Note: his first solution was to use Redis for this. Convenient, easy, and I could have predicted ahead of time, dead slow. Building sets with millions of elements over the network is slow. Staying in C or C++ is going to be way faster. Beyond that, measure.