Performance Is a Science

For 2,000 years scholars held that heavier objects fall faster than lighter ones, partly because Aristotle couldn’t be bothered to take 2 minutes to experiment. Hell, he even wrote that men have more teeth than women. Isn’t that crazy? And yet, people often rely on this kind of fact-free reasoning to arrive at conclusions about computer performance, among other things. Worse, they spend their IT budgets or sacrifice code clarity based on these flawed ideas. In reality computers are far too complex for anyone to handle performance problems by “reasoning” alone.

Galileo On Pisa (This story is not true by the way)

Think about a routine in a modern jitted language. Right off the bat you face hidden magic like type coercion, boxing, and unboxing. Even if you know the language intimately, unknowns are introduced as your code is optimized first by the compiler, then again by the JIT compiler. It is then fed to the CPU, where optimizations such as branch prediction, memory prefetching and caching have drastic performance implications. What’s worse, much of the above can and does change between different versions of compilers, runtimes, and processors. Your ability to predict what is going to happen is limited indeed.

To take another example, consider a user thinking of RAID-0 to boost performance. Whether there are any gains depends on a host of variables. What are the patterns of the I/O workload? Is it dominated by seeks and random operations, or is there a lot of streaming going on? Reads or writes? How does the kernel I/O scheduler play into it? How smart are the RAID controller and drivers? How will a journaling file system impact performance given the need for write barriers? What stripe sizes and file system block sizes will be used? There are way too many interdependent factors and interactions for speculative analysis. Even kernel developers are stumped by surprising and counterintuitive performance results.

Measurement is the only way to go. Without it, you’re in the speculation realm of performance tuning, the kingdom of fools and the deluded. But even measurement has its problems. Maybe you’re investigating a given algorithm by running it thousands of times in a row and timing the results. Is that really a valid test? By doing so you are measuring a special case where the caches are always hot. Do the conclusions hold in practice? Most importantly, do you know what percentage of time is spent in that algorithm in the normal use of the application? Is it even worth optimizing?

LHC - CMS Detector

Or say you’ve got a fancy new RAID-0 set up. You run some benchmark that writes large globs of data to the disk and see that your sustained write throughput is twice that of a single disk. Sounds great, too bad it has no bearing on most real-world workloads. The problem with the naive timing test and the benchmark is that they are synthetic measurements. They are scarcely better than speculation.

To tackle performance you must make accurate measurements of real-world workloads and obtain quantitative data. Thus we as developers must be proficient using performance measurement tools. For code this usually means profiling so you know exactly where time is being spent as your app runs. When dealing with complex applications, you may need to build instrumentation to collect enough data. Tools like Cachegrind can help paint a fuller picture of reality.

For website load times and networks you might use tools like WireShark and Fiddler, as Google did for GMail. In databases, use SQL profiling to figure out how much CPU, reading, and writing each query is consuming; these are more telling than the time a query takes to run since the query might be blocked or starved for resources, in which case elapsed time doesn’t mean much. Locks and who is blocking who are also crucial in a database. When looking at a whole system, use your OS tools to record things such as CPU usage, disk queue length, I/Os per second, I/O completion times, swapping activity, and memory usage.

In sum, do what it takes to obtain good data and rely on it. I’m big on empiricism overall, but in performance it is everything. Don’t trust hearsay, don’t assume that what held in version 1 is still true for version 2, question common wisdom and blog posts like this one. We all make comical mistakes, even Aristotle did. Naturally, it takes theory and analysis to decide what to measure, how to interpret it, and how to make progress. You need real-world measurement plus reasoning. Like science.

10 Comments