Lucky to be a Programmer

For the past few weeks I’ve been working with a fellow developer on a project that required an all-out programming effort. It’s done now, so we’re back to a regular schedule, but when people hear about the crazy hours they often say they’re sorry. They really shouldn’t be. I would never do this often, or for long periods, or without proper compensation if done for an employer, but the truth is that these programming blitzkriegs are some of my favorite periods in life. Under the right conditions, writing software is so intensely pleasurable it should be illegal.

Many programmers relate to this, but others are taken aback when they hear it. I think it’s because institutions are so good at squeezing the fun out of everything. It’s appalling for example how schools can take the most vibrant topics and mangle them into formulaic, mediocre slog. And so it is for programming. Many corporations turn an inherently rewarding experience into something people just barely stomach in exchange for a paycheck.

That’s too bad. Few things are better than spending time in a creative haze, consumed by ideas, watching your work come to life, going to bed eager to wake up quickly and go try things out. I am not suggesting that excessive hours are needed or even advisable; a sane schedule is a must except for occasional binges. The point is that programming is an intense creative pleasure, a perfect mixture of puzzles, writing, and craftsmanship.

Programming offers intriguing challenges and ample room for invention. Some problems are investigative and reductionist: Why is this code running slowly? What on earth is causing that bug? Others are constructive, like devising algorithms and architectures. All of them are a delight if you enjoy analytical work, immersed in a world full of beasts like malware, routers, caches, protocols, databases, graphs, and numbers.

This analytical side is what most people associate with programming. It does make it interesting, like a complex strategy game. But in most software the primary challenge is communication: with fellow programmers via code and with users via interfaces. By and large, writing code is more essay than puzzle. It is shaping your ideas and schemes into a coherent body; it is seeking clarity, simplicity and conciseness. Both code and interfaces abound with the simple joy of creation.

Another source of pleasure is that under certain conditions, beauty arises in programming. It may sound like bullshit but it’s real, the kind of thing that makes your day better. Take for example Euclid’s 2-line proof that prime numbers are infinite. I think many would find it beautiful - so succint and such a fascinating result. This is the beauty of math, cold and austere, and it pervades software. It is in clever algorithms like quicksort, in the sources of kernels and compilers, in elegant exploits and in the tricks we pull to solve everyday problems. When you see these solutions, be it famous algorithm or mundane trick, you smile and think “how smart” and it feels good. How noble in reason!

A non-math sort of beauty also exists in code, analogous to eloquence in discourse. It’s present in well-factored software that does a lot with little code, in short and crisp methods, in well-done architectures. Some languages make this hard and not all programmers produce it, but it’s a joy to read and work on such code. If you’re working in an expressive language with coworkers whose code you enjoy, it happens often enough to brighten things up.

Now for craftsmanship. In a sense software is abstract - where does program behavior exist but in our minds? Yet we call it building software for a reason. Programs are shaped feature by feature, architectures start out as scaffolds and grow, user interfaces come together, bugs are fixed and hotspots are optimized to make things run fast. Software provides a deeply satisfying sense of craft. We build stuff out of pure ideas and then get to watch it working to solve real problems and make people a little better off. Or far better off, as the case may be.

Take Biology. Despite nearly 400 years of scientific revolution, Biology has been unable to deliver on crucial problems like effective cures for viral infections or cancer. Some of our best progress, like antibiotics, has been due to chance and random experimentation. You start a clinical trial for a hypertension drug and suddenly - whoah - all your subjects have hard-ons! Viagra is born. To be sure, chance plays a role in all endeavours, but Physics and Chemistry have a comprehensive theoretical basis powering systematic improvements, whereas Biology has been largely confined to kludges. Wanna treat cancer? Here, blast the patient with radiation and poison and hopefully the cancer will die first. They’re brilliant kludges, and I’m happy to have them, but it’s a far cry from the precision we’ve had elsewhere.

Software is changing that. Just barely 50 years ago the shape of DNA was being discovered, but now anyone can browse and download hundreds of complete genome sequences. Or look up thousands of genes (DLEC1 for a random example), complete with nucleotide sequence, amino-acid sequence for expressed proteins, literature mentioning the gene, you name it! Or you can search vast gene and protein databases for nucleotide or amino-acid sequences, perhaps after sequencing something in ever-cheaper devices, and get a comprehensive report on the match. It doesn’t matter if they’re exact, because the algorithm in BLAST, the standard sequence search tool, delivers partial maches across databases and species, scored by match likelihood. These advances will enable massive breakthroughs in medicine. Biology is entering a new era, like Physics in the 18th century, propelled by software.

Yea, sure, biologists have a minor role :P, but we in computing increasingly power major developments in science, culture, and business. When a third-world kid looks up a Wikipedia entry, it’s our work too! We wrote the RFCs and the networking stacks, the browser and MediaWiki, the OSes and the HTTP servers. Not to mention a lot of the Wikipedia entries, but since a few were on company time I’ll leave them aside. The influence of technologists goes beyond bits and bytes: it was a programmer who invented wikis and our community started blogs. Henry Mencken pointed out correctly that “freedom of the press is limited to those who own one”. It’s a pity he’s not around to watch our creations break down the stifling conformity and cozy subservience of professional journalism. Less glamorously but to great benefit our applications have delivered steep productivity gains to businesses across the economy. These are a few examples in a long list.

Three years ago, when I finished my undergrad (after being a programmer for many years), I was about to enter med school. At that point, a couple of negative experiences had me somewhat burned out on computer work. I’m happy I stuck with it. I’m still interested in biomedical research, but if I were to get involved I’d rather come in from the software angle, because frankly it’s too much fun to pass on. My mom thinks I’m a typist but oh well.

If you find yourself stuck in a place that’s killing your innate passion for technology, by all means, move the hell on! Don’t stay put while your enthusiasm is slowly drained. It’s hard to find motivated people to hire so you’ve got a major asset already; there are plenty of employers - and companies to be started - that will better suit you. For people who think they might like programming, your mileage may vary, but I highly recommend it as a career. Not only is the outlook bullish on the job front, but as the role of software grows in society we’ll see more exciting and beneficial changes delivered by technology. I’m delighted to be along for the ride as constantly my art and craft I try to master.

PS: thanks for putting up with the irregular posting schedule. The plan is to stick to regular posting now that things have calmed down. And if you like the song, download the mp3 because the YouTube audio doesn’t do it justice.

Of Aviation Crashes and Software Bugs

I just found out that Stephen Colbert’s father and two brothers died in a plane crash on September 11, 1974. Maybe everybody knows this - I’m not sure because I haven’t watched TV in years, so I live in a sort of alternate reality. My only exposure to TV are YouTube clips of Jon Stewart, Colbert, and lots of Dora The Explorer (Jon Stewart is my favorite but Swiper The Fox is a close second, don’t tell my kids though). Now, I may not have TV to keep me informed, but I do read aircraft accident reports and transcripts from cockpit voice recorders. That doesn’t help in small talk with the neighbors, but you read some amazing stuff.

For example, in the accident that killed Colbert’s father the pilots were chatting about politics and used cars during the landing approach. They ignored their altitude and eventually ran the plane into the ground about 3 miles away from the destination airport. The report by the National Transportation Safety Board (NTSB) states that “both crew members [first officer and captain] expressed strong views and mild aggravation concerning the subjects discussed.” Since the full CVR transcript is not available we’re free to imagine a democrat and a republican arguing amid altitude alerts.

Aviation accidents are both tragic and fascinating; few accidents can be attributed to a single factor and there is usually, well, a series of unfortunate events leading to a crash. The most interesting CVR transcript I’ve read is Aeroperu 603. It covers an entire flight from the moment the airplane took off with its static ports taped over - causing airspeed, altitude, and vertical speed indicators to behave erratically and provide false data - until the airplane inverted into the Pacific Ocean after its left wing touched the sea, concluding a mad, slow descent in which crew members were bombarded with multiple, false, and often conflicting flight alerts. The transcript captures the increasing levels of desperation, the various alerts, and the plentiful cussing throughout the flight (there’s also audio with subtitles). As you read it your brain hammers the question: how do we build stuff so things like this can’t happen?

Aeroperu 603 Static Ports
Static ports covered by duct tape in Aeroperu 603

The immediate cause of the Aeroperu problem was a mistake by a ground maintenance worker who left duct tape over the airplane’s static ports. But there were a number of failures along the way in maintenance procedures, pilot actions, air traffic control, and arguably aircraft design. This is where agencies like the NTSB and their counterparts abroad do their brilliant and noble work. They analyze the ultimate reason behind each error and failure and then issue recommendations to eradicate whole classes of problems. It’s like the five whys of the Toyota Production System coupled with fixes and on steroids. Fixes are deep and broad, never one-off band aids.

Take the Colbert plane crash. You could define the problem as “chatter during landing” and prohibit that. But the NTSB went beyond, they saw the problem as “lack of professionalism” and issued two recommendations to the FAA with a series of concrete steps towards boosting professionalism in all aspects of flight. Further NTSB analysis and recommendations culminated a few years later in the Sterile Cockpit Rule, which lays down precise rules for critical phases of flight including take off, landing, and operations under 10,000 feet. Each aviation accident, error, and causal factor spurs recommendations to prevent it, and anything like it, from ever happening again. Because the solutions are deep, broad, and smart we have achieved remarkable safety in flight.

In other words, it’s the opposite of what we do in software development and computer security. We programmers like our fixes quick and dirty, yes sirree, “patches” we call them. It doesn’t matter how critical the software is. Until 1997 Sendmail powered 70% of the Internet’s reachable SMTP servers, qualifying it as critical by a reasonable measure (its market share has since decreased). What was the security track record? We had bug after bug after bug, many with disastrous security implications, and all of them fixed with a patch as specific as possible, thereby guaranteeing years of continued new bugs and exploits. Of course this is not as serious as human life, but for software it was pretty damn serious: these were bugs allowing black hats to own thousands of servers remotely.

And what have we learned? If you fast forward a few years, replace “Sendmail” with “WordPress” and “buffer overflow” with “SQL injection/XSS”, cynics might say “nothing.” We have different technologies but the same patch-and-run mindset. I upgraded my blog to WordPress 2.5.1 the other day and boy I feel safe already! Security problems are one type of bug, the same story happens for other problems. It’s a habit we programmers have of not fixing things deeply enough, of blocking the sun with a sieve.

We should instead be fixing whole classes of problems so that certain bugs are hard or impossible to implement. This is easier than it sounds. Dan Bernstein wrote a replacement for Sendmail called qmail and in 1997 offered a $500 reward for anyone who found a security vulnerability in his software. The prize went unclaimed and after 10 years he wrote a paper reviewing his approaches, what worked, and what could be better. He identifies only three ways for us to make true progress:

  1. Reduce the bug rate per line of code
  2. Reduce the amount of code
  3. Reduce trusted code (which is different than least privilege)

This post deals only with 1 above, I hope to write about the other two later on. Reducing the bug rate is a holy grail in programming and qmail was very successful in this area. I’m sure it didn’t hurt that Bernstein is a genius, but his techniques are down to earth:

For many years I have been systematically identifying error-prone programming habits—by reviewing the literature, analyzing other people’s mistakes, and analyzing my own mistakes—and redesigning my programming environment to eliminate those habits. (…)

Most programming environments are meta-engineered to make typical software easier to write. They should instead be meta-engineered to make incorrect software harder to write.

In the 1993 book Writing Solid Code Steve Maguire gives similar advice:

The most critical requirement for writing bug-free code is to become attuned to what causes bugs. All of the techniques in this book are the result of programmers asking themselves two questions over and over again, year after year, for every bug found in their code:

  • How could I have automatically detected this bug?
  • How could I have prevented this bug?

For a concrete example, look at SQL Injection. How do you prevent it? If you prevent it by remembering to sanitize each bit of input that goes to the database, then you have not solved the problem, you are using a band aid with a failure rate - it’s Russian Roulette. But you can truly solve the problem by using an architecture or tools such that SQL Injections are impossible to cause. The Ruby on Rails ActiveRecord does this to some degree. In C# 3.0, a great language in many regards, SQL Injections are literally impossible to express in the language’s built-in query mechanism. This is the kind of all-encompassing, solve-it-once-and-for-all solution we must seek.

It’s important to take a broad look at our programming environments to come up with solutions for preventing bugs. This mindset matters more than specific techniques; we’ve got to be in the habit of going well beyond the first “why”. Why have we wasted hundreds of thousands of man hours looking for memory leaks, buffer overflows, and dangling pointers in C/C++ code? It wasn’t just because you forgot to free() or you kept a pointer improperly, no. That was a symptom. The reality is that for most projects using C/C++ was the bug, it didn’t just facilitate bugs. We can’t tolerate environments that breed defects instead of preventing them.

Multi-threaded programming is another example of a perverse environment where things are opposite of what they should be: writing correct threading code is hard (really hard), but writing threading bugs is natural and takes no effort. Any design that expects widespread mastery of concurrency, ordering, and memory barriers as a condition for correctness is doomed from the start. It needs to be fixed so that bug-free code is automatic rather than miraculous.

There are a number of layers that can prevent a bug from infecting your code: software process, tools, programming language, libraries, architecture, unit tests, your own habits, etc. Troubleshooting this whole programming stack, not just code, is how we can add depth and breadth to our fixes and make progress. The particulars depend on what kind of programming you do, but here are some questions that might be worth asking, in the spirit of the questions above, when you find a bug:

As airplanes still crash we’ll always have our bugs, but we could do a lot better by improving our programming ecosystem and habits rather than just fixing the problem of the hour. The outstanding work of the NTSB is great inspiration. I’m still scared of flying though - think of all the software in those planes!

Programming Language Jobs and Trends

In the last entry I argued that while learning programming languages comes at a high cost, good programmers should be proficient in multiple languages. I think of programmers as having language sets, based on the idea that knowing one language is not enough for a professional. The task then is to pick the language set wisely to minimize learning time and maximize the benefits to your career. Like other programmers I enjoy toying with different languages, but I’m conservative about fully picking up a language because there’s too much process loss involved. I go for the minimal language set. This is a look at various languages from this point of view; I hope it’s useful to other programmers.

There are many criteria I find important in a programming language, from job market to whether it’s fun. This post looks at jobs, job trends, and overall trends for languages. First, here is data for number of jobs per language across the United States, measured roughly using Dice.com searches.

Jobs per language

These are rough ballparks, but they do reflect the relative demand for each language. I couldn’t search cleanly for C and ML so they’re out. Haskell, OCaml, and Rebol had fewer than 10 jobs each. Recruiting for some of these languages (and for LISP) often happens in a more targeted way. Also, JavaScript is useful nearly everywhere nowadays, so the number of jobs understates its importance. Anyway, that’s them results. Without knowing the supply of programmers for each language, it’s hard to say anything about the relative difficulty of landing a job. But that is not the point; after all, most programmers who spend time on programming blogs get jobs easily (at least in the current markets).

An abundance of jobs is important because it gives you choices. You’re more likely to find a job or contract that better suits you. The more options you have for jobs, the more likely you’ll find the one with the telecommuting, the long vacation, the fired-up team, the interesting project or the right industry. Above all, you have a better shot at working with non-assholes. Martin Fowler says that early in his career he decided that he "wouldn’t work with unpleasant people, however capable they might be", since people matter most. Amen. There’s no better insurance against assholes than multiple job offers or clients.

Java, C++, and C# clearly take the market demand cake. Java is peculiar in that most jobs require some significant experience in some other technology, like IBM WebSphere or BEA WebLogic. So the market is fragmented. By contrast, the C# market is more monolithic: people use whatever Microsoft gives them. There are pros and cons for each one. The main downside for Java is that most programmers qualify for only a subset of jobs, whereas if you know C#/.NET most Microsoft shops are viable. The flip side is that as a Java developer you get to choose a lot of the technologies you use, while for C# you may have to use whatever ships out of the box (sophisticated teams excepted). Java programmers can find sweet contract rates if they know the right stuff. The data at RealRates.com seems to support this (though it looks like they stopped updating the site). C# programmers are more of a commodity. This drives down rates but contributes to platform adoption. For what it’s worth, here’s job growth measured by Indeed:

java,c#,c++ Job Trends graph

The C# growth surprised me. If the data is accurate, that’s some striking growth for an established language. In script land, Indeed’s data shows Ruby growing at break-neck rates:

python,ruby,perl,php Job Trends graph

Update: these Indeed charts show growth, not absolute numbers. They are relative: C# and Ruby are growing faster, but in absolute numbers they’re below their counterparts. If you click on these charts, you’ll be taken to the Indeed web site where you can plot absolute numbers if you’re so inclined.

Salary data is less relevant. You can’t reason much based on the free online reports (from Indeed, Salary.com, etc.) because the data is dodgy and not broken down by relevant factors. Frankly, language is not the determining factor of salary and contract rates, so there’s no point sweating it. Some specific technologies might command a premium, but it’s hard to generalize to a language. Dr. Al Lee from Payscale.com has a post discussing programming salary comparisons. His discussion is insightful but I would not base any decisions on income statistics. Too much of it is up to you, your negotiation skills, employment setup (employee, brokered contract, direct contract), experience, local market, etc. If you really want some numbers, I’ve set up an Indeed Salary Search for the major languages.

Job market trends don’t fully capture the feeble whims of us programming folk. There are other interesting ways to look at mind share and what might be coming down the pike. O’Reilly published trends based on book sales here, but that’s about a year and a half old. Google Trends yields interesting information, complementary to the Job Trends feature at Indeed. Based on Google Trends you can see apparent decline in PHP, the rise of Ruby over Perl and Python, and plunges in C++ and J2EE. Financial results for Q4 2007 show strong server revenue growth for both Linux (11.6% year-over-year) and Microsoft (6.9% year-over-year). Microsoft’s Q1 2008 was impressive. Here are some of the Google Trends:

Ruby, Perl, Python

PHP, Java, C#

C#, Visual Basic

I’d take trend analysis with a grain of salt; yet the direction of movement looks consistent across different sources of data. It’s also consistent with the idea one gets from reading blogs and talking to colleagues. Namely, the rise of Ruby among the scripting languages, a relative decline of Java and PHP, and C# moving steadily. I’d take these trends into account when deciding on a language to learn. But a quick look at the job numbers for COBOL should put things in context. There’s no urgency and it’s only one factor.

So much for the market. Picking a language based on market statistics alone would be like choosing your profession based on the projections in the Occupational Outlook Handbook. So next time I’ll write about the languages themselves and where I think they fit within the current programming landscape. That way I get sleep and this post stays manageable.

Language Dabbling Considered Wasteful

[Update: I have renamed this post. The original name, "New Languages Considered Harmful", was supposed to be a tongue-in-cheek way to get people thinking. I chose it playing off of the reference to Dijkstra later in the post. It's absurd when taken literally: every language was once "new", and what exactly should we stick with? ALGOL? Fortran? Analytical engine punch cards? Unfortunately it came across as incendiary, which was not my intention. This post is not about computer science education: I believe it's important for programmers to be exposed to a variety of languages and paradigms as part of their education. It is about how much professionals get from learning new languages. There are enormous productivity gains one gets from sticking with a small set of orthogonal languages; moreover, dabbling in languages is overrated as a way to get better at programming (in particular, when compared to reading high-quality source code). Plus there's a world of stuff to learn, some of which I think has better returns than a new language for most programmers. Examples of these "language sets" could be C/Ruby/LISP, C#/F#/JavaScript, C++/Python/Java, etc. The sets are not static, but in my experience there's seldom good reason for change. In my own personal set I've had two changes in the last 7 years: Ruby displaced Perl and JavaScript got added. The original article is below in all of its flame-inducing glory.]

Learning new programming languages is often a waste of time for professional programmers. It may be a fun waste of time (i.e., a hobby), but it’s a waste nonetheless. If you do it for pleasure, then great, but profit is scarce. Pointing this out among good programmers is heresy: even the pragmatic programmers, whose teachings are by and large excellent, suggest we should learn one new programming language every year. That’s rubbish.

The theory is that by learning a new language you "expand your mind" and become "a better programmer". Right. By that kind of argument we should all be eating LSD (don’t). In reality learning a new language is a gritty business in which most of the effort is spent on low-value tasks with poor return on time invested. You need to get to know the libraries, struggle with the environment, find or tune a text editor, look for tools, and so on. Most of the effort has to do with housekeeping chores, which are surely not mind-expanding by anyone’s measure. If you hope to be productive in the new language, things are even bleaker: proficiency has less to do with the language itself than with the myriad technologies you must master to use it effectively.

Even core language learning offers dubious return. How much does it really help to learn a new syntax? How does it expand your mind to learn new operator precedence quirks? Much of what constitutes a language is lexical and syntactical bureaucracy. Worse, you’re learning absolutely nothing about fundamental aspects of computer science. No algorithms, no operating systems, no compiler theory, no math, no AI. If you’re an undergrad, then you should have time to pick up languages on the side while learning all that, of course. But a professional is making a trade-off: what else could you learn with that time? We’re better off studying business, security, usability, architecture, software estimation, and so on, rather than spending time with a different language every year.

If your goal is better programming, you will learn far more from reading high-quality code bases in your current languages than from a new language. Go read top-notch code in the languages you know already; it’ll teach you techniques and style quickly, plus different ways of thinking about problems, with the added bonus that you can actually use what you learn. You can also understand a lot about programming languages in general (issues like typing, scoping, functional vs. imperative) by reading a good book.

There’s another pernicious effect to language hopping: it hurts your momentum. Every time you rely on your current languages you get a little better. Not in a fluffy expand-your-mind way, but in a concrete way. You learn more about your libraries, you set up a new macro in the editor, you have a chance to use that new language feature. Scott Hanselman argues that learning a new language is sharpening your saw, but I see it as neglecting your half-sharpened saw while playing with the dull, new, shiny one. The upfront cost is not the only one either. It’s better to have 3 razor-sharp saws than 8 so-so ones. Each new language you add to your toolbox is making it harder for you to become furiously productive in any given language.

Clipper manual
Forget Ruby - Here’s Clipper!

Yet, any programmer worth their weight in silicon must know multiple languages. Sometimes the new saw is of a different type altogether, and it’s worth having. Right off the bat there are major obvious reasons. Different systems or parts of a system call for different languages; that’s been true in any environment I’ve ever worked in. For a while this was mostly due to speed and level of control. My first apps were written for MS-DOS in Clipper, which was a database-oriented rapid development language. Fast to develop, but no power. Soon enough we wanted to add features that called for C and assembly. Using C we could write terminate and stay resident (TSR) programs and spice up our apps with features no one else had. Sometimes the issue was not so much power, but speed. There have been many happy marriages to deal with this: Tcl and C, VB and C++, Perl and C, you name it. Fast processors and web apps have largely killed the speed/power motive. Computers can happily run applications written wholly in Python or Ruby. And if they can’t, a different language probably won’t help; you just need more web servers. But alas, you now need to know SQL and JavaScript too, so we’re back to obvious reasons for multiple languages.

Aside from the immediate reasons, there’s some merit to the mind expansion argument. I think being proficient in at least two languages is indeed important for boosting your ability as a developer. This resembles human language: learning a second one changes the way you think and your perception of the world. The third or fourth, not as much. But it can’t be any two languages. If you know Portuguese and Spanish, your mind didn’t have to expand much. Likewise, learning VB.net and C# doesn’t count. Also, I agree that some programming languages are hazardous to your skills if used exclusively. Edsger Dijkstra claimed COBOL crippled the mind and that its teaching should be regarded as criminal offense. We all know who’s the new COBOL. Java, the kingdom of nouns, is a programming straight jacket. I imagine Dijkstra would have called for harsh no-parole sentences for any CS Department chairs whose students learn only Java. If you write a lot of Java code, being fluent in a richer language does sharpen your saw. This is true for other statically typed languages, but to varying degrees. More on that in a bit.

Java protects developer from self
Java protects developer from self

You might think this is contradictory. You’d be right. Life’s not simple; sorry, I wish it were. The realities are:

But there’s a sane way to deal with these. Why, you just need to find the minimal language set. The smallest set of languages it takes to crank out great software quickly while growing as a programmer and making rivers of cash. In the next entry I’ll talk about my personal language set and the factors I used to compose it.

Reality-Driven Development

Holy mango! Talk about unexpected. When I wrote my last entry on Feynman and engineering, I was aiming for my 5-strong subscriber base. After one-time deductions of friends and family, that’s a negative number of readers. Not in a million years I could have guessed it would be on Slashdot. But now a decent respect for my newfound readership compels me to explain myself a bit better (or try, anyway).

The biggest controversy was around the "bottom-up" idea. A number of people, including NASA engineers, wrote me about the need for top/down balance. I agree with this view. Feynman’s "bottom-up" is not a dismissal of top-down analysis. As he talks about the lack of a "preliminary study of materials and components" in relation to the engine, it’s clear that such a study would be guided by a plan and exploratory design. After all, engineers can’t randomly test materials until a space shuttle engine crystallizes in front of them. The problem Feynman points out is the lack of essential information about reality in the design. Analysis is important, but it must not overrule or disregard reality. And reality is best exposed by the utmost bottom-up affair: experimentation. Feynman’s bottom-up is empiricism plus the "attitude of highest quality".

John Locke
He came from the same island as Martin Fowler

I’m not going to dwell on philosophy lest this degenerate into postmodern blabber. For those interested, I think Feynman’s flavor of science is best shown in the last chapter in The Character of Physical Law and in the electromagnetism and quantum mechanics bits of The Feynman Lectures on Physics. The brilliant empirical mind behind Appendix F is laid bare in these wonderful, fun books. But how does this apply to software? Empiricism in a project context is described well in the business literature. Here’s what In Search Of Excellence has to say in the chapter "A Bias For Action":

The problem we’re addressing (…) is the all-too-reasonable and rational response to complexity in big companies: coordinate things, study them, form committees, ask for more data(…). Indeed, when the world is complex, as it is in big companies, a complex system often does seem in order. But this process is usually greatly overdone. Complexity causes the lethargy and inertia that make too many companies unresponsive.

The important lesson from the excellent companies is that life doesn’t have to be that way. Their mechanism comprises a wide range of action devices especially in the area of management systems, organizational fluidity, and experiments. (…)

There is no more important trait among excellent companies than an action orientation. (…) They don’t indulge in long reports. Nor do they install formal matrixes. They live in accord with the basic human limitations we described earlier: people can only handle a little bit of information at one time.

Finally, and most important, is the user connection. The customer, especially the sophisticated customer, is a key participant in most successful experimenting processes.

Action and experimentation are the cornerstones of empiricism. No attempt is made to subdue reality by extensive analysis and copious documentation. Reality is invited in via experiments. Instead of agonizing over market research, an empirical company hires interns and develops a product in one summer. A non-empirical company has 43 people planning an off-button design for one year. Empirical companies still rely on analysis. P&G has memos, they’re just limited to one page. But software projects are not after "empirical reality", we just want working products. Built to Last deftly relates experiments to process in a chapter entitled "Try a Lot of Stuff and Keep What Works":

What looks in hindsight like a brilliant strategy was often the residual result of opportunistic experimentation and "purposeful accidents".

Bill Hewlett told us that HP "never planned more than two or three years out". (…) We could go on with examples from Citicorp, Philip Morris, GE, Sony, and others. (…) We were surprised to find so many examples of key moves by the visionary companies that came about by some process other than planning. Nor do these examples merely represent random luck. No, we found something else at work (…): evolutionary progress. Evolutionary progress begins with small incremental steps

After dubbing 3M the "Mutation Machine From Minnesota" the authors say:

If we had to bet our lives on the continued success and adaptability of any single company (…), we would place that bet on 3M. Using 3M as a blueprint for evolutionary progress at its best, here are five basic lessons (…).

  1. Give it a try - and quick!
  2. Accept that mistakes will be made.
  3. Take small steps.
  4. Give people the room they need.
  5. Mechanisms–build that ticking clock

Built to Last makes the inescapable link to biological evolution, the epitome of bottom-up experimental development. Top companies experiment vigorously with products and processes, driven by the market and organizational metrics. Nature experiments with genetic variation, driven by natural selection. The common theme is that successful systems are driven by reality through experimentation. That’s dandy, but how about software? The best discussion I know of software-as-evolution is the famous LKML thread where Linus shuns top-down design in favor of experimentation. I think of it this way:

Reality-driven development

A good software development process should optimize experimentation and improve feedback from reality. This is what I mean by reality-driven development. And in software the most important realities are user experience and technical quality, while the primary experiments are working software and code. This isn’t a formal model (heh), it’s simply my favorite analogy for software development. I like the name "reality-driven" because when you mention reality people think of users. And I like the model because it helps me focus on important stuff and on effective ideas, like Paul Graham’s advice to release early and let the market design the product. It also has good explanatory power. Firefox is such a great browser due to intense experimentation in the form of add-ons. Waterfall is so awful because reality is ignored: when the time for feedback comes, the project is over.

There is no specific reality-driven methodology. The Agile principles have a lot in common with these ideas (and certainly influenced them), but the devil is in the details. I prefer to think of software engineering in terms of a toolbox, full of techniques we pick and choose for the right situation. Process tools for optimizing experimentation include iterative development, executable architecture, continuous integration, and unit testing.

Based on this model, the two realities we care about are user experience (including the software’s utility) and technical quality. User experience is often neglected in agile and waterfall alike. The measurement tools come from the usability people and from plain old business sense. Techniques include usability testing, observing users, spending time with users (preferably in their habitat), talking to users, and hugging users. Technical quality revolves around the code base and third party tools. Here we’re looking for the ol’ bit of ultraviolence plus generality, clarity, simplicity, security, etc. Tools include code inspections, code reviews, and metric reports as part of the build. The elusive hiring of good programmers is crucial, but it’s not measurement, so it falls within the "software project" box.

When I think about pre-requisites (requirements and top-down design) I do so in the context of this reality-driven model. Pre-requisites can optimize experimentation by minimizing cost and risk. I have seen how well-written requirements can quickly take a team from zero to working software that’s close to users’ wishes. Likewise, good top-down design can help achieve technical quality faster. But I think of prerequisites as sketches, not blueprints. I prefer minimal specs that produce working software to be molded by the users. And rigid upfront design is a sure way to a crappy code base or engineering disasters. Alistair Cockburn put it best: "With design I can think very fast, but my thinking is full of little holes."

In the end, feedback from reality helps you avoid Ivory Tower Development and pass the Ultimate Unit Test. You make your users happy. A reality-driven process with management buy-in purges faulty o-rings and gets the right materials in a shuttle engine. It avoids abominable applications. It brings money and fame and huge obelisks in your honor. So now you know my idea of bottom-up:

  1. Have a bias for experiment over analysis, though both have their place.
  2. Optimize experiments: make them as early, fast, cheap, and broad as you can. Analysis can help here.
  3. Experiment vigorously.
  4. Be smart and proactive about measuring reality: user experience and technical quality.
  5. React to feedback. Let reality drive.

Of course, you can turn the empirical machine towards the process itself, and try to improve the way you build rather than what you build ("It’s fractal, dude!"). That’s the whole point of Built to Last. Also, I’ve found that Built to Last and In Search Of Excellence work well for explaining evolutionary/agile ideas to senior management.

I hope I didn’t kill the aforementioned newfound readership by boredom. Thanks for reading and see you next time. The new server arrives on Friday.

Evolutionary Government

Richard Feynman, the Challenger Disaster, and Software Engineering

Challenger Crew

On January 28th, 1986, Space Shuttle Challenger was launched at 11:38am on the 6-day STS-51-L mission. During the first 3 seconds of liftoff the o-rings (o-shaped loops used to connect two cylinders) in the shuttle’s right-hand solid rocket booster (SRB) failed. As a result hot gases with temperatures above 5,000 °F leaked out of the booster, vaporized the o-rings, and damaged the SRB’s joints. The shuttle started its ascent, but seventy two seconds later the compromised SRB pulled away from the Challenger, leading to sudden lateral acceleration. Pilot Michael J. Smith uttered "Uh oh" just before the shuttle broke up. Torn apart by excessive force, it disintegrated rapidly. Within seconds the severed but nearly intact crew cabin began to free fall and seven astronauts plunged to their deaths. I was a child then and remember watching in horror as Brazilian TV showed the footage.

Challenger ExplosionAt the time I didn’t know that SRB engineers had previously warned about problems in the o-rings, but had been dismissed by NASA management. I also didn’t know who Richard Feynman or Ronald Reagan were. It turns out that President Reagan created the Rogers Commission to investigate the disaster. Physicist Feynman was invited as a member, but his independent intellect and direct methods were at odds with the commission’s formal approach. Chairman Rogers, a politician, remarked that Feynman was "becoming a real pain." In the end the commission produced a report, but Feynman’s rebellious opinions were kept out of it. When he threatened to take his name out of the report altogether, they agreed to include his thoughts as Appendix F - Personal Observations on Reliability of Shuttle.

It is a good thing it was included, because the 10-page document is a work of brilliance. It has deep insights into the nature of engineering and into how reliable systems are built. And you see, I didn’t put ’software’ in the title just to trick you. Feynman’s conclusions are general and very much relevant for software development. After all, as Steve McConnell tirelessly points out, there is much in common between software and other engineering disciplines. But don’t take my word for it. Take Feynman’s:

The Space Shuttle Main Engine was handled in a different manner, top down, we might say. The engine was designed and put together all at once with relatively little detailed preliminary study of the material and components. Then when troubles are found in the bearings, turbine blades, coolant pipes, etc., it is more expensive and difficult to discover the causes and make changes.

So software is not the only discipline where the longer a defect stays in the process, the more expensive it is to fix. It’s also not the only discipline where a "top down" design, made in ignorance of detailed bottom-up knowledge, leads to problems. There is however a difference here between design and requirements. The requirements for the engine were clear and well defined. You know, go to space and back, preferably without blowing up. Feynman is arguing not so much against Joel’s functional specs, but rather against top down design such as that advocated by the UML as blueprint crowd. On goes Feynman:

The Space Shuttle Main Engine is a very remarkable machine. It has a greater ratio of thrust to weight than any previous engine. It is built at the edge of, or outside of, previous engineering experience. Therefore, as expected, many different kinds of flaws and difficulties have turned up. Because, unfortunately, it was built in the top-down manner, they are difficult to find and fix. The design aim of a lifetime of 55 missions equivalent firings (27,000 seconds of operation, either in a mission of 500 seconds, or on a test stand) has not been obtained. The engine now requires very frequent maintenance and replacement of important parts, such as turbopumps, bearings, sheet metal housings, etc.

Richard Feynman

Unfortunate top down manner, difficult to find and fix, failure to meet design requirements, frequent maintenance. Sound familiar? Is software engineering really a world apart, removed from its sister disciplines? Feynman elaborates on the difficulty in achieving correctness due to the ‘top down’ approach:

Many of these solved problems are the early difficulties of a new design. Naturally, one can never be sure that all the bugs are out, and, for some, the fix may not have addressed the true cause.

Whether it’s the Linux kernel or shuttle engines, there are fundamental cross-discipline issues in design. One of them is the folly of a top-down approach, which ignores the reality that detailed knowledge about the bottom parts is a necessity, not something that can be abstracted away. He then talks about the avionics system, which was done by a different group at NASA:

The software is checked very carefully in a bottom-up fashion. First, each new line of code is checked, then sections of code or modules with special functions are verified. The scope is increased step by step until the new changes are incorporated into a complete system and checked. This complete output is considered the final product, newly released. But completely independently there is an independent verification group, that takes an adversary attitude to the software development group, and tests and verifies the software as if it were a customer of the delivered product.

Yes, go ahead and pinch yourself: this is unit testing described in 1986 by the Feynman we know and love. Not only unit testing, but ’step by step increase’ in scope and ‘adversarial testing attitude’. It’s common to hear we suck at software because it’s a "young discipline", as if the knowledge to do right has not yet been attained. Bollocks! We suck because we constantly ignore well-established, well-known, empirically proven practices. In this regard management is also to blame, especially when it comes to dysfunctional schedules, wrong incentives, poor hiring, and demoralizing policies. Management/engineering tensions and the effects of bad management are keenly discussed by Feynman in his report. Here is one short example:

To summarize then, the computer software checking system and attitude is of the highest quality. There appears to be no process of gradually fooling oneself while degrading standards so characteristic of the Solid Rocket Booster or Space Shuttle Main Engine safety systems. To be sure, there have been recent suggestions by management to curtail such elaborate and expensive tests as being unnecessary at this late date in Shuttle history.

This is one of many passages. I picked it because it touches on other points, such as the ‘attitude of highest quality’ and the ‘process of gradually fooling oneself’. I encourage you to read the whole report, unblemished by yours truly. With respect to software, I take out four main points:

There are other interesting themes in there, and Feynman’s insight can’t be captured in a few bullet points, much less by me. What do you get out of it?

Feynman's last board at Caltech

ASP.NET Runtime Cheat Sheet

This weekend I created an ASP.NET Runtime Cheat Sheet to be used as a quick reference for HttpRequest, HttpRuntime and AppDomain/Process/Identity stuff.

It shows several members of those classes, each with its live value from my site, a link to MSDN, and some explanations. I included links to useful tools (like Process Monitor) and good posts (like anything K. Scott Allen writes). The idea is to be brief and have the highest possible information-per-word ratio.

I wrote all of the information retrieval code as a user control, so it is easy to embed into an application for debugging. The code is in AspNetRuntimeDiagnostics.ascx, MIT-licensed as usual. The output should be restricted to trusted users though, since it’s a lot of information to potential attackers.

I hope this is useful to others. There are a couple of bits that could use more description, which I plan to add. The cheat sheet is a live document, your suggestions and corrections are very welcome.