I just found out that Stephen Colbert’s father and two brothers died in a plane crash on September 11, 1974. Maybe everybody knows this – I’m not sure because I haven’t watched TV in years, so I live in a sort of alternate reality. My only exposure to TV are YouTube clips of Jon Stewart, Colbert, and lots of Dora The Explorer (Jon Stewart is my favorite but Swiper The Fox is a close second, don’t tell my kids though). Now, I may not have TV to keep me informed, but I do read aircraft accident reports and transcripts from cockpit voice recorders. That doesn’t help in small talk with the neighbors, but you read some amazing stuff.
For example, in the accident that killed Colbert’s father the pilots were chatting about politics and used cars during the landing approach. They ignored their altitude and eventually ran the plane into the ground about 3 miles away from the destination airport. The report by the National Transportation Safety Board (NTSB) states that “both crew members [first officer and captain] expressed strong views and mild aggravation concerning the subjects discussed.” Since the full CVR transcript is not available we’re free to imagine a democrat and a republican arguing amid altitude alerts.
Aviation accidents are both tragic and fascinating; few accidents can be attributed to a single factor and there is usually, well, a series of unfortunate events leading to a crash. The most interesting CVR transcript I’ve read is Aeroperu 603. It covers an entire flight from the moment the airplane took off with its static ports taped over – causing airspeed, altitude, and vertical speed indicators to behave erratically and provide false data – until the airplane inverted into the Pacific Ocean after its left wing touched the sea, concluding a mad, slow descent in which crew members were bombarded with multiple, false, and often conflicting flight alerts. The transcript captures the increasing levels of desperation, the various alerts, and the plentiful cussing throughout the flight (there’s also audio with subtitles). As you read it your brain hammers the question: how do we build stuff so things like this can’t happen?
Static ports covered by duct tape in Aeroperu 603
The immediate cause of the Aeroperu problem was a mistake by a ground maintenance worker who left duct tape over the airplane’s static ports. But there were a number of failures along the way in maintenance procedures, pilot actions, air traffic control, and arguably aircraft design. This is where agencies like the NTSB and their counterparts abroad do their brilliant and noble work. They analyze the ultimate reason behind each error and failure and then issue recommendations to eradicate whole classes of problems. It’s like the five whys of the Toyota Production System coupled with fixes and on steroids. Fixes are deep and broad, never one-off band aids.
Take the Colbert plane crash. You could define the problem as “chatter during landing” and prohibit that. But the NTSB went beyond, they saw the problem as “lack of professionalism” and issued two recommendations to the FAA with a series of concrete steps towards boosting professionalism in all aspects of flight. Further NTSB analysis and recommendations culminated a few years later in the Sterile Cockpit Rule, which lays down precise rules for critical phases of flight including take off, landing, and operations under 10,000 feet. Each aviation accident, error, and causal factor spurs recommendations to prevent it, and anything like it, from ever happening again. Because the solutions are deep, broad, and smart we have achieved remarkable safety in flight.
In other words, it’s the opposite of what we do in software development and computer security. We programmers like our fixes quick and dirty, yes sirree, “patches” we call them. It doesn’t matter how critical the software is. Until 1997 Sendmail powered 70% of the Internet’s reachable SMTP servers, qualifying it as critical by a reasonable measure (its market share has since decreased). What was the security track record? We had bug after bug after bug, many with disastrous security implications, and all of them fixed with a patch as specific as possible, thereby guaranteeing years of continued new bugs and exploits. Of course this is not as serious as human life, but for software it was pretty damn serious: these were bugs allowing black hats to own thousands of servers remotely.
And what have we learned? If you fast forward a few years, replace “Sendmail” with “WordPress” and “buffer overflow” with “SQL injection/XSS”, cynics might say “nothing.” We have different technologies but the same patch-and-run mindset. I upgraded my blog to WordPress 2.5.1 the other day and boy I feel safe already! Security problems are one type of bug, the same story happens for other problems. It’s a habit we programmers have of not fixing things deeply enough, of blocking the sun with a sieve.
We should instead be fixing whole classes of problems so that certain bugs are hard or impossible to implement. This is easier than it sounds. Dan Bernstein wrote a replacement for Sendmail called qmail and in 1997 offered a $500 reward for anyone who found a security vulnerability in his software. The prize went unclaimed and after 10 years he wrote a paper reviewing his approaches, what worked, and what could be better. He identifies only three ways for us to make true progress:
- Reduce the bug rate per line of code
- Reduce the amount of code
- Reduce trusted code (which is different than least privilege)
This post deals only with 1 above, I hope to write about the other two later on. Reducing the bug rate is a holy grail in programming and qmail was very successful in this area. I’m sure it didn’t hurt that Bernstein is a genius, but his techniques are down to earth:
For many years I have been systematically identifying error-prone programming habits—by reviewing the literature, analyzing other people’s mistakes, and analyzing my own mistakes—and redesigning my programming environment to eliminate those habits. (…)
Most programming environments are meta-engineered to make typical software easier to write. They should instead be meta-engineered to make incorrect software harder to write.
In the 1993 book Writing Solid Code Steve Maguire gives similar advice:
The most critical requirement for writing bug-free code is to become attuned to what causes bugs. All of the techniques in this book are the result of programmers asking themselves two questions over and over again, year after year, for every bug found in their code:
- How could I have automatically detected this bug?
- How could I have prevented this bug?
For a concrete example, look at SQL Injection. How do you prevent it? If you prevent it by remembering to sanitize each bit of input that goes to the database, then you have not solved the problem, you are using a band aid with a failure rate – it’s Russian Roulette. But you can truly solve the problem by using an architecture or tools such that SQL Injections are impossible to cause. The Ruby on Rails ActiveRecord does this to some degree. In C# 3.0, a great language in many regards, SQL Injections are literally impossible to express in the language’s built-in query mechanism. This is the kind of all-encompassing, solve-it-once-and-for-all solution we must seek.
It’s important to take a broad look at our programming environments to come up with solutions for preventing bugs. This mindset matters more than specific techniques; we’ve got to be in the habit of going well beyond the first “why”. Why have we wasted hundreds of thousands of man hours looking for memory leaks, buffer overflows, and dangling pointers in C/C++ code? It wasn’t just because you forgot to free() or you kept a pointer improperly, no. That was a symptom. The reality is that for most projects using C/C++ was the bug, it didn’t just facilitate bugs. We can’t tolerate environments that breed defects instead of preventing them.
Multi-threaded programming is another example of a perverse environment where things are opposite of what they should be: writing correct threading code is hard (really hard), but writing threading bugs is natural and takes no effort. Any design that expects widespread mastery of concurrency, ordering, and memory barriers as a condition for correctness is doomed from the start. It needs to be fixed so that bug-free code is automatic rather than miraculous.
There are a number of layers that can prevent a bug from infecting your code: software process, tools, programming language, libraries, architecture, unit tests, your own habits, etc. Troubleshooting this whole programming stack, not just code, is how we can add depth and breadth to our fixes and make progress. The particulars depend on what kind of programming you do, but here are some questions that might be worth asking, in the spirit of the questions above, when you find a bug:
- Are you using the right programming language? Does it handle memory for you? Does it help minimize lines of code and duplication? (Here’s a good overall comparison and an interesting empirical study)
- Could a better library or framework have prevented the bug (as in the SQL Injection example above)?
- Can architecture changes prevent that class of bug or mitigate their impact?
- Why did your unit tests fail to catch the bug?
- Could compiler warnings, static analysis, or other tools have found this bug?
- Is it at all possible to avoid explicit threading? If so, shun threads because they’re a bad idea. Otherwise, can you eliminate bugs by isolating the threads (reduce shared state aggressively, use read-only data structures, use as few locks as possible)?
- Is your error-handling strategy simple and consistent? Can you centralize and minimize catch blocks for exceptions?
- Are your class interfaces bug prone? Can you change them to make correct usage obvious, or better yet, incorrect usage impossible?
- Could argument validation have prevented this bug? Assertions?
- Would you have caught this bug if you regularly stepped through newly written code in a debugger while thinking of ways to make the code fail?
- Could software process tools have prevented this bug? Continuous integration, code reviews, programming conventions and so on can help a lot. Can you modify your processes to reduce bug rate?
- Have you read Code Complete and the Pragmatic Programmer?
As airplanes still crash we’ll always have our bugs, but we could do a lot better by improving our programming ecosystem and habits rather than just fixing the problem of the hour. The outstanding work of the NTSB is great inspiration. I’m still scared of flying though – think of all the software in those planes!
[Disclosure: I am a dual American/Brazilian citizen. I've used Linux since 1996 and Microsoft products since 1990. I like both platforms.]
Brazil often makes Linux-related headlines, the latest being the adoption of KDE in Brazilian public schools. It’s clear that Brazil is enamored with Linux, but why? This is an important question for Microsoft since emerging markets are key to sales growth. Microsoft’s Annual Report 2007 reported that “impressive growth included India, China, and Brazil which all delivered revenue growth that topped 40 percent”, which is much faster than growth in developed countries. These markets are also friendly towards Linux and pose significant challenges for Microsoft. This post is my take on the reasons for Brazil’s fondness of Linux. I speak for Brazil since I was born and raised there, but I think much of this applies to the other BRIC countries and emerging markets in general.
The first and obvious argument is economic: free as in beer is a big deal in Brazil’s economy. The table below contrasts the economics of license costs in the US and in Brazil:
|Gross National Income (GNI) per capita||$44,710||$4,710|
|Cost of Windows Vista Business||$186||$364|
|Cost of MS Office 2007 Standard||$289||$587|
|Cost of Business Licenses as % of GNI per capita||1.06%||20.1%|
|Cost of Windows Vista Home Basic||$116||$252|
|Cost of Office Home/Student||$109||$117|
|Cost of Home Licenses as % of GNI per capita||0.5%||7.8%|
|All figures in US dollars. An exchange rate of USD$1 = R$1.70 was used to compute the cost of licenses in Brazil.|
You might be surprised to learn that Microsoft licenses are nearly twice as expensive in Brazil in absolute terms. I imagine Microsoft charges about the same and Brazil’s brutal tax burden makes up the rest (the taxes are built into the price). But the interesting result is the relative price of licenses in each society, captured as % of GNI per capita. As a proportion of national incomes, business licenses are nineteen times more expensive to Brazilian society and home licenses are fifteen times more expensive. While GNI per capita is not a perfect figure, it reflects the incomes people make, how much they spend to live, and how much they pay in taxes. It is a crucial number when it comes to public policy; it’s not hard to understand why rational policies must dodge licensing costs when possible. If there’s any hope of widespread computer access, then surely we can’t expect people to spend 7.8% of their annual income on Microsoft software licenses alone. The burden on small businesses is also prohibitive. This order-of-magnitude difference is a fundamental problem that can’t be solved by piecemeal license giveaways. Suppose Microsoft gave out Windows and Office wholesale to all schools. Then what happens if those kids need a computer at home or in their parents’ business? License costs are simply out of whack with respect to most of society. Using Linux in public schools, rarely attended by richer kids, seems inescapable.
Notice that I didn’t use Windows Vista Starter Edition in the figures. This is because I find the limitation of three simultaneous programs absurd. It’s hard to believe Microsoft put in such an abominable restriction; it’s one thing to quietly omit features, it’s quite another to slap people on the face with “Sorry, no, only 3 programs! Click OK to continue.” Even the limited hardware supported by Vista Starter can easily run multiple programs, so that’s no excuse. I imagine a kid trying to learn programming in such a machine, trying to run a few tools plus a test application, and being told to bugger off. How is this bridging the digital divide? Besides, there are limitations on buying Vista Starter – a family receiving a donated computer, for example, cannot buy a retail version of it. And to cap it off, if they went ahead and bought OEM, the dollar figure for Vista Starter + Office Home comes to 5% of GNI per capita, still an order of magnitude above the US figure.
Looking at these numbers, you might wonder how Microsoft sales could grow 40% in Brazil last year – I mean, do they even have computers there? It turns out that Brazil has both the 10th largest economy in the world and the 11th worst distribution of income. There are wealthy households, businesses, and government departments to whom license prices matter far less. For example, after the dollar plunge the cost in dollars of a programmer in Brazil is close to that of one in the US, provided the employer is paying all taxes (the norm for mid-size and large businesses). These wealthier pockets comprise a sizable market whose landscape is more similar to the US: labor costs dwarf license costs, MS Office is a near-monopoly, and inertia is in Microsoft’s favor. Since this market is in Brazil’s economy the licensing costs still consume relatively more purchasing power, but Microsoft can definitely compete in these niches. Except there’s more to the story than economics.
Many cultural issues work against Microsoft to mobilize Brazilians in favor of the Penguin. I’ll hit up the three I deem biggest: 1) utter disregard for copyright, 2) strong anti-Microsoft feelings, and 3) Linux alpha geek monopoly. A thorny reality aggravates the first two issues: anti-American sentiment. It’s worth looking at this sentiment for context. The Pew Research Center runs the Pew Global Attitudes Project to track global public opinion on a variety of issues. Their latest report shows continued decline in US image, which has plummeted around the globe in the past 5 years. Here’s the data:
I was shocked to see these results at first. Things might not be as bleak as the numbers suggest though. Some of the backlash is not structural, but rather directed at the current US Administration. It may well subdue come January 2009. Yet it’s important for American companies to factor this in when thinking about markets abroad. Nothing new there, except for how bad things got. Anti-American sentiment is particularly strong in 3 of the BRIC: Brazil, Russia, and China. (In Brazil this is only a political/ideological thing, one-on-one people are still as friendly as ever towards Americans and everyone else.) Keep these numbers in mind when thinking about the factors below, starting with disregard for copyright.
When I was growing up in Brazil, paying for software licenses was about as natural as a third arm growing out of your back. Whenever you needed software, you’d dial up a friendly pirate and buy a “collection” for, uh, $30. That included, my friends, instant home delivery: the guy would drive to your house and deliver the collection. If you were programming at night, he might even bring you a pizza. The best pirates had good access to the warez scene and could find anything in case you had a custom request. In the collection you’d find cracked versions of several major pieces of software from various manufacturers. How convenient. Borland Delphi? Check. Visual Studio? Check. Windows NT 4.0 Server, Workstation? Check. Linux? Check (saves you the download). It was like MSDN for the whole computer industry! The piracy happened regardless of income levels – people “buying” the software were by no means poor (otherwise they wouldn’t have a computer in the first place, at that point in time). Many could easily afford licenses, yet felt absolutely no qualms about piracy. The whole culture disregarded copyrights deeply. To most people, the pirate was doing honest work: downloading all this stuff, burning it, delivering it. An honorable job.
Things have changed since then, but not much. Copyright enforcement is more serious; piracy within mid-size or large businesses is rare. There is more copyright awareness (or indoctrination, depending on whom you ask). Home users still pirate anything they can though. This is not restricted to software either: visit any campus in Brazil and you’ll see rampant photocopying of text books. Street vendors sell DVDs filled with MP3s, movies, you name it. I’m sure Hollywood execs have nightmares where they’re roaming the streets of Brazil. The culture still expects free distribution and the environment is very hostile to proprietary software licenses.
Before Windows Genuine Advantage, Microsoft’s strategy was to ignore the pirates: sell to the corporations, let everyone else copy it. Now regular people are growing third arms and paying for Windows licenses. Fair enough: middle-class fat cats should not be ripping off your software. The trouble is that nowadays not all cats are fat: years of sound macroeconomic policy have allowed lower-middle-class people to buy computers. This is a change from when I was there. There are now many folks who, though not poor, definitely have a very hard time paying for licenses. They either go to Linux or go unpatched. The need to buy software would effectively keep these families out of computing: they do back flips just to get the hardware itself, in the hopes of giving their children a better shot. And richer people resent paying for software, however messed up that is. Every customer cut off by Windows Genuine Advantage is a possible conversion to Linux or at the very least a little more pressure towards migration. If people had to pay for Office too, there’d be gnashing of teeth. I’m not suggesting Microsoft is responsible for fixing severe income inequality or supporting middle-class free loaders; that’s just the nut they need to crack with a creative revenue model because Linux fits like a glove. On to the next issue.
Vista Starter caps you at 3 programs. This is not how you win friends and influence people.
Brazil imported the anti-Microsoft stance common in American geeks, but on top of the usual arguments Microsoft is foreign. This adds fuel to the flame. To the Brazilian Microsoft hater, not only there is an “evil monopoly”, but its profits are repatriated and its jobs are elsewhere. Practices like the 3-program limitation on Vista Starter further erode good will (Brazilians call it the “castrated Windows” among other colorful names). Add a dash of anti-American sentiment and you’ve got some serious resistance. This fiery mood has a strong influence, from the teenager hanging out in #hackers on Brasnet to IT departments to the federal government. Even in a rational self-interest analysis, one might rightly point out that if free/open source software (FOSS) were to wipe out Windows, negative effects on Brazil’s economy are likely minimal. The wealth, jobs, and opportunity created by Microsoft aren’t in Brazil (productivity gains might be, but that’s a whole different argument). The trade offs of a potential Linux/Google take over are different when there’s no national off-the-shelf software industry, plus Google’s revenue model works beautifully in a developing country. This mix of ideological and rational arguments torpedoes Microsoft’s support.
The third cultural issue working for Linux is more subtle. In the US people talk about Microsoft losing the alpha geeks, but in Brazil FOSS has already reached a near-monopoly on them. Again, the standard reasons apply but are augmented by the local realities. Before FOSS, interesting software work was very rare in Brazil and the chance to shape widely used products practically did not exist. Imagine a place where 80% of programmers build boring, low-powered line-of-business applications working in conditions exactly opposite of Peopleware. That’s the US. Make it 99% and you have Brazil. In the US we have a wildly dynamic economy full of start-ups and interesting companies soaking up talent fast, but not so in Brazil. David Solomon, co-author of Windows Internals, was working for DEC at 16. But what if there is no one building a kernel in a 3,000-mile radius? Emigration was the most realistic possibility for interesting work. A 16-year-old would have been out of luck.
The FOSS revolution plus the Internet changed all this. Now people in Brazil can actually develop interesting and widely used programs. We’ve got kernel hackers like Marcelo Tosatti, who maintained the 2.4 Linux kernel series, and Arnaldo Carvalho de Melo, who co-founded the Conectiva distribution. There are RedHat employees, Debian contributors, committers on various projects, and so on. Lua, the programming language, comes from Brazil. There’s a practical advantage in being able to, say, tune a distribution for a particular purpose (e.g., the distribution being delivered to public schools). But beyond that it’s inspiring to finally be able to work with talented people in cool projects and have a chance to participate, rather than be handed down a proprietary product built abroad over which you have zero control. People are excited about and grateful for this. By the time you mix up these elements nearly all talented CS students and alpha geeks are well into the Linux camp. Unlike the US, the dynamic economy isn’t there to add some fragmentation. When these people go on to make technology choices in government or industry, guess what they’ll pick?
So that’s it. I think these are the main factors in Brazil’s love affair with Linux: economics, disregard for copyright, anti-Microsoft sentiment, and massive alpha geek support. These factors feed off each other, all pushing towards Linux. Millions of kids using KDE would impact the work force eventually. If Microsoft is overzealous in their anti-piracy efforts, it might precipitate faster changes in this delicate market. Meanwhile, Google Docs and Open Office are catching on. There are tactical moves Microsoft could make to counter Linux momentum, like a more sustainable licensing model for homes and small businesses (maybe their announced annuity licensing?), better native branding, and perhaps some native development. But Google has done all three already and is very well-liked in Brazil despite the anti-US feelings. My Brazilian friends, even a pragmatic IT manager who plays poker with Microsoft Brazil employees, seem to operate on the assumption that an eventual Linux take over (with some combination of Google/Google Docs/Open Office) is just a matter of time. What holds it back is that all the factors discussed here can spark things up, but until desktop Linux is ready to catch on fire you get much hype and little change. The wood does seem drier and drier, so we’ll see. What do you think?