Back in April of last year I wrote about job numbers and trends for programming languages. Since the newspapers tell me we’re all doomed to the soup line in the near future, I decided to compare the job numbers from last year to what we have now. Here’s the result:
All the numbers are from Dice.com, a rough measure to be sure, but useful nonetheless. The average decrease in the number of jobs was 40%, which seems pretty bad. I don’t know what the supply side looks like, but I imagine we now have more job seekers as well. Here is the % decrease in number of jobs, by programming language:
Interesting to see how Python and Ruby held out a bit better, while Perl declined the most. But despite a rocky short term, the overall picture for software engineers looks great according to the US Bureau of Labor Statistics:
- Computer software engineers are one of the occupations projected to grow the fastest and add the most new jobs over the 2006-16 decade.
- Excellent job prospects are expected for applicants with at least bachelor’s degree in computer engineering or computer science and with practical work experience. (…)
Employment change. Employment of computer software engineers is projected to increase by 38 percent over the 2006 to 2016 period, which is much faster than the average for all occupations. This occupation will generate about 324,000 new jobs over the projections decade, one of the largest employment increases of any occupation.
Not bad huh? Back in 2001 I pestered the BLS economists about this, asking them what they thought offshoring effects would be and so on. They came across as truly bullish on programming, which makes sense to me. The degree to which society depends on computers and programmers will only grow from here. Meanwhile, there is a natural barrier to entry when it comes to programming. When explaining the folly of projects that aim to develop software to replace programmers, Scott Westfall put it this way:
Programmers think more logically. Working through if-then-else conditions is a core capability for any programmer. While working with business teams on requirements, I have often run across cases the where same ability was lacking. (…)
Programmers have a superior ability to analyze problems and come up with solutions. They excel at analyzing preconditions, sequences of events, and outcomes. Certainly, this is a key skill in programming, but it is also useful in troubleshooting and business case analysis.(…)
While people typically think of programmers as coders, whose main talent lies in writing the arcane syntax of programming languages. I think that their main talent lies in their ability to analyze, troubleshoot, and solve problems. Code is just the physical manifestation that culminates the thought process of the programmer.(…)
I see two major consequences of this. First, the supply of programmers is constrained because the work requires a fair bit of aptitude that cannot be replaced by training. Second, programmers have a lot of professional options due to these skills, which further hurts supply. I think the economics is in our favor and we’re still lucky to be programmers, though we must be careful during the recession. What do you say? How does it look out there?
Update: you guys have brought up a number of points about the ‘methodology’ behind the Dice.com job numbers. For example, there are seasonal effects on hiring, so it would have been better to compare the two same months. Also, there may be a drop in the usage of Dice.com itself, rather than a drop in the number of available jobs. Besides, many good companies and applicants have turned away from Dice because of poor results for both sides. That is all true. I look at the Dice.com figures as a rough metric. But as a large tech jobs site I think Dice reflects the market at large, albeit imperfectly. A drop of 40% is significant enough that I find it likely it’s a real phenomenon.
Internet protocols are described by RFCs – request for comments – issued by the Internet Engineering Task Force. The words “must”, “should”, and “may” are used often in these documents to describe what hosts need to do in various situations. As such, the words themselves are defined in RFC 2119.
Back when I did Unix network programming I started to use these words for tagging code comments. Sort of like “hack:” or “todo:” comments, but with a built-in priority. This works great for me, as I can then search for “MUST:”, “SHOULD:”, and “MAY:” tags in the code and see the stuff prioritized. “MUST:” flags unshippable issues, “SHOULD:” is serious business and should be near zero, and “MAY:” is for possible refactorings and low priority stuff. Ideally all tags are temporary of course, as issues are resolved one way or another.
In general I believe in writing expressive code rather than comments, but these tags have come in handy. Plus they’re fun – there’s something to be said for cultivating the quirky traditions of computing.
For 2,000 years scholars held that heavier objects fall faster than lighter ones, partly because Aristotle couldn’t be bothered to take 2 minutes to experiment. Hell, he even wrote that men have more teeth than women. Isn’t that crazy? And yet, people often rely on this kind of fact-free reasoning to arrive at conclusions about computer performance, among other things. Worse, they spend their IT budgets or sacrifice code clarity based on these flawed ideas. In reality computers are far too complex for anyone to handle performance problems by “reasoning” alone.
Think about a routine in a modern jitted language. Right off the bat you face hidden magic like type coercion, boxing, and unboxing. Even if you know the language intimately, unknowns are introduced as your code is optimized first by the compiler, then again by the JIT compiler. It is then fed to the CPU, where optimizations such as branch prediction, memory prefetching and caching have drastic performance implications. What’s worse, much of the above can and does change between different versions of compilers, runtimes, and processors. Your ability to predict what is going to happen is limited indeed.
To take another example, consider a user thinking of RAID-0 to boost performance. Whether there are any gains depends on a host of variables. What are the patterns of the I/O workload? Is it dominated by seeks and random operations, or is there a lot of streaming going on? Reads or writes? How does the kernel I/O scheduler play into it? How smart are the RAID controller and drivers? How will a journaling file system impact performance given the need for write barriers? What stripe sizes and file system block sizes will be used? There are way too many interdependent factors and interactions for speculative analysis. Even kernel developers are stumped by surprising and counterintuitive performance results.
Measurement is the only way to go. Without it, you’re in the speculation realm of performance tuning, the kingdom of fools and the deluded. But even measurement has its problems. Maybe you’re investigating a given algorithm by running it thousands of times in a row and timing the results. Is that really a valid test? By doing so you are measuring a special case where the caches are always hot. Do the conclusions hold in practice? Most importantly, do you know what percentage of time is spent in that algorithm in the normal use of the application? Is it even worth optimizing?
Or say you’ve got a fancy new RAID-0 set up. You run some benchmark that writes large globs of data to the disk and see that your sustained write throughput is twice that of a single disk. Sounds great, too bad it has no bearing on most real-world workloads. The problem with the naive timing test and the benchmark is that they are synthetic measurements. They are scarcely better than speculation.
To tackle performance you must make accurate measurements of real-world workloads and obtain quantitative data. Thus we as developers must be proficient using performance measurement tools. For code this usually means profiling so you know exactly where time is being spent as your app runs. When dealing with complex applications, you may need to build instrumentation to collect enough data. Tools like Cachegrind can help paint a fuller picture of reality.
For website load times and networks you might use tools like WireShark and Fiddler, as Google did for GMail. In databases, use SQL profiling to figure out how much CPU, reading, and writing each query is consuming; these are more telling than the time a query takes to run since the query might be blocked or starved for resources, in which case elapsed time doesn’t mean much. Locks and who is blocking who are also crucial in a database. When looking at a whole system, use your OS tools to record things such as CPU usage, disk queue length, I/Os per second, I/O completion times, swapping activity, and memory usage.
In sum, do what it takes to obtain good data and rely on it. I’m big on empiricism overall, but in performance it is everything. Don’t trust hearsay, don’t assume that what held in version 1 is still true for version 2, question common wisdom
and blog posts like this one. We all make comical mistakes, even Aristotle did. Naturally, it takes theory and analysis to decide what to measure, how to interpret it, and how to make progress. You need real-world measurement plus reasoning. Like science.
Coding Horror started a discussion on the relative merits of writing your own code versus using 3rd-party libraries. The main argument for writing your own is borrowed from Joel’s In Defense of the Not-Invented-Here Syndrome:
If it’s a core business function — do it yourself, no matter what.
Pick your core business competencies and goals, and do those in house. If you’re a software company, writing excellent code is how you’re going to succeed. Go ahead and outsource the company cafeteria and the CD-ROM duplication. If you’re a pharmaceutical company, write software for drug research, but don’t write your own accounting package. If you’re a web accounting service, write your own accounting package, but don’t try to create your own magazine ads. If you have customers, never outsource customer service.
Righty. But then why did Fog Creek, Joel’s company, build their Copilot product based on the open source Tight VNC? I mean, that is the very core of their product, and they decided to use third party code. It’s hard to think of a bigger counter-example to his claim (there are plenty of others, like Flickr using ImageMagick and MacOS using BSD code). So why the contradiction?
It turns out the quote above is naive and simplistic. Writing excellent code is not how you’re going to succeed. You’ll succeed by delivering value and making users happy, hopefully fast. Given that you have limited resources, you need to prioritize your precious programmer firepower and funnel it into those areas that will help you differentiate and build really useful stuff. Rebuilding compilers and libraries is not how you do that. Which is why Joel, who is anything but naive, borrowed the core engine of his product from VNC and concentrated on delivering smooth usability.
Thinking of “programming” as your core business function is too broad. Only a subset of your programming is truly core. You won’t do better than Prototype, jQuery, or extjs. Rails really is great, and so are CakePHP, lex/yacc, and HTML Tidy. The instinct to use third-party libraries is absolutely right. Only when the choices are truly unsuitable or inexistent should you roll your own. This is especially true now that we have so many high-quality open source libraries. Or at least you do when you’re developing in the open source ecosystem, which is something that needs to be taken into account when deciding on the right platform.
People often forget to factor in opportunity cost when thinking of tradeoffs. When you re-write stuff, not only you spent the time and money, but you also did not build something else that might have been valuable. So pick the right libraries and concentrate on giving us the goodness only you can build.
A few years back I got three silly tech certifications: Certified Information Systems Security Professional (CISSP), Microsoft Certified Database Administrator (MCDBA), and Microsoft Certified Systems Engineer (MCSE). That sounds like an odd mix of certs for a programmer; I have them because I’m
a mercenary an independent developer and I also do architecture work. Many programmers regard certifications with scorn, for good reason. But merit, or lack thereof, is one piece of the puzzle. For me the main questions around a certification are:
- Is there market demand for the certification?
- Does the certification truly verify knowledge or ability?
- Should I look for it when hiring?
Certification providers make a lot of claims about jobs and improved incomes due to certification. Most skip over the sticky issue of correlation versus causation. Do certifications boost your income, or is motivation behind both? Or, less graciously, is the data simply crap from people seeking your hard-earned cash? We can get an idea of demand from Dice.com:
Is this the MCDBA? Or is this the CCNA? Or is this the IRA? I thought it was the UK!
By looking at total cert numbers, there’s clearly some demand. Strangely Java certifications seem to be an exception. Please let me know if I made a mistake there. Absolute numbers tell part of the story, but a better test is comparing the number of jobs that mention a given certification to the total number of relevant jobs. The chart below is my attempt at that:
Dice.com – % of relevant jobs mentioning certification
The data is flawed since it’s impossible to query “all relevant jobs”, but I tried to make sure it was sane for each cert (I posted the actual searches as a comment). I think this reasonably captures market demand for certifications, including the fortunate reality that programmers can safely ignore them. Outside of programming demand is stronger; in security the CISSP is notably successful. There are two points worth mentioning. First, certifications still have an effect on customers and jobs that have not explicitly asked for it. Whether that matters depends on your career: people who are starting out or interview frequently for contracts stand to benefit more. Second, the more clueful a company is, the less stock they put in certifications. Some hardcore, academic, or start-up workplaces downright shun certifications, especially programming ones.
Which brings us to the next point: do these certs prove anything? For the ones I have obtained, all in multiple-choice format, I feel safe answering “hell no!” Certified people might truly know the subject or they might have studied for a month or they might be good at multiple-choice tests. Basic answering techniques go a long way toward passing the tests. The ol’ elimination, contradiction, and parsing answers out of the tests themselves gets you almost there. Add a few hours studying targeted prep books and materials and random people could obtain random multiple-choice certifications, provided they test well in general.
When I took the tests my programming work involved security, so I fulfilled the CISSP requirement for three years of experience. I had decent knowledge of certain areas covered by the Microsoft stuff and the CISSP, but in other areas I was utterly ignorant. If these tests were up to snuff, I would have had to work much harder. The CISSP, given its better reputation, was a let down – I found it looser than the Microsoft stuff, which is dodgy to begin with. As luck would have it, in the week prior to the CISSP test (i.e., the study week) I started reading Harry Potter, which is like crack in that it takes your mind off everything, only more addictive. In the evening before the test I thought “crap, I just lost $400 in test fees.” I wondered whether to even show up – “surely the mighty CISSP is harder than the Microsoft stuff and you can’t pass with no study, little sleep, and a head full of Hogwarts.” Not so. Hence, in a Groucho Marx sort of way, I lost all respect for multiple-choice certifications.
In fact, the CISSP unites two negative aspects of certification: a faulty testing mechanism coupled with a heavy-handed experience requirement. I oppose criteria like degrees and years of experience, which correlate weakly with talent and job performance. They also strike me as unfair in a certification – let each employer decide how to value such things independently of the knowledge ‘verified’ by the testing.
My conclusion is that certifications based solely on multiple-choice tests are at best misguided and at worst shams. I don’t see how they could be fixed either. There is a fundamental disconnect between filling in the right blank and getting technology work done. For programming certifications the multiple-choice format is ludicrous. Hence these silly tests will remain a marketing and money-making scheme fueled by naive employers, who put faith in vacuous credentials, and the professionals who play along for supposed career benefits. Or maybe they’re fueled by the naive customers of the employers.
The final question is whether to look for certifications when hiring. In my experience there’s actually some positive correlation between certifications and the quality of job candidates, but it is too weak to be useful. I can’t use certifications as a weeding tool, they don’t guarantee a phone interview is worthwhile, and I would never rely on them for a hiring decision. Multiple-choice certs are silently ignored in my hiring process. The reality might be different for more stringent, hands-on certifications like the Cisco Certified Internetwork Expert (CCIE).
As to whether you should get certified, it’s obviously too particular a decision, but I hope this information helps. One of the best pro-certification arguments is that if you study the stuff anyway, for fun or profit, then why not go ahead and take the tests? It helps you focus somewhat and you get the warm fuzzy I-passed feeling. Coupled with the potential advantages in the market, maybe that’s good enough reason to take them. For those in markets where certifications are strong, it may be necessary. But on principle I’d sooner avoid them and for most programmers I don’t think they make sense.
Certifications that actually proved some knowledge and talent would be useful for job seekers and employers. Where possible, we must shun broken certifications and pressure vendors into adopting a valid testing scheme. This necessarily would involve practical tests, like the ones in the Red Hat Certified Engineer test or the CCIE. Both are head and shoulders above the ones I took. For programmers, certification could use on-the-spot, time-capped programming assignments. The results could be a mixture of the source code produced (the most important piece) and the outcome of an automated test suite. That’d be pretty useful during the selection process: the ability to see source code produced in standard conditions of temperature and pressure, before you sink hours into a candidate.
What do you say?
For the past few weeks I’ve been working with a fellow developer on a project that required an all-out programming effort. It’s done now, so we’re back to a regular schedule, but when people hear about the crazy hours they often say they’re sorry. They really shouldn’t be. I would never do this often, or for long periods, or without proper compensation if done for an employer, but the truth is that these programming blitzkriegs are some of my favorite periods in life. Under the right conditions, writing software is so intensely pleasurable it should be illegal.
Many programmers relate to this, but others are taken aback when they hear it. I think it’s because institutions are so good at squeezing the fun out of everything. It’s appalling for example how schools can take the most vibrant topics and mangle them into formulaic, mediocre slog. And so it is for programming. Many corporations turn an inherently rewarding experience into something people just barely stomach in exchange for a paycheck.
That’s too bad. Few things are better than spending time in a creative haze, consumed by ideas, watching your work come to life, going to bed eager to wake up quickly and go try things out. I am not suggesting that excessive hours are needed or even advisable; a sane schedule is a must except for occasional binges. The point is that programming is an intense creative pleasure, a perfect mixture of puzzles, writing, and craftsmanship.
Programming offers intriguing challenges and ample room for invention. Some problems are investigative and reductionist: Why is this code running slowly? What on earth is causing that bug? Others are constructive, like devising algorithms and architectures. All of them are a delight if you enjoy analytical work, immersed in a world full of beasts like malware, routers, caches, protocols, databases, graphs, and numbers.
This analytical side is what most people associate with programming. It does make it interesting, like a complex strategy game. But in most software the primary challenge is communication: with fellow programmers via code and with users via interfaces. By and large, writing code is more essay than puzzle. It is shaping your ideas and schemes into a coherent body; it is seeking clarity, simplicity and conciseness. Both code and interfaces abound with the simple joy of creation.
Another source of pleasure is that under certain conditions, beauty arises in programming. It may sound like bullshit but it’s real, the kind of thing that makes your day better. Take for example Euclid’s 2-line proof that prime numbers are infinite. I think many would find it beautiful – so succint and such a fascinating result. This is the beauty of math, cold and austere, and it pervades software. It is in clever algorithms like quicksort, in the sources of kernels and compilers, in elegant exploits and in the tricks we pull to solve everyday problems. When you see these solutions, be it famous algorithm or mundane trick, you smile and think “how smart” and it feels good. How noble in reason!
A non-math sort of beauty also exists in code, analogous to eloquence in discourse. It’s present in well-factored software that does a lot with little code, in short and crisp methods, in well-done architectures. Some languages make this hard and not all programmers produce it, but it’s a joy to read and work on such code. If you’re working in an expressive language with coworkers whose code you enjoy, it happens often enough to brighten things up.
Now for craftsmanship. In a sense software is abstract – where does program behavior exist but in our minds? Yet we call it building software for a reason. Programs are shaped feature by feature, architectures start out as scaffolds and grow, user interfaces come together, bugs are fixed and hotspots are optimized to make things run fast. Software provides a deeply satisfying sense of craft. We build stuff out of pure ideas and then get to watch it working to solve real problems and make people a little better off. Or far better off, as the case may be.
Take Biology. Despite nearly 400 years of scientific revolution, Biology has been unable to deliver on crucial problems like effective cures for viral infections or cancer. Some of our best progress, like antibiotics, has been due to chance and random experimentation. You start a clinical trial for a hypertension drug and suddenly – whoah – all your subjects have hard-ons! Viagra is born. To be sure, chance plays a role in all endeavours, but Physics and Chemistry have a comprehensive theoretical basis powering systematic improvements, whereas Biology has been largely confined to kludges. Wanna treat cancer? Here, blast the patient with radiation and poison and hopefully the cancer will die first. They’re brilliant kludges, and I’m happy to have them, but it’s a far cry from the precision we’ve had elsewhere.
Software is changing that. Just barely 50 years ago the shape of DNA was being discovered, but now anyone can browse and download hundreds of complete genome sequences. Or look up thousands of genes (DLEC1 for a random example), complete with nucleotide sequence, amino-acid sequence for expressed proteins, literature mentioning the gene, you name it! Or you can search vast gene and protein databases for nucleotide or amino-acid sequences, perhaps after sequencing something in ever-cheaper devices, and get a comprehensive report on the match. It doesn’t matter if they’re exact, because the algorithm in BLAST, the standard sequence search tool, delivers partial maches across databases and species, scored by match likelihood. These advances will enable massive breakthroughs in medicine. Biology is entering a new era, like Physics in the 18th century, propelled by software.
Yea, sure, biologists have a minor role , but we in computing increasingly power major developments in science, culture, and business. When a third-world kid looks up a Wikipedia entry, it’s our work too! We wrote the RFCs and the networking stacks, the browser and MediaWiki, the OSes and the HTTP servers. Not to mention a lot of the Wikipedia entries, but since a few were on company time I’ll leave them aside. The influence of technologists goes beyond bits and bytes: it was a programmer who invented wikis and our community started blogs. Henry Mencken pointed out correctly that “freedom of the press is limited to those who own one”. It’s a pity he’s not around to watch our creations break down the stifling conformity and cozy subservience of professional journalism. Less glamorously but to great benefit our applications have delivered steep productivity gains to businesses across the economy. These are a few examples in a long list.
Three years ago, when I finished my undergrad (after being a programmer for many years), I was about to enter med school. At that point, a couple of negative experiences had me somewhat burned out on computer work. I’m happy I stuck with it. I’m still interested in biomedical research, but if I were to get involved I’d rather come in from the software angle, because frankly it’s too much fun to pass on. My mom thinks I’m a typist but oh well.
If you find yourself stuck in a place that’s killing your innate passion for technology, by all means, move the hell on! Don’t stay put while your enthusiasm is slowly drained. It’s hard to find motivated people to hire so you’ve got a major asset already; there are plenty of employers – and companies to be started – that will better suit you. For people who think they might like programming, your mileage may vary, but I highly recommend it as a career. Not only is the outlook bullish on the job front, but as the role of software grows in society we’ll see more exciting and beneficial changes delivered by technology. I’m delighted to be along for the ride as constantly my art and craft I try to master.
PS: thanks for putting up with the irregular posting schedule. The plan is to stick to regular posting now that things have calmed down. And if you like the song, download the mp3 because the YouTube audio doesn’t do it justice.
I just found out that Stephen Colbert’s father and two brothers died in a plane crash on September 11, 1974. Maybe everybody knows this – I’m not sure because I haven’t watched TV in years, so I live in a sort of alternate reality. My only exposure to TV are YouTube clips of Jon Stewart, Colbert, and lots of Dora The Explorer (Jon Stewart is my favorite but Swiper The Fox is a close second, don’t tell my kids though). Now, I may not have TV to keep me informed, but I do read aircraft accident reports and transcripts from cockpit voice recorders. That doesn’t help in small talk with the neighbors, but you read some amazing stuff.
For example, in the accident that killed Colbert’s father the pilots were chatting about politics and used cars during the landing approach. They ignored their altitude and eventually ran the plane into the ground about 3 miles away from the destination airport. The report by the National Transportation Safety Board (NTSB) states that “both crew members [first officer and captain] expressed strong views and mild aggravation concerning the subjects discussed.” Since the full CVR transcript is not available we’re free to imagine a democrat and a republican arguing amid altitude alerts.
Aviation accidents are both tragic and fascinating; few accidents can be attributed to a single factor and there is usually, well, a series of unfortunate events leading to a crash. The most interesting CVR transcript I’ve read is Aeroperu 603. It covers an entire flight from the moment the airplane took off with its static ports taped over – causing airspeed, altitude, and vertical speed indicators to behave erratically and provide false data – until the airplane inverted into the Pacific Ocean after its left wing touched the sea, concluding a mad, slow descent in which crew members were bombarded with multiple, false, and often conflicting flight alerts. The transcript captures the increasing levels of desperation, the various alerts, and the plentiful cussing throughout the flight (there’s also audio with subtitles). As you read it your brain hammers the question: how do we build stuff so things like this can’t happen?
Static ports covered by duct tape in Aeroperu 603
The immediate cause of the Aeroperu problem was a mistake by a ground maintenance worker who left duct tape over the airplane’s static ports. But there were a number of failures along the way in maintenance procedures, pilot actions, air traffic control, and arguably aircraft design. This is where agencies like the NTSB and their counterparts abroad do their brilliant and noble work. They analyze the ultimate reason behind each error and failure and then issue recommendations to eradicate whole classes of problems. It’s like the five whys of the Toyota Production System coupled with fixes and on steroids. Fixes are deep and broad, never one-off band aids.
Take the Colbert plane crash. You could define the problem as “chatter during landing” and prohibit that. But the NTSB went beyond, they saw the problem as “lack of professionalism” and issued two recommendations to the FAA with a series of concrete steps towards boosting professionalism in all aspects of flight. Further NTSB analysis and recommendations culminated a few years later in the Sterile Cockpit Rule, which lays down precise rules for critical phases of flight including take off, landing, and operations under 10,000 feet. Each aviation accident, error, and causal factor spurs recommendations to prevent it, and anything like it, from ever happening again. Because the solutions are deep, broad, and smart we have achieved remarkable safety in flight.
In other words, it’s the opposite of what we do in software development and computer security. We programmers like our fixes quick and dirty, yes sirree, “patches” we call them. It doesn’t matter how critical the software is. Until 1997 Sendmail powered 70% of the Internet’s reachable SMTP servers, qualifying it as critical by a reasonable measure (its market share has since decreased). What was the security track record? We had bug after bug after bug, many with disastrous security implications, and all of them fixed with a patch as specific as possible, thereby guaranteeing years of continued new bugs and exploits. Of course this is not as serious as human life, but for software it was pretty damn serious: these were bugs allowing black hats to own thousands of servers remotely.
And what have we learned? If you fast forward a few years, replace “Sendmail” with “WordPress” and “buffer overflow” with “SQL injection/XSS”, cynics might say “nothing.” We have different technologies but the same patch-and-run mindset. I upgraded my blog to WordPress 2.5.1 the other day and boy I feel safe already! Security problems are one type of bug, the same story happens for other problems. It’s a habit we programmers have of not fixing things deeply enough, of blocking the sun with a sieve.
We should instead be fixing whole classes of problems so that certain bugs are hard or impossible to implement. This is easier than it sounds. Dan Bernstein wrote a replacement for Sendmail called qmail and in 1997 offered a $500 reward for anyone who found a security vulnerability in his software. The prize went unclaimed and after 10 years he wrote a paper reviewing his approaches, what worked, and what could be better. He identifies only three ways for us to make true progress:
- Reduce the bug rate per line of code
- Reduce the amount of code
- Reduce trusted code (which is different than least privilege)
This post deals only with 1 above, I hope to write about the other two later on. Reducing the bug rate is a holy grail in programming and qmail was very successful in this area. I’m sure it didn’t hurt that Bernstein is a genius, but his techniques are down to earth:
For many years I have been systematically identifying error-prone programming habits—by reviewing the literature, analyzing other people’s mistakes, and analyzing my own mistakes—and redesigning my programming environment to eliminate those habits. (…)
Most programming environments are meta-engineered to make typical software easier to write. They should instead be meta-engineered to make incorrect software harder to write.
In the 1993 book Writing Solid Code Steve Maguire gives similar advice:
The most critical requirement for writing bug-free code is to become attuned to what causes bugs. All of the techniques in this book are the result of programmers asking themselves two questions over and over again, year after year, for every bug found in their code:
- How could I have automatically detected this bug?
- How could I have prevented this bug?
For a concrete example, look at SQL Injection. How do you prevent it? If you prevent it by remembering to sanitize each bit of input that goes to the database, then you have not solved the problem, you are using a band aid with a failure rate – it’s Russian Roulette. But you can truly solve the problem by using an architecture or tools such that SQL Injections are impossible to cause. The Ruby on Rails ActiveRecord does this to some degree. In C# 3.0, a great language in many regards, SQL Injections are literally impossible to express in the language’s built-in query mechanism. This is the kind of all-encompassing, solve-it-once-and-for-all solution we must seek.
It’s important to take a broad look at our programming environments to come up with solutions for preventing bugs. This mindset matters more than specific techniques; we’ve got to be in the habit of going well beyond the first “why”. Why have we wasted hundreds of thousands of man hours looking for memory leaks, buffer overflows, and dangling pointers in C/C++ code? It wasn’t just because you forgot to free() or you kept a pointer improperly, no. That was a symptom. The reality is that for most projects using C/C++ was the bug, it didn’t just facilitate bugs. We can’t tolerate environments that breed defects instead of preventing them.
Multi-threaded programming is another example of a perverse environment where things are opposite of what they should be: writing correct threading code is hard (really hard), but writing threading bugs is natural and takes no effort. Any design that expects widespread mastery of concurrency, ordering, and memory barriers as a condition for correctness is doomed from the start. It needs to be fixed so that bug-free code is automatic rather than miraculous.
There are a number of layers that can prevent a bug from infecting your code: software process, tools, programming language, libraries, architecture, unit tests, your own habits, etc. Troubleshooting this whole programming stack, not just code, is how we can add depth and breadth to our fixes and make progress. The particulars depend on what kind of programming you do, but here are some questions that might be worth asking, in the spirit of the questions above, when you find a bug:
- Are you using the right programming language? Does it handle memory for you? Does it help minimize lines of code and duplication? (Here’s a good overall comparison and an interesting empirical study)
- Could a better library or framework have prevented the bug (as in the SQL Injection example above)?
- Can architecture changes prevent that class of bug or mitigate their impact?
- Why did your unit tests fail to catch the bug?
- Could compiler warnings, static analysis, or other tools have found this bug?
- Is it at all possible to avoid explicit threading? If so, shun threads because they’re a bad idea. Otherwise, can you eliminate bugs by isolating the threads (reduce shared state aggressively, use read-only data structures, use as few locks as possible)?
- Is your error-handling strategy simple and consistent? Can you centralize and minimize catch blocks for exceptions?
- Are your class interfaces bug prone? Can you change them to make correct usage obvious, or better yet, incorrect usage impossible?
- Could argument validation have prevented this bug? Assertions?
- Would you have caught this bug if you regularly stepped through newly written code in a debugger while thinking of ways to make the code fail?
- Could software process tools have prevented this bug? Continuous integration, code reviews, programming conventions and so on can help a lot. Can you modify your processes to reduce bug rate?
- Have you read Code Complete and the Pragmatic Programmer?
As airplanes still crash we’ll always have our bugs, but we could do a lot better by improving our programming ecosystem and habits rather than just fixing the problem of the hour. The outstanding work of the NTSB is great inspiration. I’m still scared of flying though – think of all the software in those planes!
In the last entry I argued that while learning programming languages comes at a high cost, good programmers should be proficient in multiple languages. I think of programmers as having language sets, based on the idea that knowing one language is not enough for a professional. The task then is to pick the language set wisely to minimize learning time and maximize the benefits to your career. Like other programmers I enjoy toying with different languages, but I’m conservative about fully picking up a language because there’s too much process loss involved. I go for the minimal language set. This is a look at various languages from this point of view; I hope it’s useful to other programmers.
There are many criteria I find important in a programming language, from job market to whether it’s fun. This post looks at jobs, job trends, and overall trends for languages. First, here is data for number of jobs per language across the United States, measured roughly using Dice.com searches.
An abundance of jobs is important because it gives you choices. You’re more likely to find a job or contract that better suits you. The more options you have for jobs, the more likely you’ll find the one with the telecommuting, the long vacation, the fired-up team, the interesting project or the right industry. Above all, you have a better shot at working with non-assholes. Martin Fowler says that early in his career he decided that he "wouldn’t work with unpleasant people, however capable they might be", since people matter most. Amen. There’s no better insurance against assholes than multiple job offers or clients.
Java, C++, and C# clearly take the market demand cake. Java is peculiar in that most jobs require some significant experience in some other technology, like IBM WebSphere or BEA WebLogic. So the market is fragmented. By contrast, the C# market is more monolithic: people use whatever Microsoft gives them. There are pros and cons for each one. The main downside for Java is that most programmers qualify for only a subset of jobs, whereas if you know C#/.NET most Microsoft shops are viable. The flip side is that as a Java developer you get to choose a lot of the technologies you use, while for C# you may have to use whatever ships out of the box (sophisticated teams excepted). Java programmers can find sweet contract rates if they know the right stuff. The data at RealRates.com seems to support this (though it looks like they stopped updating the site). C# programmers are more of a commodity. This drives down rates but contributes to platform adoption. For what it’s worth, here’s job growth measured by Indeed:
The C# growth surprised me. If the data is accurate, that’s some striking growth for an established language. In script land, Indeed’s data shows Ruby growing at break-neck rates:
Update: these Indeed charts show growth, not absolute numbers. They are relative: C# and Ruby are growing faster, but in absolute numbers they’re below their counterparts. If you click on these charts, you’ll be taken to the Indeed web site where you can plot absolute numbers if you’re so inclined.
Salary data is less relevant. You can’t reason much based on the free online reports (from Indeed, Salary.com, etc.) because the data is dodgy and not broken down by relevant factors. Frankly, language is not the determining factor of salary and contract rates, so there’s no point sweating it. Some specific technologies might command a premium, but it’s hard to generalize to a language. Dr. Al Lee from Payscale.com has a post discussing programming salary comparisons. His discussion is insightful but I would not base any decisions on income statistics. Too much of it is up to you, your negotiation skills, employment setup (employee, brokered contract, direct contract), experience, local market, etc. If you really want some numbers, I’ve set up an Indeed Salary Search for the major languages.
Job market trends don’t fully capture the feeble whims of us programming folk. There are other interesting ways to look at mind share and what might be coming down the pike. O’Reilly published trends based on book sales here, but that’s about a year and a half old. Google Trends yields interesting information, complementary to the Job Trends feature at Indeed. Based on Google Trends you can see apparent decline in PHP, the rise of Ruby over Perl and Python, and plunges in C++ and J2EE. Financial results for Q4 2007 show strong server revenue growth for both Linux (11.6% year-over-year) and Microsoft (6.9% year-over-year). Microsoft’s Q1 2008 was impressive. Here are some of the Google Trends:
I’d take trend analysis with a grain of salt; yet the direction of movement looks consistent across different sources of data. It’s also consistent with the idea one gets from reading blogs and talking to colleagues. Namely, the rise of Ruby among the scripting languages, a relative decline of Java and PHP, and C# moving steadily. I’d take these trends into account when deciding on a language to learn. But a quick look at the job numbers for COBOL should put things in context. There’s no urgency and it’s only one factor.
So much for the market. Picking a language based on market statistics alone would be like choosing your profession based on the projections in the Occupational Outlook Handbook. So next time I’ll write about the languages themselves and where I think they fit within the current programming landscape. That way I get sleep and this post stays manageable.
Learning new programming languages is often a waste of time for professional programmers. It may be a fun waste of time (i.e., a hobby), but it’s a waste nonetheless. If you do it for pleasure, then great, but profit is scarce. Pointing this out among good programmers is heresy: even the pragmatic programmers, whose teachings are by and large excellent, suggest we should learn one new programming language every year. That’s rubbish.
The theory is that by learning a new language you "expand your mind" and become "a better programmer". Right. By that kind of argument we should all be eating LSD (don’t). In reality learning a new language is a gritty business in which most of the effort is spent on low-value tasks with poor return on time invested. You need to get to know the libraries, struggle with the environment, find or tune a text editor, look for tools, and so on. Most of the effort has to do with housekeeping chores, which are surely not mind-expanding by anyone’s measure. If you hope to be productive in the new language, things are even bleaker: proficiency has less to do with the language itself than with the myriad technologies you must master to use it effectively.
Even core language learning offers dubious return. How much does it really help to learn a new syntax? How does it expand your mind to learn new operator precedence quirks? Much of what constitutes a language is lexical and syntactical bureaucracy. Worse, you’re learning absolutely nothing about fundamental aspects of computer science. No algorithms, no operating systems, no compiler theory, no math, no AI. If you’re an undergrad, then you should have time to pick up languages on the side while learning all that, of course. But a professional is making a trade-off: what else could you learn with that time? We’re better off studying business, security, usability, architecture, software estimation, and so on, rather than spending time with a different language every year.
If your goal is better programming, you will learn far more from reading high-quality code bases in your current languages than from a new language. Go read top-notch code in the languages you know already; it’ll teach you techniques and style quickly, plus different ways of thinking about problems, with the added bonus that you can actually use what you learn. You can also understand a lot about programming languages in general (issues like typing, scoping, functional vs. imperative) by reading a good book.
There’s another pernicious effect to language hopping: it hurts your momentum. Every time you rely on your current languages you get a little better. Not in a fluffy expand-your-mind way, but in a concrete way. You learn more about your libraries, you set up a new macro in the editor, you have a chance to use that new language feature. Scott Hanselman argues that learning a new language is sharpening your saw, but I see it as neglecting your half-sharpened saw while playing with the dull, new, shiny one. The upfront cost is not the only one either. It’s better to have 3 razor-sharp saws than 8 so-so ones. Each new language you add to your toolbox is making it harder for you to become furiously productive in any given language.
Forget Ruby – Here’s Clipper!
Aside from the immediate reasons, there’s some merit to the mind expansion argument. I think being proficient in at least two languages is indeed important for boosting your ability as a developer. This resembles human language: learning a second one changes the way you think and your perception of the world. The third or fourth, not as much. But it can’t be any two languages. If you know Portuguese and Spanish, your mind didn’t have to expand much. Likewise, learning VB.net and C# doesn’t count. Also, I agree that some programming languages are hazardous to your skills if used exclusively. Edsger Dijkstra claimed COBOL crippled the mind and that its teaching should be regarded as criminal offense. We all know who’s the new COBOL. Java, the kingdom of nouns, is a programming straight jacket. I imagine Dijkstra would have called for harsh no-parole sentences for any CS Department chairs whose students learn only Java. If you write a lot of Java code, being fluent in a richer language does sharpen your saw. This is true for other statically typed languages, but to varying degrees. More on that in a bit.
Java protects developer from self
You might think this is contradictory. You’d be right. Life’s not simple; sorry, I wish it were. The realities are:
- Learning new languages is very expensive (in time), a huge opportunity cost
- There is loss associated with using multiple languages: the "jack of all trades, master of none" problem
- A good programmer uses multiple languages
But there’s a sane way to deal with these. Why, you just need to find the minimal language set. The smallest set of languages it takes to crank out great software quickly while growing as a programmer and making rivers of cash. In the next entry I’ll talk about my personal language set and the factors I used to compose it.
Holy mango! Talk about unexpected. When I wrote my last entry on Feynman and engineering, I was aiming for my 5-strong subscriber base. After one-time deductions of friends and family, that’s a negative number of readers. Not in a million years I could have guessed it would be on Slashdot. But now a decent respect for my newfound readership compels me to explain myself a bit better (or try, anyway).
The biggest controversy was around the "bottom-up" idea. A number of people, including NASA engineers, wrote me about the need for top/down balance. I agree with this view. Feynman’s "bottom-up" is not a dismissal of top-down analysis. As he talks about the lack of a "preliminary study of materials and components" in relation to the engine, it’s clear that such a study would be guided by a plan and exploratory design. After all, engineers can’t randomly test materials until a space shuttle engine crystallizes in front of them. The problem Feynman points out is the lack of essential information about reality in the design. Analysis is important, but it must not overrule or disregard reality. And reality is best exposed by the utmost bottom-up affair: experimentation. Feynman’s bottom-up is empiricism plus the "attitude of highest quality".
He came from the same island as Martin Fowler
I’m not going to dwell on philosophy lest this degenerate into postmodern blabber. For those interested, I think Feynman’s flavor of science is best shown in the last chapter in The Character of Physical Law and in the electromagnetism and quantum mechanics bits of The Feynman Lectures on Physics. The brilliant empirical mind behind Appendix F is laid bare in these wonderful, fun books. But how does this apply to software? Empiricism in a project context is described well in the business literature. Here’s what In Search Of Excellence has to say in the chapter "A Bias For Action":
The problem we’re addressing (…) is the all-too-reasonable and rational response to complexity in big companies: coordinate things, study them, form committees, ask for more data(…). Indeed, when the world is complex, as it is in big companies, a complex system often does seem in order. But this process is usually greatly overdone. Complexity causes the lethargy and inertia that make too many companies unresponsive.
The important lesson from the excellent companies is that life doesn’t have to be that way. Their mechanism comprises a wide range of action devices especially in the area of management systems, organizational fluidity, and experiments. (…)
There is no more important trait among excellent companies than an action orientation. (…) They don’t indulge in long reports. Nor do they install formal matrixes. They live in accord with the basic human limitations we described earlier: people can only handle a little bit of information at one time.
Finally, and most important, is the user connection. The customer, especially the sophisticated customer, is a key participant in most successful experimenting processes.
Action and experimentation are the cornerstones of empiricism. No attempt is made to subdue reality by extensive analysis and copious documentation. Reality is invited in via experiments. Instead of agonizing over market research, an empirical company hires interns and develops a product in one summer. A non-empirical company has 43 people planning an off-button design for one year. Empirical companies still rely on analysis. P&G has memos, they’re just limited to one page. But software projects are not after "empirical reality", we just want working products. Built to Last deftly relates experiments to process in a chapter entitled "Try a Lot of Stuff and Keep What Works":
What looks in hindsight like a brilliant strategy was often the residual result of opportunistic experimentation and "purposeful accidents".
Bill Hewlett told us that HP "never planned more than two or three years out". (…) We could go on with examples from Citicorp, Philip Morris, GE, Sony, and others. (…) We were surprised to find so many examples of key moves by the visionary companies that came about by some process other than planning. Nor do these examples merely represent random luck. No, we found something else at work (…): evolutionary progress. Evolutionary progress begins with small incremental steps
After dubbing 3M the "Mutation Machine From Minnesota" the authors say:
If we had to bet our lives on the continued success and adaptability of any single company (…), we would place that bet on 3M. Using 3M as a blueprint for evolutionary progress at its best, here are five basic lessons (…).
- Give it a try – and quick!
- Accept that mistakes will be made.
- Take small steps.
- Give people the room they need.
- Mechanisms–build that ticking clock
Built to Last makes the inescapable link to biological evolution, the epitome of bottom-up experimental development. Top companies experiment vigorously with products and processes, driven by the market and organizational metrics. Nature experiments with genetic variation, driven by natural selection. The common theme is that successful systems are driven by reality through experimentation. That’s dandy, but how about software? The best discussion I know of software-as-evolution is the famous LKML thread where Linus shuns top-down design in favor of experimentation. I think of it this way:
A good software development process should optimize experimentation and improve feedback from reality. This is what I mean by reality-driven development. And in software the most important realities are user experience and technical quality, while the primary experiments are working software and code. This isn’t a formal model (heh), it’s simply my favorite analogy for software development. I like the name "reality-driven" because when you mention reality people think of users. And I like the model because it helps me focus on important stuff and on effective ideas, like Paul Graham’s advice to release early and let the market design the product. It also has good explanatory power. Firefox is such a great browser due to intense experimentation in the form of add-ons. Waterfall is so awful because reality is ignored: when the time for feedback comes, the project is over.
There is no specific reality-driven methodology. The Agile principles have a lot in common with these ideas (and certainly influenced them), but the devil is in the details. I prefer to think of software engineering in terms of a toolbox, full of techniques we pick and choose for the right situation. Process tools for optimizing experimentation include iterative development, executable architecture, continuous integration, and unit testing.
Based on this model, the two realities we care about are user experience (including the software’s utility) and technical quality. User experience is often neglected in agile and waterfall alike. The measurement tools come from the usability people and from plain old business sense. Techniques include usability testing, observing users, spending time with users (preferably in their habitat), talking to users, and hugging users. Technical quality revolves around the code base and third party tools. Here we’re looking for the ol’ bit of ultraviolence plus generality, clarity, simplicity, security, etc. Tools include code inspections, code reviews, and metric reports as part of the build. The elusive hiring of good programmers is crucial, but it’s not measurement, so it falls within the "software project" box.
When I think about pre-requisites (requirements and top-down design) I do so in the context of this reality-driven model. Pre-requisites can optimize experimentation by minimizing cost and risk. I have seen how well-written requirements can quickly take a team from zero to working software that’s close to users’ wishes. Likewise, good top-down design can help achieve technical quality faster. But I think of prerequisites as sketches, not blueprints. I prefer minimal specs that produce working software to be molded by the users. And rigid upfront design is a sure way to a crappy code base or engineering disasters. Alistair Cockburn put it best: "With design I can think very fast, but my thinking is full of little holes."
In the end, feedback from reality helps you avoid Ivory Tower Development and pass the Ultimate Unit Test. You make your users happy. A reality-driven process with management buy-in purges faulty o-rings and gets the right materials in a shuttle engine. It avoids abominable applications. It brings money and fame and huge obelisks in your honor. So now you know my idea of bottom-up:
- Have a bias for experiment over analysis, though both have their place.
- Optimize experiments: make them as early, fast, cheap, and broad as you can. Analysis can help here.
- Experiment vigorously.
- Be smart and proactive about measuring reality: user experience and technical quality.
- React to feedback. Let reality drive.
Of course, you can turn the empirical machine towards the process itself, and try to improve the way you build rather than what you build ("It’s fractal, dude!"). That’s the whole point of Built to Last. Also, I’ve found that Built to Last and In Search Of Excellence work well for explaining evolutionary/agile ideas to senior management.
I hope I didn’t kill the aforementioned newfound readership by boredom. Thanks for reading and see you next time. The new server arrives on Friday.