Software Engineering: 4 - Fix failures. Now!

Post date: Feb 7, 2012 4:49:27 PM

Building code is hard. Sadly, the designs you create, the code you write, the tests you devise are all, in the end, parts of the system that can fail. A corner case in the design can be exploited to crash the system; a mis-coded logical condition can cause unexpected behavior; a set of tests you thought were complete may not be and may allow defects to slip through... Yes, programming is a difficult mental task with only your wits standing between proper operation and chaos. But, even with the best mental effort, defects will creep in. With any luck, these defects will be detected by some other means and you will have an opportunity to fix them before your system is delivered to your customers.

When should defects be fixed? As soon after you find them as is possible. Yes, there are always other things to do rather than fix defects - new components to be designed, now code to code, chatting with your BFF via facebook... OK, maybe not so much the last one, but there are many advantages in fixing defects quickly and some real dangers in allowing them to linger. When a defect is first discovered, the person who finds it is familiar with what has happened. The system on which the defect was found is more likely to be available for inspection. Waiting for weeks (or even days) to start working on the defect will reduce the amount of information available to you or will potentially require you to go through a possibly fruitless reproduction cycle. Stacking bugs like cordwood also adds to basic project overhead - defects need to be scrubbed and triaged, people need to decide which defects deserve their attention - this can add a significant amount of work to your project team. Leaving defects until later also increases the chance of a project slip. In general, well-defined tasks, like new features, have lower variability in their duration than do more exploratory tasks, like bug fixes. New features are unlikely to uncover surprising architectural issues and, if they do, they can often be solved with no schedule impact by a decision to omit one or more features. When you find a defect later in the game your choices are more limited - fix it and slip or ship with the defect in the product.

There are also second-order costs associated with leaving defects in code. People start coding around buggy portions of the system - their code becomes distorted because the foundation they are building on is shaky. What "proper operation" is in the face of known defects is often daunting to determine. As such, testing and writing tests become more difficult, fewer tests are written, and the code, over time, becomes worse and worse. And people become scared of tackling changes in buggy code, worried that they'll be the one that will be stuck with re-writing the buggy mess (or even worse, be stuck with enhancing it into the end of their careers) - it makes coders tentative and reduces the amount of aggression a good programmer needs to tackle a batch of code.

So, make no mistake - failures, of build systems, of design, of code, of testing, are a cancer, eating away at the stability of your system. They are a symptom that something is wrong. A diagnosis must be made immediately and, if you are doing things right, remediation should also applied. immediately Waiting to fix failures is a recipe for disaster. Don't fall into that trap - fix failures... now!