The Quality of TDD
Posted by Uncle Bob on 02/16/2008
Kieth Braithwaite has made an interesting observation here. The basic idea is that code that has been written with TDD has a lower Cyclomatic Complexity per function compared to code that has not been written with TDD. If this is true then it could imply lower defects because of this.
Kieth’s metric takes in the code for an entire project and boils it down to a single number. His hypothesis is that a system written with TDD will always measure above a certain threshold, indicating very low CC; whereas systems written without TDD may or may not measure above that threshold.
Kieth has built a tool that you can get here that will generate this metric for most java projects. He and others have used this tool to measure many different systems. So far the hypothesis seems to hold water.
The metric can’t tell you if TDD was used; but it might just be able to tell you that it wasn’t used.
Comments
Alicia about 12 hours later:
Braithwaite is a genius. Im sure this will work perfectly.
Keith about 15 hours later:
Hi Bob, Actually, its not so simple as that TDD code has less complexity per method absolutely (if only), but rather that there is a preference for more simple methods in TDD code.
What fascinates me about Rich’s data is that the minimum probability of defects comes at what might be considered quite a high complexity.
By the way, I’m working on a new version of the tool with some visualization features that makes the differences in codebases a little easier to grasp. Should be available sometime in the next couple of weeks.
Keith
(PS, the rule is “i before e, except when it’s not” ;)
www.jamesladdcode.com about 19 hours later:
This is a great tool for measuring complexity in Java code:http://www.martyandrews.net/resources/complexian.html
Paddy Healey 3 days later:
Another tool that I think could be interesting is Crap4j:http://www.crap4j.org
Mark Dixon 3 days later:
I just wanted to follow up to the point about Rich’s data showing low probabilities for surprisingly high complexity values.
The Enerjy data is calculated entirely at the file level, so you can’t compare the Enerjy numbers with the traditional 7-10 range that applies at method level. This comes back to one of the fundamental problems with CC these days – most metrics are measured at a file level but CC only makes sense at the method level. Averaging the CC over the file makes no sense as the whole point of it is to highlight methods with unusual values. For Enerjy we decided to just sum the CC of all methods in the file to get a file-level metric.
Another useful approach would be to create a static analysis rule that fires whenever a method has complexity over a certain threshold and then use the number of firings as a predictor. We had a hard time coming up with a threshold that worked well for real-world code, though. Values below around 20 seemed to fire very often and on code that, to our mind, didn’t need refactoring. Values above 20 just seemed pointless :-)
Mark
unclebob 4 days later:
Mark Dixon said:
Another useful approach would be to create a static analysis rule that fires whenever a method has complexity over a certain threshold and then use the number of firings as a predictor. We had a hard time coming up with a threshold that worked well for real-world code, though. Values below around 20 seemed to fire very often and on code that, to our mind, didn’t need refactoring. Values above 20 just seemed pointless :-)
Well, just to be nasty, let me point out that in all 45,000 lines of FitNesse there is only one function with a CC > 20. It happens to be a big switch statement that translates http error codes into strings. e.g. 404->”Not Found”.
Christoph Beck 4 days later:
If Rich’s data is on a per file basis, it’s not CC (and shouldn’t be named CC). Rather it comes close to another metric called WMC or “Weighted Methods per Class”, which sums up CC values of a class. For a file based CC metric, it might have been better to take the maximum, but that way, the data presented is not that useful.
@Keith: Now, where does the magic minimum at 11 come from? Let me guess: in a typical code base we have a relevant amount of trivial, bean-like classes, usually without bugs. If we were looking at the distribution of the number of properties of these bean classes, we might find a maximum at around 5 or 6. Now each property has a getter and a setter with CC of 1, which results in a higher number of bean classes whose CC values sum up to something around 11. I’m pretty sure the function would be monotone increasing if these classes would have been excluded.
@Bob: I just analyzed fitnesse with STAN. You are right, the project has low CC values. However, to be nasty too, there are around 10 classes in the fitnesse tree with WMC > 38. According to Rich, these classes have > 50% probability of being fault-prone.
Leandro Zis 7 days later:
@Bob: Why not replace the switch statement with a Map?
unclebob 8 days later:
@Bob: Why not replace the switch statement with a Map?
Could do. But why? In this case the switch statement is simple, expressive, and terse.
Keith 16 days later:
@Mark: is there any chance of the enrjy results being written up in a more transparent style? I had actually missed the point that Christoph points out, that what that chart measures isn’t quite Cyclomatic Complexity. I actually didn’t find it easy to tell from the blog posting (and even more so the paper) exactly what it is that was being measured.
@bob: high CC methods doing problem-free, easy to understand, easy to test dispatching or marshaling tasks occur in more than one of the codebases I’ve looked at—they are a nice example of how any metric can be misleading if not interpreted properly.
Keith 24 days later:
By the way, Bob’s link has rotted and a new version of the tool is available formhttp://www.keithbraithwaite.demon.co.uk/professional/software/#measure