Posted by Uncle Bob on 11/27/2008
I really like the concept of BDD (Behavior Driven Development). I think Dan North is brilliant, and had done us all a great service by presenting the concept.
OK, you can “feel” the “but” coming, can’t you?
It’s not so much a “but” as an “aha!”. (The punch line is at the end of this article, so don’t give up in the middle.)
BDD is a variation on TDD. Whereas in TDD we drive the development of a module by “first” stating the requirements as unit tests, in BDD we drive that development by first stating the requirements as, well, requirements. The form of those requirements is fairly rigid, allowing them to be interpreted by a tool that can execute them in a manner that is similar to unit tests.
For example,
GIVEN an employee named Bob making $12 per hour.
WHEN Bob works 40 hours in one week;
THEN Bob will be paid $480 on Friday evening.
The Given/When/Then convention is central to the notion of BDD. It connects the human concept of cause and effect, to the software concept of input/process/output. With enough formality, a tool can be (and has been) written that interprets the intent of the requirement and then drives the system under test (SUT) to ensure that the requirement works as stated.
The argued benefit is that the language you use affects the way you think (See this. and so if you use a language closer to the way humans think about problems, you’ll get better thought processes and therefore better results.
To say this differently, the Given/When/Then convention stimulates better thought processes than the AssertEquals(expected, actual); convention.
But enough of the overview. This isn’t what I wanted to talk about. What struck me the other day was this…
The Given/When/Then syntax of BDD seemed eerily familiar when I first heard about it several years ago. It’s been tickling at the back of my brain since then. Something about that triplet was trying to resonate with something else in my brain.
Then yesterday I realized that Given/When/Then is very similar to If/And/Then; a convention that I have used for the last 20+ years to read state transition tables.
Consider my old standard state transition table: The Subway Turnstile:
I read this as a set of If/And/Then sentences as follows:
If we are in the LOCKED state, and we get a COIN event, then we go to the UNLOCKED state, and we invoke the Unlock action.
If we are in the LOCKED state, and we get a PASS event, then we stay in the LOCKED state, and we invoke the Alarm action.
etc.
This strange similarity caused me to realize that GIVEN/WHEN/THEN is simply a state transition, and that BDD is really just a way to describe a finite state machine. Clearly “GIVEN” is the current state of the system to be explored. “WHEN” describes an event or stimulus to that system. “THEN” describes the resulting state of the system. GIVEN/WHEN/THEN is nothing more than a description of a state transition, and the sum of all the GIVEN/WHEN/THEN statement is nothing more than a Finite State Machine.
Perhaps if I rephrase this you might see why I think this is a bit more than a ho-hum.
Some of the brightest minds in our industry, people like Dan North, Dave Astels, David Chelimsky, Aslak Hellesoy, and a host of others, have been pursuing the notion that if we use a better language to describe automated requirements, we will improve the way we think about those requirements, and therefore write better requirements. The better language that they have chosen and used for the last several years uses the Given/When/Then form which, as we have seen, is a description of a finite state machine. And so it all comes back to Turing. It all comes back to marks on a tape. We’ve decided that the best way to describe the requirements of a system is to describe it as a turing machine.
OK, perhaps I overdid the irony there. Clearly we don’t need to resort to marks on a tape. But still there is a grand irony in all this. The massive churning of neurons struggling with this problem over years and decades have reconnected the circle to the thoughts of that brave pioneer from the 1940s.
But enough of irony. Is this useful? I think it may be. You see, one of the great benefits of describing a problem as a Finite State Machine (FSM) is that you can complete the logic of the problem. That is, if you can enumerate the states and the events, then you know that the number of paths through the system is no larger than S * E. Or, rather, there are no more than S*E transitions from one state to another. More importantly, enumerating them is simply a matter of creating a transition for every combination of state and event.
One of the more persistent problems in BDD (and TDD for that matter) is knowing when you are done. That is, how do you know that you have written enough scenarios (tests). Perhaps there is some condition that you have forgotten to explore, some pathway through the system that you have not described.
This problem is precisely the kind of problem that FSMs are very good at resolving. If you can enumerate the states, and events, then you know the number of paths though the system. So if Given/When/Then statements are truly nothing more than state transitios, all we need to do is enumerate the number of GIVENs and the number of WHENs. The number of scenarios will simply be the product of the two.
To my knowledge, (which is clearly inadequate) this is not something we’ve ever tried before at the level of a business requirements document. But even if we have, the BDD mindset may make it easier to apply. Indeed, if we can formally enumerate all the Givens and Whens, then a tool could determine whether our requirements document has executed every path, and could find those paths that we had missed.
So, in conclusion, TDD has led us on an interesting path. TDD was adopted as a way to help us phrase low level requirements and drive the development of software based on those requirements. BDD, a variation of TDD, was created to help us think better about higher level requirements, and drive the development of systems using a language better than unit tests. But BDD is really a variation of Finite State Machine specifications, and FSMs can be shown, mathematically, to be complete. Therefore, we may have a way to conclusively demonstrate that our requirements are complete and consistent. (Apologies to Godel).
In the end, the BDDers may have been right that language improves the way we think about things. Certainly in my silly case, it was the language of BDD that resonated with the language of FSM.
Posted in Uncle Bob's Blatherings, Agile Methods, Clean Code
Meta 79 comments, permalink, rss, atom
Posted by Uncle Bob on 11/21/2008
This video describes how to use the comparison operators in SLIM.
Comments
Christian B. Hauknes 43 minutes later:
Recently I have had a growing feeling that there is something fundamentally wrong with the way we teach and practice TDD. Most places I come, I see people struggle with TDD, and the tests seem to cost more in maintenance than they return in value.
Last week I was at a company that just started working with automated tests about 8 -9 months ago. They had a huge core implemented in C++ where they had written a bunch of characterization test, and quite a bit of new code written “TDD’ish” in C#. People were highly enthusiastic about TDD, but they obviously had problems, and it seemed the characterization tests actually gave more long term value than the TDD-tests.
I’m frequently see people doing the same mistakes when it comes to TDD. People test methods and implementations, which leads to tests that are too closely coupled to the implementation, difficult to maintain and refactor, and seems to slow down development and changes. After working TDD-style for a couple of years, people usually get better. They start decoupling their tests from their implementation, specifying functionality rather than writing tests.
I think this is due to the mental model that the languages/tools (e.g. xUnit) portrays, and the way we normally teach TDD. xUnit uses tests and asserts, linking what we do closely to implementation. The bowling-game example gives a result where you have one class – one test class. Together, this leads to years of a distorted mental image of what’s important and what you should be focusing on.
I think BDD gives a better mental picture, and goes a long way to make this painful journey easier. It is easier to get people to write good tests without having to go through years of pain.
Erlend Oftedal about 1 hour later:
I completely agree with Christian. I often hear people starting out with TDD saying things like “but what should I test?” and naming their tests [methodname]Test and Test1, Test 2 etc.. This doesn’t really help you to understand the system by reading the tests.
BDD encourages that tests should specify the behaviour of your code. Often this means using test names that describes a functionality like “ShouldThrowExceptionWhen….”. By using this kind of syntax, you can use tools like agiledox to export a list of expected behaviour in your system. It’s also easy to think of tests to write, because all you have to do, is describe how you expect the system/component to behave.
The most interesting parts of BDD though, might be things like RSpec and Cucumber which allows you to write tests in natural language on the Given/When/Then format. I think this can really help to bridge the gap between business and developers.
JoshG about 1 hour later:
When Stacy Curl (who has worked with Dan and Liz Keogh quite a bit) was working with me in Sydney, we discussed BDD in terms of sets. We played with analysing the statements in order to both optimize them (if two tests have the same preconditions and events, then they can be collapsed at runtime into one test with a union of postconditions), and to assist in determining completeness.
Also rspec includes the rbehave tool which allows for tests to be expressed in natural (but quite structured) language.
llewellyn falco about 1 hour later:
I have a different take on the BDD / TDD theory. TDD is like making legos: little blocks, that you know work, that fit together to make a program. It is a massively more effective process than current programming “technique” of hack till you think it works. but invariably you end up with some bricks you didn’t need. or it can be hard to figure out which small bricks you want to start with in order to achieve the end result.
it’s a bottom up approach.
BDD is a top down approach. create the outline of the feature you want in the end. then you need a block so you write the unit test for that block, but that block needs another block, so another test, etc… until the outline is full. The BDD test can take a long time to pass (1-2 hours sometimes) It gives a good idea of when you are done.
Even my friends who still prefer to practice TDD use and appreciate the first BDD test (usually referred to as a lighthouse test), even if they then build from the bottom up.
But you can do BDD with xunit, you don’t need it to be english readable (but you do need it to be READABLE!!!), the big difference is that unit tests can easily change with implementation. but Behavior tests should only be changing with user requirements. which are more stationary.
Sebastian Kübeck about 2 hours later:
I don’t know if I got that right but what you are saying here is that when we do TDD (or BDD), we are programming the system twice. Once declarative and once algorithmic. Now as you mention it it seems obvious. If have a complete set of requirements for a system, you have already programmed it. Maybe that closes the circle to Frederick Brooks when he says that we program things twice anyway. So instead of throwing one away, we are creating two in parallel verifying each other when we practice TDD (or BDD).
Tim Ross about 3 hours later:
I have recently been teaching a junior developer test-first development using BDD. He had no previous TDD or unit testing experience but managed to pick it up very quickly. He was able to determine which tests to write, simply by considering the behaviour that was required.
Learning TDD is difficult if you don’t know how to write effective tests. BDD seems like a more natural approach for learning test-first development.
Brian Slesinsky about 3 hours later:
It’s an interesting insight but following up on it could easily go in the wrong direction. Tests are more readable than code because they are specific examples rather than abstract rules. To cover all possible state transitions in a formal way would require adding abstraction and logic until the tests are no easier to understand than the code being tested and would no longer be useful for communicating with customers.
On the other hand, perhaps if a system were described as a formal state machine, perhaps it could be used to automatically generate new examples? Suppose the customer starts with some examples. The programmer generalizes them, and then extracts new examples, and then we ask the customer whether they are correct. Based on the results, we change the production code and generate new examples, and so on, until we’re satisfied that we’ve generated enough examples to ensure that the program is correct. (I believe Agitar does something like this.)
Vidar Kongsli about 15 hours later:
I think one of the challenges for people starting out with TDD is that they first and foremost perceive TDD as a testing discipline. TDD is much more; it’s about how you handle your requirements and it is about application design. BDD, I think, helps in the sense that it more explicitly focuses on handling requirements.
Davy Brion about 19 hours later:
We actually have a tool which makes sure that all of our functional requirements are covered by tests. I wrote down in introduction to the tool on my blog:
http://davybrion.com/blog/2008/11/genesis-bridging-the-gap-between-requirement-and-code/
Brian Maso 1 day later:
The S * E equation is too simple for real-world stateful systems, because in black box testing there are usually many state transitions which are hidden from external “black box” observability. You have to execute several interactions on a system under test in order to deduce that specific internal state transitions were effected as expected.
I find TDD really difficult to use for very course-grained component interfaces—remote services especially (rest-ish or more free-formed soap-y) for this reason. The number of tests needed for “complete” test coverage of service behavior is huge.
On IBM devworks I wrote an article on Operation State Modeling, which is a formal requirement description where you describe the logical behavior of individual operations in terms of initial state, input, output, and final state. Its asically an expanded form of the Given/When/Then BDD idiom. There are algorithms for automatically breaking down such descriptions in to a set of individual Given/When/Then tests. So from a much more concise (cheaper to produce and maintain) description you can get your full test suite.
Sebastian Kübeck 1 day later:
@Brian you are perfectly right. The problem is that you have to cover a huge state space when you are testing against coarse grained systems. That is the reason why it is way simpler to write tests against small Units as the state space is much more limited there. Bob is taking an idealistic approach here. When you are doing TDD, you are not covering the whole state space an thus never reach S*E in reality. Your Operation State Modeling approach is very interesting! I think we will see a couple of great new tools like this in the near future.
Mario Gleichmann 3 days later:
When i first got in contact with BDD, there weren’t any user stories following the Given/When/Than pattern, but a focus on specifying behaviour instead of verifying state using a more specification oriented language (in contrast to a more test centric vocabulary).
Like Christian said, BDD tries to stay away from ‘testing’ implementation specific details, because this may become a to big barrier for future refactoring. According to this, testing ‘state’ is kind of banished within BDD – instead it’s the observable behaviour of the system you are interested in, no matter if this behaviour spans multiple classes or is located within a single method. This means, that you first of all specify interactions instead of looking to some state.
May i miss the point, but under this point of view, BDD is not about requirements in terms of describing a system of (discrete) states and some well defined transitions between them (very abstract spoken) but more about specifying the required behaviour – within this view, i would see BDD more in the tradition of Design by contract than in the tradition of TDD (yet i know, that it’s kind of an evolutionary step, coming from TDD – only my very own view on BDD)
http://etorreborre.blogspot.com 3 days later:
Hi Bob,
There are tools to generate scenarios from FSM:www.smartesting.com. You create UML state machines and the tool defines the corresponding business “targets” (a Given/When/Then) and tries to generate a scenario from the initial state, covering the target.
And then you can show that your requirements are covered.
Nice isn’t it?
The only thing with this approach is that the primary input is a model, which is not always understandable by everyone. The generated scenarios on the other hand can be made human-readable, but they’re now second-hand artifacts in the process.
Eric.
Scott Bellware 6 days later:
I think it’s a shame that the notion of BDD is coupled to acceptance testing and to GWT. I still find great value in RSpec’s context/specification style, even in many cases for acceptance testing where GWT is overkill and overly-elaborate.
I can’t help but think that there’s a missing dollop of pragmatism here in regards to GWT and that this is one of those technology fascination/distraction things that we’re going to have to swing back to the middle on after possibly a couple of years of lost opportunity.
Edward Vielmetti about 1 month later:
The challenge here is that in systems where people are involved, there are bound to be states which are either poorly specifiable or poorly described. Taking your example up top, if poor Bob is having his wages garnished for child support, he’s not going to see his whole paycheck. Iterate across all unpredictable non-systematic interaction.
A second challenges in state machine derived analysis is that when the system is moved to a new environment, some assumptions and hidden states become visible. If, for instance, you are building a file system that has Unix filesystem semantics, you may not be able to determine all of the necessary tests in some naive way until you run into a complex application that pushes the file system hard. The example I have of that is the Apollo DOMAIN port of Usenet News and sendmail, where the Apollo’s mandatory, implicit file locks were not the same as what Unix vanilla file systems expected. (this is circa 1993/1994 based on the Apollo FAQ and experience I had in the late 80s).
You see similar behavior all the time with systems where the test conditions are for local area network latency and bandwidth constraints, but the real world installation has lag and packet loss and unpredictable round trip times.
Portability to a new machine (new virtual machine, new network architecture, new file system, new operating system, new device) is more than just writing every exhaustive test to reach every corner case – it’s in anticipating which elements of the system are invariants and which ones have to be adjusted. That’s more architecture than testing.
Chris Kleeschulte about 1 month later:
This topic keenly interested me since it parallels research being done in other disciplines that may be a bit ahead of computer science/ software development in taking a concept out of someone’s head, removing the ambiguity, and getting another entity to perform the transformations described. I love the idea of “getting down to brass tacks” and integration testing from the top down. It seems to me that BDD does this well or at least puts the emphasis in the right place. If we can all agree on the ultimate goal of software development, which is producing software that does exactly what it is intended to do, then BDD seems like a good experiment in bringing the concepts of theoretical computer science (PDA’s, FSM, TM) straight up to the stakeholder. BDD, essentially, helps remove ambiguity from the stakeholder’s ideas about their system entities and helps them think clearly leading to better, faster, cheaper software in the end. Software developers can also leverage rigorous mathematical research to prove that the transforms will halt and provide desired output.
bonder about 1 month later:
@Edward,
I think that on top of BDD/FSM you also need to be considering the sheer complexity of the system and finding ways to isolate the complexity.
In your case, Bob’s garnished wages would be a reduction taken from his “base pay” (a business term that the project has developed as part of its shared language).
So the BDD test written above is a little simplistic and could be re-written:
GIVEN an employee named Bob making $12 per hour. WHEN Bob works 40 hours in one week; THEN Bob's base pay for the pay period is $480.
And one for the reduction engine would be:
GIVEN an employee named Bob with a garnishment of 50%. WHEN Bob's base pay is $480 THEN Bob's adjusted bay will will be $240 AND $240 will be credited to the associated child support payment account.
Or something like that.
I like to keep in mind what Roger Sessions says about system complexity (paraphrasing here): it is easier to manage the complexity of three sub-systems with 10 states each than one system with 1,000 states.
Paul 10 months later:
“More importantly, enumerating them is simply a matter of creating a transition for every combination of state and event.”
Ok, so we already said that it is a FINITE state machine. But according to Godel’s incompleteness theorem, only trivial systems can be described both completely and without contradiction. As the complexity of this grows, [ i.e. number of states and events ] won’t we hit a limit to our ability to formally describe the system with a grammar?
May be my point is academic. I guess the real issue is whether or not this approach is useful. How massive would a project have to be before BDD breaks down? I guess we have to measure relative gains against where our existing approaches breakdown. My point is, there’s no silver bullet.
“One of the more persistent problems in BDD (and TDD for that matter) is knowing when you are done.
Oh lord. Just tell me that we’re not going to have to solve the halting problem, too. ;)
peter about 1 year later:
Hi! So am i missing something.. supose we use the “employee bob” example. GIVEN an employee named Bob making $12 per hour. WHEN Bob works 40 hours in one week; THEN Bob will be paid $480 on Friday evening. Assuming that is now the SPEC. this would mean that ONLY an employee named BOB would get that pay. Getting this to pass would only prove that the code will work if the employee is BOB, what about employee named peter (me :-)) on the same pay scale?? Seems very brittle…
Kirk about 1 year later:
I think BDD is definitely a natural evolution of TDD in ways of test thinking and syncing up with real customer requirements. The biggest problem I’ve seen when dealing with teams using TDD is the lack of clean code when it comes to testing. Teams want to write quick tests and forget about them only leaving the code to rot and a tangled mess for the next developer to try and sift through. I’ve tried to solve this by creating a simple BDD framework in C# which I am currently using on an XP team with success. Each test is only ever 3 lines of code:
[Test] public void CanWithDrawMoneyFromAPersonsAccount() { Given(x => x.APersonWithAnAccount()) .When(x => x.FiveHundredDollarsIsWithdrawn()) .Then(x => x.TheAccountBalanceShouldReflectTheWithdrawal()); }
The test state is stored in another class separate from the actual test fixture which allows the tester and reader to focus on the meaning of the test scenario (instead of being distracted by the test setups and mocks). It also aids in refactoring and promotes code re-use.
If anyone is doing C# and is interested in this framework (CSSpec) it is available here: http://code.google.com/p/dotnetxp/downloads/list
peter about 1 year later:
@Kirk.. seems clean. Here’s a question.. what happens with Multiple Givens and Whens and Thens….?
Some of the problems encounted : Users write the “spec” thinking in implementation details (look at the example in this article.) Using that example, should the test actually prove that it ONLY works for BOB? what about employee John, etc… At which point and how do transition into the abstractions?